Friday, September 17, 2010

The Indus argument continues

Last year much excitement and noise occurred, including on this blog [1, 2, 3], when a group of scientists (led by Rajesh Rao at the University of Washington, and including my colleague Ronojoy Adhikari) published a brief paper in Science supplying evidence, on statistical grounds, that the Indus symbols constituted a writing system. In their words, they "present evidence for the linguistic hypothesis by showing that the script’s conditional entropy is closer to those of natural languages than various types of nonlinguistic systems."

This rather modest claim outraged Steve Farmer, Richard Sproat and (presumably) Michael Witzel (FSW), who had previously "proved" that the Harappan civilization was not literate (the paper was subtitled "The myth of a literate Harappan civilization"). In a series of online screeds, they attacked the work of Rao et al: for reviews, see this previous post, and links and comments therein.

Now Richard Sproat has published his latest attack on Rao et al. in the journal Computational Linguistics. Rao et al have a rejoinder, as do another set of researchers, and Sproat has a further response to both groups (but primarily to Rao et al); all these rejoinders will appear in the December issue of Computational Linguistics.

To summarise quickly, the way I see it: Sproat claims (as he previously did on the internet) that Rao et al.'s use of "conditional entropy" is useless in distinguishing scripts from non-scripts, because one can construct non-scripts with the same conditional entropy, and because their extreme ("type 1" and "type 2") non-linguistic systems are artificial examples. Rao et al. respond that that is a mischaracterisation of what they did, observe that Sproat entirely fails to mention the second figure from the same paper or the more recent "block entropy" results, and repeat (in case it wasn't obvious) that they don't claim to prove anything, only offer evidence. They give inductive and Bayesian arguments for why the mass of evidence, including their own, should increase our belief that the Indus symbols were a script.

In connection with the Bayesian arguments, Rao et al. do me the honour of citing my blog post on the matter, thus giving this humble blog its first scholarly citation. My argument was as follows: Given prior degrees of belief, $P(S)$ for the script hypothesis and $P(NS)$ for the non-script hypothesis, and give "likelihoods" of data given each hypothesis, $P(D|S)$ and $P(D|NS)$, Bayes' theorem tells us how to calculate our posterior degrees of belief in each hypothesis given the data:
$P(S|D) = \frac{P(D|S)P(S)}{P(D|S)P(S) + P(D|NS)P(NS)}$
We can crudely estimate P(D|NS) by looking at the "spread" of the language band of the figure 1A in their Science paper and ask how likely it is that a generic non-language sequence would fall in that band: assuming that it can fall anywhere between the two extreme limits that they plot, we can eyeball it as 0.1 (the band occupies 10% of the total spread) [Update 17/09/2010: See the plot below, which is identical to the one in Science, except for the addition of Fortran (blue squares, to be ignored here).] Let us say a new language is very likely (say 0.9) to fall in the same band. Then $P(D|NS) = 0.1$, $P(D|S) = 0.9$. If we were initially neutrally placed between the hypotheses ($P(NS) = P(S) = 0.5$), then we get $P(S|D) = 0.9$: that is, after seeing these data we should be 90% convinced of the script hypothesis. Even if we started out rather strongly skeptical of the script hypothesis ($P(S) = 0.2$, $P(NS) = 0.8$), the Bayesian formula tells us that, after seeing the data, we would be almost 70% convinced ($P(S|D) = 0.69$).

We can quibble with these numbers, but the general point is that this is how science works: we adjust our degrees of belief in hypotheses based on the data we have and the extent to which the hypotheses explain those data.

Sproat apparently disagrees with this "inductive" approach, and accuses Rao et al of lack of clarity in their goals. On the first page, he clarifies that he was talking only of the Science paper and has not read carefully analysed [correction 17/09/10] the more recent papers by Rao and colleagues; he says those works do not affect questions on the previous paper, writing,

'To give a stark example, if someone should eventually demonstrate rigorously that cottontop tamarins are capable of learning “regular” grammars, that would have no bearing on the questions currently surrounding Marc Hauser’s 2002 publication in Cognition.'

In this way Sproat succeeds in insinuating, without saying it, that the work of Rao et al. may have been fraudulent. (Link to Hauser case coverage)

A little later, on the claim that the arguments of FSW "had been accepted by many archaeologists and linguistics", he offers this belated evidence that such people do exist:

Perhaps they do not exist? But they do: Andrew Lawler, a science reporter who in 2004 interviewed a large number of people on both sides of the debate notes that “many others are convinced that Farmer, Witzel, and Sproat have found a way to move away from sterile discussions of decipherment, and they ﬁnd few ﬂaws in their arguments” (Lawler 2004, page 2029), and quotes the Sanskrit scholar George Thompson and University of Pennsylvania Professor Emeritus of Indian studies Frank Southworth.

Having thus convincingly cited a science reporter to prove that the academic community widely accepts FSW's thesis, he proceeds to the actual claims about the symbols; after a few pages of nitpicks not very different from the above, he addresses a point which he had previously raised in this comment: why does figure 1A in the Science paper not include Fortran? He suspects that Fortran's curve would have overlapped significantly with the languages, "compromising the visual aspect of the plot". I actually find that explanation credible(*), and I was not comfortable with the manner of presentation of the data in the Science paper: but I view this as a problem with the "system" rather than the authors. Enormous prestige is attached to publication in journals like Science. To allow more authors to publish, Science has a one-page "brevia" format (which Rao et al. used) that allows essential conclusions to be presented on that printed page, while the substance of the paper is in supplementary material online. Rao et al. can argue, correctly, that they hid nothing in their full paper (including the supplementary material); but obviously what was shown in the main "brevia" format was selected for maximum instantaneous visual impact. And they are not the only ones to do this. I'd argue that formats like "brevia" are designed to encourage this sort of thing, and the blame goes to journals like Science. It is annoying, but to compare it with the Hauser fraud is odious.

Sproat's response doesn't improve in the subsequent pages. He distinguishes between his preferred "deductive" way of interpreting data and the "inductive" approach preferred by Rao et al; he complains that they did not clarify this in their original paper (though I would have thought the language was clear enough, that they nowhere claimed to be "deducing" anything, only offering "evidence"); he nitpicks (as I would have expected) with the Bayesian arguments. Overall, for all his combativeness, he is notably vaguer in his assertions than previously. He ends on this petulant note:

I end by noting that Rao et al., in particular, might feel grateful that they were given an opportunity to respond in this forum. My colleagues and I were not so lucky: when we wrote a letter to Science outlining our objections to the original paper, the magazine refused to publish our letter, citing “space limitations”. Fortunately Computational Linguistics is still open for the exchange of critical discussion.

The openness of CL is to be applauded, but I can think of some additional explanations for why Computational Linguistics allowed the response while Science did not. One is that the Science paper by Rao et al. was not a vicious personal attack on another set of researchers, and as such, did not merit a "rejoinder" unless it could be shown that the paper was wrong. Another may have been the quality of Rao et al's response on this occasion (Sproat could, if he liked, offer us a basis for comparison by linking his rejected letter to Science) [update 17/09/10: here].

I don't expect this exchange in a scholarly journal to end the argument, but perhaps the participants can take a break now.

(*) UPDATE 17/09/2010: Rajesh Rao writes:

By the way, the reason that Fortran was included in Fig 1B rather than 1A is quite mundane: a reviewer asked us to compare DNA, proteins, and Fortran, and we included these in a separate plot in the revised version of the paper. Just to prove we didn't have any nefarious designs, I have attached a plot that Nisha Yadav created that includes Fortran in the Science Fig 1A plot. The result is what we expect.

The plot is below (click to enlarge); the blue squares are the Fortran symbols.

Rajesh also remarks that the Bayesian posterior probability estimates -- that I derived from the bigram graph in the Science paper -- can probably be sharpened from the newer block entropy results. However, since Sproat makes it clear that he is only addressing the Science paper and is unwilling to let later work influence his perception, I think it's worth pointing out that the data in the Science paper are already rather convincing.

Tuesday, August 17, 2010

How to distinguish fake coin tosses...

Dilip posted an interesting problem the other day: if you were a professor teaching probability theory, and asked your students to toss a coin 100 times and write down the sequence of heads and tails that they obtained, and some of them cheated and simply made up a sequence of heads and tails, how could you tell?

It is interesting because, generically, very few sequences are truly random; and the human mind is certainly incapable of randomness. But what signs of non-randomness could you look for? The answer that Dilip, apparently, had in mind is that true random sequences will usually contain long "runs" of heads or tails (say, 6 or more heads, or 6 or more tails, in succession in 100 coin tosses). However, individuals generating random sequences will perceive such short "runs" as non-random and correct for them. But this is not a very reliable answer by itself: the probability of a run of 6 or more (heads or tails) in 100 tosses is about 80%, so a truly random run will fail this test about a fifth of a time, while a smart student will probably throw in such runs. I argued that if one combined this with various other tests, one should be able to tell quite reliably. But at the moment I am unsure whether 100 tosses are enough for this. Certainly nobody could say for sure whether a sequence of 5 tosses was generated by a coin or a human. Meanwhile, I strongly doubt that any human could generate a sequence of 1000 random symbols (coin-tosses, numbers, whatever) that would fool the statistical tests. But can one reliably tell which of the following two is "random" and which isn't?

Sequence 1:

tthhhhhtthhhhtthhthhhtthhhhttththhhtthhhhhhtthhhhhhtthhthhhthhhhthhththtttththtthhtthhhhhttthhththtt

Sequence 2:
hthtthhtthtthhhthtthtttththhthhththhhhthhttttththhhthhhthtththhhthttththhthhhthththhhhthhthttththhth

Neither of these was generated by tossing a coin. One was made by me, by pressing "h" or "t" "randomly" on a keyboard (ie, I, a human being aware of the usual "pitfalls", was trying to generate a random sequence, fairly rapidly without thinking much about it). The other was made using the pseudorandom number generator in Python, which is based on the Mersenne twister. I would guess that the Mersenne twister is more random than I am: what I would be interested in knowing, from any experts reading this, is whether one of the above sequences can be demonstrated, statistically, to be so non-random that the chances are very high it was generated by me and not by the program. I am, moreover, interested in the method and not the answer (which you have a 50% chance of getting right randomly). If you confidently identify the Mersenne twister-generated sequence, it is safe to say that the problem is with your test and not with the Mersenne twister.

The "bonus question" that came up in Dilip's blog is, what is the probability of observing a run of 6 or more heads or tails (let's call them 6-runs) in 100 coin tosses?

Kovendhan gave an approximate answer which seemed to work well in practice, but it turns out that he made two errors and a corrected calculation does poorly. The probability of a particular choice of 6 tosses (let's call it a 6-mer) being all heads or all tails is (1/2)5 (it is (1/2)6 for all heads, and the same for all tails). The probability then that it is not all-heads or all-tails is 1-(1/2)5. There are 95 ways to choose 6 successive tosses in 100. The probability of none of these 95 is all-heads or all-tails is ( 1-(1/2)5)95 = about 0.049 approximately. The probability of at least one stretch of 6 identical tosses (all heads or all tails) existing would then seem to be 0.951 -- pretty near certain. The approximation consists of neglecting the fact that adjacent 6-mers here are not independent: eg, if your chain of tosses is HHTHTT, not only does this fail to give a 6-run, but it will also fail to do so on any of the next 3 tosses at least.

Meanwhile, Kovendhan used (1/2)6 instead of (1/2)5 for the individual 6-run, and 94 for the number of 6-mers, which yields 0.772, but he reported 0.781 -- I'm not sure how he got that. My numerical experiments suggested the true number is a little above 0.8, which is close to Kovendhan's fortuitously incorrect calculation of his approximate method, but quite far from what his approximation should really give.

The moral is, be aware of approximations and, if possible, have an estimate of their effect. Kovendhan's approximation is in fact very similar to Linus Pauling's when he calculated the zero-temperature entropy of ice. In the crystal structure of ice, each oxygen atom has four neighbouring oxygen atom, in a locally tetrahedral arrangement. Along each of these O-O bonds is a hydrogen atom, but not centrally located: two H atoms must be closer to the central O atom, and two must be closer to the neighbouring O atoms. Globally, there are many ways to satisfy this; to count the ways, Pauling essentially assumed that the configurations of the local "tetrahedra" could be counted independently. This is like Kovendhan's assumption about 6-mers; unfortunately, while two tetrahedra in ice share at most one corner, two 6-mers in the toss sequence can share up to 5 tosses, which makes the 6-mers much less independent than the tetrahedra in ice.

I attempted an answer which I give below. A commenter observed that a formula exists for the probability of a run of 6 heads, and the same formula gives the probability of a run of 6 tails. However, the probability of six heads or tails is trickier.

My approach (which will be recognisable by physicists, computer scientists and others) was this: supposing we can calculate the probability P(N) that there are no 6-runs in N tosses, how do we calculate P(N+5), the probability that there are no 6-runs in N+5 tosses? If we can do this, we can start from P(5) = 1 (there are no 6-runs in 5 tosses, obviously) and build it up from there.

Naively, a 6-run can be built up at any of the five tosses between N and N+5: for example, if the previous five tosses up until N were all heads, then tossing heads again will give a 6-run. So we must consider all possibilities for both the five tosses from N-4 to N, and the five tosses for N+1 to N+5: ten tosses in total. There are 210 = 1024 possibilities for these 10 tosses, so it looks like a counting problem. The number of possibilities of "successful" runs in these 10 tosses can be enumerated as follows (where "N" stands for "any", and one can replace H with T in all these examples):
HHHHH HNNNN (32 possibilities: 24 for the last 4 tosses, times 2 for replacing H with T)
THHHH HHNNN (16 possibilities)
NTHHH HHHNN (16 possibilities)
NNTHH HHHHN (16 possibilities)
NNNTH HHHHH (16 possibilities)
for a total of 96 possibilities. There are then 1024-96 = 928 cases where there is no run of 6 heads or tails. So the "naive" answer is P(N+5) = P(N)*928/1024. If we want P(100), we can use this to go all the way back to P(5) = 1:
P(100) = P(95)(928/1024) = P(90) (98/1024)2 ...
to get
P(100) = P(5) (928/1024)19 = 0.154 roughly
so the probability of at least one 6-run in 100 tosses is about 0.846.

Unfortunately, this is still not correct, because the 1024 possibilities -- and the 96 possibilities with runs -- are not all equally probable: they are conditional on the premise that there is no 6-run up until the N-5th toss. So, for example, the case "HHHHH TNNNN" should be weighted by the fact that it would be disallowed for half the possible sequences prior to this (the ones that ended with H); the case THHHH HNNNN would disallow far fewer sequences (the ones that end in TTTTT, which is only 1 in 32 sequences).

Therefore, I considered as prior possibilities, the five tosses numbered N-14 to N-10 (that is, the five tosses preceding the current ten-mer). There are the following ten possibilities that should be distinguished:
NNNHT, NNNTH, NNHTT, NNTHH, NHTTT, NTHHH, HTTTT, THHHH, TTTTT, HHHHH
and each of these has a "prior probability" (1/4 for NNNHT, 1/32 for HTTTT, etc), and each disallows a fraction of the 1024 10-mers we are considering, as well as a fraction of the 96 10-mers that contain 6-runs. If you do the calculation separately for each of these 10 prior possibilities, and then weight the average by their prior probabilities, you end up with
P(no 6-run in 100 tosses) = (1481219/1614480)19 = 0.195 roughly,
and P(at least 1 6-run) = 0.805 roughly.

This, as it turns out, is in excellent agreement with what I had previously obtained numerically.

Final question for anyone who has read this far: is this the exact answer? (I posted on Dilip's blog that I think it is, but don't go by that.) It is, however, "good enough" in my opinion, by two measures: the remaining error is small; and the issues have (I think) been adequately illustrated that the answer can be (laboriously) improved, if need be.

Friday, August 06, 2010

Hand over the master keys, or else...

I find it comical that India's security agencies (now joined by several other countries) are demanding the "encryption keys" to BlackBerry devices. Can our government's security experts be ignorant of basic cryptography?

BlackBerry's encryption methods are not new, not novel, not unique, not even unusual. The technology to encrypt e-mail has existed since the early 1990s, and is called OpenPGP (after PGP or Pretty Good Privacy, the first program to implement it). It is usable on pretty much all e-mail systems and is built into Blackberries. There are no "master keys" here: each user has a public key and a private key, and messages can be encrypted with the public key but decrypted only with the private key. (Conversely, messages can be digitally "signed" with the private key and the signature can be verified with the public key). If A wants to send an encrypted message to B, A encrypts it with B's public key -- which A should have a copy of. The public key is meant to be public, and it is common for people to display it on their personal webpages and elsewhere. But B's private key is needed to decrypt it, and only B has (or should have) that key. Wikipedia has a good description of public key cryptography.

As far as I can tell, BlackBerry's "enterprise security" is a somewhat different system to secure communication between BlackBerry's servers and the customer's device, but it too is key-based cryptography (3DES or AES) that requires a private key for each device. RIM, the makers of BlackBerry, say they do not possess copies of customers' private keys, and indeed it would be alarming if they did. They are not being pioneers here (except, perhaps, in bringing it to wide use among their customers): this is standard practice in cryptography.

The government can ban BlackBerries, but it will have to ban e-mail: all email can be encrypted, using a method that dates back to 1991. And in fact it's easier than that: webmail providers such as Google Mail allow the entire session to be encrypted, and it is trivial to do this by clicking a few checkboxes (even my GMail app on my non-BlackBerry phone does this) -- so no agency can snoop without accessing Google's own servers. Perhaps our security agencies will next demand the root password for Google's data servers.

Alternatively, our government can try addressing our real security problems, and their underlying causes.

Friday, July 30, 2010

Yet more thoughts on Apple

It's been several months since we got our Mac Mini [1, 2]. Previously my wife used a Linux laptop. It worked well, except when it didn't, and I had to help out.

The Mac is just the same, except that when it works well, it works beautifully. Steve Jobs values aesthetics above everything else. But when it doesn't work...

So a few days ago she calls me to say the computer is not booting. I go over to look. Not only is she quite correct, but there's no way of telling what the problem is: all Apple gives you is a white screen with an Apple logo and an endlessly-spinning counter.

I go online with my laptop, and find that there are ways to boot differently by holding down various key combinations on boot. First I try "safe mode". It boots, and all seems well; but when I try the regular boot it fails again. And now "safe mode" doesn't work too.

Then I try "verbose" boot. This gives a scrolling screen of boot messages, of the kind familiar to Unix/Linux users. I see some messages about the filesystem but I don't understand them. The boot gets stuck at a point that I can't make sense of.

Then I try "single user". This time, I get a boot prompt that helpfully tells me to "fsck -fy". I do so, and after some churning, it tells me "filesystem cannot be repaired." I think, huh? I have seen serious filesystem errors on linux and unix, which can be repaired only at the cost of losing data: but I have never seen a filesystem that could not be repaired.

Googling gives me the dubious advice that repeatedly trying fsck should fix the problem, but it does not. I try the disk repair tool that comes from Apple's install DVD, but that too refuses to repair the filesystem.

Finally, "backup and reinstall" is the only way to go. I get a USB hard drive, use my unix skills to mount it and format it with the HFS+ filesystem in single-user mode, and back up all my wife's data (only a couple of unimportant files failed to get copied, luckily). And I reformat and reinstall, as any good Windows sysadmin would do.

Thoughts:

• This has never happened to me on linux, which I've been using on my own computers for 10 years now, and on other computers for even longer. A couple of times the filesystem was sufficiently corrupted that some important system files got lost, but all I had to do was copy them over from another machine or reinstall the affected package.

• Linux, like OS X, typically uses a "journalled" filesystem (usually ext3 or ext4 on linux, HFS+ on Mac). This means that, after an "unclean" shutdown, the filesystem need not be thoroughly checked. But even when the shutdown is not unclean, Linux systems are usually set up to check the filesystem automatically once every 30 days, or once every 100 mounts (reboots), or thereabouts. This is just a precaution: hardware and software errors can always count problems. As far as I can tell, Mac is not set up to do this. In fact, as far as we know, the machine was not shut off "uncleanly" at any time recently: what probably happened was that undetected filesystem errors grew until they became unrecoverable. Why does Mac OS X not schedule a periodic filesystem check? Is it because Jobs thinks users will get frustrated at that informationless, spinning progress indicator? If so, why not just tell the user that the filesystem is being checked? I'm sure most users won't mind.

• My wife -- and other non-techie users -- could not have recovered the computer on her own. From all accounts, Apple's customer service is good and very likely they'd have done exactly what I did, but they would have taken a few days rather than a few hours.

• We should have taken backups, and got away very lightly considering we didn't. After this incident, we bought a new USB hard drive and set up Apple's "Time Machine" on it. This, like all Apple software, is slick and shiny; how well it works remains to be seen, or hopefully will not need to be seen for a while.

• I strongly suspect that the "filesystem could not be recovered" message was not the truth, but an example of Apple's control-freakishness. The filesystem could perhaps be recovered only by losing a few files (a common-enough situation). Rather than let the user make that choice, Apple wants you to call customer service at the slightest sign of trouble -- by escalating that trouble, and also by hiding all useful information from the user, making it available only via arcane key combinations at boot time.

So if anyone out there is thinking of buying Apple: it's slick hardware and software, but in times of trouble, it's probably much harder to fix than Windows. And harder than other Unix-like systems, because it hides so much of its Unixness on the grounds of being user-friendly, or something. Still, for many people, the slickness probably makes up for anything else.

Friday, July 23, 2010

Giant steps

Madhav Chari, jazz pianist, performed with an all-Chennai trio -- consisting of himself, Naveen Kumar (bass) and Jeoraj George (drums) yesterday at the Museum Theatre in Chennai. I have written about Madhav before, when he performed with a French rhythm section [1,2] (who also back him on his recent CD, "Parisian thoroughfares"); and had previewed the concert here. Suffice it to say that it lived up to its prior billing. In an e-mailed announcement Madhav had declared it to be "absolutely the very first international standard jazz group from India since the incpetion of jazz in the country in 1927." It was. He said "We play jazz music: thats what we do." That's what they did. Over half of the programme was of Madhav's own compositions, beginning with "Tales of the south" (a reference, he said, both to New Orleans and to Chennai) and ending with "Blues for Havana". In addition they threw in pieces by Charlie Parker, Thelonious Monk, John Coltrane, Cole Porter, and Sherwin/Maschwitz's "A nightingale sang in Berkeley square" (which Madhav played unaccompanied). They nailed all of them. Jeoraj took several drum solos, while Naveen played extended bass solos on Madhav's "Rejoice" and "Blues for Havana".

Madhav repeatedly said that the band is still feeling its way and is not really a mature outfit, which is why they chose not to play Ellington. But if there were flubs, I did not notice. The Parker was taken at breakneck speed, Porter's "Love for Sale" and Madhav's "Tango sentimental" were rhythmically very complex, and the chord changes in Coltrane's "Giant Steps" are a challenge for the best musicians. The band sailed through all of them.

But almost equally entertaining was Madhav's patter before the songs. He declared Chennai the most advanced city for percussion in Asia (previously he had said that though Chennai audiences may not understand jazz, they understand music better than anyone). He has a dim view of what has long passed for "jazz" in this country (perpetrated by people like Louis Banks), and took several potshots at the elites of Mumbai, Kolkata and Delhi; he challenged anyone from those cities to measure themselves against Naveen and Jeoraj; he conceded that the sizeable audience yesterday (well over 400) may be achievable for jazz in Kolkata, but declared that there is no jazz drummer in that city who can keep time, so Chennai is ahead on that count.

Towards the end, he recounted a lady at a recent party asking him why he blew his own trumpet so much, and asked the audience (to resounding cheers): "Well, if I have the greatest jazz band in the history of India, am I supposed to keep quiet about it?"

Indeed, a few years ago I marvelled that there was a jazz pianist in this "conservative" city who was the equal of the best in New York. Now I find that there is an entire world-class jazz piano trio in this city -- but it now seems exciting rather than surprising. My opinion is that Madhav really does not need to blow his own trumpet. His piano, and his new rhythm section, are eloquent enough.

Wednesday, July 21, 2010

Should one pray for Hitch? - continued

Christopher Hitchens' own answer to the question is here, along with much other interesting stuff. In Hitch's words,

Well look, I mean, I think that prayer and holy water, and things like that are all fine. They don’t do any good, but they don’t necessarily do any harm. It’s touching to be thought of in that way. It makes up for those who tell me that I’ve got my just desserts... I have to say there’s some extremely nice people, including people known to you [interviewer Hugh Hewitt], have said that I’m in their prayers, and I can only say that I’m touched by the thought.

Yesterday I received my copy of his new memoirs, Hitch-22. The immediately striking thing is that he has chosen to be photographed smoking a cigarette for its cover. This was before the cancer diagnosis, and he does like to be considered a contrarian, but if he were superstitious I wonder whether he would now think of it as tempting fate. Hitchens is also known for his prodiguous consumption of alcohol (I am surprised that the book cover does not portray him holding a glass of Scotch); and smoking and drinking are both significant risk factors for oesophageal cancer, especially in combination in large quantities.

If I were religious, I'd pray for him. As it is, I (like millions of other strangers) offer him my best wishes: I hope that he recovers fully and, meanwhile and afterwards, suppresses his contrarian urges sufficiently to obey his doctors when they ask him to stop poisoning his body in this way.

As for the material between the covers of his book: I have only read as far as the beginning of the third chapter (on his father). The "prologue with premonitions" is not his most memorable piece of writing, but that is only because his standards are so high. It is, however, sprinkled (as one would expect) with interesting anecdotes and thoughts. His portrait, in the next chapter, of his mother Yvonne -- her life, her death, his relationship with her, and his thoughts on her after she died -- is stunning and harrowing: if the book maintains that sort of intensity, it would be a life-altering experience for any reader, I would think. I have a large and growing pile of books that are only partially read, but despite the considerable bulk of this book, I will not be surprised if I finish it sooner than many other recent purchases.

Sunday, July 18, 2010

Today I read this article in Open magazine, on allegations that Sharad Pawar's daughter, Supriya Sule, is a citizen of Singapore and therefore should have her Indian citizenship revoked. The article unquestioningly quotes Mrunalini Kakade, who lost the election to Sule in 2009.

However, nowhere in the article is there evidence that she is a citizen of Singapore: the phrase used, consistently, is "Permanent Resident" which is a status for non-nationals, short of citizenship (Singapore Government web site, Wikipedia; links produced by a few seconds on google). What Open's rather breathless article says is

According to [Kakade's] petition, Supriya Sule holds 'Singapore citizenship'--Permanent Resident Identification Number S 69726251--in addition to her Indian one. This is against domestic rules that do not permit dual citizenship.

The giveaway, as Mrunalini Kakade tells Open, was Supriya Sule’s disclosure that she owns property in Singapore. Under the law of that country, only a permanent resident of Singapore is allowed to purchase property there...

"Besides, she is also the director of Laguna International Pvt Ltd. In this context, her nationality is shown as a 'Singapore Permanent Resident'... "

So, all the evidence that Kakade has supplied, at least as quoted by Open Magazine, suggests that Sule is a "permanent resident" of Singapore -- not a citizen -- just as thousands of Indian citizens are permanent residents of the United States. There is nothing in India's laws that prohibits citizens from permanent residency of another country.

What should we make of a news magazine that writes a 1300+ word on this issue without addressing this point, or asking Kakade to clarify?

Friday, July 09, 2010

Kashmir

Cross-border terrorism is almost dead. Pakistan is engulfed in its own problems. So why does the Kashmir problem not die too?

Could it be because ordinary people do not like living in a police state? And, when they protest, they do not like being treated as terrorists and fired upon?

The local media is prevented from doing their jobs, and the "national" media (ignorant of Kashmiri, and broadcasting to those who are ignorant of Kashmiri) is free to lie. (Link via Shivam)

We shoot down unarmed protestors. Which incites more protest, and we shoot them down too. (Even unarmed motorcycles are not spared.) We ban the media. We squash civil liberties. And all this is "legalised" by the draconian Armed Forces Special Powers Act (which was originally framed for the north-east, and extended to Kashmir in 1990). Our "law" allows the army to fire on protestors, invade people's homes, search them, take people away without warrant, and be immune from prosecution for all this. That's the law that has ruled the north-east for over 50 years, and Jammu and Kashmir for 20 years.

Now, why do we call ourselves a democracy? Why do we pretend that we have a free press? And why do we expect the people of those states to be grateful for these things?

Tuesday, July 06, 2010

Should one pray for Hitch? And should he know?

The question is engaging the religious. Christopher Hitchens has been diagnosed with cancer. Given his well-known atheism, should a religious-minded well-wisher pray for him?

On the religious side, Rabbi David Wolpe, who has debated Hitchens frequently on religion, puts it very well (as quoted on Goldblog) in my opinion: "I would say it is appropriate and even mandatory to do what one can for another who is sick; and if you believe that praying helps, to pray. It is in any case an expression of one's deep hopes. So yes, I will pray for him, but I will not insult him by asking or implying that he should be grateful for my prayers."

I wish all religious leaders were so open-minded: too often, religious impositions are accompanied by the implication that one should be grateful for the favour, or the threat that one is condemned if one is not grateful.

A scientist on the Dish goes a bit further in asserting that one should not even inform Hitchens (let alone demand his gratitude) that one is praying for him: to do so would be "malicious". In support, he links this randomized trial on the effect of prayer on patients who had undergone coronary artery bypass graft surgery. The study showed that, on patients who did not know whether or not they were being prayed for, prayer had no effect; but patients who knew with certainty that they were being prayed for did significantly worse (exhibited more complications within 30 days of the procedure).

So there you have it. Pray if you like, but don't tell.

(Actually, I'd be surprised if those results were replicable with other ailments: the only explanation that I can think of is that patients who know they are being prayed for believe that their prognosis is particularly poor, and therefore are under more stress -- which is particularly relevant here since they are heart patients. In particular, patients were told, via messages in envelopes, either that they "may or may not be prayed for" or that they "will be prayed for". Perhaps the latter statement was truly frightening to a lot of the patients. I'm unconvinced that the study was ethical: at the minimum, they could have chosen a different ailment, on which stress would not have such a direct and obvious effect.)

Monday, June 28, 2010

Jazz in Chennai

Madhav Chari, who is in my opinion easily India's best-ever jazz musician and pretty much the only one of international standards that I have heard, has put together a new trio. The other two members are Naveen Kumar (electric bass) and Jeoraj George (drums).

Though his colleagues in the trio are little-known, Madhav says this trio is of "international standard". And knowing him, if he says that, I believe him. He does not give compliments readily, but he has praised these members to me in the past.

They play a concert on July 22 at the Museum Theatre, and Madhav conducts three workshops in the preceding weeks. The details are below.

WORKSHOPS:

(1) JAZZ AND WESTERN CLASSICAL MUSIC
JUL 3 SATURDAY 5.00 - 6.30 PM
VENUE: MUSEE MUSICALS

The main emphasis will be on the European roots of jazz music, western classical harmony and its development, jazz harmony-melody-rhythm configurations.

(2) JAZZ, ROCK, GOSPEL AND BLUES MUSIC
JUL 10 SATURDAY 5.00 - 6.30 PM
VENUE: MUSEE MUSICALS

Emphasis on the African cultural roots of jazz music, blues music and gospel music as twin music forms: one sacred the other "profane", blues as the basis for jazz and rock music. Swingin' the blues.

(3) JAZZ, HINDUSTANI AND CARNATIC MUSIC
JUL 17 SATURDAY 5.00 - 6.30 PM
VENUE: MUSEE MUSICAL
THIS WORKSHOP IS A CO-PARTNERSHIP WITH SRUTI MAGAZINE

The art of improvisation: what is necessary: similarities in process between carnatic/hindustani and jazz music. Actual differences between the music forms. Differences in cultural configurations between the west and India (jazz is essentially a western musical idiom even if part of its roots lie in Africa). Fusion and Con-Fusion: Pitfalls in thinking that is endemic to jazz and carnatic collaborations.

CONCERT:

(4) MADHAV CHARI PERFORMANCE WITH A JAZZ TRIO
JUL 22 THURSDAY 7.00 PM
VENUE: MUSEUM THEATER
PRESENTED BY MUSEE MUSICALS

Please be seated at the venue no later than 6.45 PM.

In what sense are we a "socialist" republic?

In 1976, Indira Gandhi amended the Preamble of the Indian Constitution to insert the words "socialist" and "secular" in the description of the Indian republic. It is not clear to me what she meant by "socialist", but 34 years later, we still don't have a social security system or any kind of safety net for the vast majority of our people. We have a "public distribution system" for essential commodities, that is decrepit and corrupt but is pretty much the only resource for the poor. Our healthcare and education are terrible. We know that Mrs Gandhi, like her father, admired the Soviets, but in what respect, other than autocracy and midnight arrests, did she attempt to emulate them? (Mrs Gandhi made this amendment at the height of the Emergency. She did not choose to remove the word "democratic" from the Preamble, presumably because the Soviet bloc had its own definition of that word, as in "German Democratic Republic" -- the former East Germany. Why change words when you can merely change their meanings?)

The motivation for the above reflections was the recent decontrol of petrol prices. Now, subsidising petrol is the sort of broad-based subsidy that makes no sense to me: it benefits the rich as much as, or more than, the poor. I am all for removing such subsidies. I think we should also be charged more realistic amounts for water, electricity, and other things that we take for granted. I seldom pay more than Rs 5 for parking my car, and usually I pay nothing: our cities could earn huge revenue by just charging parking fees that bear a closer relation to the price of real estate. There is no possible argument for subsidising car owners to this extent.

But the question is, what will we get in return for removing the subsidies? Can the poor be assured of affordable food, good healthcare and education? The government has passed the "Right to Education" act but there is no clarity on how it is to be implemented, and I am worried that the only effect of the act will be to hamstring the existing private schools without providing any alternative. There seems to be zero movement, and indeed zero interest, on any of the other things that an allegedly "socialist" government should be providing to its needy people.

Balancing the budget is all very well, but there is surely no short-term rush for that: if we manage to lift 300 million people out of poverty in the next generation, the government's tax revenues will shoot up too. As George W Bush said, we need to make the pie taller. Besides, there are enough wasteful government schemes that we can trim without hurting millions of people in the process. But I do not for a moment believe that the poor will become magically prosperous via GDP growth alone. Thanks to India's spectacular recent growth, the urban middle class earns ten to fifty times as much as it used to a generation ago; but we remain every bit as stingy in paying servants and workers, haggling for the last rupee. That's not going to change.

Meanwhile, without an education, the poor simply face no better prospects than unskilled labour -- whether in farming, industry, construction or homes -- and no means of fighting exploitation.

So while I am, in theory, happy to pay more for my petrol, I want to know what the government plans to do with my money, other than cut the deficit. Indira Gandhi made "Roti, kapda, makaan" a slogan: a generation or two later, a huge number of Indians lack even those essentials of life. Healthcare and education? Perhaps a century or two from now.

Monday, June 21, 2010

Statement from David Davidar's lawyer

Nilanjana (among others) posts a statement from David Davidar's lawyer. Below is a comment I just left tried to leave on her site.

This gets bizarre. First, I can understand his lawyer vetting his statement, but why on earth is his lawyer speaking for him? Is it so that he can have a chance of denying it later?

Second, what is one supposed to make of this statement: "Mr. Davidar accepted the situation [that she did not want a secret romance], and their flirtatious relationship continued"? Surely that was a clear signal to him to back off.

And this one: "Mr. Davidar engaged in flirtatious banter with [Samantha Francis] for a short period of time. He did not engage in any conduct toward Ms. Francis that he knew or should have known was unwelcome." So he should not have known that flirtatious banter with a subordinate may be unwelcome?

As for the Frankfurt incident: he says she did not resist, she says she did. She goes into graphic details of how she resisted (climbing onto a window sill, pleading with him, curling into a foetal position, etc). Why does he not come out and say that all those specific statements were lies? What he says is "However, contrary to Ms Rundle’s claim, Mr. Davidar did not bully his way into her room, nor did he force himself upon her. Ms Rundle did not object when they kissed." It is possible he entered the room before she asked him to leave. He does not deny that she asked him to leave, or climbed on the window sill. If she "did not object" when they kissed, perhaps she had given up. In fact, the "foetal position" can be interpreted as not resisting.

It was an unequal relationship and he should have respected that. If she was not always negative -- if she sometimes even seemed to encourage him -- perhaps this widely-circulated anonymous blogpost may explain why.(*)

How he squares it with his wife is between him and his wife -- it is nobody else's business. I don't see why that should enter into his lawyer's statement, either. If he chooses to make a public statement on his wife, surely he can make the statement himself.

(*)Key quote from that post:
I flirted back, when he'd flirt, and I'm ashamed. But I blame him. I blame the way he manipulated us into thinking it was all part of the job, the "culture" of the office...

PS (21/06/10, 22:17): The other striking thing about that statement is the ratio of its length to its content. Huge stretches of it consist merely of "she invited him to tennis", "they had dinner together", "she asked him for a ride", "she sent him good wishes", and variations thereof. In Davidar's and his lawyer's minds, presumably, all this paints a damning portrait. If I had assumed that every woman who invited me to dinner or to a concert had been trying to flirt with me, maybe my life would have been as colourful as Davidar's. Such an attitude must make platonic friendship between the sexes completely impossible (and yes, some do argue that it is impossible).

Thursday, June 17, 2010

Spot the difference

What's the difference between this image from 2001 (here's a relevant article)

and this one from today's Hindu?

Answer: the position of the Indian soldiers.

Question: The 2001 photo caused tremendous outrage in India. Will the 2010 photo create a similar storm of protest?

I'm not hopeful.

Wednesday, June 16, 2010

The Davidar case

As everyone knows by now, David Davidar, publishing icon, faces claims of sexual harassment from two women from his time as president of Penguin Canada. Davidar previously worked in Penguin India and several women in the Indian publishing industry have declared their disbelief. For example, four women are quoted here, as follows: "David Davidar is a deeply loved and respected figure in publishing. Naturally, his many friends continue to believe in him, and always will"; "He is one of the most decent persons I know. I refuse to believe these allegations"; "This is the last thing anyone would expect to be levelled against David"; "I find it very difficult to believe these allegations could be true".

Why is that relevant? According to one of his defenders (who, however, acknowledges the gravity of his accuser's charges, the trauma she must be going through, and the necessity of justice if the charges are true): "I know character is no defence, but sometimes a man's character does count." But men (and women) display different characters to different people. If Davidar is guilty of harassing two women (and we should not be judging him based on media reports), the fact that he did not harass several other women is of no importance.

Besides, is that really his character? According to the late Dom Moraes, writing back in 2002, Davidar "drank a lot and liked to fall in love." Moraes relates an illustrative story, which does not sound like harassment, but does not induce much respect either.

I saw the Moraes link on Ashok Banker's blog. Banker refers to Davidar's "dark side", which he saw "quite frequently -- and believe me when I say, I’m not revealing all that I saw because some of it is darker than even I want to talk about publicly." Banker, in an earlier, now deleted post (still cached in google as I write, but I won't link) relates a much more salacious story, which is still not a clear case of harassment but does make one wonder. [Update 17/06/10: Banker has restored that post, with reader comments. He says he took it down because his server couldn't handle the load. The reader comments are interesting: see below.]

Neither Moraes' nor Banker's assessments of Davidar's character are of any more relevance, however, than that of Davidar's numerous defenders. What matters is what he did in Canada. The truth could be that he was an inveterate womaniser who, however, never stepped over the line, and these particular charges are false. It could also be that he was the perfect gentleman in all dealings with women, except in these two cases, where the charges are true. Or it could be anything in between, or anything beyond. We simply don't know, and while it is fun to speculate, it is not very productive to do so.

The sociology of jumping to a man's defence on the grounds of, essentially, "but he never assaulted me" does puzzle me, however. We saw a lot of that in the Anand Jon case, too. Meanwhile, Banker's own posts sound like "kicking a man while he's down", and -- other than the Moraes link, which was interesting because it was unbiased by current events -- rather unsubstantial. And the same can be said of my post here. And of course I'm not alone. In today's world, we all enjoy speculating on celebrity news, and speculating on others' speculations, and so on ad infinitum. But I do agree with Banker that the entirely unbalanced initial reactions from the Indian publishing industry deserved some counterpoint. So, which is better: restraint from all sides, or unrestrained speculation from all sides? The result is the same: nobody is any wiser. Let the case take its course through the Canadian legal system.

UPDATE 17/06/10: As already noted above, it is the comments by Davidar's friends that intrigue me, and the ones on Banker's blog are no different. Yes, Davidar has close friends, who never saw anything in him that would suggest he would be capable of such a thing. Yes, they hope that he can clear his name. But why write hundreds of words that have no bearing on this case, referring to their personal experiences with him as "another side to the story" even though it has nothing whatever to do with the story? I understand feeling the need to speak up when your close friend is accused of unsavoury things, but why not simply say something like: "I know David well and respect him, and would not think him capable of such conduct; I hope he can clear his name, but I recognise the seriousness of these charges and, if proved, want justice to be done" -- and then leave it at that?

Also worth reading: "What it feels like for a girl" -- an anonymous blogger's experiences in the Canadian publishing industry.

Friday, June 11, 2010

Friends that the gay community doesn't need

What the gay/lesbian community needs, as Andrew Sullivan (among others) points out, is friends in the mainstream. In 1992, only 42% of Americans personally knew someone who was gay or lesbian. Today, 77% do, and they also see that their gay/lesbian friends are completely normal, honest, straightforward people. That in itself accounts for the change in attitudes towards gays in the US (and, earlier, in Europe).

What the community does not need is a self-appointed activist who writes in a national newspaper that "homosexuality may sometimes have a lot to do with paedophilia, and, further, that if it is based on mutual consent, it is no big deal."

I certainly would not want a man who believed this teaching undergraduates. From Abi's blog, I see that this man, Ashley Tellis, has been sacked from his teaching position at IIT Hyderabad. Below is the comment I posted on Abi's blog:

I don't know what went on at IIT, but I agree with chitta. Please read that article by Tellis before making up your mind. This is not about gay rights. It is about paedophilia. When gay rights activists, all over the world, are struggling to remove conservative conceptions that gays are sexual perverts, Mr Tellis says "homosexuality may sometimes have a lot to do with paedophilia, and, further, that if it is based on mutual consent, it is no big deal."

Elsewhere he glories in his own paedophile activities with a Nepali boy: the article used to be here but seems to be gone now.

A man who thinks paedophilia is "no big deal" should not be teaching undergraduates: I wouldn't want my son in his class. A man who has admitted to paedophilia should be in jail. [Update 12/06/10: The article in question is here and he did not quite admit to paedophilia: he leaves it a little ambiguous. See comment 7 below.] And portraying this as a case of victimisation of gays does no service to the gay rights cause, and indeed, could do a great deal of damage by reinforcing negative (and, in the vast majority of cases, false) public stereotypes of gays.

To add to that: in the case of minors, "consent" makes no difference, for a variety of reasons, only a few of which he touches on (dismissively). But this is not, in my opinion, a topic worth arguing about. Paedophila is off-limits. Conflating paedophilia with gay rights is the very last thing that gay activists need at the moment. (Besides, as Tellis himself points out, most paedophiles are heterosexuals: so why make that conflation at all?)

Monday, June 07, 2010

If you haven't heard of it, I'm sure you're not alone. I first heard of it in a magazine article a couple of weeks ago, and today I read this article on rediff.

The situation is that the main (essentially, only) highway into Manipur has been blocked by Naga rebels for over 50 days now. As a result, the Manipuris are short of petrol, medicines, and other essential supplies.

I can't imagine even a five-day blockade occurred in a "mainland" India state: the government would intervene, by force if need be. But a 50 day blockade of Manipur does not even register on the national consciousness.

Sunday, June 06, 2010

Universities and cities

Via Abi, here's a recent (well, nearly a month old) article by Sanjeev Sanyal arguing for better integration of universities with urban communities in India. Sanyal's argument is that walling off the campus (as the IITs and IIMs do) causes them to have no impact on their surroundings: to benefit the city, the university system must be integrated into it.

I couldn't agree more. But from a purely selfish point of view, Sanyal's other point -- that it is unfair and unrealistic to expect entire families to live in a remote walled-off location, and unproductive to supply schools, medical facilities, etc at that location simply because the city is too far away -- is equally important.

I just spent two days in Cologne. The university is in a pleasant campus-like space with academic buildings separated by green parks; but the "city" is a couple of minutes walk away, the hotel where I stayed was a five or ten minute walk away, and the main railway station was a 20 minute walk from the hotel (I timed it this morning). There is an extensive tram and underground system but I simply didn't need to use it (but my hosts and I used it once, under time pressure).

Previously, I spent my postdoc time in Paris and New York, and it was a hugely positive experience to be living in the middle of the city and not in a walled-off community. The academic part of my university was indeed walled-off, but New Yorkers will know the special atmosphere that the unwalled New York University contributes to its neighbourhood, the Greenwich Village.

And I grew up in Delhi University, which has lots of small walls but no all-encompassing wall; I think the student and faculty community had a positive influence on the area. Certainly it is one of my favourite parts of Delhi (perhaps the only part of the city that I like).

But the mania for walls is not confined to academic campuses in India. The papers are full of new housing developments that are located an insane distance from the city, but come equipped with school, hospital, clubhouse, and whatnot. Of course, those who can afford these will also have air-conditioned chauffeur-driven cars to transport them. But what is the ecological impact of all this?

Why are our Indian cities often somewhat unpleasant to walk around in? My theory is that the common Indian mindset of separating "shopping" from "residential" areas contributes to it. The newspapers in Chennai are full of complaints from residents in "residential" localities (like Besant Nagar) that shops are infringing their space and causing crowds and noise. But what they don't see is that the commercial activity also contributes to safety. In Cologne (at least in the city centre) you feel safe walking on the road at midnight because there are people around. In Indian residential areas you often don't feel safe after dark. Meanwhile, the "commercial" areas are overcrowded, noisy and dirty, and navigating them becomes an unusually unpleasant obstacle course. A better mixture of commercial and residential activity would, I feel, be beneficial all around: shoppers can avoid the madness of T.Nagar, residents can feel a little more secure (at the expense of putting up with a little more noise). Of course, in addition the usual urban requirements like clean sidewalks, cleanliness, sanitation, are necessary.

Gated communities, academic and otherwise, are an escape from the urban chaos, but I think they are based on a false premise -- that such isolation is desirable. It is not, either for the residents or for the rest of the community.

Thursday, June 03, 2010

Terrorist weapons

(Hat tip: Sunil)

It seems the Israel Defence Force found deadly weapons on the Mavi Marmara, which was attacked by IDF commandos resulting in the deaths of many activists.

The IDF's photographs prove beyond doubt that the ship, which according to Wikipedia has a capacity of 1080 passengers, harboured a handful of knives. (I count perhaps 20 or 30, most of which look like kitchen knives, pocket knives or Swiss army knives). It also carried various kinds of plumbing equipment: a prominent wrench, a few spanners, hammers, screwdriver. And there is a stack of CD-ROMs.

The pictures are captioned: "Pictures of the weapons found on the Mavi Marmara ship where today, when IDF soldiers attempted to board the ship and redirect it to the Ashdod Port, the activists on board lynched the soldiers in a planned attack..."

And there you have it. Beware of carrying kitchen or plumbing equipment if you sail your own boat in international waters near Israel. Remember, if Israeli soldiers board your boat and you resist, it means you planned the attack on them, and they have the right to shoot you.

You can't make this stuff up.

Saturday, May 29, 2010

Systemic bias against emerging markets?

Some years ago, at a lunch discussion in Paris, an Israeli-American asked me what were the chances of a military takeover in India.

"Zero", I answered. I was met with incredulity and, I think, sniggers.

"You can't be so sure! How can you be sure?"

I tried to put it another way. I said something like this: "It is less than the chance of a military takeover in France, or Israel, or the United States -- simply because the army has almost no power in India, is entirely under the control of the political establishment, and it has no influence on Indian politics. Unlike in many developed countries."

More incredulity and audible chuckles, and the subject changed.

I got reminded of that exchange today in reading T N Ninan's takedown of the credit ratings issued by Standard and Poor (S&P) to several countries. (Go and read it now, before reading on.)

Ninan points out that while India has a BBB- credit rating, Spain, Portugal and Italy had an A+ or better rating just two months ago, and so did Greece a year ago. Yet none of the economic indicators -- budget deficit, unemployment rate, public debt, GDP growth rate -- suggest that India should be more risky than these countries. "As recently as in March, S&P was 'affirming' Greece's BBB+ status (which, please note, was better than India's)." China's rating, while better than India's, was also till recently lower than these European countries'.

How can the world's premier rating agency get it so wrong? "The rating agencies argue that emerging markets have a higher political risk. Well, tell that to the Greeks, who are rioting in the streets of Athens!"

Ninan suspects "systemic bias against emerging markets" but I think there is another explanation: Euro-zone countries, like AIG, Citigroup and other Wall Street giants, are "too big to fail". If they screw up their economics, they will be bailed out (it's already happening) because the alternative is the disintegration of the euro as a currency, which, apart from the purely economic consequences, would be a blow to European pride too awful to contemplate.

In India we have no such illusions: we are bigger than all the PIGS put together, but not too big to fail. (This may not apply to China, but it is difficult to see who'd have deep enough pockets to bail out China, were it to become necessary). But in a way that is good news for us: there is nobody to bail us out, so we have an incentive to keep our systems functioning.

Tuesday, May 25, 2010

RIP, Martin Gardner

The man who "turned thousands of children into mathematicians, and thousands of mathematicians into children" is no more. James Randi's post here. NYT obit here.

I posted my thoughts on Gardner just a few months ago, when he turned 95.

Wednesday, May 05, 2010

I am currently reading E T Jaynes' "Probability Theory: The Logic of Science", his posthumous textbook published in 2003. Jaynes was a lifelong promoter of Bayesian methods in probability and statistics, the inventor of the "maximum entropy" method of assigning priors, and, for much of his career, at loggerheads with "orthodox" (or "frequentist") statisticians, who dismissed Bayesian ideas of "prior" and "posterior" probabilities except where these could be rigorously justified as limits of large numbers of trials. Jaynes, drawing on previous work of Cox, Polya, Jeffries and others (including himself), argues that probability theory is the unique generalisation of Boolean logic to statements that have varying degrees of plausibility. Specifically, given three reasonable-sounding "desiderata", he shows that the rules of probability theory follow uniquely, with no reference to trials and sample spaces and the usual language. His point, hammered again and again throughout the book, is that prior information is essential and must not be thrown away: "If we humans threw away what we knew yesterday in reasoning about our problems today, we would be below the level of wild animals." Meanwhile, he condemns much orthodox statistics as "ad-hockery", and even when valid, of extremely limited applicability.

The book is full of interesting nuggets, historical insights and examples of misleading statistics. I just came across the following striking example.

According to Jaynes, the data in this example are real but the circumstances have been simplified. In experiment A, patients were given one of two treatments, an old one and a new one, and the number of "failures" (deaths) and "successes" (recoveries) were compared. The results were:

Experiment A
Old: 16519 failures, 4343 successes
(success rate 20.8 +/- 0.28 %)
New: 742 failures, 122 successes
(success rate 14.1 +/- 1.10 %)

Experiment B was the same experiment conducted two years later. The results were

Experiment B
Old: 3876 failures, 14488 successes
(success rate 78.9 +/- 0.30 %)
New: 1233 failures, 3907 successes
(success rate 76.0 +/- 0.60 %)

The results were "discouraging": the new treatment, in both experiments, showed a lower success rate.

Says Jaynes: "But then one of them had a brilliant idea: let us pool the data, simply adding up" the totals over experiments A and B for each method. This "pooled data" yields the results:

Pooled data
Old: 20395 failures, 18831 successes
(success rate 48.0 +/- 0.25 %)
New: 1975 failures, 4029 successes
(success rate 67.1 +/- 0.61 %)

And, lo and behold, the "pooled data" show the new method performing strikingly better. Says Jaynes, "they eagerly publish this gratifying conclusion, presenting only the pooled data; and become (for a short time) famous as great discoverers."

How is it that pooling the data changes the results? The point is that, when pooling in this manner, certain essential facts are being hidden: both methods performed much better in Experiment B; and experiment B contained many more instances of the new method, with somewhat fewer instances of the old method.

Here is another example of dodgy statistics that I came across a while ago: this one is particularly distressing because it was a review, meant to settle a long-running argument.

Peter Duesberg believes that AIDS is not caused by the HIV virus, but by drug overuse (in the original San Francisco bay area outbreaks), malnutrition (in Africa), and the antiretroviral drugs themselves (in the HIV+ patients being treated). Today this is seen as a crackpot view, but back in the 1980s it was at least worthy of consideration. By 1994, mainstream HIV researchers were beginning to get fed up of his arguments.

One of Duesberg's arguments was that AIDS-like symptoms were induced by antiretroviral drugs like AZT (the first antiretroviral approved for use). An example of how he and the mainstream researchers could interpret statistical data in opposite ways is found in a review by Jon Cohen, "Reviewing the data - IV: Could Drugs, Rather Than a Virus, Be the Cause of AIDS?" One of the things at issue is how to interpret data from the "Concorde study", which tracked 877 individuals who were treated with AZT soon after entering the study (the "Imm" group), and 872 individuals who were given deferred treatment with AZT or not given AZT at all (the "Def" group). At the end of the three-year study, 96 deaths occurred in the "Imm" group, and 76 in the "Def" group. Duesberg is quoted as saying, in a written response to Science magazine: "The Concorde data exactly prove my point: The mortality of the AZT-treated HIV-positives was 25% higher than that of the placebo group."

But "25% higher" is a meaningless number. If four deaths occurred in the Def group and five in the Imm group, that would be an increase of 25% but nobody would consider that significant. If there were 400 deaths in the Def group and 500 deaths in the Imm group, most people's gut reaction would be that this is a significant increase. How to assess the significance in this case?

First, Cohen quotes experts who note that 22 of these deaths occurred from causes unrelated to AZT or AIDS, such as traffic accidents and suicides. Subtracting those leads to 81 Imm deaths and 69 Def deaths -- a 17% increase, but how significant is that?

Enter the "experts", and I quote:

In addition, say the critics, there is a deeper flaw in Duesberg's analysis: He does not take account of the total number of people in the Imm and Def groups. His reasoning for ignoring the denominator is, as he told Science in an interview, that "it was the same in the two groups." But National Institute of Allergy and Infectious Diseases Director Anthony Fauci says this type of analysis means "ignoring an important part of a calculation." Specifically, there were 96 total deaths out of 877 in the Imm group, implying that 10.9% of the people who were immediately treated with AZT died. In the deferred treatment group, there were 76 deaths among 872 people, or 8.7%.

The appropriate conclusion, say the authors of the Concorde study, is that the difference in mortality between Imm and Def groups is not 25% but 10.9% minus 8.7% -- or 2.2%. Subtracting the deaths from causes unrelated to AZT or AIDS, the difference drops to 1.3%. As the Concorde paper notes, neither difference (2.2% or 1.3%) is statistically significant.

So, apparently, the answer to bad statistics is atrocious statistics. (No wonder AIDS deniers are still around today.) What these people seem to be saying is that the corrected difference is 1.3% of the total population and is not statistically significant (why they assert this is unclear). If one person died in the Def group, and thirteen died in the Imm group, that difference would be the same 1.3%: would it still be statistically insignificant?

Actually, using some simple assumptions one can quickly check how significant these numbers really are. Suppose a patient in the Def group has a fixed probability p of dying in the duration of the experiment. (Of course, not all patients are equally fit, but without knowing other prior information, this is the best we can do.) Given the data (uncorrected, for now, for "other" deaths), our best assumption is p = 76/872 = 0.087. The distribution is a binomial distribution, a bell-shaped curve when the numbers are large: for a population size of N=872 (Def group), its mean is 76 and its standard deviation is the square root of Np(1-p), or about 8.3. For the Imm group, the numbers are nearly unchanged. 96 is more than two standard deviations away from 76, so it would seem that Duesberg was right in pronouncing it significant: there is only a 2% probability that one would see such numbers in the absence of any effect from AZT.

But we can improve on this calculation. We assumed that p was equal to its best estimate, but of course any value of p, other than zero or one, could in theory produce these data. What we need to ask is: given that 76 deaths were seen in the Def group, what is the distribution of expected deaths in the Imm group if AZT had no effect, and where does the number 96 lie on that distribution? I won't get into the details here, but if we assume that we have no a priori expectation on the probability p that a person from Def would die, then the distribution of p is proportional to the likelihood of seeing 76 deaths given p. More generally, if there are N individuals in the population and one observes k deaths, the distribution of p is proportional to the probability of seeing k deaths given p; that is, it is proportional to $\left( N \atop k \right) p^k (1-p)^{N-k}$. The normalisation factor is obtained by integrating from 0 to 1. The probability of seeing $K$ deaths in the Imm trial, if AZT had no effect, is $\left( N \atop K \right) p^K (1-p)^{N-K}$, averaged over all values of $p$ with the preceding probability distribution for $p$. If we do the math, we get the following distribution for $K$:
$P(K) = (N+1) \left(N \atop K \right) \left(N \atop k \right) \frac{(k+K)! (2N-k-K)!} {(2N+1)!}$
(We have for simplicity assumed the total number of patients to be the same in both groups, since the actual difference is small.)

If we plot this as a function of $K$, we get a bell-shaped curve as follows:

The red line is the number 96 that were observed in the Imm group: it lies well within the "bell", and clearly it is not a significant difference.

What if we correct the numbers? There were evidently 15 unrelated deaths in the Imm group and 7 in the Def group; and 81 relevant deaths out of 862 in Imm, 69 out of 865 in Def. Taking N = 865 and k = 69, the plot is as follows:

The red line marks the observed number 81 in the Imm group, and statistically it is even less significant than earlier.

The statistics I have used dates to the 19th century. What I find worrisome is that, in 1994, the scientific world was doing their best to shut Duesberg up, and marshalled their best statistics and published them in one of the most prestigious journals (Science) -- and this was the best they could do? The quoted extract above, claiming that 1.3% of 877 is "not statistically significant", is so horrifying to me that I have to wonder: what else in the biomedical literature has been "proved" with the effect of such statistics? Just to illustrate the point, here is the hypothetical case where 3 people died in the control group, and 15 died in the Imm group, out of a total of 872:

Clearly, in this case, 1.3% is statistically significant.

The point here is that statistics is not a trivial task. According to Jaynes, the large majority of "orthodox" 20th-century statisticians got things very wrong. But even within orthodox statistics are applicable, it is not a task to be done mechanically or unthinkingly. It is not fair to expect a biological, medical or clinical researcher to be an expert in this field. Biomedical journals routinely ask reviewers whether expert statistical reviews of manuscripts are necessary. Despite that, I wonder how much bad statistics slips through, and how much damage it causes.

Physicists usually do not undergo serious courses in statistics in their education, and don't commonly use orthodox statistical tests. Jaynes observes in his book that this is a good thing: the gut instinct of a physicist is often a better measure of significance than the "ad-hockery" of orthodox statistics. His solution is to start, in all instances, with the basic laws of probability theory and approach hypothesis testing as a Bayesian problem. This is not usually an easy task, but it is necessary.

Friday, April 02, 2010

Congratulations, Simon Singh

On an important legal victory. Sued for libel by the British Chiropractic Association, he earned a ruling that his speech was "fair comment", and the judges added that the court was not the place to settle scientific questions.

He faces further court action but this is a significant ruling. However, as he says, "It is extraordinary this action has cost £200,000 to establish the meaning of a few words." The fight to change British libel laws continues.

Wednesday, March 31, 2010

Obama and the art of compromise

Drill, baby, drill. Along "vast" stretches of the Atlantic coastline of the US.

The proposal — a compromise that will please oil companies and domestic drilling advocates but anger some residents of affected states and many environmental organizations — would end a longstanding moratorium on oil exploration along the East Coast from the northern tip of Delaware to the central coast of Florida, covering 167 million acres of ocean.

That's how the New York Times defines compromise. Please the oil companies and the politicians in their pockets, anger the residents and environmentalists. Can't please 'em all!

Friday, March 26, 2010

Your company introduces a car priced at Rs 1 lakh (US\$ 2500) and becomes internationally famous. A few months after production starts, a customer's brand-new car goes up in flames on the way home from the dealer. The family is too traumatised to consider buying another car. How do you react?

"We regret the inconvenience" -- hm, seems a bit inadequate.

"I fell off a bicycle when I was a kid, but later rode it, overcoming my fear" -- not an improvement.

I'm waiting to see where it goes from here. For me, the good name of the Tatas -- and it was a very good name at one time -- has been permanently tarnished by Tata Indicom (I'm a former customer) and Tata Motors (I'm not a customer but never seem to hear good things about their vehicles); and especially by their antics in Singur and Ratan Tata's subsequent cosying up to the butcher of Gujarat.

Thursday, March 18, 2010

Odds are, it's wrong

That's the title of a very interesting article at Science News, arguing that bad statistics is the dirty secret of science. I believe it is.

Though there is an entire field of physics called "statistical mechanics", the statistics there don't go beyond the 19th century. To this day, physics undergraduate and graduate programmes cover statistics minimally, or not at all. Perhaps it seems unimportant to theorists, but it is crucially important in testing hypotheses, which is what experiments claim to do. Or perhaps hypotheses in physics are sufficiently clear-cut, and experimental data sufficiently clean, that sophisticated hypothesis testing is not necessary.

In other fields, hypotheses are murky and plentiful, data are noisy and ambiguous, but the practitioners are still ignorant of statistics. When the field is medicine, and the question is of new drugs or therapies, it is a crucial matter.

Famously, Sir Roy Meadow -- creator of the discredited "Munchhausen syndrome by proxy" hypothesis -- sent several mothers to jail with his expert evidence based on bogus statistics. The consequences of bad statistics may not always be equally bad, but if the medical literature is as riddled with them as recent articles suggest, the cumulative effect may be worse than anything Meadow did.

But bad statistics in the medical literature is just the starting point: there are problems throughout the practice of standard medicine. This is why, though I respect mainstream medicine and regard much "alternative medicine" as fraudulent and the rest as of very limited (and unvalidated) applicability, I was sufficiently annoyed by this post by Orac to leave this comment. (See also other comments there on dubious practices in the health industry.) I think Orac does, in theory, a great service by pointing out peddlers of pseudoscience and exposing their ignorance and, often, fraudulence. In practice, he preaches to the converted and, I suspect, antagonises nearly everyone else.

Wednesday, March 17, 2010

Airtel wants me to chat with young boys and girls

I saw this message after sending a text from my Airtel mobile. Words fail me.

In case the text is unreadable, it says: "SMS cost 0.30 INR Bal 452.20 INR. Mobile Chat! Call 543216 & chat with young boys & girls from Bangalore, Mumbai, Delhi, and Chennai." It continues, if you scroll down, "Charges at Rs 2/min." The blue dot is the light of my computer's webcam -- the only available device I had to capture the message before it vanished.

Maybe I should give Airtel the benefit of the doubt and assume they're not promoting paedophilia -- but it sounds plenty sleazy regardless. But, just in case they are promoting paedophilia, I'm alerting some activists, as well as putting up this blog post.

Tuesday, March 16, 2010

A new way to detect magnetic monopoles!

I have previously posted on monopoles and on homoeopathy, so it is only fitting that I post this gem of a scholarly paper. (Seen on Orac)

From the abstract:

In previous articles by this author and his colleagues in the Journal of Alternative and Complementary Medicine, it has been shown that physical reality consists of two uniquely different categories of substance, one being electric charge–based while the other appears to be magnetic charge–based. Normally, only the electric atom/molecule type of substance is accessible by our traditional measurement instruments. We label this condition as the uncoupled state of physical reality that is our long-studied, electric atom/molecule level of nature. The second level of physical reality is invisible to traditional measurement instruments when the system is in the uncoupled state but is accessible to these same instruments when the system is in the coupled state of physical reality... Part II of this article (in a forthcoming issue) explores the thermodynamics of complementary and alternative medicine (CAM) through five different space–time applications involving coupled state physics to show their relevance to today's medicine: (1) homeopathy; (2) the placebo effect; (3) long-range, room temperature, macroscopic-size-scale, information entanglement; (4) explanation for dark matter/energy plus possible human levitation; and (5) electrodermal diagnostic devices.

Yes, that's exactly what was missing in the physicists' picture: "a coupled state of physical reality." (Needless to say, Maxwell's equations, which suggest that a magnetic monopole -- if it existed -- would be rather easy to detect, must be wrong too.)

Friday, February 26, 2010

On mastery and singlemindedness

Of late I find myself getting into several discussions on "mastery". One example was here, where the topic under discussion was poetry, and my opinion was this: "To break the rules you need to know the rules. I'd say you need to do more than know the rules: you need to master the rules." (I also promised a longer write-up on my views on the subject, but this is not that write-up: it's more of a trial run.)

I don't claim to be an expert in poetry, but I think this principle applies widely. I heard it from a classical guitarist in Bangalore who had a most unorthodox posture, and would say "I'm sitting like this because, first, I have a physical problem with the standard posture, and second, I know what I am doing. If you are learning the instrument, you had better hold it the standard way. In science, there are many examples of scientists with mastery of the subject breaking rules -- the Dirac delta function being perhaps the best known -- but an average scientist who breaks rules is likely to produce crackpot research.

Here, however, I want to talk about a different question: does mastery of a field imply exclusion of ability, or interest, in other fields? The specific motivation is Sunil Mukhi's post today on mastery. He expresses his skepticism on the current scientific/academic trend favouring "interdisciplinarity" and "being a well-rounded individual" and "all that", and adds that "serious achievement requires concentration, knowledge, technique and depth."

Now, there is absolutely no doubt about that. Achievement in any field requires all of the above. But he cites as his example Sachin Tendulkar, saying that Sachin "single-mindedly focuses on what he does best" and suggesting that he has no interest in any other form of expertise.

But in Sachin's case this is not true. He is a fine bowler. To date he has 154 ODI wickets, 44 Test wickets, but those figures don't reveal his value: he is not called to bowl long spells as specialist bowlers do, but as a change bowler to break up a well-set partnership, and his success rate there is extraordinary. He seems to extract as much turn, sometimes, as Shane Warne or Mutthiah Muralitharan. I am convinced that if he had applied a part of his batting focus to bowling, though he wouldn't have been the greatest batsman in history, he would have been by far the greatest allrounder -- greater than Gary Sobers. Ne is also an outstanding fielder. As for other sports: very few sportsmen -- in Tendulkar's class or not -- attempt more than one sport professionally, but I am sure Tendulkar has an amateur interest in several other sports. In particular, he has been photographed playing table-tennis (with concentration writ large on his face).

I have a big problem with the view, widespread in India, that mastery in one field requires exclusion of interest in other fields. Many Indian parents discourage their children from pursuing any other activity during the dreaded Board exams: anything other than study is viewed as a distraction. I read the complete Sherlock Holmes, cover to cover, and I don't think my results suffered. Nearly all great scientists that I can think of have had strong interests in other fields, and not just in other sciences. Far from distracting them, I think it has strengthened their primary work -- even if they never went fully "interdisciplinary".

Which brings me to Sunil's other example: Srinivasa Ramanujan. Says Sunil:

Recently a colleague, talking about his institution's undergrad admissions process, observed that "with the kind of breadth requirements we have, one wonders if Ramanujan, who only knew mathematics, would even get admission". That's basically my point, and I think Sachin's achievement validates it.

But Ramanujans are very rare and not replicable. I'd like to think that if a Ramanujan showed up at my institute, or Sunil's, his ability would be immediately recognised by the scientists there and we would make every effort to help him bypass the usual educational requirements. But it is terrible advice to a young mind to try and become a Ramanujan. Such a creature comes along once a century, or
even more rarely.

Most of the great Indian scientists I can think of were multidisciplinary. Visveswarayya had an extraordinary range of civil engineering achievements, from irrigation to flood protection to roadways. Jagdish Chandra Bose made significant contributions to plant physiology, membrane biophysics, and other fields, and is now recognised as Marconi's predecessor in wireless communication. C V Raman made contributions in light scattering, acoustics of musical instruments, crystal dynamics and properties. Subrahmanyan Chandrasekhar was famous for switching fields every ten years and achieving mastery of the new field: he wrote classic books on stellar structure, stellar dynamics, radiative transfer, plasma physics, and hydrodynamics. Yet Ramanujan seems to capture the popular imagination much more than these figures. His is a unique and romantic story, but should not be held up as an example to follow. He is not someone who broke the rules after first having mastered the rules: he seems to have never learned the rules, but achieved mastery all the same.

To me, "mastery" does not imply "singlemindedness". Nor does it imply remaining in the same field all one's life. And, in fact, I think Sachin Tendulkar is an excellent example of the former point, and I suspect he will continue to be an important figure in whatever he chooses to do after he retires from cricket.

Sachin Tendulkar is no Ramanujan. He has natural talent, yes, but is the product of a fine coach (Ramakant Achrekar), a school system that has produced many other fine cricketers, and, of course, his own hard work and study. Ramanujan barely knew how he produced his own results (which he largely supplied without proof, keeping mathematicians busy for the following century), and often attributed his insights to the Goddess Namagiri. Tendulkar's achievements are the results of extremely conscious hard work, and he is eminently worthy of emulation.