Stay tuned – Receive JSM-news !

Join the JSM mailing list to receive our latest updates.
Email address
ABOUT
All too often companies have only the vaguest idea about what kind of data they’re holding; because such data is very often hidden deeply away in a variety of databases and fragmented across different departments. We identify this data and bring it to light, making it visible, cohesive, comparable and easy to understand so that it really does support YOU in making the right decisions. And if need be, we can also identify any lacking data and define a concept to fill in the gap.

The role of Mahalanobis in emergence of Modern Statistics

Posted by on Dec 27, 2017 in Insights, Thoughts | No Comments

This article started as simply a transcript of a lecture by Stephen Stigler

titled “Mathematical Statistics as a Global Enterprise” at Indian Statistical Institute (ISI) Delhi, commemorating the 125th birth year of Prasanta Chandra Mahalanobis. As I ruminated on what Stigler had to say about Mahalanobis and his role in the history and development of statistics (as a global enterprise), I started drawing parallels between the singular events in the life of Mahalanobis and those of other scientists, and inevitably, singular events in the history of statistics and the history of other sciences.

The result, then, is my crude attempt at crystallising the motivations that drove statisticians like Mahalanobis (and by extension, generations of statisticians) and understanding the phenomena that cemented statistics as the foundation of the age of information. As I emerge from the rabbit hole, I regret that little of the original lecture is left with me. It has largely been replaced by a web of thoughts. However, the University of Chicago Centre in Delhi (who organised the lecture in collaboration with ISI) will soon publish a recording of the original lecture, to which I will provide a link here, whenever it is available.


Making of a Scientific Revolution

In his brilliant TEDx talk, mathematician and historian Rohit Gupta (also known in the blogosphere as Compasswallah) asks “How do you create a scientific revolution?” After providing several examples, he answers, “Look at history itself and ask — where were the peaks of scientific awakening and what were the conditions and contexts that led to them?” The answer to this question is not easily apparent. However, to a studious mind, it is possible to reasonably trace the origins of a scientific achievement to one or a small number of very specific events.

As an example, consider the fact that you can read this article, (written on a cold December evening in a dimly lit room in South Delhi, on a terribly slow internet connection) anywhere in the world with no more than a tap of your finger. We, dear reader, are communicating across time and space. Even this remarkable technological feat can be traced back to a singular time and place in history.

In 1948, a trio of scientists at Bell Labs invented the transistor — a purely non-mechanical, electronic switch which could, unlike conventional switches, be turned on or off with nothing more than currents and voltages. The impact of this idea, which I have greatly oversimplified, on the world was unprecedented. The transistor won John Bardeen, William Shockley and Walter Brattain the Nobel Prize in Physics in 1956. (Bardeen would win another Physics Nobel in 1972, and remains the only person to have won it twice). This transistor, however, was only part of the puzzle.

Across the corridors of Bell Labs, a 32-year old mathematician, Claude Shannon, had been working on what would become the mathematical theory of communication. The transistor was the hardware, but Shannon’s theory would define the very limits of transfer of information. Shannon’s theory is so rich and seminal that it fills communication engineering curriculum to this day. The entire edifice of digital communication rests on these breakthroughs — the hardware and the math which ultimately dictates the software of digital communication. It is almost unbelievable that such a fortuitous confluence of researchers happened at Bell Labs in 1948. For a more detailed and highly readable history of the information age, read James Gleick’s The Information.

Examples of great collaborations can be found in physics and astronomy too. The infallibility of Newtons laws shows itself most prominently in the discovery of Neptune. Neptune’s position was mathematically predicted before it was directly observed, based primarily on the law of gravitation. Newton’s Principia Mathematica (specifically, the laws of gravity) benefited greatly from a long collaboration held with Edmund Halley, the astronomer famous for discovering the Halley’s comet. Halley would sail off undertaking scientific expeditions over the oceans, and it is believed that his readings of the positions of planetary bodies provided Newton with the data necessary to postulate the laws of astronomical motion, and ultimately, the law of gravity.

A similar, and an equally serendipitous collaboration (which also survived troubled waters) took place in statistics in the time between the two world wars, between Ronald Fisher and P C Mahalanobis. This was an event that would change statistics forever.


Prasanta Chandra Mahalanobis — The Early Years

Precocity is overrated. We have come to overvalue this trait in students because of the hyper-competitive age we live in. It is true that many exceptionally smart people do show early signs of genius, but it is also equally likely for remarkable individuals to have been thoroughly unremarkable in their youth. Mahalanobis was amongst the latter. He had quite an uneventful stint at Presidency College, Calcutta, other than the fact that he might have been taught by J. C. Bose, and that, Netaji Subhas Chandra Bose was two years junior to him. At Cambridge, he interacted with Srinivasa Ramanujan. Ramanujan left anecdotes with everyone he met, and Mahalanobis recounts a very interesting one here. Nevertheless, there still was no foreshadowing. On his voyage back to India, he is known to have read all nine volumes of Biometrika, a revered statistical journal which counted Karl Pearson among its founders. While there was no precocity, Mahalanobis certainly did not lack tenacity and determination. After discovering the applications of statistical techniques to meteorology and anthropology, he would work on these problems for the greater part of his life, particularly in sampling and multivariate analysis.

David and Goliath

One of the characteristics of successful scientists is having courage. Once you get your courage up and believe that you can do important problems, then you can. If you think you can’t, almost surely you are not going to. Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn’t know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, “What would the average random code do?’’ He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts?

— Richard W Hamming (You and Your Research)

In 1923, Mahalanobis wrote a manuscript titled “On the Seat of Activity in the Upper Air”. It was a critique of the work of William Henry Dines, an independently wealthy English meteorologist who had invented the pressure tube anemometer. It was a device that could be mounted under a hot air balloon and when deployed, it would fly around with the balloon, recording air pressures at various altitudes. Following his experiments with the device, Dines inferred that the pressure at a height of 9 kilometres (the so-called “seat of activity”) from sea-level is what affects pressure changes throughout earth’s atmosphere to the greatest degree. Mahalanobis examined Dines data from a rigorous statistical standpoint, and ended up estimating that the layer of atmosphere from the height of 2 km to 4 km was statistically more telling, than the 9 km estimate reported by Dines. Mahalanobis’ primary contention was that Dines had ignored sequential measurements when the anemometer would ascend with the balloon, thereby, leading to a false statistical correlation.

The manuscript did get published in the Memoirs of the Indian Meteorological Department, but was quite negatively criticised by Dines himself in the 1923 issue of Nature.

W H Dines' response in 1923 issue of Nature.

W H Dines’ response to Mahalanobis’ “Correlation of Upper Air Variables” in a 1923 issue of Nature.

In response to Dines’ letter, Mahalanobis said that Dines had still not addressed the original issue. It is difficult to tell if the feud was ever settled, but Mahalanobis’ apparent “effrontery” stands out prominently in this whole affair. He did not know that he was not supposed to take on a titan like Dines.

Collaboration with Fisher

The single longest collaboration of Mahalanobis’ life was with Ronald Fisher (creator, among other things, of the immensely famous Iris flower dataset). Mahalanobis and Fisher were at Cambridge at somewhat the same time, but strangely they never ran into each other. They did correspond frequently.

Mahalanobis’ initial work on meteorological data had a great impact in studies related to weather, rainfall, soil conditions and ultimately agriculture. Statistics was being recognised in India as a key discipline within anthropological studies as well. Mahalanobis’ work in anthropological data lead to methods that could be applied to the classification of populations characterised by anthropological measurements. This was called the Mahalanobis Distance (it is essentially a way of measuring the distance between a fixed point and a set of points, defined as a statistical distribution).

In the November of 1929, Mahalanobis submitted this work to the coveted Biometrika, but Karl Pearson (the founder of the journal) rejected his submission without comment. Mahalanobis wrote to Fisher about this, and the latter sympathised, recommending that Mahalanobis make the submission to the journal of the Royal Anthropological Society. However, the Royal Anthropological Society too rejected the submission because they thought it was being considered for publication in the Biometrika.

When Fisher found out about this, he thought Mahalanobis and Biometrika had struck a deal behind his back (he was known to be a touchy person, and would often be easily offended). When confronted, Mahalanobis sent a five-page reply, explaining that he had actually sent two papers to the Biometrika both at the same time. Pearson had rejected only one of them, and had asked for an abridged version of the second. By ‘abridged’, he meant that the Biometrika would only publish the data from the second paper, not the analysis. As far as Fisher was concerned, Mahalanobis had lied by omission, and suggested that Mahalanobis continue entirely on his own without involving Fisher.

Stephen Stigler

Stephen Stigler shows how Fisher’s salutations in letters to Mahalanobis changed over the years.

Fisher and Mahalanobis had their ups and downs, but their partnership was to become a cornerstone of modern statistics.

The Fisher Lectures

The Indian Statistical Institute was founded in a small classroom in Presidency College in 1931. At this time, it was barely more than a society. Early sessions of the society consisted mostly of Mahalanobis delivering talks on his work and various related subjects. Sankhya, the Indian journal of statistics, was founded two years later. The first Indian Statistics Conference was held in 1938, with none other than Ronald Fisher presiding. The then Viceroy of India was also in attendance.

Over the coming years, Fisher visited India no less than eight times, and his lectures during these visits are the stuff of legend. He spoke about the theory of estimation, a topic on which he had authored many papers, the best known of which is The Mathematical Foundations of Theoretical Statistics (A very readable summary of Fisher’s work in the early 20s has been published by Stigler). He also lectured about the design of experiments, and the theory of genetics — a topic that had captured the imagination of the best statisticians of the time.

Fisher saw these lectures as a change to sum up his life’s work. But he was known to be a bad lecturer. When asked by an unsuspecting student to use better words, he famously said, “Young man, these are the best words!” Mahalanobis, however, was very well prepared for this. He made sure that his students (most prominently R C Bose and K R Nair) had done their homework with Fisher’s work well before the lectures happened. They would spends months studying the lecture material before the lectures. The collection of these lectures was later turned into a book published by the University of Calcutta, which is one of the most definitive texts in modern statistics.

The Building of an Enterprise

The true contribution of Mahalanobis is that he was able to, immensely successfully, bring together a confluence of people and ideas. Because of this alone, he succeeded on a grander scale, and with many more challenges, than Pearson or Fisher did. In his book, The Seven Pillars of Statistical Wisdom, Stigler writes,

With all the variety of statistical questions, approaches and interpretations, is there then no core science of statistics? If we are fundamentally dedicated to working in so many different science, from public policy to validating the discovery of the Higgs boson, and we are sometimes seen as mere service personnel, can we really be seen in any reasonable sense as a unified discipline, even as a science of our own?

This is an identity crisis that many statisticians must have suffered. Today, statisticians are indispensable. The AI revolution, at its core, is a victory of statistical analysis. Indeed, mathematical statistics was once viewed simply as a service. Remember that Mahalanobis’s contribution to Biometrika was valued only for its data, not its analysis. We have come a long way since then, and few people can claim to be as instrumental to the development of statistics into a global enterprise as Prasanta Chandra Mahalanobis.

Jaidev Deshpande

Data Scientist and Software Developer. Jaidev specialises in building end-to-end machine learning applications to automate business processes.

More Posts - Website

Leave a Reply