Principles of Data Analysis

Published by Cappella Archive, a micropublisher.     (Why a micropublisher?)

The text (700 kb) can be downloaded free, in your choice of

A4 size       letter size       paperback size

The paperback can be ordered directly from the publisher   or (slightly more expensively) from Amazon UK. It is beautifully produced but doesn't cost much more than laser printing.

Read on for       the cover       reviews       an excerpt       errata

From the Cover

Cover text by Nigel Dowrick, illustration by Rebecca Thorn.

Cover illustration by Rebecca Thorn This is a short book with a lot in it. As the title says, its topic is the principles of data analysis. The emphasis is on why things are done rather than on exactly how to do them. If you already know something about the subject, then working through this book will probably deepen your understanding. The book begins by identifying four general classes of data analysis problem, and uses elementary probability along with Bayes' theorem to explain exactly what each involves. The next two chapters use some simple distributions to illustrate these ideas. Further chapters discuss the Monte Carlo method (briefly), least-squares fitting (in some detail), and the problem of determining a distribution function from data. The book ends with an interesting pair of chapters on entropy: one on the maximum entropy method, and one actually about thermodynamics.

A major strength of the book is the inclusion of genuinely interesting worked examples. These deal with real problems and the way in which the author tackles them gives the reader confidence that they, too, could do likewise. The opportunity for this comes from a series of equally interesting questions (solutions or heavy hints kindly provided). There are also plenty of references in footnotes should the reader decide that they wish to know more about a particular topic.

As I've already said, the book is not aimed at absolute beginners. In general it's written for those who already know something about the subject, and want to understand more. Such people will not be disappointed. However, even beginners could learn a lot from Chapter 1, parts of the other chapters, and the examples. Given that it's available free (in electronic form) and that the author's web site isn't hard to track down(!) I'd recommend anyone who thinks they might be interested to download it. It's a small download (less than 700 kB) so what have you got to lose?

Published reviews

Physics Today writes:

....provides a fresh, succinct view of data analysis at a level suitable for working physicists, graduate students, and very advanced undergraduates....

The Observatory Magazine (Dec 2003) writes:

This neat little book was developed from a taught course, and what a good course it must have been if this text is a fair reflection---rigorous yet informal; pedagogical but not pedantic; fundamentally theoretical in character, but studded with examples, problems (with hints and answers!), and illustrative applications....


A book called ``Principles of Data Analysis'' might normally be considered a useful substitute for Mogadon. However, Prasenjit Saha has instead produced a delightful and lively book, likely to keep one awake at night tussling with the thoughts it provokes. Instead of serving up a wearisome account of Bayesian versus non-Bayesian theologies, he leads us on an entrancing path through a field dotted with charming examples - including ice-hockey pucks sliding around, the distribution of taxi numbers, the weirdness of quantum mechanics, and share option trading. It is illuminating and refreshing to see connections between these disparate subjects. Data analysis is now dangerously close to being fun.

Eric Grunwald
Managing Director, Perihelion Ltd (UK)

I thought it was great! Clear and concise, an eclectic but focused set of topics, very compact and elegant derivations, and great problems (several of which I tried!). I shall certainly keep it in mind for students.

Scott Tremaine
Professor and Astrophysics Department Chair
Princeton University

While this text is probably too concise for a novice, it is a useful supplement for those with some familiarity with the subject. It is well written, being both easy and entertaining to read, and contains many interesting examples and problems (some of which are quite challenging).

D.S. Sivia
Rutherford Appleton Laboratory (UK)
Author of ``Data Analysis: A Bayesian Tutorial''

If you would like to understand the root principles of data analysis, come to grips with illuminating problems, and enjoy the whole experience, then you could not do better than to read Prasenjit Saha's short book.

Matthew Colless
Director, Anglo-Australian Observatory

Your book is a good example of a philosophy which seems to be peculiar to astronomy which draws on so many disciplines, to transform the areas it uses. In other words, statistics is too important to be left to the statisticians and one should have a shot at looking at it through other spectacles!

Rajaram Nityananda
Director, National Centre for Radio Astrophysics (India)

I thought this book was extremely useful. On the one hand it is very concise and would probably need supplementation for a course in data analysis and statistics. On the other hand, the problems are extremely good and valuable. I am going to use this book in Fall 2003 for a graduate class on statistics and Bayesian statistics.

John Tonry
Professor, Institute for Astronomy
University of Hawaii

This is a clear and well-written book, that explains various aspects of data analysis from a Bayesian perspective. The examples and problems are extremely well chosen. Although the book is too concise to serve as a first exposure for an average student, it should be very useful for an advanced reader or with supplemental material.

Onuttom Narayan
Professor of Physics
University of California, Santa Cruz

Excerpt from the preface

What this book hopes to convey are ways of thinking (= principles) about data analysis problems, and how a small number of ideas are enough for a large number of applications. The material is organized into eight chapters:

(1) Basic probability theory, what it is with the Bayesians versus the Frequentists, and a bit about why quantum mechanics is weird (Bell's theorem).

(2) Binomial and Poisson distributions, and some toy problems introducing the key ideas of parameter fitting and model comparison.

(3) The central limit theorem, and why it makes Gaussians ubiquitous, from counting statistics to share prices.

(4) An interlude on Monte-Carlo algorithms.

(5) Least squares, and related things like the chi^2 test and error propagation. [Including the old problem of fitting a straight line amid errors in both x and y.]

(6) Distribution function fitting and comparison, and why the Kolmogorov-Smirnov test and variants of it with even longer names are not really arcane. [Sample problem: invent your own KS-like statistic.]

(7) Entropy in information theory and in image reconstruction.

(8) Thermodynamics and statistical physics reinterpreted as data analysis problems.

As you see, we are talking about data analysis in its broadest, most general, sense.


The following errors in the printed book are corrected in the online version.

In Chapter 5, equations (5.19), (5.24) and (5.27) erroneously took the determinant of the covariance matrix to the L/2 power, whereas in fact it should just be square rooted. (Thanks to John Tonry for this correction.)

Just after equation (M.12), a reference to it in the text was mislabelled as (M.11).

In equation (M.21) confused z and 2.