Martin Sykora's Personal Blog: Lies, damned lies, and statistics

Sunday, March 3, 2013

Lies, damned lies, and statistics

From my experience most PhDs in engineering and computer science must use quantitative data analyses and statistical techniques to evaluate and validate data within experiments at some point in their research. Increasingly social science researchers also use these techniques, mainly through well established software packages, such as SPSS, R-Statistics or other.

I have to say that most papers I read, even if they are relatively theoretical, do have a strong empirical data analysis component. One can sometime loose sight of the problems associated with statistical evaluations of empirical work, and hence I thought it be a good reminder for my readers and myself to refresh some common pitfalls with statistical techniques.

Discarding unfavorable data

Loaded questions

Overgeneralization

Biased samples

Misreporting or misunderstanding of estimated error

False causality

Proof of the null hypothesis

Data dredging

Data manipulation

Wikipedia is a good source. I also think a good statistics text will help in combination with some lighter reading.

There are also calls for researchers to make their code and datasets publicly available so that experiments can be repeated independently. This is now increasingly becoming a common practice, especially with high profile journals and conferences, but there are still numerous issues associated with making datasets and code-bases publicly available.

Martin Sykora's Personal Blog

Sunday, March 3, 2013

Lies, damned lies, and statistics

No comments:

Post a Comment

BOOKS: Trading favourites and Financial Data Mining