(or: Methodologists, atone!)
By far my favorite book on the research methods, Unobtrusive Measures (first published in 1966), is a skeptical romp through social science where the authors take the position that most of what we call social science is wrong. The theme of the book is that research is likely wrong because research design is very difficult and researchers too easily substitute received wisdom and procedure for hard thinking about designing studies, experiments, measures, tests, and so on. Scientific conduct has a rote character that extensive training and preparation (e.g., making you get a Ph.D.) can reinforce. Peer review and the tenure system can be engines of conservatism.
So you perform a survey in which you ask a particular question of a particular group not because it means something as evidence or it is a particularly good idea. You do it because your advisor did it that way, or someone else (cite, year) did it that way and it is therefore respectable. And if someone did it before, it’s comparable. This is perfectly reasonable. It’s likely you are interested in a particular problem, but not really in the methods or statistics relevant to tests related to that problem, so you offload all of the thinking about statistics by performing the methods and statistics that everyone else does. It’s efficient.
Yet when you stop and actually think about the intricacies of any particular research design, it gets ugly. Einstein said, “Theory is something nobody believes, except the person who made it. An experiment is something everybody believes, except the person who made it.” For decades (since even before Webb in 1966), various cranky types have been alarmed at the misuse of quantitative research.
My own struggles with the topic led me to design a graduate course called Unorthodox Research Methods. The premise is this: Most research courses teach procedure, but we need to train our students to think about research design and evidence first and we are not doing a good job of that. (I’m revising the syllabus for this course and so I’m thinking about these issues again, hence this post.) The Internet is making us rethink many of our research methods, and Webb’s 1966 critique has never been more apt.
A blog post is only big enough for one example, so here’s a big one: A huge pitfall in our procedure-based methods education is the use of statistical significance. Even non-quants are familiar with those nagging asterisks that appear after all sorts of columns in all sorts of journal articles across the social sciences. Statistical significance is the end of conversation about method in many research projects. Once p < .05, you pack up your kit and go home. Why do you test significance this way? Because it’s a step in your list of steps. I think it is fair to say that most researchers have internalized this approach despite the fact that it is totally wrong and the statistics literature has railed against it for decades.
Just so we are clear: statistical significance is often useless – it’s not even a hint toward the right answer for your research project in many situations. Luckily for the truth, the rise of the Internet is about to cause this test to blow up in our face. We have taught statistics so badly in the social sciences that most researchers do not appear to realize that the test of significance is about sampling. (Bam!) It is a test that helps you figure out if you are being excessively skeptical because of the small size of the sample that you’ve got. And our samples are now changing.
Thanks to the Internet and our ongoing revolution in computing we are entering the era that the UK calls e-Social Science, and here in the US we call Computational Social Science. Fast processors. Big iron. Big datasets. Many variables.
Data from the cloud now potentially lets us test all kinds of social science questions (particularly if you are interested in human communication) that before would have by necessity sat in a small sample questionnaire. As social scientists turn toward “big data” they are going to trip over their bad habit of significance testing. The fact is, most methods courses and research procedures in wide use are obsessed with errors caused by sampling, especially small sample sizes. (Bam!) But as a sea of digital data opens up to the horizon, our problems are increasingly about specification error and not sample sizes, just as measures are increasingly unobtrusive and not self-reports.
Remember, statistical significance is about sampling. ”Except in the limiting case of literally zero correlation, if the sample were large enough all of the coefficients would be significantly different from everything.” (McCloskey, p. 202). Take your study of communication patterns from 60 paper-and-pencil questionnaires replicate it with a random sample of a million Facebook accounts (if you can get access… see this editorial). You’ll find that statistical significance — particularly at the arbitrary point of p < .05 – tells you zip.
(click for more shirts like this.)
I think most of the solution is to de-emphasize procedure, as social science procedure is becoming much more volatile as information technology improves. We need to get people to understand that research design is a creative act, not the boring part of the research process. Students need to write new procedures, not memorize old ones. To that end, we need classes about evidence and research design. Figuring out how to do that is a challenge but we’ve got to step up to it. (If you’ve got ideas for revising the syllabus for my last attempt, send me an email or a comment.)
Chant it with me: Statistical significance does not equal substantive significance. Please chant it with me. This is something we ought to know already but it may take the big datasets of the Internet to teach us. What other lessons are in store?
Deirdre McCloskey: Rhetoric Within the Citadel: Statistics (http://deirdremccloskey.org/docs/pdf/Article_181.pdf) and Why Economic Historians Should Stop Relying on Statistical Tests of Significance (http://deirdremccloskey.org/docs/pdf/Article_180.pdf)
John P. A. Ioannidis: Why Most Published Research Findings Are False (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/)
Jonathan A C Sterne and George Davey Smith: Sifting the Evidence: What’s Wrong with Significance Tests? (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/?tool=pubmed)