Research Replication in Social Computing

metaxas - May 1, 2012 @ 12:15 pm · Filed under Critical Thinking, Predicting, Research Replication, Social Media, Twitter

On the need for Research Replication in Social Computing

A call for replicating Social Computing results as a necessary step in the maturity of the field.

The field of Social Computing is a rather new one but it is one of the more active in Computer Science in the last few years. Many new conferences have been created to host the research efforts of computer scientists, social scientists, physicists, statisticians and many other researchers and practitioners. The excitement generated by the opportunities that opened through the relatively ease of retrieving large amounts of data has led many young researchers in diving to uncover the modes of social interactions.

At the risk of oversimplifying, one could say that the research papers we produce follow the general pattern of observational sciences:

We collect data that arguably can capture the phenomenon we want to study,
we may apply some sophisticated statistical tools, test a hypothesis applying machine learning tools, and
analyze the results.

Our conclusions sometimes do not just state the phenomenon we just observed, but they expand from the specific findings to claim possible projections that go beyond the observed.

One of the reasons that this approach seems familiar it that it resembles the one used in Experimental Computer Science. There, we measure the characteristics of the systems or algorithms we have built, and study their performance experimentally when exact analysis is not easy or even possible. This is a true and tried approach since, in the systems we build, we take great effort to avoid any behavior that is outside of the specifications. In the artificial worlds we create, we try to control all of its aspects, and this process has produced amazing technological results.

On the other hand, this approach may be inappropriate or incomplete compared to those used in Experimental Natural Sciences. Physicists, Biologists and Chemists would start with this approach to make initial sense of the data they are collecting, but this is just the beginning of the process. Replication of their research is normally needed to verify the validity of the original experiments. Sometimes the research results would not be validated, nevertheless, even in this case the replication process would provide insight into the workings of natural phenomena. Nature is mostly repeating its phenomena consistently, and one may have to account for all the parameters that affect them. Sometimes this is not easy, and replication offers the best guarantee that the research findings are valid.

As we mentioned, Social Computing is now being done by researchers coming from many disciplines, but it is different from both Computer Sciences and Natural Sciences. Though it has the potential of also becoming an experimental science, so far it is mostly an observational Science. This, it turns out, is a very important distinction. Society is different than Nature in several important ways. Its basic building blocks are people, not atoms, or chemical compounds or molecules. The complexity of their interactions is not easily tractable, to the degree that one may not be able to even enumerate all the factors that affect them. Moreover, people (and even social “bots” released in Social Media) do not behave consistently over time and under different conditions.

The closest relative to Social Computing is not Computer Science, we would argue, but Medical Science, where Natural Sciences phenomena are influenced by Social conditions. In both Medical and Natural Sciences, replication of results is considered an irreplaceable component of scientific progress. Any lab can make discoveries, but these discoveries are not considered valid until they have been independently replicated by other labs. Not surprisingly, replicating research findings is considered a critical publishing action, and researchers are getting credit for doing just that.

In Computer Science, replication has not been considered important and worth any credit, unless it reveals crucial flaws in the original research. It is unlikely, for example, that replicating Dijkstra’s Shortest Paths algorithm would contribute to the development of our discipline, and so it makes sense not to give credit to its replication. On the other hand, inability to replicate Hopcroft and Tarjan’s tri-connected component algorithm was a significant development, and Gutwender and Mutzel who discovered it and corrected it, did receive credit for it.

We acknowledge the need for replicating Social Computing research results, as a way of establishing the patterns that Social Media data are discovering under all meaningful conditions. We believe that such research replication will give credibility to the field. Failing to do that, we may end up collecting a large number of conflicting results that may end up discrediting the whole field.

2 Comments

B. Huberman

May 3, 2012 @ 3:52 am

1
See my recent letter in Nature:

http://www.nature.com/nature/journal/v482/n7385/full/482308d.html
metaxas

May 4, 2012 @ 7:21 am

2
Good points, Bernardo. I have incorporated them in my next posting.

When Computation met Society