At the recent CHI2011 conference, I was asked to serve as a panelist to discuss the issue of replication of research results. As part of this RepliCHI panel, I wrote an essay arguing for how replication isn't just replication of experiments or rebuilding of systems, but instead it is used as an important step in building up greater understanding of a domain. Many panelist, including myself, were surprised when many people showed up at the panel (>100 people?), ready to discuss this seemingly dry academic issue. Here was my essay slightly edited:
One mainstream perspective on HCI is that it is a discipline built upon applied psychological science. Psychological science here refers to the understanding of mind and behavior, while ‘Applied’ here means that it is the application of approaches of methods, findings, models, and theories from the psychology domain. One only has to look at the CHI annual proceedings to see that it is full of borrowed methods from Experimental Psychology, a particularly approach to understanding mind and behavior based on scientific experimental methods. This approach worked well for HCI, since computers can be seen as a kind of stimuli that is not only interesting, but could augment cognition and intelligence .
Experimental psychology is based on the idea that, if you design the experiment and control the laboratory setting well enough, you have end up with evidences to believe that the results of the experiment will generalize. These ideas around controlled experiments of course form the basis of the Scientific Method. As part of the scientific discovery process, we ask researchers to document the methodology and results, so that it can be archived and replicated by others.
But my position is that replication is not the only goal. More importantly, if there are limitations to the study, later experiments might expand on the original experiment to examine new contexts and other variables. In these ways, the idea behind the replication and reproducibility of experiments is not just to ensure validity of the results, but it is also an essential part of the scientific dialog. After all, the reason we value research publications so much is not just because they document and archive the results of the research, but also that others might literally stand on the shoulder of giants, to reproduce *and* to build on top of the results.
Take for example, the great CHI 97 Browse Off in Atlanta which aimed to put together a number of hierarchical browsers to see which is the ‘best’. At the event, the Hyperbolic Browser  was the clear winner. While the event was not meant to be a controlled experiment, but it was widely publicized, especially amongst information visualization researchers. Several years later, this experiment was replicated in a laboratory setting at PARC  with the top 2 performing systems during the event — Hyperbolic Browser and Windows Explorer. Not just once, but twice, under different task conditions!
In the first experiment, the results were at odds with the Browse Off. Not only were there no difference between the browsers in terms of performance, it appears that subject variation had more effect on the results than any other variable.
Further analyses showed that there was an interesting interaction effect between the amount of information scent available via the interface conditions and performance, with better information scent resulting in lower retrieval task times with Hyperbolic Browser.
In the second experiment, when restricted to retrieval tasks rather than including comparison tasks also, Hyperbolic Browser was faster, and users appears to learn more of the tree structure than Explorer.
What’s interesting is the interpretation of the results suggest that squeezing more information onto the screen does not improve subject perceptual and search performance. Instead, the experiment show that there is a very complex interaction between visual attention/search with density of information of the display. Under high scent conditions, information seems to ‘pop out’ in the hyperbolic browser, helping to achieve higher performance.
The above extended example show that there are a number of fundamental problems with viewing experimental results as the end result of a line of research inquiry. Instead, they are often the beginning. Further experiments often shed light on the complex interaction between the mind/behaviors of the user and the system. Replication/duplication of results and further research efforts examining other contexts and variables are not just desirable, but it is an important part of the whole scientific exercise.
 Lamping, J., R. Rao, and P. Pirolli, A focus + context technique based on hyperbolic geometry for visualizing large hierarchies, in CHI 95, ACM Conference on Human Factors in Computing Systems. 1995, ACM: New York
 Peter Pirolli, Stuart K. Card, and Mija M. Van Der Wege. 2000. The effect of information scent on searching information: visualizations of large tree structures. In Proceedings of the working conference on Advanced visual interfaces (AVI ’00). ACM, New York, NY, USA, 161-172. http://doi.acm.org/10.1145/345513.345304
A great article about this issue just came out at the NYTimes over the weekend:
I enjoyed reading this post and the NYTimes article. Thanks!
I particularly agree that replication is not necessarily simple repetition of the same work, but often new explorations under varied conditions so that we can broaden or deepen the field's understanding. My colleagues and I was involved in one such replication in HCI research. Here is a simplified account.
At 2002s CHI conference, I was intrigued by a paper by McGuffin and Balakrishnan [M&B 2002] on expanding targets. It has to do with a very basic action computer users do all the time: target acquisition on a computer screen, meaning clicking on an icon, a menu, or a word. It is well known that the larger a target is, the easier (faster) it is to click on it. This is more formally known as Fitts law: MT = a + b log (D/W + 1) where MT is the movement time, D is the distance to and W is the size of the target. So if we want to make it easy for the user to select an icon or a menu we should make them bigger. The problem is that you cant make every object big because the total screen space is limited. What is intriguing about M&Bs demonstration is that that you dont have to make every object big permanently. You only have to make the target bigger when you are well on your way to it. In fact the target expansion can take place as late as when the cursor is 90% through the journey to the destination and you still have the benefit of a large target!
This was obviously important, both practically and theoretically. I will spare the theory part here. Practically it means you can have the cake and eat it too! So it seemed an experiment worth replicating.
I was on sabbatical at the University of Paris-sud from IBM Almaden. What got my French colleagues (Stephane Conversy, Michel Beaudouin-Lafon, and Yves Guiard) and I really decided to replicate M&B 2002 was a methodology hole in their experiment: the expanding target trials were massed (repeated) in the same block. The participants therefore could anticipate the target expansion. In other words, they could, at least in principle, visualize, aim at, or budget a much bigger target than what was presented to them. The benefit of the target expansion measured in M&B 2002 could be merely an artifact of the experiment design, not a result of human online response to the expanded targets.
So we designed a new experiment, with various conditions. In one condition within a block of trials the target may expand, or not, or even shrink. Now the participants could not anticipant what the real target size would be untill they were through 90% of the journey.
The results? Yes, the participants could still take advantage of the target expansion, even if the expansion could not be anticipated. Why? It was because, just like many other higher level tasks in everyone life, it only took about half of the total time (40 to 55% depending on the experimental condition) to cover 90% of the ground. The last finishing mile took the rest of the total time. See Figure 11 in [ZCBG 2003]. This was an insight gained in the replication and it was not well known previsouly despite the fact that Fitts law was one of the best studied topics in HCI and in human-performance psychology. So in this case, the replication was worthwhile and informative. It had no trouble to go through the review process and was published in the following year at the same conference (CHI 2003).
That was not the end of the story. Knowing that people can take advantage of target expansion in real time is still not sufficient to put it in practice. For one thing, how do we know which object on the screen is the intended target (vs. obsticles and distraters)? We dealt with that toward the end of our work in [ZCBG 2003]. There are some interesting recent work to follow in that direction. [Ruiz and Lank 2010] is one example.
[M&B 2002] Michael McGuffin , Ravin Balakrishnan, Acquisition of expanding targets, Proceedings of the SIGCHI conference on Human factors in computing systems. doi>10.1145/503376.503388
[ZCBG 2003] Shumin Zhai, Stephane Conversy, Michel Beaudouin-Lafon, and Yves Guiard. 2003. Human on-line response to target expansion. In Proceedings of the SIGCHI conference on Human factors in computing systems (CHI '03). DOI>10.1145/642611.642644
[R&L 2010] Jaime Ruiz and Edward Lank. 2010. Speeding pointing in tiled widgets: understanding the effects of target expansion and misprediction. In Proceedings of the 15th international conference on Intelligent user interfaces (IUI '10). ACM, New York, NY, USA, 229-238. DOI=10.1145/1719970.1720002
Shumin Zhai, Google Inc, 28JUN2011
Shumin's example deserves its own blog entry, and shows that there are definitely more examples of such replication in HCI research literature. I hope that as a field we will document these cases so that we really understand how much scientific research is incrementally built upon ideas from others before us.
I don't think that is an example of the benefits of replication. Shumin's work posed a new hypothesis and tested it experimentally. Replication would be repeating the experiment under identical conditions.
If the work had simply replicated a previous experiment instead of asking a new question, it likely wouldn't have gotten published. That's an important distinction when it comes to what motivation researchers have for replicating other work: the community does not reward simple replication.
I think you may have missed the original point of the essay. The original essay was exactly arguing that "how replication isn't just replication of experiments", but rather that once you do the replication, you will learn new things and they will open new research questions.
The value of replication isn't just to confirm the results, but rather the process will open new questions. In other words, if we focus on the goal and taking the term "replication" at its face value, then we've lost the point of it all. The benefit of replication as confirmation is only partially the goal. The benefits are also derived from the *process* of replication, because it opens new avenue of research.
Moreover, there are many different ways to do replication. You can reproduce the experimental condition exactly. Sometimes you do it because the original data simply isn't available. But sometimes the replication is in the context of replicating the original conditions, but adding new ones (had 2 interface condition, now we add a 3rd or 4th). Sometimes it's adding new experimental measures (adding eyetracking, when the original didn't have it).
Displaying all 5 comments