There has much discussion on Twitter, Facebook, and in blogs about problems with the paper reviewing system for HCI systems papers (see Landay's blog post and the resulting comment thread). Unlike papers on interaction methods or new input devices, systems are messy. You can't evaluate a system using a clean little lab study, or show that it performs 2% better than the last approach. Systems often try to solve a novel problem for which there was no previous approach. The value of these systems might not be quantified until they are deployed in the field and evaluated with large numbers of actual users. Yet doing such evaluation incurs a significant amount of time and engineering work, particularly compared to non-systems papers. The result, observed in conferences like CHI and UIST, is that systems researchers find it very difficult to get papers accepted. Reviewers reject messy systems papers that don't have a thorough evaluation of the system, or that don't compare the system against previous systems (which were often designed to solve a different problem).
At CHI 2010 there was an ongoing discussion about how to fix this problem. Can we create a conference/publishing process that is fair to systems work? Plans are afoot to incorporate iterative reviewing into the systems paper review process for UIST, giving authors a chance to have a dialogue with reviewers and address their concerns before publication.
However, I think the first step is to define a set of reviewing criteria for HCI systems papers. If reviewers don't agree on what makes a good systems paper, how can we encourage authors to meet a standard for publication?
Here's my list:
What do you think? Let's discuss.
Tessa Lau is a Research Staff Member and Manager at IBM's Almaden Research Center.
I'd like to second your first recommendation. I've reviewed a number of systems papers that do not provide a sufficiently compelling motivation or use case - why should I (or anyone) care about this system? Without this, the paper often represents technology in search of a problem.
Now, having read Don Norman's provocative article in the recent issue of interactions magazine - Technology First, Needs Last: The Research / Product Gulf - I have a keener appreciation for the possible contribution of some technologies in search of problems, but I still believe these are more the exception than the norm ... and that without adequately making the case for the human-centered value(s) the systems will help realize, such papers are probably more suitable for other venues.
One problem is that our field is moving so fast that we have to allow new ideas to cross evolve with other ideas rapidly. If we require evaluatuons on every paper, then we don't have the rapid turn around required for innovations to cross paths with each other.
On the other hand, it seems wrong not to have some filter. Without filters, we might end up publishing ideas that seem interesting, but are actually quite useless.
I think you have a great start on a list of discussion points. One thing to keep in mind is that we should evaluate papers in whole rather in parts. I will often recommend accepting papers that are deficient in one area but very good on another.
I think it would be useful to some of us discussing your post if you could say more about the kinds of evidence you are referring to when you say "evidence that the system solves the problem" that are not user studies.
So, what are some examples of specific problem system problems ("clearly and convincingly presented"), and what would you consider appropriate evidence to show that your system solved the problem? Is it a set of usage scenarios that have been hard to address through previous designs, and you show how a single interface design can address them completely? Is it a new significantly more efficient algorithm or mechanism, for example to handle complex preferences around group permissions, that would be useful to the builders of group systems to know about? (In the latter case, would evidence be performance testing using logs of previous queries as data?) Is it a new approach for using skin-tapping as input?
I am a strong proponent of rigorous gate-keeping at conferences, simply because I need some help figuring out which things are worth following in my limited time. At the same time, I think it is important to keep in mind all the different ways a systems paper can be really valuable and worth seeing at a conference like CHI. A systems paper could be interesting thanks to a thorough analysis of its deployment and usage (evaluation). Or, it could be interesting thanks to a well argued discussion of why it was built a particular way (design). Or, it might just demonstrate that a given interesting capability could be created at all. Or, it could be a careful argument about why a certain system would be really useful, even if it hasn't been built or evaluated yet (motivation/position paper). At the end, what I want are papers that stimulate thought and action. I'm not going to demand any particular levels of motivation, design or evaluation; rather, I'm going to ask whether the whole is innovative enough. This is a highly subjective decision, which is why I greatly value wise program committees who can make such a judgment on my behalf.
Thank you all for the interesting discussion. My goal was to initiate a discussion within our community, not to have all the answers.
Along those lines, Dan, the question you raise about what constitutes "appropriate evidence" is one that I'll turn back to the community for a collective answer.
For what it's worth, though, I don't think of your examples as "systems". The first is usage scenarios or design. The second is an algorithm. The third is an interaction method. Each of those is fairly self-contained and possible to evaluate using a fairly closed study.
What I meant by "system" is an implemented prototype that gives people access to new functionality that did not exist before. Examples of "systems" include CoScripter, ManyEyes, Landay's DENIM and SILK, Gajos's SUPPLE, Ko's WhyLine. How can we show that each of these systems is "innovative enough" (in David's words) to merit publication?
Hi, Tessa! I like your list, and I think that the bullet points are representative of good evaluation criteria for systems papers across computer science.
The main sticking point is, as I see it, is "Evidence that the system solves the problem as presented." In some other areas of empirical computer science, we have repositories of test problems, suites of agreed-upon performance metrics, testing harnesses for software, and so forth. Usability testing is seen as the gold standard in HCI, though, and it's much harder to leverage such tools to make evaluation by user testing efficient. The effort devoted to user testing of a new system can sometimes rival the effort to having built the system in the first place--okay, I might be exaggerating a bit, but still...
If we could agree on some reasonable substitutes, that would be good. Some of the papers I've worked on have included GOMS models of performance, for example, but not everyone buys into a purely analytical approach. Sometimes, even worse, what I'd like to convey in a paper is a conceptual shift, a different way of thinking about some kind of problem, and that's even harder to evaluate than pure performance.
I've suggested it before and will suggest it again: an easy start could be to make accepted Interactivity demos 'worth' as much as a full paper at CHI: same presentation length, and the associated paper (maybe 6 pages long) needs to be of the same 'archival' status in the ACM digital library.
This could show a true commitment to systems research.
Displaying all 7 comments