I remember exactly when I knew I didn't want to be a pure mathematician. It was the Spring of my sophomore year and I was knee-deep in a math major at Stanford. It was 2003 and everyone was a computer science (CS) major, either prepping to be shipped off to Google or doing their own startups. So of course, being myself, I took not being a CS major as a badge of honor; I wasn't part of the code monkey zombie bandwagon. I had taken a programming course, but felt like my ratio of fighting the languages (C/C++) to actually expressing ideas made CS nowhere near as stimulating as math. As a tangent, I do think that impression would've been pretty different if my first exposure had been with a functional programming language (Scheme or Clojure for instance). That said, I was even more impatient at 19, so that might not be true.
What made me want to go into CS was actually a really good math department seminar. It was about Algebraic Topology and for the first forty minutes or so I was riveted by the standard math cycle: Concepts were defined, abstractions erected, machinery churned into theorems. Then towards the end, the speaker started talking about applications to digit and handwriting recognition. At the time, I didn't know anything about machine learning (ML), but I left skeptical that if you really wanted to tackle a problem like digit recognition that you would end up doing anything with Betti numbers or Algebraic topology at all. This suspicion was confirmed that night during a Googling session on state-of-the-art techniques in digit recognition: if you started from the perspective of actually wanting to solve the problem, there were better, simpler, and more direct ways to do so.
This was my first brush with machine learning research and there was something specific about it that appealed to me. The math geek in me liked the technical machinery and how you adapt the same ideas to fit different setups. But what I really found fascinating was the process of looking at data and thinking about how to express intuitions about a problem into actual code. I didn't appreciate this at the time, but that process of taking qualitative ideas and struggling to represent them computationally is the core of artificial intelligence (AI).
The little about machine learning that I learned that night caused me to do a pretty big about face. Luckily, I was already at Stanford which had a fantastic suite of AI related courses (machine learning, graphical models, etc.), which I proceeded to take. I then went to UC Berkeley to do a Ph.D. in CS, specializing in statistical natural language processing (NLP) and machine learning. I learned more from my awesome Ph.D. advisor, Dan Klein, than anyone else academically or professionally. Under his guidance and mentorship, I became a solid NLP researcher, winning multiple best-paper awards for our work. By the end of my time in grad school, I got a tenure-track faculty job offer at UMass Amherst and planned to do a post-doc at MIT before becoming a professor at UMass. I was on the path to having a pretty promising academic career, if I do say so myself.
At some point while at MIT, I decided to leave and do a startup because I felt my work as an academic wasn't going to have the impact I wanted it to have. I went into academic CS in order to design NLP models which would become the basis of mainstream consumer products. I left because that path from research to product rarely works, and when it does it's because a company is built with research at its core (think Google). This wasn't a sudden realization, but one I had stewed on after observing academia and industry for years.
During grad school, I did a lot of consulting for 'data startups' (before 'big data' was a thing) and consistently ran into the same story: smart founders, usually not technical, have some idea that involves NLP or ML and they come to me to just 'hammer out a model' for them as a contractor. I would spend a few hours trying to get concrete about the problem they want to solve and then explain why the NLP they want is incredibly hard and charitably years away from being feasible; even then they'd need a team of good NLP people to make it happen, not me explaining ML to their engineers on the board a few hours a week. Useable fine-grained sentiment analysis is not going to be solved as a side project.
Often the founders of these companies were indeed finding real pain points, but their view of ML was that it was some kind of 'magic sauce' they could sprinkle on an idea to make a product. None of the thinking was constrained by what was feasible or likely to even work. They also couldn't recognize the data problems they had and which ones they could solve, because they weren't used to viewing the world in that way. Machine learning, if it's a key part of a product, can't just be grown and attached to a company's core unless it's there from the start and baked into the foundation.
On the academia side, I had become increasingly frustrated by the kinds of problems being worked on in my field statistical natural language processing. Like any academic community, the work within NLP had become largely an internal dialogue about approaches to problems the community had itself reified into importance. Take for example syntactic parsing of natural language (essentially automatically diagramming a sentence). It's a problem with a whole history rooted in linguistics that goes back the better part of a century. The motivation for NLP work in the area has been that at some point if we want to really have semantic understanding of sentences, we need to nail syntax first. Of course, there have been more concrete uses of syntax parses in machine translation and other areas, but the problem has had the status it has because of this historical roadmap. It's a completely valid problem, but it's frustratingly removed from other more direct problems out there now: How can I summarize all the news or email I get into something digestible? How can I map customer questions directly to database queries? How can I find interesting discourse about a topic?
People in NLP do good work in these areas and in general on end-user problems, but by and large the community is oriented around developing tools to induce traditional linguistic structure which in principle will facilitate downstream applications. Obviously, if we solved syntactic parsing, doing these and other real-world tasks might be easier, but on the other hand, if we worked more directly on these kind of problems, we might find that linguistic analysis isn't as essential as we thought or we might have a better idea of the higher-order linguistic abstractions which are actually worth inducing because they are roadblocks for specific applications.
My response to this concern was to focus my own research on setting up and tackling problems that I thought were closer to these kind of real-world applications. It was while I was at MIT working on one such project, that I began to doubt even this strategy. I was working on extracting snippets from social media reviews covering various aspects of a restaurant; for instance, in the domain of restaurant reviews, "I loved the chicken parm, but the waiter was incredibly snooty," you would register "I loved the chicken parm" as a positive sentiment regarding the food, and "The waiter was incredibly snooty" as a negative sentiment regarding service, while ignoring the rest of the review which might not have any concrete or useable snippets.
Something like this would indeed be useful for a number of applications, but merely as an add-on to something like Yelp. That would in fact be useful, but the NLP wouldn't be at the core of the product. In fact, when I thought about most of the uses of NLP research I had seen in products, most were peripheral to the core experience: increase ad clicks by 2%, increase session lengths by a minute or two, increase 'likes' by 1%. I left because the only way NLP was going to be a core piece of the product is if someone like me was part of the formation. So I moved back from MIT to the bay area and co-founded Prismatic with Bradford Cross.
Nearly two years later, after a lot of learning about industry and making real products, I can confidently say that I'm happy I left academia. Prismatic is a pretty tight realization of how I would've wanted NLP and ML to work in a startup and manifest in product. The relationship is symbiotic: the machine learning and technology is informing possibilities for the product, and conversely product needs are yielding interesting research. Various pieces of the machine learning (like the topics in a topic model) are first-class product elements. Many of the more ambitious NLP ideas I thought about during grad school will become first-class aspects of the product over the next few years.
Getting here wasn't easy.
My co-founder and spent six unpaid months figuring out what would make a high-engagement experience and why other 'smart news' entrants never really stuck. Once we got seed funding, we didn't just rush to a fast MVP (mimimum viable product), we took the better part of a year tackling a tough research problem in a startup and thoughtfully converging on a high-engagement product through lots of trial-and-error. We also couldn't have asked for a better early team than our first two hires, who I knew from my time in Ph.D.-land: Jason Wolfe (from Berkeley) and Jenny Finkel (Stanford). All in all, I think we've carved out an interesting niche of strong computer science and artificial intelligence tightly focused on making smart useable products a reality. What I like most about our approach is that we're always motivated by direct and real problems and the solutions are free to delve into deep abstractions and the technical trenches.
Aria Haghighi is co-founder of Prismatic. He is a multi award-winning Statistical Natural Language Processing researcher and a former Microsoft Research Fellow. He holds a Ph.D. in Computer Science from UC Berkeley and a BS in Mathematics with distinction from Stanford.
Fantastic post. I love that Prismatic seems to be a sort of bridge between the market, and high level academic research. Academia needs more of that.
What you said about companies viewing machine learning as sprinkling salt is so true and I think it applies to other fields also. I think the basic problem is that often the people in charge are not the people with the Ph.D degrees in machine learning/artificial intelligence/statistics etc. So, it's difficult to explain why what they need/want could take years. And often, they don't really want what's going to take years. They just want to make it look that way to customers.
In your case, I think I you did a really smart thing to go out on your own. This way you don't have to explain it to anyone. Great article. I don't know what your company does but I'm going to check it out.
Interesting article, I hope that I don't ever have to work with someone with an eyebrow piercing.
two words : nice read :)
I appreciate the deep and thoughtful insight that's provided here. I'm not sure, however, what the failings of add-on features really are. I don't disagree that they are there, I just don't fully appreciate the differences.
I can see that other 'smart-news' companies haven't done as well. At least, the user experience is missing something. When I first discovered Prismatic, there was an intangible quality that encouraged me to share the experience with my peers. I imagine that many of them had pre-conceived ideas about what delivering the news was, and they're only augmenting the experience a little with rankings, recommendations and other data features. Is the difference that the news articles on Prismatic are more integrally linked, more naturally evolving one to the other?
Possibly the issue is really about a simple user experience. Making smart usable products usually means keeping it simple, not tacking on new features to clutter the whole experience. Hiding some of the features behind the scenes but affecting the way news is ranked, clustered and served may accomplish the same thing, whether the business owners are technical or not.
I've often faced the barriers caused by differences in perspective. As a developer, most people I talk to don't understand the features that drive value into their products. There's a quality missing until someone can have a deep appreciation for detail. However, this is just a natural problem of not thinking about the same problems in the same way. This may be important because even business people can be trained to think about a problem in a useful way, even if they don't fully appreciate the technology they're working on.
Bottom line, I'm curious about the reasons why Prismatic is really a quality experience and so many others are not. I'm curious if the difference comes from deeper topic models (as the author suggests), or just the confidence in the underlying models to present topics clearly and simply.
Hi, very interesting thoughts and journey so far. I'm working in academic research as well (physics), and been thinking for ages whether and how it could be more beneficial if I was doing startups instead of going towards professorship. These days I'm more using my startup/hacker cultural experience to change things and shake up the lab around me, with some success. This piece just made me think of my objectives again. And just checked out Prismatic, it feels brilliant!
I can't see any obstacle the academia imposes on you. What you simply wanted may be just a group of people like you working on the same thing. (i.e. making NLP products).
Why is it that we hear mostly justifications of personal decisions to leave academia for industry and not about people going the other way?
I also started grad school as a pure mathematician, and then discovered AI. But I am convinced that AI *is* mathematics (just not the flavors of math that live in Math Departments), applied to the fundamentally important problem of modeling the mind.
I'm glad to hear that you had a good experience creating a satisfying consumer product and commercializing it. But spend some time reading biographies and the history of science and mathematics, and consider what it would be like to spend decades working on problems that are worthy of the history books, things that seriously change the state of human knowledge, not just the quality of the web browsing experience. There are a lot of unimportant problems out there, both inside and outside of academia, and people build careers around them. But you're in a position to pick a problem you consider genuinely important. Think about it.
Thanks for your honesty in writing this post. I'm also a Berkeley PhD who left academia for a startup (Britely.com, a tool for capturing the best ideas you read online - often found via Prismatic!).
The people I'm working with now are wickedly smart, intensely passionate, and well-balanced. I have never been happier than I am now that I am building stuff. Publish or perish, no thank you. I couldn't do what I do (growth insights & strategy, new product development, social & behavioral design) without my PhD, but in the end I'm not very motivated by adding lines to my CV. I'd rather be making things that real people use to solve real problems.
YES: "What I like most about our approach is that we're always motivated by direct and real problems and the solutions are free to delve into deep abstractions and the technical trenches."
Hope our paths cross some day, Aria. (And for the record, I disagree w/ anon re: working with people with eyebrow piercings.)
Caneel Joyce | @caneel