According to a recent ReadWriteWeb blog post by Audrey Watters, 44% of enterprise users questioned had never heard of NoSQL and an additional 17% had no interest. So why are 61% of enterprise users either ignorant about or uninterested in NoSQL? This post contains my two cents worth on the topic.
At a recent trade show I attended, which highlighted NoSQL engines, there were many Web developers, mostly from startups. However, I was struck by the absence of enterprise users. Hence, my (totally unscientific) experience confirms the basic point of the above blog post.
Moreover, in my experience, most information among enterprise users occurs by word of mouth. Hence, if they don’t hear about something, it is because their professional network does not pass the word along. In other words, an interested enterprise professional generates additional interest. Non-interest generates the behavior seen in the above blog post. So why is enterprise interest lacking?
To get more color on the situation, I contacted a very senior technical guru at a large enterprise who is responsible for looking at new database management system (DBMS) technology for his company. I asked him how interested he was in NoSQL and, in effect, how interested his company was. He reported “no interest.” I asked him why.
He first said that the vast majority of his company’s applications are classifiable as online transaction processing (OLTP) where there are frequent small updates to a database of structured records or data warehouses/data marts which assemble historical business data for ad-hoc query by analysts. Although there are some other applications around the “edges,” such as document management, these are not considered important.
He then made one comment about OLTP, one comment about warehouses, and one general comment. These follow.
No ACID Equals No Interest
Much of the OLTP data kept by this company is mission critical. Screwing it up causes people to lose their jobs. In his world, ACID is the gold standard for update to shared data sets. Any system that does not support real transactions is considered a nonstarter in his OLTP environment.
Even if a data set can get by with single-record transactions now (a common feature of NoSQL DBMSs), he is unwilling to guarantee that it will never need multi-record transactions in the future. Put differently, his company assumes that ACID may be required in the future for any OLTP data set, and nixes non-ACID systems.
A Low-Level Query Language is Death
Data warehouses are subject to frequent ad-hoc queries like “Tell me whether pet rocks are selling better than Barbie dolls in the south?” Ted Codd’s pioneering paper, "A Relational Model of Data for Large Shared Data Banks," in 1970 advocated a user interface whereby one stated what is required and not how to fetch it from disk. In the subsequent 40 years of DBMS activity, high-level languages, like SQL, have been shown to offer ease of programming for such ad-hoc data warehouse inquiries. My enterprise guru’s company is rarely interested in the algorithmic record-at-a-time interfaces seen in most NoSQL products, as they are seen as a throwback to the days of IMS and CODASYL.
NoSQL Means No Standards
His company has a large number of databases (apparently more than 10,000), and the company is clearly concerned with the number of different kinds of interfaces their application programmers have to learn. Hence, standards are important to a large enterprise.
Seemingly, there are north of 50 NoSQL engines, each with a different user interface. Most have a data model, which is unique to that system, along with a one-off, record-at-a-time user interface. My enterprise guru was very concerned with the proliferation of such one-offs. In contrast, SQL offers a standard environment.
I want to close this blog post with a single comment: “Those who do not understand the lessons from previous generation systems are doomed to repeat their mistakes.” In other words, “Stand on the shoulders of those who came before you, not on their toes.”
Disclosure: Michael Stonebraker is associated with four startups that are either producers or consumers of database technology. Hence, his opinions should be considered in this light.
This blog makes me wonder why I pay $100 a year to ACM.
Are you seriously going to sit there and disregard a very viable set of database options just because one person in one enterprise environment says he's uninterested? Or are you pushing your own agenda in the disguise of public opinion?
How do we teach the up and coming professionals that they should use the best tool for the job, when the presumably one of the top DB guys in the industry is waging a war on new technologies in the database field? I say presumably, because your continual dismissal of NoSQL solutions will render you irrelevant.
I am in no position to defend the author but it seem to me that what he is writing here is not NoSQL bashing. This article is a valuable thing, it is making clear to any NoSQL vendor what the barriers are that need to be overcome.
I work for an ISV that sells software to large enterprises and the issues raised here are the issues that would prevent us from using NoSQL. Our customers want to write their own reports using existing data warehouses, they want a RDBMS that fits into their existing support model.
"How do we teach the up and coming professionals that they should use the best tool for the job.."
You do that by teaching them to use the best tool for the job, the point is that NoSQL is not going to be the best tool for the job as long as these barriers remain. "The job" is rarely just the application itself, data lives on forever and enterprises want to use data everywhere and NoSQL needs to embrace that reality if they want to be enterprise players.
Funny - I would take the same numbers and draw the opposite conclusion.
39% of enterprises ARE interested in nosql. Considering the nosql products themselves have only said they've been ready for a year or so, getting 39% of enterprises interested in that time is kind of amazing.
Srdjan Pejic, at the top of the article it was made clear that it isn't "just one person", "44% of enterprise users questioned had never heard of NoSQL and an additional 17% had no interest. So why are 61% of enterprise users either ignorant about or uninterested in NoSQL?" Not to mention the fact that ACM has featured many articles enthusiastic about NoSQL, does that validate your $100 a year?
In addition, it is quite clear that to an enterprise, NoSQL options are not 'viable' for exactly the reasons stated.
I'd have to say, though, that the disclaimer at the bottom of this article is uncalled for, especially since similar disclaimers have not appeared on articles by proponents of NoSQL solutions (who are also financially invested in that tech).
This is why Stonebreaker, thankfully, is waging a counterargument to NoSQL:
The average NoSQL fan typically lacks the ability to compare and understand relational database performance vs. NoSQL alternatives.
Nowhere has Mike ever stated, "For specific large dataset problems, SQL continues to outperform NoSQL." Instead, I've seen him advocate for specific solutions to specific problems. CStore becomes Vertica, HStore becomes Volt, and those who know better chose Postgres over MySQL (anybody feel like implementing something better than a nested-loops join?!?).
In my personal growth, I came to understand that most of my own startup's scalability problems had been solved before. Any time we started to get excited about Cassandra, Bigtable, DryadLINQ, PNUTS, or K-V stores like Redis, Tokyo Cab, Couch, or Mongo - a more reasoned voice in our team was able to educate everyone else that a typical relational SQL solution was still quite scalable while offering far superior consistency or isolation. We saw time and time again that NoSQL hype can easily trend towards uninformed religion.
There are very few people working on problems that really need to care about NoSQL or consistency-relaxed alternatives. In essence, Stonebreaker's opinion is _necessary_ to seriously question the NoSQL fanboy's understanding; he advocates different flavors of database solutions for different problems. That fact stands in stark contrast to your accusation that he ignores the best tool for the job, or is being rendered irrelevant.
First of all, please do not assume I am a NoSQL "fanboy". Also, how is it that you're sure I lack ability to "compare and understand relational database performance vs. NoSQL alternatives", as you put it?
A survey by Information Week is not a good representative of opinion. Most would say it's actually biased to favour established players like Microsoft and Oracle, so basing an article on those numbers is dubious at best.
Secondly, since you seem to have not read my comment carefully, I was complaining about the influence of this post by this author on "the best tool for the job" paradigm.
If your startup determines that basing your data store on a relational database is the best way to go, I will fully support you in that choice. Personally, I know that requirements my projects have fit better with a data model based on a K-V store like Mongo for stuff other than e-commerce. The e-commerce portion will go into something like Postgres, because the need for consistency is greater. Again, best tool for the job.
However, when influencers in our field start spewing what is essentially FUD (and mind you this is not the first instance of a post like this from Stonebraker), that is what I take issue with.
Jay Wright, why weren't any of these many enthusiastic articles referenced here as a counterpoint? Could it be because Mike has an agenda against NoSQL solutions?
As for the stats reference, refer to what I said above about Information Week.
I'm a senior level technical person at a large enterprise, and we are VERY interested in post-SQL and post-relational databases. So now you can change the title of the article to "Why are Only Half the Enterprises Interested in NoSQL", based on your sample size :-)
Our core business is serving information from non-relational stores. My group has been doing a lot of work lately with MongoDB. We're too busy generating tight, clean, readable, flexible, and fast applications to make it to those trade shows you mentioned.
I spent the past 20+ years working with (and at one time for) Oracle. I find it more than a bit ironic that people are now trying to defend MySQL, SQL Server, and Postgres as the "solid and reliable" database choices vs. the new NoSQL crowd. If the data is mission critical, it probably belongs in Oracle. Everything else is riskier. An argument against NoSQL based on risk is an argument against every other RDBMS. And if you think database "X" is better than Oracle in terms of reliability, put it at the top. The argument still stands.
And while SQL is fine when used as intended, as an ad-hoc query language, it is absolutely dismal as a programmatic interface. Interrupting the flow of code in the native language you are coding in to inject a blob of functional goo that requires endless buffering, translations, and a complete different error handling sub-environment should NEVER be considered the right way to do it.
In some of the subsystems we're working on, we're seeing an order of magnitude reduction in code size when moving from embedded SQL to MongoDB. And the code is infinitely more readable. It may not be a "standard" like SQL, but then again, neither is SQL. You still end up writing different abstraction layers and ORM junk and all kinds of other bloatware to deal with each vendor's "enhancements" to SQL.
You simply have to sacrifice too much when working with RDBMS' in order to reap the benefits. Forced, static schemas, jumping through SQL hoops, and all of the other 1970s techthink is not the future, regardless of what the enterprise guy you know says.
Its not really true that NOSQL does not support ACID properties. Some NOSQL solutions do support ACID very well. The confusion occurs due to the myriad different technologies that are clubbed together in the NOSQL segment and the message being propagated by the most "popular" ones being taken as representative of the whole group. This goes against the grain of the NOSQL movement that "one size does not fit all".
Another popular reason given for the non adoption by enterprise firms is that NoSQL does not have standards. This should be looked at from the perspective of the movement being so young. Also the different products have such different use cases it is not realistic to come up with a standard. The different niche markets within the movement need to develop further before any standards can emerge. That being said, I agree that, it is a negative for a lot of enterprise firms looking at this market but those enterprise firms that feel the pain that some of these technologies say they would solve, are actively looking at this group of technologies. For those who don't see the pain yet or those who don't see the advantages that some of these technologies might provide them, then yes, they aren't interested. But I also believe that a lot of those firms are going to soon reach a point where they feel a need for some of these products that could manage and analyze huge amounts of data given the fact that a lot of firms are sitting on huge piles of data, they don't know what to do with or how best to leverage. Another issue with enterprise adoption is that most of these technologies are open source without much commercial support. But that situation is rapidly changing with new commercial entities coming up behind a lot of these technologies, and that might give enterprise adopters some level of confidence. Here at InfiniteGraph we had a conversation with a fortune 500 firm VP who said that, "while graph databases are new, the problems they solve are not".
Enterprises are interested in scalable, robust databases. They aren't interested in buzzwords like NoSQL. Technology does matter, not word of mouth.
As an example I can only say that SAP is currently developing its own column-store called NewDB. It's not a "let's try that" startup, it's based on a very mature TREX search index engine which can be used as general storage with good scalability and performance.
Oracle's CTO mentioned a year ago "we were always in the cloud, but there were a different word for it".
Enterprise may be conservative and slow, but they aren't stupid.
If it's said that on 48core machine the RDBMS is spending 85% CPU in Core, thay will start to search a solution.
There are a couple reasons, I think, why "Enterprise Organisations" have little interest in NoSQL databases.
Part of it is "no ACID means No Way" as you mentioned. Honestly, I think this is less true factually than it is simply an implicit belief on the part of those communities. I've seen many cases where pessimistic locking ended up being a strong limiter to performance and was, upon examination, not merited. I've also seen many cases where the application layer negated the utility of ACID transactions. How many web apps slam updates of potentially stale data?
The second is that unlike startups the organisations are more segmented and more populated by specialists. The choice of DBMS is usually in the hands of a DBA group, where everyone has spent their career learning how to provide care and feeding for (usually one vendor's) RDBMS. Anything else is "weird", "scary", and ultimately a threat.
Finally there is "High level language" access. This is a spot where the NoSQL community is behind the curve. As a developer, SQL and Pig Latin (a query language for PIG, a query front end for Map Reduce on Hadoop) are pretty much the same. Yet I know that I've run into hundreds of technical people, managers, qa people, even "power business users" who happily write (or at least cut and paste) SQL but whose response to even "cd \foo; dir" is some variant on "I'm not a programmer don't make me code." I can't explain it. It doesn't make sense to me. But empirically it is very true.
The combination makes it virtually impossible to even consider using a NoSQL data store in such organisations in cases where it makes sense. And yes, there are cases where it makes sense and cases where it does not.