Top 12 Datanami Stories of 2019
2019 was an eventful year in the big data space, with enough intersecting story lines to keep a big data watcher enmeshed for hours – if not days — on end. We did our best to trace the story lines out for you, dear reader, to help you better understand what new technology is emerging, who’s using it for what, and why.
One useful editorial exercise to do at the beginning of the year – when the news flow is rather slow — is to determine what stories captured readers’ attention over the past year. To that end, we’ve compiled a list of the top dozen 2019 stories, as determined by pageview numbers, which is driven directly by the readers. We hope you find this list is as worthwhile as we do.
Without further ado, here are the top 12 Datanami stories from 2019 (headlines are clickable, by the way):
There were three Hadoop distributors at the beginning of 2019, and only one at the end. The popping of the Hadoop bubble came on suddenly and was worse than some expected, but is the technology completely finished? Some technology leaders aren’t sure what can replace Hadoop on prem.
Kafka was positioned by its backers as the anti-database, something designed specifically for managing event data, which databases aren’t designed to do. So when the Kafka folks backtracked a bit and embraced the database concept, it was newsworthy.
Hewlett Packard Enterprises stepped up and acquired MapR’s assets in early August, it put to rest one of the most intriguing commercial sagas of the year. How HPE will monetize the MapR assets, which includes the versatile MapR File System, however, is not yet clear.
Interest in emerging artificial intelligence and machine learning techniques was sustained throughout the year. Don’t expect the excitement around AI and ML to ebb in 2020. If anything, it’s set to increase.
The demise of MapR — which was spurred by the delay of a cash infusion from investors after first quarter 2019 results came in lower than expected — and the fire sale to HPE was one of the biggest stories of the summer in the big data space. People were talking about it, and it showed.
With about 10 billion parameters, the world’s largest neural networks are maxing out the capacity of the biggest clusters, according to Intel’s GM of AI, Naveen Rao. That means the current 10x growth rate of the size of deep learning models is unsustainable. Will hardware innovation bail out AI researchers? It’s something to watch in 2020.
Early summer (remember summer?) brought a flurry of news, from the business struggles of Cloudera and MapR to the acquisitions of Tableau and Looker. Were venture capitalists preparing for a recession, or just rearranging their portfolios? Or maybe a bit of both?
Kafka excels at managing event data, while databases are best for managing state. Instead of compromising on data architectures when both approaches are required and cobbling them together, why not just bring them together into a single architecture? That’s essentially what the folks behind Kafka have done with ksqlDB, which it unveiled later in the year.
The answer is no, you may not stop doing ETL, probably ever. Any other questions?
One potential solution to the problem of not having enough people with the title of “data scientist” is just to change the definition of what a “data scientist” is. In response, titles like research scientist, deep learning engineer, and machine learning engineer now trending.
Top 10 lists, which were in style among big data watchers at the beginning of the year, deteriorated quickly and became out of date by year’s end. Their replacement: The top 12 list, which is seen as fresh, exciting, and new.
Like our polarized national politics, the data community is divided over which programming language is best for data management and machine learning. But in the end, the story is really about Python’s remarkable ascendance. R has not kept up with Python in the popularity department, but reports of R’s death are greatly exaggerated.
Thanks for reading!
The Datanami Team