What to Expect at Strata This Week
We knew that big data was also fast. But what we didn’t fully grasp perhaps was just how fast the technology changes. This week’s Strata Data Conference is setting up to be a prime example of the punctuated equilibrium we’re currently experiencing.
Just a few years ago, this bi-annual event was called Strata + Hadoop World. But when Hadoop’s influence waned, O’Reilly Media and Cloudera changed the name. And while Cloudera still sells a Hadoop platform, the company is much more interested in clouds these days. That’s why it’s taken to calling itself the “enterprise data cloud company.”
So, what will we see in the Javits Center this week? It doesn’t take a crystal ball to know that AI will loom large. Machine learning and deep learning are key engines driving AI today. Whether you’re into Python, R, SAS, or another language, there’s something for you.
Attendees can spend their entire day hearing from data scientists about how they’re using ML to do AI, whether it’s feature engineering in Spark or anomaly detection with TensorFlow. You can check out the Strata Schedule for Wednesday September 25 (the day that general sessions start) for the specifics.
AI will also have a large presence in the expo, too as many third-party vendors hawk their wares. Companies selling AutoML solutions, such as DataRobot and H2O.ai, have received big rounds of funding from venture capitalists in recent months, which you can consider a proxy indicator for interest in the space.
The dirty little secret of ML, however, is that data scientists spend three-quarters of their time working with data. Data, it turns out, comes in all shapes and sizes, and you can’t just run it through the algorithm to get the answer (this won’t surprise regular Datanami readers).
The solution to this problem, to varying degree, is the field of data engineering. From ETL to data cleansing to data catalogs, solutions are being developed to help address your data challenges. Some of the vendors looking to push the puck forward at Strata in this regards include Alation, Trifacta, io Tahoe, Dremio, StreamSets, and others in the data catalog, data prep, data integration and DataOps spaces.
Data governance, privacy, and security are big demands that organizations must grapple with, especially as GDPR went going into effect in 2018 and California’s CCPA slated to go into effect in 2020 (although key aspects of CCPA have yet to be defined by the California legislature). To that end, solutions that can help to automate some of the tasks required to implement data privacy and governance are gaining attention. Vendors like Immuta, Okera, Privacera, and others.
Security is never something that line of business wants to implement, but you can be sure that it’s on the minds of your IT minders. For every breakthrough-making solution, the security firm needs to ensure that it’s kosher. To that end, the security-making firms like Zaloni and others have solutions that can help.
For all the love that governance, security, and ETL (continues) to get, good old data storage and processing continues to put bread on the table for a large chunk of the data-liking populace. To that end, there’s no shortage of vendors specializing in storage of big data, including ArrangoDB, DataStax, MemSQL, MinIO, TimeScale, and Yellow Brick Data, and others.
For all its controversy, big data remains a “thing.” Like the phrase or not, it reflects a market demand and a style of computing that is growing — with Hadoop or without it. In New York City this week, the technologies and techniques for storing, securing, and processing big data will be front and center.
Stay tuned to Datanami for coverage from Strata Data Conference this week.