Spring Strata 2016 Coverage
Shiny new objects are easy to find in the big data space. So when the industry’s attention shifted towards processing streams of data in real time–as opposed to batch-style processing that was popular with first-generation Hadoop–we saw dozens of promising new technologies pop up seemingly overnight. One of them was Apache Kafka.
The interesting thing is that Kafka wasn’t actually new. Jay Kreps started writing the software, which serves as a messaging layer for moving data, when he worked at LinkedIn in 2008, and the software was contributed to open source in 2011. Read more…
Feature Articles from Spring Strata 2016
The big data ecosystem was on full display at last week’s Strata + Hadoop World conference in San Jose. At the ripe old age of 10, Hadoop is still the driving force, but newer frameworks like Spark and Kafka are gaining steam. Here are some of the top trends your Datanami editor pulled from the show based on observations and discussions with attendees and vendors. Read more…
Elastic today announced that it’s added a graph query engine to Elasticsearch engine users now have the option of using their search indexes as the basis for conducting graph analyses. The new option will make it relatively easy for customers to conduct big data analysis for use cases such as fraud detection and product recommendations. Read more…
Shakespeare once pondered the nature of names, pointing out that “a rose by any other name would smell as sweet.” For data scientists, the meaning behind the title is not just an epistemological exercise, but a practical problem that has consequences upon that delicate dance between employer and employee.
The data scientist shortage is having all kinds of impacts on how organizations approach big data projects. Read more…
As we learned in the first part of this series, the gap between demand for skilled data scientists and supply is driving salaries north of $200,000 in some areas of the country. If big data analytics is to be democratized, steps must be taken to ensure that this short-term misalignment doesn’t turn into a long-term problem. Read more…
Today the ODPi issued the first set of documents that describes a standard distribution of basic runtime components for Hadoop, including YARN, HDFS, and MapReduce. Going forward, the organization is preparing a management specification for Hadoop as it considers which Hadoop problem area it will tackle next.
The ODPi was founded a year ago on the eve of the Spring Strata + Hadoop World conference as the Open Data Platform initiative to help reign in some of the complexity that’s impacting Hadoop distributors, software vendors, and users. Read more…
If your company is looking to hire data scientist right now, good luck. Five years after Harvard Business Review first shone the spotlight on the data scientist shortage, the gap between data science supply and demand remains substantial. In fact, the gap may be getting bigger.
How big is the data science skills gap? Read more…
News in Brief from Spring Strata 2016
The global big data market is poised to explode over the next decade, according to a new forecast, topping an estimated $92 billion by 2026 as new streaming analytics technologies emerge.
Market researcher Wikibon said this week it expects the global demand for big data services to grow at a hefty 14.5 percent annual rate over the next decade. Read more…
An artificial intelligence (AI) startup out of Berkeley, California called Bonsai.ai won the Startup Showcase at the Strata + Hadoop World today. The second and third-place winners were also announced, as was the winner of the audience choice award.
At Strata + Hadoop World San Jose this week, I will present with my fellow Trifacta colleague, co-founder Joe Hellerstein, a session entitled “Architecting immediacy: The design of a high-performance, portable wrangling engine.”
A big part of our session will be discussing our new Photon Compute Framework, an enhancement at the core of Trifacta’s data wrangling interface. Read more…
The latest release of a No+SQL database management platform adds integration capabilities for legacy COBOL and Btrieve systems designed to allow users to update the data management engine underneath their existing applications.
Noting that a significant number of financial and other users continue to rely on legacy systems based on COBOL and Btrieve transactional database software, database specialist FairCom Corp. Read more…
MemSQL used the second full day of the Strata + Hadoop World conference to launch a new version of its distributed SQL database that pushes forward its hybrid transactional/analytical processing (HTAP) strategy, which is gaining steam across the industry as a blended form of computing.
MemSQL is part of a new class of in-memory, horizontally scalable, relational databases that are gaining momentum for the capability to ingest and analyze large amounts of data in near real time. Read more…
Hadoop is increasingly moving to the cloud, with the Gartner group reporting that over 50% of companies are considering a cloud only or hybrid cloud solution for Big Data. Altiscale has been offering a high-performance, secure, multi-tenant cloud solution since 2014, with its multitenancy and performance capabilities driven by the use of namespaced Docker containers. Read more…
Platfora launched its end-to-end analytics application for Hadoop when the only other option was to build your own. To that end, Big Data Discovery has everything you need. But with today’s update to the tool–issued on the second day of the Strata + Hadoop World conference–Platfora is opening up the kimono a bit more in an effort to better integrate with popular tools in the ecosystem, namely Tableau and Spark SQL. Read more…
The distinction between traditional operational systems and event/stream processing has begun to blur. Stream-oriented approaches offer novel ways to build applications that yesterday would have used a more traditional stack, such as LAMP or something similar. Rather than have monolithic clients fetch, process and update data over a network, developers are building pipelines that push data through fixed processing. Read more…
In the era of RDBMS and modern data warehouses, business intelligence was mostly a solved problem. Any reasonably advanced tool would work with any reasonable database, and the only real work was deciding what to collect and how to present it. However. the rise of big data and its associated technologies has forced the market solve all these old problems all over again, and we’re now left with a proliferation of software that can be difficult to differentiate. Read more…
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds.
With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets. Read more…
The way that computing is done is changing dramatically. Instead of a program with a finite input, we now have programs with infinite streams as inputs. Why does this matter, and why is the change happening now?
This matters because life doesn’t happen in neatly defined batches. Neither should your code. Read more…
We’re only three months into 2016, but it has been an exciting year in open source and big data. With a marked jump in growth, usage and queries on Apache Kafka (Redmonk), the demand for engineering and DevOps jobs requiring Kafka talent is creating huge demand for training and skills development, as users look to leverage new features and create new deployments. Read more…
In the run up to next week’s Hadoop confab in Silicon Valley, vendors are releasing a flock of automation and other tools aimed at beefing up the mainstream data processing framework. Among them is an attempt to incorporate data science with a leading Hadoop distribution via a machine-learning approach.
Boston-based data science automation specialist DataRobot said this week its machine-learning platform designed to fill the data science skills gap has been certified on Cloudera Enterprise 5. Read more…
This Just In from Spring Strata 2016
- MemSQL 5 Unveiled – (3/31/2016)
- AtScale Releases Intelligence Platform 4.0 – (3/30/2016)
- MapR Introduces New Stream Processing Quick Start Solution – (3/30/2016)
- Confluent Introduces Partner Program to Support Apache Kafka Ecosystem – (3/30/2016)
- Dataguise Announces DgSecure 6.0 – (3/30/2016)
- Robin Systems and Zettaset Partner to Deliver Containerization Solution for Mission-Critical Applications – (3/30/2016)
- Cloudwick Announces New Vulnerability Assessment for Cybersecurity Threat Detection – (3/30/2016)
- Looker Forms Alliance With IBM Cloud Data Services to Deliver Suite of Looker Blocks – (3/29/2016)
- Platfora 5.2 Now Available – (3/29/2016)
- Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop – (3/29/2016)
- Altiscale Announces Partnership With Tableau – (3/29/2016)
- Attunity Replicate for Apache Kafka Introduced – (3/29/2016)
- ManageEngine Expands Big Data Monitoring to Hadoop – (3/29/2016)
- Novetta Entity Analytics Version 2.7 Released – (3/29/2016)
- NetApp Advances Data Analytics Performance for Third Platform Enterprise Applications – (3/29/2016)
- Dato Unveils New Features Within Machine Learning Platform – (3/29/2016)
- Bigstep Adds MapR Converged Data Platform to its Technology Stack – (3/29/2016)
- Confluent Announces First Public Kafka Training Courses – (3/29/2016)
- Manthan Partners With SnapLogic to Accelerate Data Onboarding for Big Data Analytics – (3/29/2016)
- Teradata Revolutionizes Enterprise Data Lake Design and Deployment – (3/28/2016)
- Impetus Technologies Launches New Data Warehouse Transformation Practice – (3/28/2016)
- Citus Data Releases Citus 5.0 – (3/24/2016)
- Zoomdata Developer Network Launched – (3/24/2016)
- Mesosphere Announces Two New Major Product Releases – (3/24/2016)
- Mesosphere Raises $73.5M in Funding – (3/24/2016)
- Galactic Exchange Announces Beta Availability of ClusterGX – (3/24/2016)