AWS Looks to ‘Demystify’ Machine Learning
Amazon Web Services used a big data conference in the backyard of some of its largest government customers to showcase its AI and machine learning tools that are helping to funnel ever-larger volumes of data into its storage and computing infrastructure.
Making a pitch for better data management tools like metadata systems, AWS executives addressing a big data conference in Tysons Corner, Va., said the the public cloud giant aims to go beyond democratizing big data to “demystify” AI and machine learning.
The combination of organized data and analytics will accelerate the building and deployment of machine learning models, many that currently never make it to production. Those that are deployed often require up to 18 months to roll out, noted Ben Snively, a solution architect at AWS (NASDAQ: AMZN).
Open source tools for model development often advance a generation or two in the time it takes many enterprises to develop, train and launch a machine learning model, he added.
Snively asserted that the combination of big data and analytics along with AI and machine learning creates a “flywheel effect” in which organized, accessible data leads to faster insights, better products and—completing the cycle—more data.
(Hence, the cloud vendor forecasts as much as 180 zettabytes of widely varied and fast-moving data by 2025.)
As it seeks to demystify machine automation technologies and move beyond the current technology “hype phase,” AWS executives note that deployment of machine learning models and, eventually, full-blown platforms, remains hard. Among the reasons are “dirty” data that must be cleansed to foster access. The company estimates that 80 percent of data lakes currently lack metadata management systems that help determine data sources, formats and other attributes needed to wrangle big data.
That makes the heavy investments in data lakes “inefficient,” stressed Alan Halamachi, a senior manager for AWS solution architectures. “If data is not in a format where it can be widely consumed and accessible,” Halamachi stressed, machine learning developers will find themselves in “data jail.”
Once big data is wrangled and secured—“Hackers would like nothing more than to engineer a single breach with access to all of it,” Hamachi said—it can be combined with analytics on the inference side to accelerate training of machine learning models, Snively said.
Noting that most machine learning models built by enterprises never make it to production, the AWS engineers pitched several new tools including its SageMaker machine and deep learning stack introduced in November. Described as a tool for taking the “muck” out of developing machine learning models, Snively said Sagemaker is also designed to free data scientists from IT chores like standing up a server for model development.
The cloud giant is seeing more experimentation among its customers as they seek to connect big data with machine learning development. “Voice [recognition] systems are here to stay,” Snively asserted, and developers are investigating “new ways of interacting with those systems.”
“It’s really about demystifying AI and machine learning” and getting beyond the “magic box” phase, he added.
July 10, 2020
- Bobby Soni to Lead Hitachi Vantara’s Digital Infrastructure Business Unit as President
- CoronaSurveys Project to Measure COVID-19 Real-Time Impact Now Reaches 150 Countries
- AWS Announces General Availability of AWS IoT SiteWise
- UBS Launches Big Data Shareholder Activism Tool
- Snowflake Achieves Fedramp Moderate Authorization for Snowflake on AWS and Microsoft Azure Government
- Call For Papers Now Open For In-Memory Computing Summit 2020 Virtual Worldwide Conference
July 9, 2020
- Spectra Logic Publishes ‘Digital Data Storage Outlook 2020’
- MariaDB Announces $25M Funding Round to Scale SkySQL Operations
- Domo Updates its COVID-19 Global Tracker with National Paycheck Protection Program Data from the SBA
- Cloudian Launches Operations in Australia and New Zealand
- NHS Trusts Advance Use of Analytics to Manage Patient Infection Status, Staff Exposure During Pandemic
- cnvrg.io and NetApp Partner to Deliver MLOps Dataset Caching
- Columbia Professor Confronts Healthcare Inequality in Time of COVID-19
- Oracle Autonomous Database Now Available in Customer Data Centers
- Researchers Receive NIH Funding to Develop Data-Driven Strategies in COVID-19 Fight
- FingerMotion Launches Big Data Insurance Solution
July 8, 2020
- Circonus Announces Free 45-Day Trial of its Kubernetes Monitoring Solution
- Talend Donates Nearly $3M in Data Skills Courses, Technologies to Higher Education
- HNI Corporation Taps Ascend.io to Fuel Operational Analytics
- GridGain Announces Nebula Managed Service For Apache Ignite and GridGain In-Memory Computing Platforms
Most Read Features
- Big Data File Formats Demystified
- How to Build a Better Machine Learning Pipeline
- Nvidia Destroys TPCx-BB Benchmark with GPUs
- BI Tools — Are They Enough to Build a Data-Driven Culture?
- Databricks Brings Data Science, Engineering Together with New Workspace
- How COVID-19 Is Impacting the Market for Data Jobs
- Understanding Your Options for Stream Processing Frameworks
- Is Python Strangling R to Death?
- COVID-19 Gives AI a Reality Check
- Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz
- More Features…
Most Read News In Brief
- Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks
- Researchers Explore Link Between American Individualism and Poor COVID-19 Response
- IBM Brings Back a Netezza, Attacks Yellowbrick
- New Report Ranks Countries by COVID-19 Safety
- Blurred Lines: SAS and Microsoft To Go Deep in Analytics Partnership
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- New Map Shows Hundreds of Counties in the COVID-19 Endgame — and Thousands on the Uptick
- NIH Launches Massive Initiative for COVID-19 Patient Data Analytics
- Bitnine Looks to Scale PostgreSQL
- War Unfolding for Control of Elasticsearch
- More News In Brief…
Most Read This Just In
- HSBC Joins Data Privacy Firm Privitar’s Series C Financing Round with $7M Investment
- D2iQ Unveils KUDO for Kubeflow to Accelerate Enterprise-Grade Machine Learning on Kubernetes
- SAS Debuts Tools to Gauge Risks and Impacts of Reopening
- Databricks Introduces Delta Engine, Acquires Redash
- Technology Aims to Provide Cloud Efficiency for Databases During Data-Intensive COVID-19 Pandemic
- BP Invests $5M in Geospatial Analytics Software Company Satelytics
- Alation Launches Data Governance Initiatives
- New Actian Vector for Hadoop Enables Real-time and Operational Analytics
- MariaDB Announces the General Availability of MariaDB Community Server 10.5
- Informatica Acquires Compact Solutions
- More This Just In…