June 25, 2020

COVID Notebooks Aims to Speed Predictive Models

George Leopold

via Shutterstock

IBM’s new open source toolkit with AI extensions to the Jupyter notebooks data science development platform is being extended to a COVID notebooks platform designed to help analyze real-time data about the pandemic.

The company’s Center for Open-Source Data and AI Technologies developed the COVID notebooks toolkit that among other things addresses data quality issues related to coronavirus analytics. Along with compiling “authoritative” data on the pandemic, the IBM unit said it “clean[ed] up the most serious data-quality problems.”

“Policy makers are asking questions including: What stories can we tell in the aggregate? Are there patterns we see across the country? What regions or demographics are getting affected the most by the pandemic?” the company said in a blog post.

Given that underlying data about the pandemic changes daily, COVID notebooks allows data scientists to concentrate on building models rather than data cleaning. The tool allows frequent updates of results on analysts’ notebooks.

The open source pipelines Elyra and its visual editor along with Kubeflow can be used to update results with fresh data.

Data sources include the Johns Hopkins University COVID-19 Data Repository, which includes county-level information on the pandemic. The Johns Hopkins data set is widely used to develop predictive models for national and state forecasts of deaths attributed to the novel coronavirus.

Other data sources include agencies like New York City’s Department of Health and Mental Hygiene, which includes borough-level data on the early epicenter of the pandemic.

In one scenario, IBM said data could be analyzed to detect correlations between poverty and infection rates. “Open source developers and data scientists can easily build on these tools to extend the analysis to their individual use cases,” notebook developers added.

In May, IBM announced an extension of its Elyra AI Toolkit to the industry standard JupyterLab user interface with the goal of simplifying development of AI and other data science models. The initial release included a visual editor for building AI pipelines along with the ability to run interactive notebooks as batch jobs. Other features include Python script execution and a “hybrid runtime” capability based on Jupyter notebooks’ enterprise gateway.

The COVID-19 toolkit incorporates Jupyter notebooks and Python data science libraries, including Panda. Panda data frames were used for cleaning and data analysis. IBM said it is extending Pandas for natural language processing applications.

Meanwhile, a graphical workflow editor built as part of the Elyra project ties the COVID-19 notebooks into workflows for running daily updates. Those data are collated into an appropriate format for easier analysis with tools like Pandas, IBM said.

Recent items:

IBM Extends Jupyter Notebooks for AI Development

How the Lack of Good Data Is Hampering the COVID-19 Response

Applications: Artificial Intelligence, Data Mining, Predictive Analytics, Research Analytics

Technologies: Frameworks

Sectors: Biosciences, Healthcare, Other

Vendors: IBM

Tags: COVID notebooks, COVID-19, data analysis, data cleaning, Elyra, Jupyter notebooks, JupyterLab, Kubeflow, pandemic, predictive models

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

COVID Notebooks Aims to Speed Predictive Models

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

COVID Notebooks Aims to Speed Predictive Models

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link