Sears Rides Hadoop Up Retail Mountain
Falling behind Walmart and Target in retail store sales, Sears hopes to rebound by investing fairly heavily in Hadoop. Sears revenue had decreased from $50 billion in 2008 to $42 billion last year. However, smarter marketing as a result of being able to keep all their data and target customers individually has resulted in sizable growth over the last year, with sales over this year’s quarter ending on July 28 up 163% from the same quarter in 2011.
“With Hadoop we can keep everything, which is crucial because we don’t want to archive or delete meaningful data,” said Sears Chief Technology Officer Phil Shelley. Sears has seen their big data processing, especially with regard to evaluating marketing campaigns, quicken as a result of moving their data from Teradata and SAS onto Hadoop. According to Shelley, what took six weeks’ time now happens within a week on Hadoop. Their current 300-node cluster which contains 2PB allows the company to keep 100% of their data instead of a meager 10% according to Shelley.
Sears’s view and strategy regarding big data is an interesting one. Along with being Sears’s CTO and Executive VP, Shelley runs MetaScale, a Sears subsidiary whose goal it is to move into providing Hadoop services for other companies similar to Amazon and their Amazon Web Services.
It would seem that, in an effort to compete with Amazon, they would have a little catching up to do. On the other hand, Sears’s big data efforts currently surpass that of Walmart’s, who just started running ten Hadoop test nodes for experimental e-commerce analysis. Sears did that in 2010.
It is unfair to compare Sears, which makes it money historically from their physical stores, to online stores like Amazon. It may not even be fair to compare them to Target and Walmart, as Sears has more of an appliance focus while Target and Walmart are more general.
With that being said, Sears, and specifically MetaScale, wants to exist in the big data market. They have some interesting viewpoints regarding that. For example, Shelley sees little value in the modern era for ETL.
“ETL is an antiquated technique, and for large companies it’s inefficient and wasteful because you create multiple copies of data,” Shelley says. .“Everybody used ETL because they couldn’t put everything in one place, but that has changed with Hadoop, and now we copy data, as a matter of principle, only when we absolutely have to copy.”
Shelley’s principles are sound and may have led to the drastic reported reduction, $500,000, in their mainframe costs per year. Some, like Cloudera CEO Mike Olson, warn against the complete departure from ETL. But to Shelley the move is intuitive. “If in three years you come up with a new query or analysis, it doesn’t matter because there’s no schema,” Shelley says. “You just go get the raw data and transform it into any format you need.”
Related Articles
Cloudera CTO Reflects on Hadoop Underpinnings
Six Super-Scale Hadoop Deployments
May 3, 2024
- IDC: Structured Database Workloads Drive Major IT Infrastructure Spending in Late 2023
- Oracle Database 23ai Brings the Power of AI to Enterprise Data and Applications
- Solodev Debuts Advanced AI Services to Boost Cloud-Based App Development
- Sigma Announces AI Toolkit for Business and Launches Sigma Actions to Support Custom Data Applications
- New Salesforce White Paper Tackles LLM Security Risks
May 2, 2024
- Informatica to Welcome Data and AI Visionaries at Informatica World 2024
- Tealium for AI Launches to Address Data Readiness and Governance Challenges in AI Projects
- Eviden to Launch Applied AI and Edge AI Training Center in Collaboration with NVIDIA and UMBC Training Centers
- Airbyte Introduces Comprehensive Partner Program to Enhance Data Movement Services
- Dremio Enhances SQL Engine Capabilities for Optimal Performance in Data Analytics
- Galileo Introduces Protect, a Real-Time Hallucination Firewall to Safeguard Enterprise Generative AI
- LakeChime: A Data Trigger Service for Modern Data Lakes
- Confluent Unveils New Capabilities to Apache Flink Including Freight Clusters for Cost-Effective Data Handling
- Satellogic Releases Open Dataset for AI Model Training
- Tableau and Databricks Expand Strategic Partnership
May 1, 2024
- Quantum Corporation Launches Quantum GO with New Subscription Model for Data Management
- Exovera Enhances exoINSIGHT with Advanced Data Exploration and Collection
- Predactiv Unveils New Platform for Enhanced Data Integration and Analysis
- Deloitte and Google Public Sector Introduce New GenAI Platform for Smarter Government Solutions
April 30, 2024
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
CDAO Canada Public Sector 2024
June 18 - June 19 -
AI Hardware & Edge AI Summit Europe
June 18 - June 19London United Kingdom -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States