
How Joyent Uses OS Virtualization to Deliver its Manta Object Storage Platform
The infrastructure service cloud provider, Joyent, has plunged deeper into the big data game with Manta, their own cloud object store and data services platform that they claim will serve to spur a wave of big data innovation in the cloud.
The recently announced Manta is described as a highly scalable, distributed object storage service with integrated compute. We caught up with Joyent Senior Vice President of Engineering, Bryan Cantrill, who oversees the development of Joyent’s core platform, to talk about their latest step into the big data arena and where they are going with Manta.
Cantrill explained that Joyent differentiates themselves from competitors like Amazon and Rackspace through their fundamental approach to the entire stack. “In order to be able to innovate up-stack with cloud services, you need to own the entire stack of software, down to the hardware,” he said, noting that Joyent owns their own operating system, the hypervisor, and language runtime. This, he explained, allows them, with Manta, to attempt new ways to virtualize which he says others can’t provide.
“There is another way to virtualize, which is to do it not in terms of hardware, but in terms of the OS,” he explained. “Instead of giving a tenant a virtual microprocessor, you give them a virtual operating system that looks and feels and smells like its own machine, but it’s actually at a higher layer of abstraction.”
What this means, he says, is that the applications that run in a virtual OS actually run on the hardware with no intervening second operating system in the stack. Cantrill explains using a tenant who wants to spin up some infrastructure with a gigabyte of DRAM. In the hardware virtualized model, the hypervisor needs to take a gigabyte of DRAM and give that to the operating system that is sitting on top of the virtual hardware. “Now if you put your app on top of virtual hardware that only uses 500 megabytes of DRAM, there’s 500 megabytes left there that is simply lost to the sytem,” he says, explaining that the system has no real way of reclaiming it.
Cantrill says with hardware virtualization, this is true across the line for all the physical resources, including DRAM, CPU, the network, etc. “With hardware based virtualization, the hypervisor is forced into these really blunt, course decisions that really prohibit tenancy and performance.”
On the other hand, Cantrill explains that using an OS virtualized model – as Joyent does with its Manta service – you get the resources that you ask for because your processes are simply processes in the larger OS. “What that means is that tenancy is much higher with a much lighter weight abstraction,” says Cantrill. “You can get many more of them on the box and they can perform at a much higher level because they’re all on the hardware.”
While the downside is that the tenant is not able to pick their own operating system – they would need to execute with virtualized hardware to do that – the upside, says Cantrill, is a significant performance benefit because the user is able to run their applications directly on the hardware, where the data is.
“We’re the only cloud provider that owns our own operating system,” says Cantrill. “What OS level virtualization allows us to do uniquely is have an object storage service (Manta) in which you can put objects, and then when you with to actually compute upon thos objects, instead of needing to get the objects out of the service into transient compute (like you would with Amazon S3 and EC2), you can actually spin up a virtual OS instance on the data itself – on the storage node upon which the data lives – and then your compute can actually execute directly on the object.”
The use cases this model opens up are extensive, says Cantrill. “It’s one of the things that is so interesting about this technology,” he explains. “Every once in awhile you come up with a technology that unlocks so much that your customers actually start to tell you how they’re going to use it.”
“For us personally, I can tell you that the rubicon that we crossed is when we started to use Manta to implement Manta,” says Cantrill, explaining that there are a number of big data problems that they ran into as they implemented the object storage service, including metering. “We need to keep track of usage of the system, and the way that is historically done is by having big MapReduce clusters that run over logs. Within Manta, we can actually just pump those logs back into Manta and then run Manta jobs over the logs to meter the service – it was so much easier to do than the alternative.”
Log processing is one of the chief use cases that Cantrill says Manta is currently being used for. Transcoding multimedia is another, said Cantrill, who explained that with Manta, because the tenant doesn’t need to pull their images into transient compute, they can transcode them at rest and create new Manta objects as a result. “Transcoding in Manta is a one-liner,” says Cantrill.
Ultimately, Cantrill says Manta can be considered a big data cloud operating system – a claim which is becoming something of a trend – late last month we saw the preview release of Hadoop 2.0 which includes the YARN resource manager, which Arun Murthy says turns Hadoop from a single application system “to a multi-application operating system.” Cantrill takes umbrage with this.
“I don’t mean to sound pejorative towards Hadoop, but in terms of an operating system, there is a very ridged technical definition of what an operating system is, and I build one and they don’t,” he explained. “I’ve been doing kernel development for 20 years, and that’s the operating system. Everyone loves to call themselves an operating system, but ultimately if there is no kernel component to what you’re doing, then what you’re doing is an app.”
Cantrill says that this is an important distinction because he believes that the OS still has a ways to go in its development – going as far as calling it the nexus of innovation. “I’ve heard so many times over my career that OS development is dead, and not interesting – everything that you can do with the operating system has been done – and we’ve done so much interesting OS work after those points. It’s like the patent examiner in the 1890’s saying that everything has been invented.”
“I should be clear that Manta has got OS components for certain, but much of Manta is a distributed system on top of that operating system, so it’s really about the innovation in the operating system unlocking up-stack innovation and making sure that we work together as a team and make sure that if there’s something that we should do, we do it in the OS – we don’t try to work around it at a higher layer of software.”
Cantrill explains that this is the key difference between their approach and other multi-tenancy cloud operating system efforts. “They are actually working around or reinventing abstractions that should exist in the OS, or they’re working around abstractions that should exist in the OS,” he explains. “I think that what we’ve built can be said with a higher degree of confidence actually is a cloud operating system in that we are giving you the OS as the abstraction on your data.”
Related Items:
YARN to Spin Hadoop into Big Data Operating System
The Art of Scheduling in Big Data Infrastructures Today
Yahoo! Spinning Continuous Computing with YARN
April 21, 2021
- EU Commission Proposes New Rules for Excellence and Trust in Artificial Intelligence
- Digital Asset Raises $120 Million Growth Round to Expand Daml Data Network
- Neuravest Launches Data Refinery, Consolidating Alternative Data Providers for Investment Portfolios
- TigerGraph Unveils TigerGraph Cloud on Google Cloud Platform and Expanded Global Developer Community
- Hive Announces Series D Funding to Unlock the Next Wave of Intelligent Automation with AI
- Qumulo Expands Global Presence to Asia Pacific, Expands Strategic Partnership with HPE
- FIDO Alliance Creates New Onboarding Standard to Secure Internet of Things
April 20, 2021
- Cape Privacy Raises $20 Million in Series A Funding For Encrypted Learning Platform
- TruSTAR Introduces API 2.0, Operationalizing Data Orchestration and Normalization
- Redis Labs Ushers the Real-Time Era with Redis as a Data Platform
- SC21: Introducing the [email protected] Data Science Competition
- Netreo Expands APM Capabilities with Strategic Acquisition of Stackify
- Starburst Empowers Tableau Users with Real-Time Data Querying for Improved Analytics
- Dataiku Announces Strategic Investment from Snowflake
- Filebase Raises $2M Led by Multicoin Capital to Unify Decentralized Storage Networks
April 19, 2021
- Tencent Cloud Makes Metrics and Data Monitoring More Efficient Through Integration with Easy-to-Use Grafana
- Novel Use of 3D Geoinformation to Identify Urban Farming Sites
- Loft Labs Open-Sources Virtual Cluster Technology for Kubernetes
- GoodData Launches Cloud-Native Platform as First Step in New Data as a Service Category
- Druva Secures $147 Million Investment to Extend Market Leadership
Most Read Features
- Big Data File Formats Demystified
- A ‘Glut’ of Innovation Spotted in Data Science and ML Platforms
- Synthetic Data: Sometimes Better Than the Real Thing
- Who’s Winning In the $17B AIOps and Observability Market
- Can Digital Twins Help Modernize Electric Grids?
- Why Data Science Is Still a Top Job
- He Couldn’t Beat Teradata. Now He’s Its CEO
- Cloud Data Warehousing: Understanding Your Options
- Is Python Strangling R to Death?
- Big Data Predictions: What 2020 Will Bring
- More Features…
Most Read News In Brief
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- AWS Adds Explainability to SageMaker
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- The Union of Salesforce, Tableau Yields Hybrid ‘Business Science’
- Insightsoftware Loads Up on Embedded Analytics with Logi, Izenda Deals
- Databricks Edges Closer to IPO with $1B Round
- Esri Simplifies Developer Access to Location Data with ArcGIS Platform
- Domo Gets the Lead Out with a ‘Palooza
- CDOs Must Shift to Offense, Survey Finds
- Fiverr Adds Data Science Recruiting Category
- More News In Brief…
Most Read This Just In
- Moody’s Analytics Wins Award for Best Use of AI in Banking or FinTech
- Aiven Raises $100M Series C to Expand Global Open Source Innovation
- Alluxio Advances Analytics and AI with NVIDIA Accelerated Computing
- AWS Announced Strategic Partnership with Hugging Face NLP Startup
- GrafanaCONline Returns June 7-17, CFP Is Open Now
- y42 Raises $2.9M to Provide a Scalable and Affordable Data Stack to Companies of All Sizes
- Novel Use of 3D Geoinformation to Identify Urban Farming Sites
- Tecton Unveils Major New Release of Feast Open Source Feature Store
- KIOXIA’s PCIe 4.0 NVMe SSDs Now Qualified with NVIDIA Magnum IO GPUDirect Storage
- SC21: Introducing the [email protected] Data Science Competition
- More This Just In…
Sponsored Partner Content
Sponsored Whitepapers
Contributors
Featured Events
-
Data Science Salon – Applying AI & Machine Learning To Media, Advertising & Entertainment
May 4 - May 5 -
ASF Roundtable – Harnessing Big Data: Constructing Data Pipelines
May 13 @ 11:00 am - 12:30 pm