When talking about big data, one can analyze a relatively small amount of data really quickly, or build something like a Hadoop cluster and analyze a lot of data over a long period of time.
“Today we are living in a world of compromises,” said SAP’s Pushkar Bhat at a SAP forum in Delhi, India. “If you were to go deep into the data and you were to look into structured as well as unstructured, you start missing out on the realtime element, you start missing out on the simplicity, you start missing out on the high speed of the data itself.”
Bhat extolled the virtues of SAP HANA, which could possibly be a solution in bridging that compromise.
For Bhat, a lot has been done technologically over the last thirty years, but the hard drive has been left relatively untouched. “Frankly, for the last 30 years that hard drives have been in existence, we haven’t seen a whole lot of innovation.” For a long time, the only improvement was a higher spindle speed.
SAP, on the other hand, delved into the relationship between the memory and the disk, noting that was the source of highest inefficiency. Instead of improving that relationship, SAP essentially eliminated it by moving the memory from the disk to the RAM itself. According to Bhat, phsyics would only allow the disk to spin so quick. “The first thing we did (with HANA) is we moved a lot of the data from disk to the main memory.”
This allows for increased parallelization. According to Bhat, instead of the number of parallel threads being proportional to the amount of CPUs, it would be proportional to the amount of cores. “We take all of this memory, break it off into small chunks to a particular core, and therefore create what is called RAM locality. Data is mapped directly to the core, and therefore it’s extremely fatser in terms of how you process it.”
The results, as Bhat reported them, are impressive. Reporting speed is 90-95% faster, as reports are generated almost as fast as data is input. Also, SAP HANA can analyze terabytes in a matter of minutes instead of hours or days.
SAP HANA already put itself to the test in the healthcare industry. A big data concern exists in healthcare, particularly at the genomic level, as researchers try to dive into the fundamentals of the human body and the diseases that afflict it. However, processing all the data that comes from gene sequencing takes an understandably significant amount of time.
With SAP HANA, according to Bhat, it does not have to. “For MKI, we said that for people who are suffering from cancer, where identifying the right type of treatment depending on what is wrong with your genes, used to take three days of computing time to do one person’s sequencing. We’re bringing that down to 20 minutes.”
Such a development would expand the reach of such intense genetic study of an individual. A test that takes three days is likely to be significantly more expensive than one that takes only twenty minutes.
However, that is just a test case. According to Bhat, the full integrated version will be available soon. In fact, that was his most definitive statement. “This particular data is etched in stone. Q4 2012, we will stand on the stage and say that HANA is ready to go and run your (important thing).”
Further innovations include allowing current BW users who have made alterations to keep their alterations upon implementing SAP HANA and providing business users the ability to look at the data at the level at which it is stored. “If you start going back and giving that level of granularity to the business, they will just love you for it.”
Initial returns with regard to processing terabytes of data quickly look impressive. As Q4 rolls around, it will be seen if SAP HANA can maintain its reported success.