Data West Brings Technology Leaders to SDSC
Data and technology enthusiasts from around the world descended upon the San Diego Supercomputing Center (SDSC) for the third annual Data West conference, which is taking place this week on the campus of the University of California, San Diego.
Hosted by the SDSC’s Center for Large Scale Data Systems Research (CLDS) and the Data Science Hub Business Exchange (DSH BX), Data West 2018 is a two-day conference that allows leaders from industry, government, and academia to collaborate on the challenges and scientific, business and social opportunities presented by today’s big data collections and technologies.
The show began Wednesday morning in the East Wing of SDSC, where over 100 people attended dozens of sessions presented by academic researchers and industry representatives, and also perused HPC and AI solutions presented by 15 vendors in the technology expo.
Among those demonstrating compelling solutions was Amarnath Gupta, a database expert at SDSC who presented the findings of one of his social media research projects — specifically, on the nature of Twitter discussions of voter fraud.
By using graph analysis to trace the use of hashtags used in the online discussion, Gupta was able to show how similar messages propagated outward rapidly over a short period of time, and then how the network of communicators became more “dense” after the initial wave of tweets had ended. His analysis also showed that more than half of the 300 million or so tweets used in his analysis originated from just a handful of Twitter accounts – about 50 of them. That could indicate that the voter fraud discussion was actually managed centrally, and did not occur as organically as a neutral observer might assume.
The system that Gupta built to conduct this research, which is composed of open source AsterixDB, Solr, Neo4j, Postgres, and Apache Spark tools (among others), is named “An Analytical Workbench for Exploration of SOcial MEdia,” or AWESOME, which might be the greatest product name ever for a big data product. The PhD said there is a possibility that AWESOME could be turned into a commercial offering. Specifically, the platform could be used to build cybersecurity tools that companies could use to protect themselves from online attacks designed to hurt their brand and image.
The Data West Expo was full of sponsors that Datanami readers will recognize, including Cray, Dell EMC, Collibra, and Booz Allen Hamilton, as well as some that they may not, like Decision Sciences, AEEC Innovation Lab, and Chatham Hill.
One vendor making an inaugural appearance in the Expo was GigaIO, a Carlsbad, California startup that’s developed a new PCIe-based data fabric that could simplify how data moves in large compute clusters — not to mention speeding it up considerably – without the need for complex, proprietary interconnects.
According to GigaIO vice president Steve Campbell, the FabreX switch delivers extremely low latency for today’s demanding AI and HPC workloads. In a X16 configuration, the FabreX latency is 43 nanonseconds, and for an x8 link, it’s about 86 nanoseconds. The switch, which will soon be generally available, supports full PCIe Gen 3 transmission rates of 256Gbits/sec transmission rates at full duplex, and will support PCIe Gen 4’s 512 Gbits/sec rate soon.
The flexibility of the FabreX is another selling point. Its software-defined approach allows users to connect a variety of computational resources, including CPUs, GPUs, TPUs, and FPGAs, with other PCIe endpoint devices, including NVMe storage and traditional PCIe storage, without the “rip and replace” waste demanded by traditional interconnects.
Data West 2018 also featured an industry session by Hus Tigli, the founder and chairman of Xaxar, a San Diego startup that just came out of stealth with machine learning software designed to classify network data in real time.
Tigli says Xaxar’s algorithms, when implemented on data center networks via lightweight agents, are able to identify the different types of traffic with upwards of 85% to 95% accuracy, just by examining 14 fields in the packet header (no deep packet inspection necessary). By identifying the types of traffic, data center operators can create rules that allow the data to be routed more efficiently, such as via OpenFlow tables, adjustment of buffer settings, routing to dedicated physical links, or even using high-bandwidth photonic or peer-to-peer wireless links, he said.
The upshot of having a more efficient network, Tigli said, is higher server utilization rates. “If data traffic can be classified real time, rapidly, and correctly, it can be managed more intelligently, efficiently, and profitably,” he said. In tests with IBM, Xaxar’s technology contributed to a 50% reduction in the time required to complete a given server job, which translates to a doubling of the server utilization, Tigli said. In a data center with 50,000 servers, the technology could save $37 million in capital expenditure (CapEx) and $4.5 million per year in operating expenditure (OpEx) related to power and cooling, he said.
Data West 2018 continues today with its Executive Forum, which is being held at the nearby Sanford Consortium for Regenerative Medicine on the Torrey Pines Mesa. Among those speaking today include representatives from Pfizer, Cerner, the J. Craig Venter Institute, StemoniX, and Sirenas.