Follow Datanami:
August 7, 2015

Three Tips for Building a Big-Data Back-End for Your Mobile App

Ashish Thusoo

With more companies striving to become data-driven, many businesses are developing mobile applications and integrating big data analytics for their products or services. As companies expand, along with their user bases, they must be prepared to handle the massive data influxes those mobile apps are causing.

As virtual architecture becomes more widely adopted to manage large datasets of mobile data, companies need to find a solution that will allow them to focus on the valuable insights hidden in these datasets, instead of having to address infrastructure issues, such as costs, storage capacity and elasticity. Many organizations have taken to the cloud to avoid these issues.

Here are the top three tips for building a big data driven back end in the cloud to support mobile applications:

Invest Aggressively In Sophisticated Data Capability, But Not Necessarily Data Center Infrastructure

As mobile companies such as Ola Cabs and MyFitnessPal continue to embrace analytics and drive the latest surge in data volume, these kinds of companies are experiencing new demands on IT infrastructures for handling massive amounts of data. The biggest challenge is how to meet growing performance demands while minimizing management apps

Companies should carefully consider if it’s worth owning and maintaining data infrastructure. Alternatively, they can leverage cloud services such as AWS, Microsoft Azure or Google Compute Engine. Not only can the cloud help to potentially cut capital costs, but much more importantly, it can reduce product risks and critical results timelines.

Additionally, in any big data project, storage capacity must have the ability to scale quickly to accommodate datasets, while computational power must be able to scale both up and down very granularly and quickly. The cloud offers a convenient solution for addressing these concerns and also offering pay-as-you-go models, so that businesses only pay for what they use, providing enormous flexibility.

Another challenge with big data is finding innovative technological solutions that can pick up where traditional databases and existing scalable architectures leave off. When it comes to mobile applications, data is collected from multiple systems and must be presented in a format where a person can take action. For instance, companies should consider using Hive to unlock raw JSON event data from databases. Hive comes with columnar input formats, such as RCFile and ORC, which allow users to reduce the read operations in analytics queries and allows each column to be accessed individually. By organizing data to be more easily readable and accessible, companies will be able to focus on the data collected rather than the process to unlock the business insights from their mobile applications.

Use Real-Time Querying to Maintain a Competitive Advantage

With mobile applications, data is consumed more quickly and frequently because of how accessible it is. To analyze this steady flow of large datasets, companies can use querying engines for optimizing ad-hoc interactive queries on data sources up to petabytes in size.

Back when big data was just getting off the ground, early adopters of open source Hadoop achieved competitive advantages through analysis of vast amounts of multi-structured data to gain actionable insights. Today, thanks to fast SQL-on-Hadoop solutions such as Presto-as-a-Service in the cloud, that next competitive advantage is real-time data query. Real-time query software behive_hadoop_stinger.gifnefits the bottom-line of businesses by allowing decision makers to gain the actionable insights they need to make better decisions faster than the competition. In addition, real-time query, in conjunction with geo-location tools, gives companies the ability to track, interact with and influence customers in a way that drives sales while enhancing the customer experience—all in real time.

Real-time query solutions can help to achieve accelerated speed to insight through an interactive and incremental process. It enables users to circumvent sluggish data-refinement pipelines by streaming detailed datasets directly into Hadoop. It’s also important to note that the metadata analyzed in Hadoop is shared by all processes. This means that if users are able to extract additional meaning from the data during real-time query sessions, these additions become visible to the other processes in the system. As a result, discovery is also accelerated, and all departments, such as marketing and operations, are able to see and use the data, interpreting its value as it applies to their specific role in the company.

Additionally, real-time query on Hadoop allows organizations to carry out full-fidelity analysis of data, picking up where insight and discoverability leave off. Along with providing full access to both summary and detailed information, real-time query software gives analysts the flexibility to easily ask unanticipated ad-hoc questions. With the ability to interact iteratively with vast stores of structured, semi-structured and unstructured data, end users can not only see trends, relationships and patterns hidden in the raw data, but all of the supporting details as well.

The ultimate desirable outcome of real-time query on Hadoop for business is affordability and increased profits. A cloud-based Hadoop solution can bring initial costs down from thousands to hundreds of dollars per terabyte because it utilizes open-source software running on a cluster of commodity servers. And with real-time query on Hadoop, the need to move data from one system to another is completely eliminated, saving organizations money and more importantly, valuable time.

Enable Continued Flexibility

The mobile enterprise is gaining momentum because of the business value, however corporate IT developers are struggling to keep up with the pace of mobile workers’ demands for access to key business processes and applications. The flexibility and elasticity of the cloud can provide business agility to address these concerns.

When reviewing potential cloud solutions, businesses should consider a combination of auto-scaling and spot-pricing options to access the capacity needed. This will also help the business to save money and gain more value from their investment.binary cloud

Additionally, in the big data solution selection process, it’s important to think holistically and consider the business priorities, remembering that these priorities are subject to change. With that being said, there is no “one-size fits all” infrastructure solution. What makes the cloud attractive is its elastic infrastructure, which can enable other dimensions that make the business more agile and high performing.

Finally, different teams across companies must also have access to the best analysis tools needed to derive decision-making from various types of big data workflows. In the mobile world, there are certain queries you’ll want to perform after the fact. For example, advertising workflows usually involve analyzing historical data, which involves ad-hoc analysis of past behaviors. On the other hand, there are emerging use cases whereby real-time streaming analysis is needed to provide context service. An example of this would be asking an app, “What is the best Italian restaurant nearby that’s open right now?”

To Sum It Up…

These three tips can help ensure that organizations are fully equipped to handle vast amounts of data and scale to the needs of their growing business. It’s now standard to have business intelligence and analytics integrated into mobile applications to assist businesses with decision-making and collaboration across all departments. Mobile companies must take the proper steps in order to focus on the value of their data, instead of worrying about how to manage it all.


About the author: Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team; under his leadership the team built one of the largest data processing and analytics platforms in the world. This platform achieved not just the bold aim of making data accessible to analysts, engineers and data scientists, but drove the “big data” revolution. In the process of scaling Facebook’s big data infrastructure, he helped drive the creation of a host of tools, technologies and templates that are used industry wide today. Ashish also contributed to the development of the Apache Hive data warehouse infrastructure project for Hadoop.


Related Items:

Big Data’s Cold Feet Syndrome

How to Turn Your Company Into a Data-Driven Enterprise