Follow Datanami:
June 28, 2018

Blockchain Solutions Usher in Era of Trusted Big Data

Tripp Smith


Big Data technologies have matured to enable ingestion and storage of massive transactional and event data for integration and analysis, but governance capabilities remain a persistent challenge. Big Data solutions like Apache Hadoop and Apache Spark platforms have fundamentally changed the economics of storing and mining data that were previously inaccessible to engineers and data scientists powering an ecosystem of new business opportunities in many industries.

Industries like healthcare, financial services, and government with high criticality operations have been more reluctant to adopt Big Data technologies. The cost of “getting it wrong” vastly outweighs the value of “getting it right.” Emerging Blockchain-powered solutions seek to address these problems by enabling advanced consistency, auditability, and security features aimed at a new generation of big data problems.

Consider the following overly-simplified scenarios:

  • A pharmaceutical company wants to select patients for participation in a research study for a drug treatment promising substantial benefits for patients with a specific type of condition, but potential adverse side effects for patients that are incorrectly selected for the trial; each incremental patient added to the trial potentially adds hundreds of thousands of dollars of benefit, but each incorrectly selected participant could add millions of dollars of liability.
  • A online media company wants to identify consumers to target with an online advertisement that promises to dramatically boost sales for the correctly identified consumers; each correctly identified consumer might add dollars of benefit with a fraction of a cent in expense for incorrectly selected consumers

See the difference? The pharmaceutical company’s ethical responsibility and potential liability for errors far outweighs the incremental gains of adding an additional participant to the study. While the media company might be willing to rely on “mostly correct” web logs that duplicate or omit some events in order to increase their overall lift, the pharmaceutical company requires a very high level of confidence in their data to make a decision. Blockchain technologies are disrupting the economics of trusted data. While Big Data technologies have provided the infrastructure to process data at scale, the addition of blockchain technologies enables those data to be trusted for highly critical use cases.

(Panchenko Vladimir/Shutterstock)

Blockchain is increasingly the victim of a heightened marketing buzz. Suffice it to say that Blockchain is not Bitcoin and the larger Blockchain technology ecosystem of platforms like Hyperledger Fabric generally consist of:

  • a network of participating nodes,
  • an immutable distributed ledger for storing data,
  • a consensus algorithm for guaranteeing that data is consistent across nodes,
  • optionally a facility for smart contracts to apply computations to transactions on data the network
Big Data Innovation Blockchain Innovation
Cluster Computing Many nodes within a single organization participate in a trusted cluster to divide the effort of execution in parallel across all nodes in the cluster Many nodes in a trusted network potentially across multiple organizations work together to process and store transactions
Storage Enables efficient physical storage of data on inexpensive commodity software providing high throughput reads and writes for arbitrary unstructured, polystructured, or structured data Provides a facility to store a sequence transactions in an immutable ledger that makes it highly improbable that data can be intentionally or unintentionally changed once it is committed to the ledger; the physical ledger storage can be a variety of conventional or Big Data backends
Consistency Stores data redundantly across nodes to support robust consistency in the case of failure of one or more nodes in the cluster Provides consensus algorithms that verify transactions and commit transactions to the ledger consistently across nodes in the network even if they are maintained by separate organizations
Compute Enables distributed massively parallel processing (MPP) to divide compute workloads across the cluster Enables “Smart Contracts” which allow code to be deployed to the network to perform computation consistently and autonomously on transactions within the network
Security Provides security principles, role based access controls, and access control list features to access to data within the cluster; provides features for encryption at rest and wire encryption for data within the cluster Provides identity management through private keys and security to encrypt transactions or contracts on the network that enable data or contracts to restrict access to a subset of users


Big Data and Blockchain technologies can be used together to support a new ecosystem of viable use cases that place a high value on trusted data, data sharing, and transaction consistency that are not feasible with Big Data technologies alone. This enables industry applications with higher criticality, both in historically highly regulated industries, such as financial services, healthcare, and government, as well as industries facing new regulatory challenges such as GDPR in the context of marketing and media.

(Mikko Lemola/Shutterstock)

Emerging Blockchain solutions are delivering on enhanced trusted Big Data use cases that deliver the efficiency and granularity of distributed computing at scale in historically highly manual domains such as:

  • Customs and Border Control: The U.S. Department of Homeland Security (DHS) has awarded a [grant of $192,380 to blockchain project Factom]( to support beta testing of a platform aimed to secure data from Border Patrol cameras and sensors. Factom’s project combines blockchain technology with critical infrastructure, such as sensors and cameras, to protect the integrity and authenticity of data collected by these devices.
  • Healthcare: Alphabet’s DeepMind is building a blockchain-inspired tool built on Trillian that it calls [Verifiable Data Audit]( that will track how patient data is used and the reason why, for example, that blood test data was checked against the NHS national algorithm to detect possible acute kidney injury. This type of platform can be used to extend the FHIR open standard for interoperability in healthcare.
  • Insurance: AIG and Standard Chartered [converted]( a multinational, controlled master policy written in the UK, and three local policies in the U.S., Singapore and Kenya, into a “smart contract” that provides a shared view of policy data and documentation in real-time. Blockchain enables visibility into coverage and premium payment at the local and master level as well as automated notifications to network participants following payment events.
  • Food Safety Supply Chain: Tyson Foods announced an [initiative]( that will lay the foundation for a digital transformation of their supplier management system through a partnership with FoodLogiQ, a Durham, NC-based software developer focused on food traceability and mapping the world’s food supply chain. Blockchain can trace the parties involved in the mass production and distribution of food to identify the source of potential contamination during food safety scares.
  • Environmental Crime: The World Wildlife Fund has [introduced]( blockchain technology to the Pacific Islands’ tuna industry to help stamp out illegal fishing and human rights abuses. Consumers may have unknowingly bought tuna from illegal, unreported and unregulated fishing and, even worse, from operators who use slave labour. Blockchain technology means that soon a simple scan of tuna packaging using a smartphone app will reveal where and when the fish was caught, by which vessel and fishing method.

Each of these use case represents new applications and opportunities for Big Data that were not feasible without the auditability and security features enabled by Blockchain. Amidst the hype surrounding Blockchain topics like Bitcoin, Blockchain is building lasting value for new use cases based on Trusted Big Data.

About the author: Tripp Smith is the CTO of Clarity Insights, the largest consultancy in the US focused solely on data analytics. Tripp is a data analytics thought leader whose career has spanned entrepreneurial ventures and Fortune 500 companies centered around big data, technology vision, strategy, and product development. He is focused on technology vision, product development, business development, partner alliances, coaching, and mentoring.

Related Items:

Blockchain Starting To Feel Its Way into the Artificial Intelligence Ecosystem

Can Blockchain Help ML and AI?

Blockchain Startup Reboots with AI, Machine Learning