In this week’s recap of some of the noteworthy items from recently-published scholarly research on data-intensive computing topics, we take a brief look at a new file system designed to address secure data; the past, present and future of sentiment analysis and opinion mining; some novel approaches to business intelligence and more.
Let’s get started with one of the most interesting research pieces that came across our desk:
The Jigsaw Distributed File System
Researchers from the University of Arkansas’ division of Medical Sciences and the CompSci department at Embry-Riddle Aeronautical University have introduced the Jigsaw Distributed File System (JigDFS) which is aimed at securely storing and fetching files on large-scale networks.
Given the sensitive medical data that necessities this kind of research, the team describes how files in JigDFS are sliced into small segments using an Information Dispersal Algorithm (IDA), and are then distributed onto different nodes recursively.
They argue that JigDFS provides fault-tolerance against node failures while assuring confidentiality, integrity, and availability of the stored data. Layered encryption is applied to each file segment with keys produced by a hashed-key chain algorithm. Recursive IDA and layered encryption enhance users’ anonymity and provide a degree of plausible deniability.
The team says that JigDFS is envisioned to be an ideal long-term storage solution for developing secure data archiving systems.
New Approaches to Sentiment Mining
As a team centered at NUS in Singapore argues in the journal Intelligent Systems, while the Web plays an increasingly significant role in people's social lives, it contains more and more information concerning their opinions and sentiments.
The distillation of knowledge from this huge amount of unstructured information, also known as opinion mining and sentiment analysis, is a task that has recently raised growing interest for purposes such as customer service, financial market prediction, public security monitoring, election investigation, health related quality of life measure, etc.
To highlight the way forward for this branch of big data research, the team illustrates past, present, and future trends of sentiment analysis by delving into the evolution of different tools and techniques, from heuristics to discourse structure, from coarse- to fine-grained analysis, from keyword- to concept-level opinion mining.
Next -- An Integrated Approach to BI >
An Integrated Approach to BI
European researchers have contributed to a tome on modern business intelligence in their description of working toward an integrated framework for business intelligence operations.
They argue that IT support in the manufacturing sector has reached a watershed with digital components beginning to permeate all products and processes. The classical divide between “technical” IT and “business” IT begins to blend more and more. Data from design, manufacturing, product use, service, and support is made available across the complete product lifecycle and supply chain.
In their opinion, this goes hand in hand with the diffusion of sensor and identification technology and the availability of relevant information streams on the customer side—leading to unprecedented amounts of data.
According to the researchers, in this case the challenge is to purposefully apply emerging BI concepts for a comprehensive decision support that integrates product and shop floor design phases, the steering and design of operational industrial processes, as well as big and unstructured data sources. They bring those pieces together in order to derive an integrated framework for management and decision support in the manufacturing sector.
APIs for Social Science Analysis: Some Ethical Notes
A team from the University of Gent elaborates on the analysis of big data harvested via social media Application Programming Interfaces (APIs), in social science disciplines in general and in communication sciences in specific.
The team starts with a brief description of API-harvested big data. Next, three issues related to the use of APIs and big data in social sciences, are identified and addressed. More concretely, the paper discusses the ethical, practical-methodological challenges related to the use of APIs and big data in social sciences, as well as the programming- and computer skills needed to understand the big data ecosystem in all its complexity, from technical and practical issues to its political aspects.
In this way, the team’s discussion contributes to the development of a more comprehensive and reliable foundation for the use of big data or API-harvested data in social sciences and communication sciences.