Our world can increasingly be expressed as interconnected data sets. From the information we consciously put on social networking sites, to the statistics we generate by carrying smartphones and swiping our credit cards, we are walking data generators. Data collection has changed enormously, and data analysis is scrambling to catch up. This new area of science is Big Data, also called XDATA or e-science: wading through vast oceans of information from multiples sources to find meaning.
The interconnectedness of XDATA
This new Big Data frontier has exciting applications across a wide variety of disciplines. According to Raghu Raghavendra, Vice Dean for Global Academic Initiatives for the USC Viterbi School of Engineering, “So much data is collected, there are opportunities to discover new phenomena.” But first we must meet the challenges of storing, accessing and processing these large swaths of content.
Three faculty members from USC Viterbi have received a competitive award from the Defense Advancement Research Projects Agency (DARPA) as part of the Obama Administration’s “Big Data Research and Development Initiative,” a push to advance technologies for the collection, storage, preservation, management, analysis and sharing of huge quantities of data.
The team of three co-PIs from the Electrical Engineering Systems Department, Viktor Prasanna, Yogesh Simmhan and Raghu Raghavendra, will develop software tools for the storage, accessibility and processing of data.
Whether it is climate sensors, social media interactions or economic activity, we can collect terabytes of data per day on any given subject, but gleaning meaning from it remains a serious challenge. Assistant Professor Simmhan explains, “Data is only as useful as the analysis that you can draw from it, and that’s why our analytical and data management techniques are so critical.”
The complexity of the information compounds this challenge. Often the relationships between data streams are as important as the data itself, so effective algorithms that merge and compare these connections need to be in place. Management systems must also address the dynamism of the data in question: with real-time data collection, terabytes of information can be generated daily or even hourly.
Accessing such vast amounts of information quickly and efficiently is a significant challenge. “One of the key stumbling blocks is actually getting the data from the hard disk,” says Simmhan. To solve this issue, the team is developing a system that partitions data across multiple hardware platforms and conducts analytics in parallel.
This important work puts the USC Viterbi School of Engineering among a select few institutions that are pushing the boundaries of this new area of science that has such far-reaching, and potentially life-changing applications. “We have a chance to be a part of the global leadership in this space,” says Simmhan.
Big Data could also open the door to more citizen science, both in data collection and analysis. Volunteers could potentially turn their cell phone’s constant data collection streams into sensors for scientific studies. The availability of Big Data on shared networks also means that analysis could be crowdsourced to a wide audience of amateur scientists around the world.
The applications of Big Data analytics are only just beginning to unfold. Practical data such as consumer habits, energy usage and transportation logistics can inform more efficient systems that cut down on waste. And data-driven models of climate change, living systems and human behavior could not just better inform science and policy, but perhaps approach the holy grail of Big Data, event prediction, with certainty. “And this,” as Raghavendra says, “is just the beginning.”
*The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Approved for Public Release, Distribution Unlimited