Getting the Dirt on Big Data
No Thumbnail Available
Authors
Waterman, K Krasnow
Hendler, James A.
Issue Date
2013-09-10
Type
Article
Language
Keywords
Alternative Title
Abstract
“Dirty data” – data which is incomplete or incorrect – presents a significant challenge in producing trustworthy data analytics. The old technology adage “garbage in, garbage out” applies to this problem. If you analyze erroneous information, you produce erroneous results. For big data analytics, this means misunderstanding of the broad brushstrokes of the data, its statistical themes and trends. The primary industry approach to solving this challenge has been to propose to correct all of this dirty data, projects that cost millions of dollars and take many years, resulting in an endless game of catch-up. We believe that a better solution is to accept that data is flawed and use related data to refine the analytic results. Using Linked Data to augment and visualize big data, the success of this “broad” data approach can be seen in relatively quick and easy to produce examples.
Description
Full Citation
K. Krasnow and J. Hendler, Getting the Dirt on Big Data, Big Data, 1(3), 2013.
Publisher
Mary Ann Liebert, Inc.