Getting the Dirt on Big Data

Authors
Waterman, K Krasnow
Hendler, James A.
ORCID
No Thumbnail Available
Other Contributors
Issue Date
2013-09-10
Keywords
Degree
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
Full Citation
K. Krasnow and J. Hendler, Getting the Dirt on Big Data, Big Data, 1(3), 2013.
Abstract
“Dirty data” – data which is incomplete or incorrect – presents a significant challenge in producing trustworthy data analytics. The old technology adage “garbage in, garbage out” applies to this problem. If you analyze erroneous information, you produce erroneous results. For big data analytics, this means misunderstanding of the broad brushstrokes of the data, its statistical themes and trends. The primary industry approach to solving this challenge has been to propose to correct all of this dirty data, projects that cost millions of dollars and take many years, resulting in an endless game of catch-up. We believe that a better solution is to accept that data is flawed and use related data to refine the analytic results. Using Linked Data to augment and visualize big data, the success of this “broad” data approach can be seen in relatively quick and easy to produce examples.
Description
Department
Publisher
Mary Ann Liebert, Inc.
Relationships
Access