AuthorRashid, Sabbir; Chastain, Katherine; Stingone, Jeanette; McGuinness, Deborah L.; McCusker, Jamie
Full CitationRashid, S. M., Chastain, K., Stingone, J. A., McGuinness, D. L., & McCusker, J. P. (2017). The Semantic Data Dictionary Approach to Data Annotation & Integration. SemSci@ ISWC, 2017.
AbstractA standard approach to describing datasets is through the use of data dictionaries: tables which contain information about the content, description, and format of each data variable. While this approach is helpful for a human readability, it is difficult for a machine to understand the meaning behind the data. Consequently, tasks involving the combination of data from multiple sources, such as data integration or schema merging, are not easily automated. In response, we present the Semantic Data Dictionary (SDD) specification, which allows for extension and integration of data from multiple domains using a common metadata standard. We have developed a structure based on the Semanticscience Integrated Ontology’s (SIO) high-level, domain-agnostic conceptualization of scientific data, which is then annotated with more specific terminology from domain-relevant ontologies. The SDD format will make the specification, curation and search of data much easier than direct search of data dictionaries through terminology alignment, but also through the use of “compositional” classes for column descriptions, rather than needing a 1:1 mapping from column to class.;
PublisherCEUR Workshop Proceedings