Annotating Diverse Scientific Data with HAScO

McGuinness, Deborah L.
No Thumbnail Available
Other Contributors
Issue Date
The Human-Aware Data Acquisition Infrastructure (HADatAc)
Terms of Use
Full Citation
Ontologies are being widely used across many scientific fields, most notably in roles related to acquiring, preparing, integrating and managing data resources. Data acquisition and preparation activities are often difficult to reuse since they tend to be domain dependent, as well as dependent on how data is acquired: through measurement, subject-elicitation, and/or model-generation activities. Therefore, tools developed for preparing data from one scientific ac- tivity often cannot be easily adapted to prepare data from other scientific activi- ties. We introduce the Human-Aware Science Ontology (HAScO) that integrates a collection of well-established science-related ontologies, and aims to address issues related to data annotation for large data ecosystem, where data can come from diverse data sources including sensors, lab results, and questionnaires. The work reported in the paper is based on our experience developing HAScO, using it to annotate data collections to facilitate data exploration and analysis for numerous scientific projects, three of which will be described. Data files pro- duced by scientific studies are processed to identify and annotate the objects (a gene, for instance) with the appropriate ontological terms. One benefit we re- alized (of preserving scientific data provenance) is that software platforms can support scientists in their exploration and preparation of data for analysis since the meaning of and interrelationships between the data is explicit.