A Linked Data Representation for Summary Statistics and Grouping Criteria
No Thumbnail Available
Authors
McCusker, Jamie
Dumontier, Michel
Chari, Shruthi
McGuinness, Deborah L.
Issue Date
2019
Type
Article
Language
Keywords
Alternative Title
Abstract
. Summary statistics are fundamental to data science, and are
the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until
now, we have not had a suitable linked data representation available. We
propose a way to express summary statistics across aggregate groups
as linked data using Web Ontology Language (OWL) Class based sets,
where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic
summaries of their study cohorts and the patients assigned to each arm.
While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those
values back into the RDF graphs they were computed from. We represent
this knowledge, that would otherwise be lost, through the use of OWL
2 punning semantics, the expression of aggregate grouping criteria as
OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium’s provenance ontology, PROV-O, providing interoperable representations that
are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation
of patient case information from the Genomic Data Commons, a data
portal from the National Cancer Institute.
Description
Full Citation
Jim McCusker, Michel Dumontier, Shruthi Chari, Joanne Luciano and Deborah L. McGuinness. A Linked Data Representation for Summary Statistics and Grouping Criteria. Semantic Statistics (SemStats). Co-located with the International Semantic Web Conference, Auckland, NZ, October, 2019.
Publisher
CEUR-WS