Show simple item record

dc.contributor.authorMcCusker, Jamie
dc.contributor.authorDumontier, Michel
dc.contributor.authorChari, Shruthi
dc.contributor.authorMcGuinness, Deborah L.
dc.date.accessioned2023-01-28T18:40:34Z
dc.date.available2023-01-28T18:40:34Z
dc.date.issued2019
dc.identifier.citationJim McCusker, Michel Dumontier, Shruthi Chari, Joanne Luciano and Deborah L. McGuinness. A Linked Data Representation for Summary Statistics and Grouping Criteria. Semantic Statistics (SemStats). Co-located with the International Semantic Web Conference, Auckland, NZ, October, 2019.en_US
dc.identifier.urihttps://ceur-ws.org/Vol-2549/article-04.pdf
dc.identifier.urihttps://hdl.handle.net/20.500.13015/6456
dc.description.abstract. Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium’s provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.en_US
dc.publisherCEUR-WSen_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.titleA Linked Data Representation for Summary Statistics and Grouping Criteriaen_US
dc.typeArticleen_US


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States