Automating Population Health Studies through Semantics and Statistics Semantic Statistics (SemStats)

New, Alexander
Qi, Miao
Chari, Shruthi
Rashid, Sabbir
Seneviratne, Oshani
McCusker, Jamie
Erickson, John S.
McGuinness, Deborah L.
Bennett, Kristin P.
Thumbnail Image
Other Contributors
Issue Date
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
Full Citation
Alexander New, Miao Qi, Shruthi Chari, Sabbir M. Rashid, Oshani Seneviratne, Jim McCusker, John S. Erickson, Deborah L. McGuinness and Kristin P. Bennett. Automating Population Health Studies through Semantics and Statistics Semantic Statistics (SemStats). Co-located with the International Semantic Web Conference, Auckland, NZ, October, 2019.
With the rapid development of the Semantic Web, machines are able to understand the contextual meaning of data, including in the field of automated semantics-driven statistical reasoning. This paper introduces a semantics-driven automated approach for solving population health problems with descriptive statistical models. A fusion of semantic and machine learning techniques enables our semantically-targeted analytics framework to automatically discover informative subpopulations that have subpopulation-specific risk factors significantly associated with health conditions such as hypertension and type II diabetes. Based on our health analysis ontology and knowledge graphs, the semanticallytargeted analysis automated architecture allows analysts to rapidly and dynamically conduct studies for different health outcomes, risk factors, cohorts, and analysis methods; it also lets the full analysis pipeline be modularly specified in a reusable domain-specific way through the usage of knowledge graph cartridges, which are application-specific fragments of the underlying knowledge graph. We evaluate the semanticallytargeted analysis framework for risk analysis using the National Health and Nutrition Examination Survey and conclude that this framework can be readily extended to solve many different learning and statistical tasks, and to exploit datasets from various domains in the future