Semantic modeling of cohort descriptions

Thumbnail Image
Chari, Shruthi
Issue Date
Electronic thesis
Computer science
Research Projects
Organizational Units
Journal Issue
Alternative Title
To address this challenge of supporting study applicability assessment, in this thesis, we make three contributions and we detail them in this paragraph. We build a Study Cohort Ontology (SCO) to encode the vocabulary of descriptions of study populations, that are reported in Table 1s. The coverage of our ontology encoding is sufficient to support the knowledge representation of all Table 1s of research studies in our evaluation dataset. We construct RDF knowledge graphs modeled on SCO, to expose in a declarative manner these population descriptions. This knowledge graph supports multiple explorations that may be difficult to undertake without the declarative specification. Additionally, we create a cohort similarity visualization strategy powered by the functional capabilities of our knowledge graph representation, that conveys cohort similarity details to a physician, at a quick glance. We provide a brief description of each of our contributions in the rest of this abstract.
Finally, we are evaluating the semantic and functional capabilities of SCO and Table 1 knowledge graphs in supporting applications to help a physician evaluate study applicability. We present three scenarios that a physician might use to determine study applicability. The first scenario is of a study match, to determine if a study population is similar to a given patient. Study Limitation, the second scenario we propose, evaluates the diversity of study populations, to expose, for example, underrepresentations and study bias. Our third scenario, Study Quality Evaluation, analyzes Table 1s to check for conformance to required best practices such as adequate population size, equal distribution of populations among study arms etc. Additionally, we support cohort similarity visualizations, to identify the similarity of a study population to a clinical population, at a quick glance. Our applications showcase the ability of our semantic approach to gain a deeper understanding of guideline evidence, by making visible reported characteristics of study populations, and demonstrate our ability to draw inferences on these characteristics.
SCO is the first building block of our knowledge representation approach. SCO is an extensible, open-source ontology, that encodes the vocabulary and models the structure of Table 1s. We have adopted a bottom-up approach to modeling and have revised the structure of SCO upon investigation of a number of research studies on which pharmacological treatment recommendations are based in the American Diabetes Association's Standards of Medical Care 2018 Guidelines. Leveraging the SCO ontology structure, we model the various study arms (a group of study participants that receive an intervention or are put on a control regime) of the Table 1s, and their associated characteristics and descriptive statistics in RDF knowledge graphs. Table 1s vary in their descriptions of cohort variables and interventions, usage of descriptive statistical measures to cover aggregations, and in unit representations. Representing aggregations in OWL and RDF has been a long-studied research problem, and there are multiple approaches to the modeling of aggregations. Our modeling of the descriptive statistical measures (mean + standard deviation, median + interquartile range, etc.), on cohort variables in RDF knowledge graphs, makes it possible to run queries and inferences across measure and unit representations.
Treatment recommendations within Clinical Practice Guidelines are justified by findings from research studies that are often based on highly selective populations. As a result, when physicians treat complicated patients who do not wholly align with guideline recommendations, they face challenges in determining if the study evidence, population descriptions (often captured in the first table in the published work, thus often referred to as Table 1) align with their clinical population.
May 2019
School of Science
Full Citation
Rensselaer Polytechnic Institute, Troy, NY
PubMed ID