Tetherless World Videos/Presentations



Recent Submissions

Now showing 1 - 5 of 279
  • Item
    Eliciting Survey Knowledge with Semantic Data Dictionaries
    (2024-02-28) Santos, Henrique; Pinheiro, Paulo; McGuinness, Deborah L.
    Many countries perform surveys to gather data from their population for supporting decision-making and development of public policies. Questionnaires are possibly the most used type of data acquisition instrument in surveys, although additional kinds may be employed (especially in health-related surveys). In the United States, the NHANES is a national health and nutrition examination survey conducted by the National Center for Health Statistics, designed to collect data on adults' and children's health and nutritional status. Data is organized in several tables, each containing variables to a specific theme, such as demographics, and dietary information. In addition, data dictionaries are available to (sometimes partially) document the tables' contents. While data is mostly provided by survey participants, instruments might be collecting data related to other entities (e.g. from participants' households and families, as well as laboratory results from participants' provided blood and urine samples). All this complex knowledge can often only be elicited by humans when analyzing and understanding the data dictionaries in combination with the data. The representation of this knowledge in a machine-interpretable format could facilitate further use of the data. We detail how Semantic Data Dictionaries (SDDs) have been used to elicit knowledge about surveys, using the publicly available NHANES data and data dictionaries. In SDDs, we formalize the semantics of variables, including entities, attributes, and more, using terminology from relevant ontologies, and demonstrate how they are used in an automated process to generate a rich knowledge graph that enables downstream tasks in support of survey data analysis.
  • Item
    Facilitating Reuse of Mental Health Questionnaires via Knowledge Graphs
    (The Healthcare and Life Sciences Symposium, 2023-05-08) Santos, Henrique; Rook, Kelsey; Pinheiro, Paulo; Gruen, Daniel M.; Chorpita, Bruce F.; McGuinness, Deborah L.
    Questionnaires are one of the most common instrument types for screening patients for mental disorders. They are composed of items whose answers are typically scored to determine the elevation on a specified dimension, and hence the statistical probabilities associated with the corresponding disorder or diagnosis. The Patient Health Questionnaire (PHQ-9) and the Generalized Anxiety Disorder (GAD-7) questionnaire, for instance, measure levels of depression and anxiety respectively, and can be used to support diagnosis of depression and generalized anxiety disorder. Some questionnaires are multidimensional, such as the Revised Children's Anxiety and Depression Scale (RCADS), and can thereby estimate elevations on multiple dimensions that underlie a variety of disorders. Mental health screening questionnaires are designed so that each item assesses specific symptoms whose pattern of co-occurence (often organized in a subscale) allows estimation of how likely such symptoms would occur in the absence of the disorder whose symptoms the items represent. Questionnaire users typically estimate how likely a set of co-occuring symptoms would be (i.e., a score) in the general population as a strategy to estimate the likelihood that the respondent has a disorder warranting mental health services. The RCADS is a 47-item, youth self-report questionnaire with subscales (separation anxiety disorder, social phobia, generalized anxiety disorder, panic disorder, obsessive compulsive disorder, and major depressive disorder). It also yields a Total Anxiety Scale (sum of the 5 anxiety subscales) and a Total Internalizing Scale (sum of all 6 subscales). Items are rated on a 4-point Likert-scale from 0 ("never") to 3 ("always"). A Parent Version (RCADS-P) similarly assesses parent report of a youth’s symptoms across the same six subscales. Brief versions of the RCADS questionnaires are available as well (RCADS-25), yielding only three scores: Total Anxiety, Total Depression, and Total Anxiety and Depression. RCADS questionnaires have been translated to 19 languages. Recently there has been increased interest in supporting widespread adoption of common measures in mental health to support estimation and measurement of clinical dimensions across settings, contexts, and nations. However, the current measurement architecture in mental health is essentially based on a text-only document representation of each questionnaire, with limited knowledge of how they were created, how they relate to other questionnaires, how items relate to symptoms, which in turn relate to disorders, how short and long versions are related, etc. The result is significant constraints on the types of use cases that can be supported, with especially limited support for such pursuits as shortening the number of items, translations to new languages, reuse of items in new questionnaires, and even the combination of items from different questionnaires. We present our progress towards tackling these challenges. Our solution is composed of: (1) a modeling of mental health symptoms, scales, disorders, and their relationships as an ontology; (2) the representation of questionnaire instruments as a knowledge graph, using standardized terminology; and (3) a software infrastructure for operationalizing the management and distributions of semantic questionnaires (Semantic Instrument Repository - SIR). Using the RCADS questionnaires as a use case, we encode their (sub)scales in an ontology, reusing existing terminology from relevant sources. We expand our base Human-Aware Science Ontology (HAScO) to include questionnaire structure, and propose a new ontology for encoding and aligning mental health terminology, such as symptoms, scales, and disorders. SIR supports authoring, curation, and dissemination of questionnaires, their elements, and relationships between these elements, thus allowing questionnaires to contain mental health semantics.
  • Item
    RPIrates: Fun with OpenAI, GPTStudio and R!
    (The Rensselaer IDEA, 2023-02-14) Erickson, John S.
    Seemingly everyone has been talking about the impact of OpenAI's ChatGPT on, well, everything​, including writing code. In this very special RPIrates we talk specifically about generating great, and sometimes not-so-great, R code based on OpenAI's "Codex" models, which are easily accessed via the OpenAI API with the help of RStudio extensions (addins) provided by the GPTStudio package. After a quick icebreaker in which we demo GPTStudio in action, we briefly review transformer neural networks; we talk about how code completion tools like Microsoft's IntelliCode Compose apply TNN frameworks to write scarily excellent code; we tour OpenAI's Codex documentation; and then we get back to more hands-on with GPTstudio. We also talk about why code completion frameworks like Codex are so great at Python and Javascript but spotty with R, and of course, the ethics; oh, the ethics!​ Many links for further learning and research are provided.
  • Item
    Intro to Web Science (Oct 2022)
    (2022-10) Erickson, John S.