Eliciting Survey Knowledge with Semantic Data Dictionaries

Authors
Santos, Henrique
Pinheiro, Paulo
McGuinness, Deborah L.
ORCID
Loading...
Thumbnail Image
Other Contributors
Issue Date
2024-02-28
Keywords
Degree
Terms of Use
Attribution-ShareAlike 3.0 United States
Full Citation
Santos, Henrique; Pinheiro, Paulo; McGuinness, Deborah L. Eliciting Survey Knowledge with Semantic Data Dictionaries. In 2024 Mobilizing Computable Biomedical Knowledge (MCBK) North America Chapter Meeting. Virtual. (2024)
Abstract
Many countries perform surveys to gather data from their population for supporting decision-making and development of public policies. Questionnaires are possibly the most used type of data acquisition instrument in surveys, although additional kinds may be employed (especially in health-related surveys). In the United States, the NHANES is a national health and nutrition examination survey conducted by the National Center for Health Statistics, designed to collect data on adults' and children's health and nutritional status. Data is organized in several tables, each containing variables to a specific theme, such as demographics, and dietary information. In addition, data dictionaries are available to (sometimes partially) document the tables' contents. While data is mostly provided by survey participants, instruments might be collecting data related to other entities (e.g. from participants' households and families, as well as laboratory results from participants' provided blood and urine samples). All this complex knowledge can often only be elicited by humans when analyzing and understanding the data dictionaries in combination with the data. The representation of this knowledge in a machine-interpretable format could facilitate further use of the data. We detail how Semantic Data Dictionaries (SDDs) have been used to elicit knowledge about surveys, using the publicly available NHANES data and data dictionaries. In SDDs, we formalize the semantics of variables, including entities, attributes, and more, using terminology from relevant ontologies, and demonstrate how they are used in an automated process to generate a rich knowledge graph that enables downstream tasks in support of survey data analysis.
Description
Department
Publisher
Relationships
Access