The CHEAR Data Repository: Facilitating children’s environmental health and exposome research through data harmonization, pooling and accessibility

Stingone, Jeanette
Pinheiro, Paulo
Meola, Jay
McCusker, Jamie
Bengoa, Sofia
Kovatch, Patricia
McGuinness, Deborah L.
Teitelbaum, Susan
No Thumbnail Available
Other Contributors
Issue Date
CHEAR (Child Health Exposure Analysis Repository)
Terms of Use
Full Citation
Funded by the U.S. National Institute of Environmental Health Sciences, the Children’s Health Exposure Analysis Resource (CHEAR) provides scientific investigators access to laboratory and statistical analyses aimed at incorporating and expanding environmental exposures within their research. To benefit the broader research community, the CHEAR Data Center has created a public data repository that houses deidentified data from studies accepted into the CHEAR program. To date, 26 studies have submitted data containing > 41,000 specimens, > 3,000 mothers and children and 139 environmental chemicals. The goal of this repository is to promote the secondary analysis of pooled CHEAR studies by providing data in a manner that is findable, accessible, interoperable and reusable (FAIR). The repository has been constructed by coupling the open-source Human-aware Data Acquisition Framework with semantic annotation templates that transform CHEAR datasets into machine-readable knowledge graphs. These tools facilitate the ingestion, semantic-mapping, harmonization and accessibility of data (epidemiologic, clinical and biomarker) and metadata across the multiple studies within the CHEAR Program. We demonstrate how users of the public repository have the ability to simultaneously search, view, and download data from multiple CHEAR studies. The repository can be searched based on a number of factors including health outcomes, biological markers of exposure and common covariates. Because data have been harmonized to a common vocabulary (the CHEAR ontology), downloaded datasets automatically contain CHEAR-wide harmonized codes and labels for variables that are present in multiple studies. By selecting common data elements, users can create customized datasets with accompanying codebooks in a format that is easily imported into statistical analysis software. For maximal FAIR impact, we have promoted the CHEAR data, tools and methods through Google dataset search, Github, and Bioportal. The repository will encourage secondary analysis of pooled CHEAR studies, facilitating investigations that leverage larger sample sizes and greater exposure variability.