Enhancing treatment effect estimation using machine learning and large language models

Loading...
Thumbnail Image
Authors
Neehal, Nafis
Issue Date
2025-05
Type
Electronic thesis
Thesis
Language
en_US
Keywords
Computer science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
In modern healthcare, researchers and clinicians face numerous challenges when determining how well treatments work. Randomized Clinical Trials (RCTs) and Observational Studies are the most popular methods for evaluating treatment effectiveness, but each has limitations. Observational studies are easier to conduct than RCTs, but contain inherent biases that affect their reliability in measuring treatment effects. RCTs, while considered the best scientific approach, are hindered by high costs, long timelines, and difficulties recruiting enough participants. These problems are made worse by two key issues: the growing complexity of selecting the right baseline features needed for valid results, and the frequent lack of diversity among trial participants, which limits how widely the findings can apply to different populations. Recent advances in Large Language Models show potential for automating aspects of clinical trial design, especially baseline feature selection, though careful testing is needed to prevent false information generation. Beyond these research challenges, healthcare systems struggle to identify eligible patients for specialized programs, particularly in complex care management (CCM). The current system relies heavily on doctor referrals, creating bottlenecks that prevent many qualified patients from receiving helpful services. This thesis presents AI and Machine Learning based novel methods that improve treatment effect measurement in both observational studies and clinical trials, while also improving access essential healthcare programs for eligible patients. We begin by tackling the challenge of bias in observational studies, where confounding variables typically distort treatment effect estimates (Chapter 2). We develop a novel hybrid matching algorithm that combines multiple matching techniques to address this issue. Using a type-2 diabetes (T2D) health management program as our test case, we apply causal inference methods to observational data from a regional health insurance provider. Through hybrid matching and survival analysis, we evaluate both T2D onset timing and acute care utilization (emergency and inpatient visits). Our results reveal the program's dual impact: it significantly accelerates early T2D detection while substantially reducing participants' need for acute care services, though it shows no significant effect on T2D onset after the initial two-month period. While our hybrid matching approach proved effective for observational studies, we encountered significant challenges in identifying optimal baseline features for matching treated and control groups to minimize bias. This challenge becomes even more critical in clinical trials, where the stakes are higher and mistakes in feature selection can bias treatment effect estimation and contaminate trial outcomes. In Chapter 3, we address this by exploring the potential of Large Language Models (LLMs) to assist in baseline feature selection for clinical trial design. We create two unique datasets covering nearly 1,700 clinical trials to benchmark how state-of-the-art models like GPT-4o and LLaMa3 can assist researchers in identifying crucial basiline features. Our evaluation process, combining LLM-as-a-Judge and human expert validation, reveals that while state-of-the-art general-purpose LLMs offer substantial benefits in this domain, their performance remains mediocre with significant room for improvement. In Chapter 4, we extend our research to address a critical challenge in clinical trials: achieving both accurate treatment effect measurement and demographic representativeness simultaneously. We introduce ``Framework for Research In Synthetic Control Arms (FRESCA)" to tackle inequity in hybrid clinical trials—studies that combine randomized controlled trial (RCT) participants with historical synthetic controls borrowed from existing trial data or real-world evidence. In this context, equity refers to ensuring trial populations adequately represent the target population for which the treatment is intended, particularly regarding protected attributes like age, gender, and race/ethnicity. Synthetic controls are non-randomized patients selected from historical data to supplement or replace standard control group recruits, potentially reducing trial costs and recruitment challenges. FRESCA implements a dual-stage approach that first employs propensity score matching (PSM) to recommend appropriate synthetic control patients, then applies Iterative Proportional Fitting (IPF) to adjust the population distribution to match target demographic characteristics. We do initial validation of FRESCA using the SPRINT (Systolic Blood Pressure Intervention Trial) data with NHANES (National Health and Nutrition Examination Survey) serving as our target population reference. Initial results demonstrate FRESCA's ability to create more representative hybrid trial populations while maintaining statistical validity of treatment effect estimates. Chapter 5 expands our evaluation of FRESCA across multiple clinical trials, including SPRINT and ALLHAT clinical trials, with NHANES serving as the target population. We compare two new methods that integrate PSM and IPF against other baseline and state-of-the-art methods, confirming the effectiveness of combining propensity and equity adjustments in achieving both accurate and representative treatment effect estimates. Notably, our findings show that even with reduced RCT recruitment supplemented by synthetic controls, these methods maintain accuracy and equity across various trials and outcomes. This chapter also highlights the critical influence of treatment and control group sizes on estimation precision and the importance of balanced synthetic control usage for accurate treatment effect estimations. Finally, in Chapter 6, we address the challenge of healthcare access by examining how patients can be better connected with appropriate healthcare programs. We present a novel end-to-end machine learning approach that simulates the physician's decision-making process for Complex Care Management (CCM) program referrals. Using proprietary Electronic Health Record (EHR) and Claims data from a regional health provider, we tackle challenges such as non-stationarity, dataset imbalance, and partial labeling in time-series data. Our approach successfully identifies additional eligible patients for CCM program referrals who might otherwise be overlooked, providing clear justifications for their selection and potentially bridging the gap between available healthcare resources and patient needs.
Description
May2025
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN
Collections