DSpace@RPI

DSpace@RPI is a repository of Rensselaer Polytechnic Institute's theses and dissertations which are available in digital format, largely from 2006 to present, along with other selected resources.

Recent Submissions

  • Item
    LOKE: Linked Open Knowledge Extraction for Automated Knowledge Graph Construction
    (arXiv, 2023-11-15) McCusker, Jamie
    While the potential of Open Information Extraction (Open IE) for Knowledge Graph Construction (KGC) may seem promising, we find that the alignment of Open IE extraction results with existing knowledge graphs to be inadequate. The advent of Large Language Models (LLMs), especially the commercially available OpenAI models, have reset expectations for what is possible with deep learning models and have created a new field called prompt engineering. We investigate the use of GPT models and prompt engineering for knowledge graph construction with the Wikidata knowledge graph to address a similar problem to Open IE, which we call Open Knowledge Extraction (OKE) using an approach we call the Linked Open Knowledge Extractor (LOKE, pronounced like "Loki"). We consider the entity linking task essential to construction of real world knowledge graphs. We merge the CaRB benchmark scoring approach with data from the TekGen dataset for the LOKE task. We then show that a well engineered prompt, paired with a naive entity linking approach (which we call LOKE-GPT), outperforms AllenAI's OpenIE 4 implementation on the OKE task, although it over-generates triples compared to the reference set due to overall triple scarcity in the TekGen set. Through an analysis of entity linkability in the CaRB dataset, as well as outputs from OpenIE 4 and LOKE-GPT, we see that LOKE-GPT and the "silver" TekGen triples show that the task is significantly different in content from OIE, if not structure. Through this analysis and a qualitative analysis of sentence extractions via all methods, we found that LOKE-GPT extractions are of high utility for the KGC task and suitable for use in semi-automated extraction settings.
  • Item
    Customizable Knowledge Graph Visualization using the Whyis Knowledge Explorer
    (CEUR, 2024-11-11) McCusker, Jamie
    Network visualization over large knowledge graphs suffers from multiple challenges: graphs have varying and sometimes multiple ways to represent what people expect a "link" to be - everything from direct triples to complex chemical interactions, social constructs, and OWL property restrictions can be considered a link. Additionally, large knowledge graphs cannot be usefully visualized as a whole because they are simply too large and complex, an any patterns are lost in the noise when there is enough computational ability to represent them. The Whyis Knowledge Explorer is a component of the Whyis knowledge graph development framework that addresses these issues. It allows for fast, customizable network visualization of large scale knowledge graphs. By providing a "starting point" with any specific node, users can explore the graph piece by piece, building a view up by expanding selected nodes on demand, making it easier to explore locally. By using “data views”, the component provides a consistent user interface over a wide range of entity types that can handle both simple and complex relationships between entities. These data views publish a consistent output from multiple templates and can be extended through plugins as well as by the implementing Knowledge Graph App (KGApp). Entity types can also be assigned custom styles through CSS using Cytoscape.js styling. Additionally, links can be qualified with certainty values, showing more probable links as having greater weight. We also use the same interface to provide a summary view of the knowledge graph by automatically generating concept maps of instantiated types, allowing users to see and explore overall usage patterns in the knowledge graph, highlighting both intended design and knowledge curation issues. This component has been a key part of many Whyis-based projects and is mature and scalable.
  • Item
    ChatBS: An Exploratory Sandbox for Bridging Large Language Models with the Open Web
    (CEUR, 2024-11-11) Erickson, John S.; Santos, Henrique; McCusker, Jamie; McGuinness, Deborah L.; Hendler, James A.
    The recent widespread public availability of generative large language models (LLMs) has drawn much attention from the academic community to run experiments in order to learn more about their strengths and drawbacks. From prompt engineering and fine-tuning to fact-checking and task-solving, researchers have pursued several approaches to try to take advantage of these tools. As some of the most powerful LLMs are ``closed'' and only accessible through web APIs with prior authorization, combining LLMs with the open web is still a challenge. In this evolving landscape, tools that can facilitate the exploration of the capabilities and limitations of LLMs are desirable, especially when connecting with traditional web features such as search and structured data. This article presents ChatBS, a web-based exploratory sandbox for LLMs, working as a front-end for prompting LLMs with user inputs. It provides features such as entity resolution from open knowledge graphs, web search using LLM outputs, as well as popular prompting techniques (e.g. multiple submissions, ``step-by-step''). ChatBS has been extensively used in Rensselaer Polytechnic Institute's Data INCITE courses and research, serving as key tool for utilizing LLMs outputs at scale in these contexts.
  • Item
    Design and simulation of cryogenic systems and calibrations techniques for the nexo neutrinoless double beta decay experiment
    (Rensselaer Polytechnic Institute, Troy, NY, 2024-08) Tidball, Adam; Newberg, Heidi; Brown, Ethan
    This dissertation presents a study of the dynamics of cryogenic plants and the implementation of radon injection as a calibration strategy for the nEXO neutrinoless double beta decay experiment. The research focused on three main areas: the design and optimization of cryogenic systems, the simulation of \textsuperscript{220}Rn progeny propagation in a xenon flow, and the development of a laboratory test stand to validate these simulations. Simulations of the EXO-200 xenon plant were developed using the Aspen engineering software suite, validating the modeling strategy for application to the nEXO refrigerant plants. Performance evaluations, conducted with Aspen Plus, included steady-state simulations and dynamic modeling of dual-stage heater and condenser systems based on those in EXO-200. These simulations assessed their ability to regulate pressure and temperature during and after transient upset conditions. Ramped pump speed changes were simulated to reflect typical operational variations, with both heater and condenser models utilizing PID controllers for temperature and pressure regulation. The dual-stage heater mitigated temperature excursions to 0.2 K while ensuring full xenon evaporation, achieving a temperature differential of just 0.001 K. The condenser model limited pressure excursions to $\Delta P = 6.21 \times 10^{-4}$ bar, well within the 0.35 bar TPC wall limit. Power consumption for the simulated condenser and heater closely matched theoretical values, confirming the reliability of this simulation strategy. Additionally, data from the LXTS slow control system was used to validate the modeling strategy, further demonstrating its applicability to simulating heat exchanger dynamics in xenon recirculation plants. Models were then created of the nEXO Novec-7000 refrigerant system to explore the viability of different operating conditions and plant configurations. These analyses revealed that using 1-inch piping for the entire Novec-7000 plant was inadequate for meeting the circulation needs of the nEXO system. Moreover, the ``top deck'' configuration, where the xenon circulation pump is placed above the cryostat, was found to be unsuitable due to cavitation issues in the pump caused by insufficient inlet pressure. The effect of incorporating valves with varying flow coefficients was analyzed, ruling out the use of valves with flow coefficient $C_V=13.2$, while indicating that valves with $C_V=26.5$ work under a variety of configurations. Results of these simulations allowed estimates to be placed on the Novec-7000 pump inlet pressure under these different system conditions, specifying which configurations ensure recirculation of refrigerant and which do not. For the unsuitable ``top deck'' configuration, required Novec-7000 vessel operating pressures to prevent recirculation pump cavitation are instead presented. A radon injection simulation model was developed in this study to leverage the entire decay chain of $^{220}$Rn down to stable $^{208}$Pb to understand the flow of calibration radioisotopes through the nEXO xenon plant. Statistical analyses, including z-tests, p-tests, and Kolmogorov-Smirnov tests, demonstrated the infeasibility of using late-chain isotopes for determining velocity and diffusion coefficients in a small-scale test stand. Fitting procedures applied to the model using krypton tracer data yielded best-fit velocity and diffusion parameters, which were $v = 0.635 \pm 0.005 \, \text{m/s}$ and $D = (2.0 \pm 1.9) \times 10^{-2} \, \text{m}^2/\text{s}$, respectively. These parameters were then used to estimate radon arrival times in the nEXO radon injection design, with an estimated arrival time to the TPC of $128 \pm 7_{\text{stat}} \pm 16_{\text{geom}}$ s in agreement with the calculated upper limit of 145 s. The developed simulation code will be made available to the collaboration, ensuring its utility in ongoing and future research. A concentration vs. time trend was determined for an updated TPC model with four outlet lines and a tangentially oriented inlet line using SolidWorks flow simulation. These results were coupled with outputs from the radon injection simulation code to generate a plot of activity vs. time in the TPC. From this plot, it is determined that activity in the TPC reaches a local maximum at 0.08 days, after which point total activity drops to 10\% after 1.41 days and to 1\% after 2.67 days. The updated TPC model is shown to promote effective mixing similar to a previously studied model with tangential inlet and outlet lines called Orientation 6. In contrast, Orientation 2, with radially positioned supply and return lines, demonstrates less effective mixing. Activity trends were determined for each of these configurations, with those encouraging better mixing correlating to higher activites in the TPC. These trends confirm that calibration with $^{220}$Rn is feasible using a standard 20 kBq source as 44\% of the $^{220}$Rn isotopes emanated from the calibration source reach the TPC. A laboratory test stand was constructed to experimentally validate the radon injection simulation model in a dual-phase system. This test stand was designed to be sensitive to xenon scintillation produced by alphas emitted along the $^{220}$Rn decay chain. The characterization of the QDrive and PT-100 components was detailed, ensuring their reliability for experimental use. Helium leak testing was performed to guarantee the integrity of the system, with a global leak rate of $8.37 \times 10^{-10}$ mbar L/s placed on the test stand, with no detection of localized leakage. The leak rate is below the approximate threshold of $\sim 10^{-7}$ mbar L/s which indicates that xenon leakage is dominated by molecular rather than bulk fluid flow, confirming the system's integrity and suitability for commissioning with xenon \cite{pfeiffer2013leak}. Cooling of the xenon condenser was initially achieved with a copper braid in the original design, which was upgraded to a custom thermal link with enhanced cooling capacity calculated to be 67 W for 40-Kelvin temperature differentials across the link. Custom data acquisition software was developed to support the robust operation of the test stand, facilitating efficient data collection and analysis. Argon was studied as a potential inexpensive proxy to xenon for research and development purposes, which necessitated exploring strategies to shift its short-wavelength scintillation light to wavelengths detectable by the photodiode in the test stand. Tetraphenyl butadiene (TPB) spectroscopy studies were conducted to explore the wavelength-shifting efficiency of TPB dissolved in ethanol and toluene for this application. The studies indicated that TPB dissolved in toluene showed markedly higher degrees of wavelength shifting compared to TPB dissolved in ethanol, with TPB coating thickness found to correlate to wavelength shifting efficiency. The TPB-coated slides exhibited flaking over time, making them unsuitable for incorporation in a high purity experimental appartus, indicating that future work must be done to refine the deposition strategy before this material can be used in the test stand.
  • Item
    Advances in the theory, implementation, and application of mechanistic models in downstream bioprocessing of monoclonal antibodies
    (Rensselaer Polytechnic Institute, Troy, NY, 2024-07) Zhang, Wendi; Przybycien, Todd, TP
    Mechanistic models are powerful tools for process characterization and optimization. Although their usage in academia is common, their widespread use in the biopharmaceutical industry is hindered by several factors. First, derivation of first-principles models is very difficult, sometimes due to insufficient understanding of the underlying phenomena or limitations in the experimental techniques or instruments. Deriving empirical models also requires instinct and experience in relevant fields. Second, there are many unit operations involved in the production process. There is a lack of a powerful, yet flexible and shared platform to perform these simulations. Third, the workflow for using these developed models is not clear. Applying these models in an already established industry workflow and interpreting the results require advanced knowledge in multiple disciplines. In this thesis, we aim to address these challenges, particularly for protein A chromatography and precipitation capture purification operations in the downstream bioprocessing of monoclonal antibodies. In protein A chromatography, we derived a novel isotherm model and implemented it with the general rate model in well-known chromatography modelling software packages including GoSilico and CADET. Experiments are designed and performed to demonstrate that the model is compatible with an array of resins, mAbs and experimental conditions. We show that the model can be easily trained and that it offers excellent chromatogram predictions. We further established an intuitive workflow showing how to use the model for process development. Beyond these practical applications, we also used models of varying complexity to understand the heterogeneous surface properties of the protein A resins and tried to determine what compromises are needed when applying complicated semi-empirical and first-principles mechanistic models to practical situations where speed, simplicity and accuracy are required simultaneously. In precipitation, we focused on the solution, implementation and benchmarking of a mechanistic model for solid particle formation kinetics called the population balance model in the free and open-source software package CADET. CADET already supports the solution of many models for other unit operations, and introducing population balance model to the CADET family drives integrated in silico process development. In addition to this, we also applied the model to characterize antibody precipitation kinetics: experiments were conducted and a workflow was established to apply the model. The results were further used to inform future process development directions.

Communities in DSpace@RPI

Select a community to browse its collections.

Now showing 1 - 3 of 3