Re-querying as a method of llm code improvement

Loading...
Thumbnail Image
Authors
Harsham, Kyle
Issue Date
2025-12
Type
Electronic thesis
Thesis
Language
en_US
Keywords
Computer science
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
Large Language Models (LLMs) have seen incredible growth in both usage and capabilitiesin recent years. One of the main use cases for these systems in the modern world is code gen- eration. The reason many companies are turning to LLMs for code generation is that these systems are (at least thought to be) faster and more cost-effective than human employees. However, when these systems are used at scale, myriad issues arise: For example: Code can perform poorly when used at application scale; code can perform erratically on edge cases; and importantly, large quantities of generated code are impractical or impossible to verify by hand. These problems give rise to the obvious question regarding code verification: Is the code generated by an LLM or LRM plausibly correct? That is: Is it plausible that the code performs the requested task correctly? Notice the qualifier of plausibility in the previous sentence. That is there because as program verification is not Turing-decidable, it is not reasonable to invariably expect to be able to verify arbitrary code generated by an LLM. Thus, we present a method for giving a user high confidence that code performs as a user expects, derived from the test-driven development paradigm. Test-driven development is the concept of writing test cases before writing the actual functions. This allows a programmer to both contemplate more deeply what they are trying to create, and ensure that their code performs as they expect during development. Our method requires a user to write test cases for the code they intend to have an LLM generate, thus retaining the benefits that test-driven development brings. In writing test cases, a user will have a chance to think through their query thoroughly before making it and be confident that the generated code performs as they expect. Further, our method implements automatic re-querying when generated code does not pass all tests. In essence, a user asks an LLM to generate code, and then the code is run against predefined test cases to verify correctness. If the code fails to pass any of the test cases, the LLM is automatically sent a follow-up query outlining what the code that it generated failed to accomplish. This approach gives the LLM a chance to fix any mistakes it may have made, which overall improves the results.
Description
December2025
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN
Collections