Re-querying as a method of llm code improvement
Loading...
Authors
Harsham, Kyle
Issue Date
2025-12
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Computer science
Alternative Title
Abstract
Large Language Models (LLMs) have seen incredible growth in both usage and capabilitiesin recent years. One of the main use cases for these systems in the modern world is code gen-
eration. The reason many companies are turning to LLMs for code generation is that these
systems are (at least thought to be) faster and more cost-effective than human employees.
However, when these systems are used at scale, myriad issues arise: For example: Code can
perform poorly when used at application scale; code can perform erratically on edge cases;
and importantly, large quantities of generated code are impractical or impossible to verify
by hand.
These problems give rise to the obvious question regarding code verification: Is the
code generated by an LLM or LRM plausibly correct? That is: Is it plausible that the code
performs the requested task correctly? Notice the qualifier of plausibility in the previous
sentence. That is there because as program verification is not Turing-decidable, it is not
reasonable to invariably expect to be able to verify arbitrary code generated by an LLM.
Thus, we present a method for giving a user high confidence that code performs as a user
expects, derived from the test-driven development paradigm. Test-driven development is the
concept of writing test cases before writing the actual functions. This allows a programmer
to both contemplate more deeply what they are trying to create, and ensure that their code
performs as they expect during development. Our method requires a user to write test cases
for the code they intend to have an LLM generate, thus retaining the benefits that test-driven
development brings. In writing test cases, a user will have a chance to think through their
query thoroughly before making it and be confident that the generated code performs as
they expect. Further, our method implements automatic re-querying when generated code
does not pass all tests. In essence, a user asks an LLM to generate code, and then the code
is run against predefined test cases to verify correctness. If the code fails to pass any of the
test cases, the LLM is automatically sent a follow-up query outlining what the code that it
generated failed to accomplish. This approach gives the LLM a chance to fix any mistakes
it may have made, which overall improves the results.
Description
December2025
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY