LLM-Based Code Generation for Querying Temporal Tabular Financial Data

No Thumbnail Available
Authors
Lashuel, Mohamed
Kurdistan, Gulrukh
Green, Aaron
Erickson, John S.
Seneviratne, Oshani
Bennett, Kristin P.
Issue Date
2024-10-22
Type
Article
Language
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
We examine the question of ``how well large language models (LLMs) can answer questions using temporal tabular financial data by generating code?''. Leveraging advanced language models, specifically GPT-4 and Llama 3, we aim to scrutinize and compare their abilities to generate coherent and effective code for Python, R, and SQL based on natural language prompts. We design an experiment to assess the performance of LLMs on natural language prompts on a large temporal financial dataset. We created a set of queries with hand-crafted R code answers. To investigate the strengths and weaknesses of LLMs, each query was created with different factors that characterize the financial meaning of the queries and their complexity. We demonstrate how to create specific zero-shot prompts to generate code to answer natural language queries about temporal financial tabular data. We develop specific system prompts for each language to ensure they correctly answer time-oriented questions. We execute this experiment on two LLMs (GPT-4 and Llama 3), assess if the outputs produced are executable and correct, and assess the efficiency of the produced code for Python, SQL, and R. We find that while LLMs have promising performance, their performance varies greatly across the languages, models, and experimental factors. GPT-4 performs best on Python (95.2\% correctness) but has significantly weaker performance on SQL (87.6\% correctness) and R (79.0\% correctness). Llama 3 is less successful at generating code overall, but it achieves its best results in R (71.4\% correctness). A multi-factor statistical analysis of the results with respect to the defined experimental factors provides further insights into the specific areas of challenge in code generation for each LLM. Our preliminary results on this modest benchmark demonstrate a framework for developing larger, comprehensive, unique benchmarks for both temporal financial tabular data and R code generation. While Python and SQL already have benchmarks, we are filling in the gaps for R. Powerful AI agents for text-to-code generation, as demonstrated in this work, provide a critical capability required for the next-generation AI-based natural language financial intelligence systems and chatbots, directly addressing the complex challenges presented by querying temporal tabular financial data.
Description
Full Citation
M. Lashuel, G. Kurdistan, A. Green, J.S. Erickson, O. Seneviratne and K.P. Bennett, "LLM-Based Code Generation for Querying Temporal Tabular Financial Data," 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr), Hoboken, New Jersey, USA, 2024.
Publisher
IEEE
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN