Financial data processing for LLM
The project involved analyzing, structuring, and validating financial datasets to train and optimize LLMs for finance-specific applications. I worked on preparing large structured and unstructured financial data, including reports, risk assessment datasets, and compliance-related documents. A critical component was the integration of reasoning into dataset preparation and ensuring that the AI models could go beyond data retrieval to perform logical financial analysis, variance checks, and compliance reasoning. The datasets involved tens of thousands of financial records, ranging from numerical transactions to unstructured regulatory texts, which were transformed into training-ready formats.