Sanskrit LLM Training Data Curation
Currently engaged in a self-initiated project to curate and label a unique dataset for training a Large Language Model (LLM) focused on the Sanskrit language. This involves meticulous data collection from diverse classical and contemporary Sanskrit texts, followed by annotation for various NLP tasks. My work encompasses classifying text types, identifying named entities, and generating high-quality prompt-response pairs to develop a robust and culturally aware Sanskrit LLM. This endeavor highlights my deep understanding of linguistic nuances, data preparation for cutting-edge AI models, and commitment to preserving and digitizing cultural heritage through AI.