math eval.
The main goal of this evaluation project is to solve AP-level mathematics questions and determine the correctness of the LLM model on these questions.
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
No subject matter listed
I have broad experience in data labeling and preparation of AI training data, with a particular focus on developing high-quality datasets for machine learning models. My work spans from designing complex and diverse prompts to iterative testing that uncovers model weaknesses and creates gold-standard responses that make for improved model performance. I have worked on various tasks such as code generation, code completion, error correction, test-case generation, and many more using multiple programming languages like Python, and SQL. I make sure the datasets are robust, scalable, and aligned with specific AI benchmarks by employing rigorous input validation, edge case analysis, and clear documentation. With a strong background in data analysis and machine learning, I am qualified for this position since I can understand the subtleties of model behavior and requirements. I am excellent in using tools such as Python, pandas, and SQL for preprocessing data, domain-specific knowledge to create datasets that solve real-world problems. Attention to detail and a focus on accuracy and quality make me very good at enhancing AI model performance and ensuring the practical applicability of AI models.
The main goal of this evaluation project is to solve AP-level mathematics questions and determine the correctness of the LLM model on these questions.
This project was focused around generating an SFT dataset containing prompts + gold standard responses for helping improve the performance of a customer’s model on coding related tasks. One of the biggest focuses was ensuring that we only provide datarows in the delivery where the prompt is shown to elicit an erroneous / incorrect response from the customer model. For this, we had to iteratively query the customer model with your prompt - testing until we got a incorrect response.
review prompts generated by other labelers.
The main aim of this project was to evaluate an LLM’s response to a given prompt and rectify (if applicable) any incorrect steps within the LLM’s response.
Ms. Computer Science, computer science
MBA, Business Adminsistration
Graduate assistant
data analyst.