Finance Research Evaluation Specialist

Design and run research-grade evaluation frameworks for AI agents in financial workflows on a part-time, remote contract (20+ hrs/week). Requires an advanced finance-related degree and deep finance domain experience; pay $6–8/hr.

Generative Ai Rlhf

100% Remote Hourly · $6–$8/hr

$6–$8/hr

Compensation

Worldwide

Eligibility

Expert

Experience

Jun 30, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We help people start and grow careers teaching AI by consolidating opportunities, building a unified portfolio, and connecting contributors with industry projects.

This role is posted through OpenTrain and gives you direct, hands-on influence on how finance-focused AI systems are evaluated and improved.

About AI training and evaluation work

AI training (data labeling and evaluation) is the human side of building modern AI systems. People create benchmarks, rate outputs, curate test cases, and analyze model failures so production AI behaves reliably in real-world workflows.

These projects are often remote and flexible, making them a great fit for experienced researchers who want to shape how financial AI agents reason, decide, and automate tasks.

The role

OpenTrain is recruiting a Finance Research Evaluation Specialist to design rigorous evaluation frameworks, benchmarks, scoring rubrics, and research-grade protocols for AI agents operating in financial domains. You will work closely with researchers, engineers, and product teams to improve enterprise finance workflows.

Title: Finance Research Evaluation Specialist (contract, part-time).
Client: OpenTrain (posted via OpenTrain).
Workload: 20+ hours per week, remote, worldwide; English required.

What you'll do

Design and evolve evaluation frameworks for AI agents in financial domains.
Build benchmark suites, scoring methodologies, quality rubrics, and evaluation protocols.
Conduct applied research on financial reasoning, decision-making, and workflow automation.
Develop and curate datasets, test cases, and benchmark environments for enterprise finance challenges.
Analyze model behavior, failure modes, and performance trends to guide product and research improvements.
Collaborate with researchers, ML engineers, and product teams to operationalize evaluation findings.
Publish internal research findings, technical reports, and best-practice recommendations.

Requirements

This is an expert-level research role. Candidates must demonstrate deep finance domain knowledge and experience building rigorous analytical or evaluation frameworks.

Advanced degree in Finance, Economics, Financial Engineering, Quantitative Finance, or a related field (required).
Deep domain expertise in investment research, capital markets, risk management, corporate finance, accounting, or financial analysis.
Experience conducting rigorous research, developing analytical methodologies, or building evaluation frameworks.
Strong analytical and critical-thinking skills and excellent written and verbal communication.
Experience working in highly collaborative, cross-functional environments.

Helpful background (preferred)

Experience evaluating or benchmarking large language models, AI agents, reasoning systems, or enterprise AI applications.
Familiarity with agentic workflows, AI evaluation methodologies, synthetic data generation, or model alignment techniques.
Knowledge of machine learning, financial AI, decision intelligence, or computational finance.
Experience designing industry benchmarks, assessment frameworks, or performance standards and a publication record.
Proficiency with Python, SQL, statistical analysis, or machine-learning frameworks is beneficial.

Logistics, pay, and tools

This is a contract, part-time position requiring 20+ hours per week and is open to remote contributors worldwide. Work language: English.

Pay: Hourly, USD $6–$8/hr (PAY_PER_HOUR). Hourly rate typically $8; listed min/max $6–$8.
Data type: text. Labeling tasks: evaluation ratings and data collection.
/ proprietary tooling (details provided after selection).
Employment types: Contractor, Part-time.

How to apply

Apply through your OpenTrain account to submit your profile, relevant research samples, and a brief note about your finance evaluation experience. Shortlisted candidates will be contacted for a technical discussion and sample task.

OpenTrain helps you track the application and build a portfolio of your AI training work as you contribute to industry projects.