Data Science Expert — Python, SQL, GenAI

Design realistic, reproducible end-to-end data science problems and verify solutions using Python and SQL. This contract role suits senior data scientists (5+ years) with strong ML/statistics foundations and hands-on GenAI experience.

Coding & Software

100% Remote Hourly · $15–$40/hr

$15–$40/hr

Compensation

Worldwide

Eligibility

Expert

Experience

Apr 5, 2026

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect experienced contributors with projects that shape how state-of-the-art AI systems learn from human examples. Creating an OpenTrain account is free.

About AI training and why this work matters

AI training (data labeling, annotation, human feedback) is the human side of building intelligent systems. Contributors create, verify, and evaluate the example data and problem statements that modern models learn from — work that is remote, flexible, accessible, and on the cutting edge of AI.

Work remotely and often on a flexible schedule.
Contribute to how models behave across industries like finance, telecom, healthcare, and e-commerce.
Projects range from technical coding tasks to prompt writing and evaluation for generative AI.

The role

We are hiring Data Science Experts to author and verify computational, business-realistic data science problems. You will create original, reproducible Python- and SQL-based problems that mirror end-to-end analytical workflows, and you will verify correct solutions using standard data science libraries.

Contract, part-time project work (project-based; not permanent).
Typical contribution expectation listed in project: ~10–20 hours/week during active phases (see Requirements for availability).
Location restriction noted: USA (see application details).

What you'll do

Author complete, computationally intensive data science problems and verify solutions so problems are deterministic and reproducible. Problems should include realistic business context, span ingestion through deployment considerations, and require programmatic (not manual) solutions.

Design end-to-end tasks: ingestion, cleaning, EDA, feature engineering, modeling, validation, and deployment considerations.
Produce Python code (Pandas, NumPy, SciPy, scikit-learn, statsmodels) and SQL queries that implement and verify solutions.
Ensure reproducibility: fixed random seeds, deterministic processing, clear instructions, and verifiable outputs.
Create problems reflecting domains such as fraud detection, forecasting, optimization, risk, and customer analytics.
Write clear documentation, solution explanations, and test cases for each problem.
Include evaluation criteria and, where applicable, summary or scoring rubrics for model outputs.

Required qualifications

5+ years of hands-on data science experience with demonstrated business impact.
Expert Python for data science: Pandas, NumPy, SciPy, scikit-learn, statsmodels.
Expert SQL skills: complex joins, aggregations, window functions and database operations.
Strong statistical and ML foundations: feature engineering, model selection, evaluation, error analysis.
Ability to design deterministic, reproducible problems (fixed seeds, no stochastic ambiguity).
Comfort using visualization libraries for EDA and communication (Matplotlib; Seaborn a plus).
Familiarity with big-data and scalable processing concepts (partitioning, memory/performance considerations).
Experience with GenAI technologies (LLMs, RAG, prompt engineering, vector DBs).
Understanding of MLOps and model deployment basics (packaging, reproducibility, monitoring).
Experience with modern ML frameworks (TensorFlow or PyTorch; LangChain a bonus).
Written English proficiency at C1+ (able to write clear business problem statements).
Availability to contribute ~10–20 hours/week during active project phases (time commitment varies).
Location: USA (restricted).

Who should apply

Apply if you are a senior/principal data scientist or ML engineer who enjoys designing realistic, code-first problems and documenting reproducible solutions. This work suits contributors who can translate business needs into technical tasks and who take care to make solutions verifiable and deterministic.

You have production experience delivering measurable business outcomes with data science.
You write clean, well-documented Python and SQL and can produce testable solution code.
You’re comfortable describing assumptions, edge cases, and evaluation metrics in written English.

Compensation, schedule, and project details

This is contract, part-time work paid hourly. Projects are short-to-medium term and project-based rather than permanent.

Pay: hourly, up to $40/hour (range provided in posting: $15–$40/hr).
Expected availability during active phases: approximately 10–20 hours/week (timeRequirement fields may vary across projects).
(project will provide platform/tools and instructions).
Label tasks for this role include text generation, question answering, evaluation/rating, coding/problem-writing, and text summarization.

How to apply and next steps

If this role fits your background, prepare an OpenTrain profile highlighting relevant projects, code samples, and a short summary of business impact. You will be screened for experience, coding ability, and written English before being invited to project onboarding.

Have examples ready: reproducible notebooks, SQL scripts, problem statements, or short writeups showing your end-to-end work.
Be prepared for a short technical screening and a sample authoring task to demonstrate reproducibility and clarity.