Senior Data Science Task Designer (Python & SQL)

Design realistic, end-to-end, computationally intensive data science problems to train and evaluate advanced AI systems; requires Master’s/PhD, 5+ years’ experience, expert Python and strong SQL. Remote contract, part-time (<20 hrs/week) at $50/hr.

Coding & Software

100% Remote Hourly · $50/hr

$50/hr

Compensation

Worldwide

Eligibility

Entry

Experience

Dec 3, 2025

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Creating an OpenTrain account is free.

We connect experienced practitioners with projects that shape how state-of-the-art AI behaves. Contributors build reusable skills and portfolios while working remotely and flexibly.

About AI training work

AI training (data labeling, annotation, and human-feedback work) is the human side of building modern AI systems. Contributors provide the examples, evaluations, and ground-truth that models learn from.

This role focuses on creating high-fidelity, full-pipeline data science problems that are used to evaluate and train advanced AI—work that is technical, impactful, and often flexible in time commitment.

The role

You will design complex, computationally intensive data science problems that simulate realistic end-to-end workflows across industries (telecom, finance, government, e-commerce, healthcare, etc.).

Each scenario must be deterministic, require non-trivial reasoning across the full data pipeline, and be implemented and verified in Python using standard data science libraries. Your problems and verified solutions will be used to evaluate and train advanced AI systems.

What you’ll do

Design deterministic, full-pipeline problems: ingestion → cleaning → exploratory analysis → feature engineering → modeling → validation → deployment considerations.
Implement reference solutions in Python using pandas, numpy, scipy, scikit-learn, statsmodels (and other libraries as appropriate).
Verify and document correct answers, edge cases, and expected outputs so solutions are unambiguous and reproducible.
Incorporate scalability and big-data considerations, and describe realistic data volumes and compute constraints.
Provide clear business context and problem statements written in advanced, precise English (C1+).
Optionally incorporate GenAI elements where relevant (LLMs, RAG, prompt design, vector DBs) and note MLOps/deployment implications.

Requirements

This role is senior and technical: preserve the exact qualifications below when applying.

Master’s or PhD in Data Science, Statistics, Mathematics, Computer Science, or a closely related quantitative field.
At least 5 years of hands-on data science experience with proven business impact (industry, consulting, or similar).
Expert Python skills for data science, including pandas, numpy, scipy, scikit-learn, and statsmodels.
Strong proficiency in SQL and database operations for large-scale data manipulation and analysis.
Deep understanding of statistical analysis and machine learning algorithms, including their assumptions, limitations, and practical use cases.
Demonstrated ability to design deterministic, computationally intensive problems that span the full data science pipeline.
Experience with GenAI technologies (LLMs, RAG, prompt engineering, vector databases) and familiarity with MLOps concepts and model deployment workflows.
Working knowledge of modern AI/ML frameworks such as TensorFlow, PyTorch, and LangChain.
Advanced English (C1 or higher) with strong technical writing skills for clear, structured problem statements and solutions.
Reliable laptop/desktop, stable internet connection, and sufficient availability to take on recurring project tasks.

Location restrictions

Candidates cannot be based in the following locations: Iran, Cuba, North Korea, Syria, Sudan, Venezuela, Myanmar; Switzerland; China, Taiwan, Kenya; Armenia, Israel, Kazakhstan, UAE, Netherlands, Serbia, Kyrgyzstan, Turkey, Uzbekistan, Belarus, Russia, Ukraine, Abkhazia, South Ossetia; US states: Alaska, Arkansas, California, Connecticut, Delaware, Georgia, Hawaii, Illinois, Indiana, Kansas, Louisiana, Maine, Maryland, Massachusetts, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, Ohio, Oregon, Tennessee, Utah, Vermont, Washington, West Virginia; Antarctica, Aruba, Åland Islands, Saint Barthélemy, Bonaire, Sint Eustatius and Saba, Bouvet Island, Cocos (Keeling) Islands, Democratic Republic of the Congo, Cook Islands, Christmas Island, Western Sahara, Falkland Islands (Malvinas), French Guiana, Guadeloupe, South Georgia and the South Sandwich Islands, Heard Island and McDonald Islands, British Indian Ocean Territory, Northern Mariana Islands, Martinique, New Caledonia, Norfolk Island, Niue, French Polynesia, Saint Pierre and Miquelon, Pitcairn, Réunion, Saint Helena, Ascension and Tristan da Cunha, Svalbard and Jan Mayen, Sint Maarten (Dutch part), French Southern Territories, Tokelau, United States Minor Outlying Islands, Holy See, Virgin Islands (British), Wallis and Futuna, Mayotte.

Compensation, time commitment, and contract

This is a remote contract, part-time role with flexible scheduling.

Pay: USD 50 per hour. Time requirement: Less than 20 hours per week. Employment types: Contractor, Part-time.

How to apply and next steps

If this role matches your background, apply through the OpenTrain platform. OpenTrain helps you build a profile and connect with AI training projects; creating an account is free.

Be prepared to share examples of past data science work, snippets of reproducible Python solutions, and writing samples that demonstrate clear technical documentation. Selected candidates may be asked to complete a short design exercise.