Data Scientist — Mathematical Statistics (Python)

Entry-level, remote contract role for Python-savvy data scientists to run statistical analyses with numpy/scipy/statsmodels, clean messy datasets, and communicate findings; part-time (<20 hrs/week), $25/hr, worldwide.

Coding & Software

100% Remote Hourly · $25/hr

$25/hr

Compensation

Worldwide

Eligibility

Entry

Experience

Sep 3, 2025

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for building careers in AI training and data labeling. We connect contributors with projects that shape how AI systems learn by providing opportunities across annotation, evaluation, and human-in-the-loop workflows.

Work found through OpenTrain is often remote, flexible, and accessible to people who bring careful thinking, domain knowledge, or strong language and technical skills.

Why AI training and data science work matters

AI models learn from examples and human judgment: your statistical analyses and clear documentation directly influence model behavior and downstream decisions. This role sits at the intersection of real-world data work and model evaluation, offering hands-on experience with reproducible analysis and experimental design.

Many contributors choose this work for the flexibility, the chance to shape cutting-edge systems, and the ability to work remotely while building a portfolio of analysis projects.

The role

You will perform statistical analyses and data-cleaning tasks using Python libraries (numpy, pandas, scipy, statsmodels), then summarize results for stakeholders with clear narratives and visual summaries. This is a part-time contractor role intended for entry-level candidates with a strong foundation in mathematical statistics and practical experience in Python-based analysis.

Work is remote and flexible (under 20 hours per week). You will maintain reproducible notebooks and document methods and assumptions carefully.

What you'll do day-to-day

Clean and wrangle messy datasets to prepare for analysis and modeling.
Select and run appropriate statistical tests (t-tests, chi-square, ANOVA) and post-hoc analyses where needed.
Fit and evaluate linear and logistic regression models using statsmodels/scipy.
Compute and interpret p-values, confidence intervals, effect sizes, and power analyses.
Check model assumptions and diagnostics; apply non-parametric methods when appropriate.
Design or assess experiments and A/B tests, including power calculations.
Summarize findings with concise narratives and visual summaries for stakeholders.
Maintain reproducible notebooks, document methods clearly, and note assumptions and limitations.

Requirements

Strong Python for analysis: numpy, pandas, scipy, statsmodels.
Mastery of hypothesis testing: t-test, chi-square, ANOVA, and appropriate post-hoc tests.
Ability to calculate and interpret p-values, confidence intervals, and effect sizes.
Experience with correlation (Pearson, Spearman) and regression (linear, logistic) modeling.
Comfortable with probability distributions, normality checks, and non-parametric alternatives.
Experience extracting insights from messy data via exploratory data analysis and cleaning.
Knowledge of experimental design, A/B testing, and power analysis.
Reproducible workflows: notebooks, version control, and clear documentation.
Strong analytical writing and the ability to explain assumptions and limitations.
SQL for data extraction and joins is a plus.

Who should apply

This role is a fit for entry-level data scientists, recent graduates, or analysts who are comfortable with mathematical statistics and want to build experience applying those skills in AI-training and evaluation work.

You should enjoy working with code and data, documenting your process, and translating technical results into clear, actionable summaries for non-technical stakeholders.

Compensation & logistics

Pay: $25 USD per hour. Employment type: contractor, part-time. Time requirement: less than 20 hours per week. Location: worldwide (remote).

Labeling context: this project involves computer-code/programming data and evaluation/rating tasks using general-purpose tools; labeling software is listed as Other. You will deliver analyses, annotated code notebooks, and evaluation judgments as specified by project instructions.

How to apply and next steps

Prepare a short portfolio or links to reproducible notebooks that demonstrate your use of numpy/pandas/scipy/statsmodels, hypothesis testing, regression modeling, and clear write-ups of assumptions and results.

During onboarding you will receive project-specific instructions, example notebooks, and quality guidelines. Tasks are contract-based and scheduled flexibly to suit part-time availability.