Python Infrastructure Engineer — LLM Training & Agent Tooling

Build and own Python infrastructure for LLM training and agent evaluation as a remote part-time contractor (US & Canada). Requires 5+ years Python, Docker, CI/CD, FastAPI/Flask and a test-driven, security-aware mindset; pay tiers Junior $34, Mid $37, Senior $42/hr.

Coding & Software

100% Remote Hourly · $37/hr

$37/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Jul 25, 2025

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. It’s where people start and grow careers teaching AI: discover open projects, build a profile, and apply in minutes.

We support contributors across the industry who annotate, evaluate, and improve AI systems. This role is an opportunity to join that ecosystem as a developer shaping the tooling researchers rely on every day.

About AI training work

AI training is the human side of building models — people create, review, and evaluate the examples that teach modern systems. Work ranges from labeling images and audio to writing and scoring model outputs and building evaluation systems for agents.

These roles are often fully remote, flexible, and accessible: contributors directly influence how state-of-the-art AI behaves while working part-time or around other commitments.

The role

We’re seeking senior‑minded Python engineers to own infrastructure for an LLM-training workflow and agent evaluation tooling. You will deliver reusable repositories, automated evaluation pipelines, secure sandboxes, and developer environments so researchers can iterate quickly.

This is a part-time contractor role (20+ hours/week) open to candidates located in the United States and Canada. Short‑listed candidates will complete a timed HackerRank assessment and a platform coding test before recruiter interviews.

Build secure sandboxes and task frameworks for agent evaluation.
Design and implement automated scoring and CI/CD pipelines.
Create developer environments (devcontainers, Makefiles, pre-commit) and clear docs for researcher adoption.

What you’ll do day-to-day

You’ll be the technical owner of infrastructure components that support LLM training and agent evaluation. Expect hands-on engineering, close collaboration with researchers, and delivering production-ready tooling.

Author modular FastAPI/Flask back-ends with Pydantic validation and structured logging.
Write and maintain pytest unit, integration, and functional tests with high coverage.
Design GitHub Actions (or equivalent) to lint, test, build, and deploy; manage secrets and caching.
Author multi-stage Dockerfiles, optimize images, and use docker-compose (Kubernetes experience is a plus).
Create devcontainers, Makefiles, .env workflows, and pre-commit hooks to streamline onboarding.
Integrate security scanners (Trivy/Snyk) into CI and apply least-privilege practices.
Support and pair-program with AI researchers; write concise usage docs and run onboarding sessions.

Requirements

All requirements below are core to the role. Candidates who cannot meet these should not apply.

5+ years professional Python experience producing production-grade code, async I/O, packaging, and refactoring.
CS/Engineering degree or equivalent hands-on experience.
Test-driven mindset: writes unit, integration, and functional tests with pytest; aims for high coverage.
Linux power-user: comfortable with bash, grep, curl, jq, permissions, and basic networking.
Docker expertise: multi-stage Dockerfiles, image optimization, docker-compose; K8s is a plus.
CI/CD ownership: designs GitHub Actions (preferred) or similar workflows; manages secrets and caching.
FastAPI or Flask proficiency for building modular REST/async services.
Dev-environment setup: devcontainers, Makefiles, .env workflows, and pre-commit integration.
Experience building sandboxes, scoring pipelines, or evaluation frameworks for LLMs/agents.
Familiarity using AI coding assistants (Copilot, Claude Code, Cursor) responsibly.
Security awareness: hardening images, least-privilege, and integrating scanners.
Version-control discipline: semantic commits, branch hygiene, and thorough code reviews.
Screening readiness: able to complete a timed HackerRank assessment and platform coding test within 48 hours of invite.

Who should apply

This role suits engineers who enjoy owning infrastructure, partnering with researchers, and shipping developer-facing tooling. You should be comfortable mentoring others, documenting systems clearly, and making pragmatic security trade-offs.

Senior-minded engineers who can lead infra projects end-to-end.
People who prefer part-time, flexible contractor work and can commit 20+ hours/week.
Candidates with prior experience on platforms like OpenTrain, OpenTrain, OpenTrain, or Alignerr (especially with coding-focused tasks) — considered a strong bonus.

Compensation, schedule, and hiring process

This is a part-time contractor position for residents of the United States and Canada. Expected minimum weekly commitment is 20+ hours.

Short‑listed candidates will complete a timed HackerRank assessment and a platform coding test before recruiter interviews. Be prepared to complete those assessments within 48 hours of invitation.

Pay tiers (USD, hourly): Junior $34/hr; Middle $37/hr; Senior $42/hr.
Contract type: contractor, part-time. Remote work within US & Canada only.
Hiring steps: skills screen, timed HackerRank + platform coding test, recruiter interview, technical interviews and references.