Python Infrastructure Engineer — LLM Training & Tooling

Join an OpenTrain project building the infrastructure that powers LLM training and agent evaluation: design sandboxes, CI/CD, Dockerized services, and developer tooling. Remote contract work for Asia‑Low candidates, 20+ hrs/week, tiered hourly pay $9–$16.

Coding & Software

100% Remote Hourly · $12.5/hr

$12.5/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Jul 25, 2025

Posted

Open worldwide

Interested in this role?

Create a free OpenTrain account and apply in minutes.

Apply now

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Creating an OpenTrain account is free and the platform connects contributors to projects where they can learn, grow, and directly shape how cutting‑edge AI systems behave.

Why AI training work matters

AI training (data labeling, human feedback, and evaluation) is the human side of model building: teams need reliable, well‑engineered tooling and infrastructure so researchers can train and evaluate models safely and at scale. These projects are remote, flexible, and a practical way to work at the cutting edge of AI development.

The Role

We’re hiring senior‑minded Python infrastructure engineers to design and maintain the systems used for LLM training and agent evaluation. You’ll build secure sandboxes and task environments, create CI/CD pipelines, containerize services, write test‑driven code, and support researchers relying on these tools.

What you’ll do day‑to‑day

Architect and maintain secure sandbox environments and task runtimes for agent evaluation and LLM experiments.
Develop backend services using FastAPI or Flask with clear validation, logging, and auth patterns.
Write clean, test‑driven Python (unit/integration/functional tests with pytest) and refactor legacy modules.
Create and debug multi‑stage Dockerfiles, docker‑compose setups, and developer containers (devcontainer.json).
Design and own CI/CD workflows (GitHub Actions or similar) that lint, test, build, and deploy while handling secrets and caching.
Build scoring pipelines, reusable repositories, and developer workflows (Makefiles, .env patterns, pre‑commit hooks).
Integrate security scanning (Trivy/Snyk) into CI and apply least‑privilege practices to images and services.
Support researchers through documentation, pair‑programming, and responsive tooling support.

Requirements

5+ years professional Python experience writing production‑grade code, packaging, async I/O, and refactoring.
Testing mindset: writes unit, integration, and functional tests with pytest and focuses on coverage and reliability.
Linux power‑user: daily CLI use (bash, grep, curl, jq, systemd) and basic networking and permission handling.
Container expertise: author and debug multi‑stage Dockerfiles; comfortable with docker‑compose (Kubernetes a plus).
CI/CD ownership: designs GitHub Actions or similar that lint, test, build, deploy, and manage secrets & caching.
FastAPI or Flask proficiency: builds modular REST/async services using Pydantic for validation.
Dev‑environment setup: creates devcontainer.json, Makefiles, .env workflows, and pre‑commit hooks.
LLM/agent infrastructure exposure: experience building sandboxes, scoring pipelines, or evaluation frameworks for agents.
Familiarity with AI coding assistants (Cursor, Claude Code, Copilot) and knowing how to leverage them safely is a plus.
Collaboration skills: explains tools to researchers, writes concise docs, and pair‑programs when needed.
Security awareness: applies least‑privilege, hardens images, and integrates scanners (Trivy/Snyk) into CI.
Version‑control discipline: semantic commits, PR templates, and code‑review best practices.
Screening readiness: able to complete a timed HackerRank assessment and platform coding test within 48 hours of invite.

Compensation, schedule & location

This is a contract, part‑time role that requires 20+ hours/week and is fully remote for candidates located in the Asia‑Low region (Afghanistan through Vietnam).

Hourly rates are tiered by experience: Junior $9 USD/hr, Middle $12 USD/hr, Senior $16 USD/hr. Employment type: contractor, part‑time.

Hiring process & next steps

Selected candidates will be asked to complete a quick HackerRank assessment and a platform coding test before scheduling recruiter interviews. Expect to complete timed coding tasks within 48 hours of an invite.

When applying, be prepared to share examples of relevant work (GitHub, repos, or quick code samples) and to discuss past projects that demonstrate the qualifications listed above.