Data Engineer for AI Training Pipelines
Work remotely as a contract Data Engineer building ETL pipelines and data models that feed next‑generation AI systems. Flexible 20+ hrs/week contractor role with pay up to $130/hr (range $30–$130) and opportunities for full‑time contracting.
Coding Software
$30–$130/hr
Compensation
Worldwide
Eligibility
Entry
Experience
Jun 28, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for people starting and growing careers in AI training and data labeling. We help contributors discover projects, build unified portfolios, and grow sustainable freelance careers shaping how AI systems are trained.
Joining an OpenTrain project connects you to real, paid work that directly influences AI behavior while giving you tools to manage and showcase your skills across the industry.
Why AI training matters
AI models learn from examples created and curated by people. Data engineers in AI training shape the quality, reliability, and scale of those examples by building the data pipelines, models, and automation that feed training systems.
This work is remote, flexible, and accessible — you can contribute technical expertise without switching out of freelance or contractor life, while working on cutting‑edge systems that rely on your data solutions.
The role
We are hiring a Data Engineer contractor to design, build, and maintain robust data solutions that support AI model development. You will focus on ETL pipeline design, data modeling, automation with Python and SQL, and ensuring data quality at scale.
This is a remote contractor opportunity with a baseline commitment of 20+ hours per week; full‑time contract opportunities are available for candidates who want to scale up.
What you'll do
You will own core data engineering tasks that enable AI training workflows and evaluation pipelines.
- Design, build, and maintain scalable ETL pipelines for ingestion, transformation, and integration.
- Develop and optimize logical and physical data models in modern data warehousing environments.
- Write advanced SQL and Python scripts to automate data workflows and processes.
- Monitor, troubleshoot, and enhance data infrastructure for performance and scalability.
- Ensure data accuracy, consistency, and security across platforms and document data flows and system designs.
- Collaborate with cross‑functional teams to understand requirements and deliver reliable data solutions.
Requirements
You must be able to demonstrate hands‑on experience and the core skills listed below; this role is remote and requires strong self‑management and communication.
- Expert‑level proficiency in MySQL for complex querying, schema design, and performance tuning.
- Strong programming skills in Python for data manipulation and automation.
- Hands‑on experience building and maintaining production ETL pipelines.
- Demonstrated expertise with data warehousing concepts and best practices.
- Excellent written and verbal communication skills with clear documentation habits.
- Proven ability to manage and prioritize tasks in a dynamic, remote work setting.
- Detail‑oriented with a commitment to delivering high‑quality data solutions.
Helpful background (nice to have)
The following experiences are not required but will help you be successful and move faster in the role.
- Experience with cloud data warehouse platforms such as AWS Redshift, Snowflake, or Google BigQuery.
- Background in data modeling or data architecture.
- Familiarity with agile development methodologies and collaborative engineering workflows.
Compensation, hours, and employment type
This is a contractor opportunity that supports part‑time or full‑time contracting arrangements depending on workload and availability.
Pay: hourly range $30–$130/hr with published rate up to $130/hr. The typical minimum commitment is 20+ hours per week; full‑time contracts may be available.
- Work location: Remote (worldwide).
- Employment types: Contractor, Part‑time (full‑time contract opportunities possible).
- Data work types: computer programming/code datasets and evaluation/rating tasks using third‑party/other labeling tools.
How to apply
If you meet the requirements and want to work on data engineering that directly impacts AI systems, create or sign into your OpenTrain account, complete your profile, and submit your application for this project.
In your application, highlight relevant MySQL and Python projects, ETL pipelines you built or maintained, and any cloud data warehouse experience. Be prepared to share examples of documentation or system diagrams that demonstrate your approach to data quality and scalability.