SFT Dataset Engineer — JUCE C++ Audio DSP (Qwen3-Coder)
Build a 3,000–5,000 example supervised fine-tuning dataset and run QLoRA on Qwen3‑Coder focused on JUCE/C++ audio DSP. Expert C++/JUCE and DSP background required; 20+ hrs/week, fixed-price contract of $1,000.
Coding & Software
$1000 fixed price
Compensation
Worldwide
Eligibility
Expert
Experience
Feb 27, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect technical contributors with cutting-edge AI training projects — a fast-growing way to work in tech that is remote, flexible, and accessible to people who want to shape how AI systems behave.
- Free to join and designed for people who want to start and grow careers teaching AI.
- Projects span annotation, RLHF, transcription, code labeling, and more — this role focuses on code and audio DSP.
Why AI training and this project matter
AI models learn from high-quality examples prepared by people. This project directly improves a generative coding model's ability to write and reason about real-time audio DSP in JUCE/C++ by creating curated instruction-response pairs and fine-tuning Qwen3‑Coder.
Work is remote, impact-driven, and sits at the intersection of software engineering, DSP, and machine learning — an opportunity to influence how future audio tools are built.
- Contribute to real model improvements via supervised fine-tuning and QLoRA.
- Work on domain-specific examples (processBlock, AudioBuffer, filters, oscillators, delay, reverb, plugin architecture).
The role
We are hiring an ML engineer / dataset curator with deep experience in C++ audio plugin development and DSP to build a supervised fine-tuning (SFT) dataset and run QLoRA fine-tuning on Qwen3‑Coder. This is a contractor, part-time engagement for 20+ hours/week on a fixed‑price contract of $1,000.
You will extract DSP‑relevant code from provided open-source repositories and convert tutorials, blog posts, forum Q&A, and textbook content into clean ChatML training examples, with LLM-assisted generation and secondary quality filtering.
- Employment type: Contractor, part-time. Worldwide applicants welcome.
- Time commitment: 20+ hours/week during the engagement.
- Compensation: Fixed price $1,000 for the contract (see deliverables).
What you'll do
Execute a reproducible pipeline from source repositories and documentation to SFT-ready examples, and perform QLoRA fine-tuning on Qwen3‑Coder using the Unsloth tooling supplied by the client.
- Extract DSP-relevant C++ functions and examples from 40+ open-source GitHub repos (e.g., Surge, ChowDSP, Airwindows, Vital, JUCE framework).
- Generate high-quality instruction-response pairs using LLM-assisted pipelines (Bespoke Curator or Distilabel with Claude/GPT‑4).
- Convert blog posts, tutorials, forum Q&A, and free textbook content into clean ChatML-formatted training examples.
- Perform quality filtering with a second LLM pass, deduplicate examples, and finalize a 3,000–5,000 example dataset covering processBlock, AudioBuffer, juce_dsp filters, oscillators, delay lines, reverb, virtual analog modeling, plugin architecture, and real-time DSP best practices.
- Run QLoRA fine-tuning on Qwen3‑Coder using Unsloth and produce reproducible training logs and checkpoints.
- Deliver a comprehensive resources document listing all repo URLs, blog links, textbook references, and any clone scripts used; the client will provide an initial resource document and clone script to the hired candidate.
Requirements
Do not apply unless you meet the core technical requirements: strong C++ and JUCE experience, solid DSP knowledge, and hands-on ML engineering experience with model fine-tuning workflows and LLM-assisted data curation.
- Expert-level experience in C++ audio plugin development using the JUCE framework.
- Strong digital signal processing background: filters, oscillators, delay/reverb algorithms, virtual analog techniques, and real-time considerations.
- Hands-on experience creating datasets for supervised fine-tuning and running QLoRA-style low-rank adaptation fine-tuning pipelines.
- Familiarity with LLM-assisted data curation workflows and tools (examples: Bespoke Curator, Distilabel, Claude, GPT‑4).
- Ability to produce ChatML-formatted training examples and perform systematic deduplication and QA.
- Comfortable with git, cloning multiple repos, and producing reproducible scripts and documentation.
Preferred and helpful
These are not strict requirements but will make your application stand out and help accelerate the project ramp-up.
- Previous contributions to relevant open-source audio projects (Surge, Airwindows, Vital, ChowDSP, etc.).
- Existing personal or public JUCE audio projects we can review — the client explicitly says examples would be super helpful.
- Experience using Unsloth or similar tooling for model fine-tuning and experimentation.
Deliverables, timeline, and how to apply
Deliverables: a curated 3,000–5,000 example ChatML-formatted SFT dataset, deduplication and QA reports, training logs and QLoRA checkpoints for Qwen3‑Coder, and a comprehensive resources document and clone script. The client will provide an initial resource list and clone script to start.
To apply, submit a brief cover note describing your JUCE/C++ and DSP experience, links to any example audio projects or repos, and a short outline of how you'd approach building the dataset and running QLoRA. Include availability for 20+ hours/week and confirm acceptance of the fixed-price $1,000 contract.
- Application materials: cover note, links to example projects, brief plan for dataset creation and fine-tuning, and availability.
- Contract terms: fixed-price $1,000; part-time contractor role; global applicants accepted.