Fine-Tune Qwen3-Coder on JUCE/C++ Audio DSP — Dataset Curation & QLoRA Training
OpenTrain AI · Remote · Worldwide · Posted Jun 9, 2026
Labeling Overview:
We need an ML engineer to build a supervised fine-tuning (SFT) dataset for a JUCE/C++ audio DSP coding model. The work involves: (1) extracting DSP-relevant C++ functions from 40+ open-source GitHub repos (Surge, ChowDSP, Airwindows, Vital, JUCE framework, etc.), (2) generating high-quality instruction-response pairs using LLM-assisted pipelines (Bespoke Curator or Distilabel with Claude/GPT-4), (3) converting blog posts, tutorials, forum Q&A, and free textbook content into clean ChatML-formatted training examples, (4) quality filtering with a second LLM pass and deduplication, and (5) running QLoRA fine-tuning on Qwen3-Coder using Unsloth. Target: 3,000–5,000 examples covering processBlock, AudioBuffer, juce_dsp filters, oscillators, delay lines, reverb, virtual analog modeling, plugin architecture, and real-time DSP best practices. A comprehensive resource document with all repo URLs, blog links, textbook references, and a clone script will be provided to the hired candidate.