Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities
Devon Jarvis, Richard Klein, Benjamin Rosman, Steven James, Stefano Sarao Mannelli
Core AI workload signals detected from paper context and implementation/artifact evidence.
Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factor ...
s contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.
Results & Benchmarks
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates.
Implementation Evidence Summary
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Reproduction Risks
- Estimate is based on paper-only reproduction flow
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence disclosure
Evidence graph: 2 refs, 1 links.
Utility signals: depth 60/100, grounding 58/100, status medium.
Implementation Status
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
- No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
- Track assumptions and missing details in an experiment log before coding.
Reproduction readiness
Hardware requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
No verified implementation available
- · No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.
No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
Hugging Face artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Models
Datasets
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Research context
Tasks
None detected
Methods
Transformer
Domains
Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.