Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 30,252
- Last push
- Jul 17, 2024 (664d ago)
Risk flags
- No push in 12+ months
- No CI pipeline detected
- No tagged releases
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
Core AI workload signals detected from paper context and implementation/artifact evidence.
Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-f ...
ollowing capabilities of pretrained language models by bootstrapping off their own generations. Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model. Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT-001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning. Our code and data are available at https://github.com/yizhongw/self-instruct.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Instruction tuning | T5-LM | ROUGE-L. | 25.7 | paper-derived | No explicit refs |
| Instruction tuning | GPT3 | ROUGE-L. | 6.8 | paper-derived | No explicit refs |
| Instruction tuning | T 0 0 | ROUGE-L. | 33.1 | paper-derived | No explicit refs |
Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks.
tatsu-lab/stanford_alpaca is the strongest maintained implementation based on ranking signals. License is declared (Apache-2.0). Dependency/environment manifests are present.
Open tatsu-lab/stanford_alpacaEvidence graph: 3 refs, 3 links.
Utility signals: depth 90/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Community adoption signal (16919 stars)
Risk flags
Code and documentation to train Stanford's Alpaca models, and generate the data.
Preserved for provenance. Not recommended as the default path for new builds.
Dependencies pinned, manual setup needed
Quick start
git clone https://github.com/tatsu-lab/stanford_alpaca.git
pip install -r requirements.txt No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
Showing top 6 by score. 2 additional low-confidence matches are hidden.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Instruction tuning
Methods
Transformer
Domains
Natural Language Processing, Large Language Models
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.