OpenTrain AI
Maintained implementation availablepytorch

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh +7 more

February 26, 2021arXiv: 2103.00020
1 repo33,116 stars~a few hours to reproduce
arXiv PDF

Abstract

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple...

Results & Benchmarks

TaskDatasetMetricValue
Image classificationCIFAR-10Accuracy101
Image classificationCIFAR-100Accuracy102

Best Implementation

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

33.1k 4.0k Mar 2026 MIT
License
CI
Deps
Docker
  • Selected openai/CLIP as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with openai/CLIP and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Research Context