OpenTrain AI
Maintained implementation availablepytorchPretrained Models Available

Kosmos-2: Grounding Multimodal Large Language Models to the World

June 1, 2023arXiv: 2306.14824
1 repo22,088 stars~a few days to reproduce
arXiv PDF

Abstract

Results & Benchmarks

TaskDatasetMetricValue
Natural language processingFewVLMFlickr30k31.0
Natural language processingMetaLMFlickr30k43.4

Hardware Requirements

  • Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Best Implementation

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

22.1k 2.7k Jan 2026 MIT
License
CI
Deps
Docker
  • Selected microsoft/unilm as the strongest maintained implementation for new work.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with microsoft/unilm and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few daysNo CI workflows detectedDependency manifest is missing

Additional Implementations

No additional verified repositories beyond the primary recommendation.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.