Maintained implementation available

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

February 1, 2024arXiv: 2402.04615

4 repos381 stars~a few hours to reproduce

Abstract

Results & Benchmarks

Task	Dataset	Metric	Value
Computer vision	Baseline	App Unseen.	67.6
Computer vision	ScreenAI	App Unseen.	87.8

Best Implementation

kyegomez/ScreenAI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"

381 37 Mar 2026 MIT

License ✓

CI ✓

Deps ✓

Docker –

Selected kyegomez/ScreenAI as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Reproduction Path

1
Start with kyegomez/ScreenAI and validate setup instructions in README.
2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3
Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursNo repository-level red flags were detected, but paper-specific preprocessing and hyperparameter details may still be under-specified.

Additional Implementations

Official

google-research-datasets/screen_annotationConfidence: low
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
Stars: 88Forks: 10Last push: Mar 2024

Community

niuzaisheng/ScreenAgentConfidence: low
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
Stars: 585Forks: 64Last push: Nov 2024License: NOASSERTION

Hugging Face Artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches:

models

arxiv:2402.04615 ScreenAI Vision-Language

datasets

arxiv:2402.04615 ScreenAI dataset

spaces

arxiv:2402.04615 ScreenAI demo

Research Context