Results & Benchmarks
| Task | Dataset | Metric | Value |
|---|---|---|---|
| Computer vision | Baseline | App Unseen. | 67.6 |
| Computer vision | ScreenAI | App Unseen. | 87.8 |
Best Implementation
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
- Selected kyegomez/ScreenAI as the strongest maintained implementation for new work.
- Includes CI workflow signals.
- Includes dependency/environment manifest signals.
- Repository activity is within the last 24 months.
Reproduction Path
- 1
Start with kyegomez/ScreenAI and validate setup instructions in README.
- 2
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
- 3
Log exact dependency versions and runtime environment for reproducibility.
Additional Implementations
Official
- google-research-datasets/screen_annotationConfidence: low
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
Stars: 88Forks: 10Last push: Mar 2024
Community
- niuzaisheng/ScreenAgentConfidence: low
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
Stars: 585Forks: 64Last push: Nov 2024License: NOASSERTION
Hugging Face Artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches: