SODIUM: From Open Web Data to Queryable Databases
Chuxuan Hu, Philip Li, Maxwell Yang, Daniel Kang
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources. Thus, they must spend substantial effort searching, extracting, and organizing raw data before analysis can begin. We formalize this process as the SODIUM task, where we conceptualize open domains such as the web as latent databases that must be systematically instantiated to support ...
downstream querying. Solving SODIUM requires (1) conducting in-depth and specialized exploration of the open web, which is further strengthened by (2) exploiting structural correlations for systematic information extraction and (3) integrating collected information into coherent, queryable database instances. To quantify the challenges in automating SODIUM, we construct SODIUM-Bench, a benchmark of 105 tasks derived from published academic papers across 6 domains, where systems are tasked with exploring the open web to collect and aggregate data from diverse sources into structured tables. Existing systems struggle with SODIUM tasks: we evaluate 6 advanced AI agents on SODIUM-Bench, with the strongest baseline achieving only 46.5% accuracy. To bridge this gap, we develop SODIUM-Agent, a multi-agent system composed of a web explorer and a cache manager. Powered by our proposed ATP-BFS algorithm and optimized through principled management of cached sources and navigation paths, SODIUM-Agent conducts deep and comprehensive web exploration and performs structurally coherent information extraction. SODIUM-Agent achieves 91.1% accuracy on SODIUM-Bench, outperforming the strongest baseline by approximately 2 times and the weakest by up to 73 times.
Results & Benchmarks
No concrete benchmark grounding is available yet. Treat the page as context or an implementation starting point only.
During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources.
Implementation Evidence Summary
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Reproduction Risks
- Estimate is based on paper-only reproduction flow
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence disclosure
Evidence graph: 2 refs, 1 links.
Utility signals: depth 60/100, grounding 58/100, status medium.
Implementation Status
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
- No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
- Track assumptions and missing details in an experiment log before coding.
Reproduction readiness
Hardware requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
No verified implementation available
- · No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.
No benchmark numbers could be verified. You will not be able to validate reproduction correctness against published numbers.
Hugging Face artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Models
Datasets
Spaces
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Research context
Tasks
Agentic tool use
Methods
Agentic systems
Domains
AI Agents
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.