Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 140
- Last push
- Oct 2, 2025 (159d ago)
Risk flags
- No CI pipeline detected
- No Docker setup
Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang
Core AI workload signals detected from paper context and implementation/artifact evidence.
Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed questions, we present a ...
multi-angle NL2SQL evaluation framework, NL2SQL360, to facilitate the design and test of new NL2SQL methods for researchers. Through NL2SQL360, we conduct a detailed comparison of leading NL2SQL methods across a range of application scenarios, such as different data domains and SQL characteristics, offering valuable insights for selecting the most appropriate NL2SQL methods for specific needs. Moreover, we explore the NL2SQL design space, leveraging NL2SQL360 to automate the identification of an optimal NL2SQL solution tailored to user-specific needs. Specifically, NL2SQL360 identifies an effective NL2SQL method, SuperSQL, distinguished under the Spdier dataset using the execution accuracy metric. Remarkably, SuperSQL achieves competitive performance with execution accuracy of 87% and 62.66% on the Spider and BIRD test sets, respectively.
Researcher verdict
This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on hkustdial/nl2sql360. Use it as an implementation baseline, then validate benchmark parity before adapting it.
Why this page is still worth reading
Benchmark trust
Concrete benchmark findings are present and can be audited against the extracted evidence.
Use this page as
Start here when you need the most practical implementation path quickly.
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Natural language to SQL (NL2SQL) | Spider test set | Execution accuracy | 87% | llm-grounded | paper.abstractevidencePack.paperSections[id=paper_abstract]researcherSummary.benchmarkSnapshot[0] |
| Natural language to SQL (NL2SQL) | BIRD test set | Execution accuracy | 62.66% | llm-grounded | paper.abstractevidencePack.paperSections[id=paper_abstract] |
Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases.
hkustdial/nl2sql360 is the strongest maintained implementation based on ranking signals. License is declared (MIT). Dependency/environment manifests are present.
Open hkustdial/nl2sql360LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_abstract], researcherSummary.benchmarkSnapshot[0], researcherSummary.coreClaim, guidance.riskFlags[0], repos[0].fullName, repos[1].fullName, repos[2].fullName, paper.title, summary.hasReliableImplementation
Evidence graph: 3 refs, 3 links.
Utility signals: depth 90/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
Official implementation from Papers with Code
Risk flags
AI-generated summary grounded in paper metadata and artifact signals.
The paper introduces NL2SQL360, a multi-angle NL2SQL evaluation framework designed to support the design and testing of new NL2SQL methods for researchers. This page includes benchmark evidence for Natural language to SQL (NL2SQL) on Spider test split set. Reproduction guidance focuses on implementation viability and concrete risk controls.
Use hkustdial/nl2sql360 first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.
🔥[VLDB'24] Official repository for the paper “The Dawn of Natural Language to SQL: Are We Fully Ready?”
Preserved for provenance. Not recommended as the default path for new builds.
Follow the direct implementation path
Start with hkustdial/nl2sql360 and validate setup instructions in README.
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
Log exact dependency versions and runtime environment for reproducibility.
Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!
No additional community repositories detected yet.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Natural language processing
Methods
Transformer
Domains
Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.