Skip to content

Researcher verdict

Recommended implementation path available

implementation baseline
Benchmark trust: grounded evidence

This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on hkustdial/nl2sql360. Use it as an implementation baseline, then validate benchmark parity before adapting it.

Why this page is still worth reading

  • Benchmark findings give you an audit trail for validation before picking an implementation path.
  • A concrete repository path exists via hkustdial/nl2sql360, so this page can act as a practical starting point.
  • Reproduction risks are surfaced explicitly, which helps decide whether the paper is worth immediate prototyping.

Benchmark trust

Concrete benchmark findings are present and can be audited against the extracted evidence.

Use this page as

Start here when you need the most practical implementation path quickly.

Results & Benchmarks

Freshness tier: cold
Direct + Inferred Evidence
Natural language to SQL (NL2SQL)
Spider test set
Execution accuracy
87%
Source: llm grounded
Natural language to SQL (NL2SQL)
BIRD test set
Execution accuracy
62.66%
Source: llm grounded

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task Dataset Metric Value Source Evidence refs
Natural language to SQL (NL2SQL) Spider test set Execution accuracy 87% llm-grounded
paper.abstractevidencePack.paperSections[id=paper_abstract]researcherSummary.benchmarkSnapshot[0]
Natural language to SQL (NL2SQL) BIRD test set Execution accuracy 62.66% llm-grounded
paper.abstractevidencePack.paperSections[id=paper_abstract]

Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases.

Use This Implementation Because…

Confidence: high

hkustdial/nl2sql360 is the strongest maintained implementation based on ranking signals. License is declared (MIT). Dependency/environment manifests are present.

Open hkustdial/nl2sql360

Reproduction Risks

  • No CI workflows detected
Evidence disclosure

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_abstract], researcherSummary.benchmarkSnapshot[0], researcherSummary.coreClaim, guidance.riskFlags[0], repos[0].fullName, repos[1].fullName, repos[2].fullName, paper.title, summary.hasReliableImplementation

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

hkustdial/nl2sql360
best maintained
Maintenance: Recently updated
Confidence: High
Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars
140
Last push
Oct 2, 2025 (159d ago)
ReleasesDependencies

Risk flags

  • No CI pipeline detected
  • No Docker setup
hkustdial/nl2sql_survey
historical official
Maintenance: Active
Confidence: High
Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars
1,345
Last push
Mar 3, 2026 (7d ago)

Risk flags

  • No CI pipeline detected
  • No tagged releases
  • No Docker setup
Maintenance: Stale
Confidence: Medium
Reproducibility: Moderate

Official implementation from Papers with Code

Stars
10
Last push
Sep 1, 2024 (555d ago)
Dependencies

Risk flags

  • No push in 12+ months
  • No CI pipeline detected
  • No tagged releases

Paper summary

AI-generated

AI-generated summary grounded in paper metadata and artifact signals.

The paper introduces NL2SQL360, a multi-angle NL2SQL evaluation framework designed to support the design and testing of new NL2SQL methods for researchers. This page includes benchmark evidence for Natural language to SQL (NL2SQL) on Spider test split set. Reproduction guidance focuses on implementation viability and concrete risk controls.

Key contributions

  • The paper introduces NL2SQL360, a multi-angle NL2SQL evaluation framework designed to support the design and testing of new NL2SQL methods for researchers.
  • Using NL2SQL360, the authors systematically compare leading NL2SQL methods across diverse application scenarios, including different data domains and SQL characteristics.
  • The NL2SQL360 framework is used to explore the NL2SQL design space and to automatically identify an NL2SQL solution that is tailored to user-specific requirements.
  • Leveraging NL2SQL360, the authors identify SuperSQL as an effective NL2SQL method under the Spider dataset when evaluated using execution accuracy.
  • SuperSQL achieves competitive execution accuracy scores of 87% on the Spider test set and 62.66% on the BIRD test set.

Implementation guidance

Use hkustdial/nl2sql360 first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.

Reproducibility notes

  • Lack of continuous integration workflows can allow unnoticed breaking changes to enter the main branch, leading to non-reproducible or failing runs over time.
  • Environment or dependency drift, due to no automated testing across versions, may cause discrepancies between reported NL2SQL360 results and newly reproduced results.

Best implementation now

hkustdial/nl2sql360
Confidence: High
Reproducibility: Moderate

🔥[VLDB'24] Official repository for the paper “The Dawn of Natural Language to SQL: Are We Fully Ready?”

Stars: 140
Forks: 16
Last push: Oct 2, 2025
License: MIT
Official implementation from Papers with Code
Repository link is mentioned in the paper metadata
Strong overlap with paper title keywords
Community adoption signal (140 stars)
License ✓
CI –
Deps ✓
Docker –
  • Selected hkustdial/nl2sql360 as the strongest maintained implementation for new work.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.
  • Official repository is preserved separately as historical context.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

hkustdial/nl2sql_survey
Stars: 1,345
Last push: Mar 3, 2026

Reproduction path

Direct

Follow the direct implementation path

  1. 1

    Start with hkustdial/nl2sql360 and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hours
No CI workflows detected

Additional implementations

Official

  • BugMaker-Boyan/NL2SQL360
    Confidence: Medium

    Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!

    Stars: 10
    Forks: 0
    Last push: Sep 1, 2024
    License: MIT

Community

No additional community repositories detected yet.

Hugging Face artifacts

No trustworthy direct or curated related Hugging Face artifacts were found yet.

Continue with targeted Hugging Face searches derived from the paper title and method context:

Tip: start with models, then check datasets/spaces if you need evaluation data or demos.

Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.

Research context

Tasks

Natural language processing

Methods

Transformer

Domains

Natural Language Processing

Evaluation & Human Feedback Data

Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.

Open in HFEPX

Explore Similar Papers

Jump to Paper2Code search queries derived from this paper's research context.

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.