What is the best open-source implementation of "The Dawn of Natural Language to SQL: Are We Fully Ready?"?

The best maintained implementation is hkustdial/nl2sql360 with 140 stars on GitHub. Confidence: high. Reproducibility: Moderate.

How reproducible is "The Dawn of Natural Language to SQL: Are We Fully Ready?"?

Estimated time to first reproduction: a few hours. Risk flags: No CI workflows detected. Start with hkustdial/nl2sql360 and validate setup instructions in README.

What framework is used to implement "The Dawn of Natural Language to SQL: Are We Fully Ready?"?

The primary implementation uses none.

The Dawn of Natural Language to SQL: Are We Fully Ready?

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang

Published: Jun 3, 2024

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 3

Top repo stars: 140

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: none

Time to first repro: a few hours

1 risk flag

arXiv PDF DOI

Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed questions, we present a ...

Read full abstract

multi-angle NL2SQL evaluation framework, NL2SQL360, to facilitate the design and test of new NL2SQL methods for researchers. Through NL2SQL360, we conduct a detailed comparison of leading NL2SQL methods across a range of application scenarios, such as different data domains and SQL characteristics, offering valuable insights for selecting the most appropriate NL2SQL methods for specific needs. Moreover, we explore the NL2SQL design space, leveraging NL2SQL360 to automate the identification of an optimal NL2SQL solution tailored to user-specific needs. Specifically, NL2SQL360 identifies an effective NL2SQL method, SuperSQL, distinguished under the Spdier dataset using the execution accuracy metric. Remarkably, SuperSQL achieves competitive performance with execution accuracy of 87% and 62.66% on the Spider and BIRD test sets, respectively.

Technical details

Canonical key: arxiv-2406.01265

Cache status: Fresh

Generated at: Mar 9, 2026, 10:47 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: ready (legacy_benchmark_findings_trimmed)

LLM model: openai/gpt-5.1-20251113

LLM generated: Mar 9, 2026, 1:51 PM

LLM content type: researcher_benchmark_brief

HF policy: hf-relevance-v27

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_abstract], researcherSummary.benchmarkSnapshot[0], researcherSummary.coreClaim, guidance.riskFlags[0], repos[0].fullName, repos[1].fullName, repos[2].fullName, paper.title, summary.hasReliableImplementation

Researcher verdict

Recommended implementation path available

implementation baseline

Benchmark trust: grounded evidence

This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on hkustdial/nl2sql360. Use it as an implementation baseline, then validate benchmark parity before adapting it.

Why this page is still worth reading

Benchmark findings give you an audit trail for validation before picking an implementation path.
A concrete repository path exists via hkustdial/nl2sql360, so this page can act as a practical starting point.
Reproduction risks are surfaced explicitly, which helps decide whether the paper is worth immediate prototyping.

Benchmark trust

Concrete benchmark findings are present and can be audited against the extracted evidence.

Use this page as

Start here when you need the most practical implementation path quickly.

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Natural language to SQL (NL2SQL)

Spider test set

Execution accuracy

87%

Source: llm grounded

Natural language to SQL (NL2SQL)

BIRD test set

Execution accuracy

62.66%

Source: llm grounded

Benchmark evidence drill-down

2 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language to SQL (NL2SQL)	Spider test set	Execution accuracy	87%	llm-grounded	paper.abstractevidencePack.paperSections[id=paper_abstract]researcherSummary.benchmarkSnapshot[0]
Natural language to SQL (NL2SQL)	BIRD test set	Execution accuracy	62.66%	llm-grounded	paper.abstractevidencePack.paperSections[id=paper_abstract]

Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases.

Use This Implementation Because…

Confidence: high

hkustdial/nl2sql360 is the strongest maintained implementation based on ranking signals. License is declared (MIT). Dependency/environment manifests are present.

Open hkustdial/nl2sql360

Reproduction Risks

No CI workflows detected

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 90/100, grounding 85/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

hkustdial/nl2sql360

best maintained

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 140
Last push: Oct 2, 2025 (159d ago)

ReleasesDependencies

Risk flags

No CI pipeline detected
No Docker setup

hkustdial/nl2sql_survey

historical official

Maintenance: Active

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 1,345
Last push: Mar 3, 2026 (7d ago)

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

BugMaker-Boyan/NL2SQL360

alternative

Maintenance: Stale

Confidence: Medium

Reproducibility: Moderate

Official implementation from Papers with Code

Stars: 10
Last push: Sep 1, 2024 (555d ago)

Dependencies

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Paper summary

AI-generated

AI-generated summary grounded in paper metadata and artifact signals.

The paper introduces NL2SQL360, a multi-angle NL2SQL evaluation framework designed to support the design and testing of new NL2SQL methods for researchers. This page includes benchmark evidence for Natural language to SQL (NL2SQL) on Spider test split set. Reproduction guidance focuses on implementation viability and concrete risk controls.

Key contributions

The paper introduces NL2SQL360, a multi-angle NL2SQL evaluation framework designed to support the design and testing of new NL2SQL methods for researchers.
Using NL2SQL360, the authors systematically compare leading NL2SQL methods across diverse application scenarios, including different data domains and SQL characteristics.
The NL2SQL360 framework is used to explore the NL2SQL design space and to automatically identify an NL2SQL solution that is tailored to user-specific requirements.
Leveraging NL2SQL360, the authors identify SuperSQL as an effective NL2SQL method under the Spider dataset when evaluated using execution accuracy.
SuperSQL achieves competitive execution accuracy scores of 87% on the Spider test set and 62.66% on the BIRD test set.

Implementation guidance

Use hkustdial/nl2sql360 first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.

Reproducibility notes

Lack of continuous integration workflows can allow unnoticed breaking changes to enter the main branch, leading to non-reproducible or failing runs over time.
Environment or dependency drift, due to no automated testing across versions, may cause discrepancies between reported NL2SQL360 results and newly reproduced results.

Best implementation now

hkustdial/nl2sql360

Confidence: High

Reproducibility: Moderate

🔥[VLDB'24] Official repository for the paper “The Dawn of Natural Language to SQL: Are We Fully Ready?”

Stars: 140

Forks: 16

Last push: Oct 2, 2025

License: MIT

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

Strong overlap with paper title keywords

Community adoption signal (140 stars)

License ✓

CI –

Deps ✓

Docker –

Selected hkustdial/nl2sql360 as the strongest maintained implementation for new work.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.
Official repository is preserved separately as historical context.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

hkustdial/nl2sql_survey

Stars: 1,345

Last push: Mar 3, 2026

Reproduction path

Direct

Follow the direct implementation path

1

Start with hkustdial/nl2sql360 and validate setup instructions in README.
2

Reproduce the baseline result with the provided defaults before modifying hyperparameters.
3

Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hours

No CI workflows detected

Additional implementations

Official

BugMaker-Boyan/NL2SQL360
Confidence: Medium

Please visit https://github.com/HKUSTDial/NL2SQL360 to get the official code!

Stars: 10

Forks: 0

Last push: Sep 1, 2024

License: MIT