TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Q: How reproducible is "TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate assumes artifact-level reproduction; full training reproduction may require additional paper details.. Use the paper-linked Hugging Face release as the starting artifact, then reconstruct training and evaluation settings from the paper.

Hanyu Guo, Jiedong Yang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxiang Chu

Published: May 21, 2026

No direct implementation yet

Evidence: Inferred

Domain fit: AI-core

Verified repos: 0

Time to first repro: a few days

2 risk flags

arXiv PDF

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for ...

Read full abstract

three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.

Technical details

Canonical key: arxiv-2605.22355

Cache status: Fresh

Generated at: May 23, 2026, 5:27 AM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: No

LLM status: ready

LLM model: openai/gpt-5.1-20251113

LLM generated: May 23, 2026, 5:28 AM

LLM content type: researcher_benchmark_brief

HF policy: hf-relevance-v27

LLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_table_2], evidencePack.paperSections[id=paper_table_3], evidencePack.paperSections[id=paper_table_4], evidencePack.paperSections[id=paper_table_5], evidencePack.paperSections[id=paper_table_7], evidencePack.paperSections[id=paper_table_10], guidance.riskFlags[0], guidance.riskFlags[1], researcherSummary.implementationRecommendation, researcherSummary.reproductionRisks[0], researcherSummary.hardwareNotes[0], researcherSummary.timeToFirstMeaningfulRun

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Direct + Inferred Evidence

Preference-aware transit route planning

TransitLM Preference-Aware Planning benchmark

Label Preference Compliance

96.02%

Source: llm grounded

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Preference-aware transit route planning	TransitLM Preference-Aware Planning benchma…	Label Preference Compliance	96.02%	llm-grounded	No explicit refs

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency.

Implementation Evidence Summary

Confidence: low

No direct maintained repository implementation was found, but paper-linked Hugging Face artifacts are available.

Reproduction Risks

Estimate assumes artifact-level reproduction; full training reproduction may require additional paper details.

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 2 refs, 1 links.

Utility signals: depth 60/100, grounding 58/100, status medium.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

Use the paper-linked Hugging Face release as the starting artifact, then reconstruct training and evaluation settings from the paper.
No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days