Matched via arXiv identifier search · Strong overlap with paper title keywords
- Stars
- 0
- Last push
- Feb 26, 2026 (107d ago)
Risk flags
- No CI pipeline detected
- No tagged releases
- No Docker setup
Efe Kahraman, Giulio Tosato
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
We investigate document page ordering on 5,461 shuffled WOO documents (Dutch freedom of information releases) using page embeddings. These documents are heterogeneous collections such as emails, legal texts, and spreadsheets compiled into single PDFs, where semantic ordering signals are unreliable. We compare five methods, including pointer networks, seq2seq transformers, and specialized pairwise ranking models. The ...
best performing approach successfully reorders documents up to 15 pages, with Kendall's tau ranging from 0.95 for short documents (2-5 pages) to 0.72 for 15 page documents. We observe two unexpected failures: seq2seq transformers fail to generalize on long documents (Kendall's tau drops from 0.918 on 2-5 pages to 0.014 on 21-25 pages), and curriculum learning underperforms direct training by 39% on long documents. Ablation studies suggest learned positional encodings are one contributing factor to seq2seq failure, though the degradation persists across all encoding variants, indicating multiple interacting causes. Attention pattern analysis reveals that short and long documents require fundamentally different ordering strategies, explaining why curriculum learning fails. Model specialization achieves substantial improvements on longer documents (+0.21 tau).
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Transformer | Random | 2–5 pages | 0.007 | paper-derived | No explicit refs |
| Transformer | Greedy NN | 2–5 pages | 0.168 | paper-derived | No explicit refs |
| Transformer | TSP NN | 6–10 pages | 0.147 | paper-derived | No explicit refs |
We investigate document page ordering on 5,461 shuffled WOO documents (Dutch freedom of information releases) using page embeddings.
danderfer/Comp_Sci_Sem_2 is the closest maintained adjacent implementation (Matches contextual method/domain keyword: transformer). It is not paper-verified; validate algorithm and evaluation setup against the paper before trusting reported metrics. Community adoption signal: 181 GitHub stars.
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence graph: 3 refs, 3 links.
Utility signals: depth 100/100, grounding 85/100, status high.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Matched via arXiv identifier search · Strong overlap with paper title keywords
Risk flags
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
Hardware requirements
No verified implementation available
Framework baselines
Modern transformer training baseline.
Reference transformer building block implementation.
These are not paper-verified. Use them as reference points when no direct implementation is available.
Matches contextual method/domain keyword: transformer
No additional verified repositories beyond the primary recommendation.
These repositories had low-confidence matching signals and are hidden by default.
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Models
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Tasks
Transformer
Methods
Transformer
Domains
None detected
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.