MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training

Q: What is the best open-source implementation of "MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training"?

The best maintained implementation is jdrechsel13/sympy-random-latex with 2 stars on GitHub. Confidence: high. Reproducibility: Strong.

Q: How reproducible is "MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training"?

Estimated time to first reproduction: a few hours. Risk flags: Top repository has low community adoption. Start with jdrechsel13/sympy-random-latex and validate setup instructions in README.

Q: Are there pretrained models available for "MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training"?

Yes, 2 Hugging Face models found. The top result is aieng-lab/MathBERT-mamut with 45,070 downloads.

Q: What framework is used to implement "MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training"?

The primary implementation uses none.

Jonathan Drechsel, Anja Reusch, Steffen Herbold

Published: Feb 28, 2025

Best maintained implementation now

Evidence: Direct

Domain fit: AI-core

Verified repos: 4

Top repo stars: 2

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: none

Time to first repro: a few hours

1 risk flag

arXiv PDF

Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships. While state-of-the-art transformer models excel in processing and understanding natural language, they encounter challenges with mathematical notation, which involves a complex structure and diverse representations. This study focuses on the ...

Read full abstract

development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in LaTeX notation, effectively capturing the mathematical variety in notation of the same concept. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation. Experiments show that models trained on these datasets exhibit new SoTA performance on mathematical retrieval tasks. We publish our code, generated datasets, and pretrained mathematical models: https://github.com/aieng-lab/math-mutator.

Technical details

Canonical key: arxiv-2502.20855

Cache status: Stale (SWR served)

Generated at: Apr 30, 2026, 11:22 PM

Artifact coverage: direct

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few hours

1 risk flag

none

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Retrieval / indexing

MATH

Accuracy

93.98

Source: paper fulltext

Retrieval / indexing

MP BERT

Recall

99.5

Source: paper fulltext

Retrieval / indexing

MP BERT -random-falses

Recall

99.7

Source: paper fulltext

Retrieval / indexing

MP BERT -constant-falses

Recall

99.2

Source: paper fulltext

Benchmark evidence drill-down

4 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Retrieval / indexing	MATH	Accuracy	93.98	paper-derived	No explicit refs
Retrieval / indexing	MP BERT	Recall	99.5	paper-derived	No explicit refs
Retrieval / indexing	MP BERT -random-falses	Recall	99.7	paper-derived	No explicit refs
Retrieval / indexing	MP BERT -constant-falses	Recall	99.2	paper-derived	No explicit refs

Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships.

Use This Implementation Because…

Confidence: high

jdrechsel13/sympy-random-latex is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (NOASSERTION).

Open jdrechsel13/sympy-random-latex

Reproduction Risks

Top repository has low community adoption

Evidence disclosure

Evidence graph: 4 refs, 4 links.

Utility signals: depth 90/100, grounding 95/100, status high.

Implementation Comparison

Top 3 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

jdrechsel13/sympy-random-latex

best maintained

Maintenance: Stale risk

Confidence: High

Reproducibility: Strong

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 2
Last push: Jul 8, 2025 (298d ago)

CIDependencies

Risk flags

No tagged releases
No Docker setup

aieng-lab/math-mutator

historical official

Maintenance: Recently updated

Confidence: High

Reproducibility: Moderate

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 6
Last push: Mar 19, 2026 (44d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

aieng-lab/transformer-math-evaluation

alternative

Maintenance: Stale risk

Confidence: Medium

Reproducibility: Moderate

Official implementation from Papers with Code

Stars: 2
Last push: Jul 8, 2025 (298d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Best implementation now

jdrechsel13/sympy-random-latex

Confidence: High

Reproducibility: Strong

A computer algebra system written in pure Python with a randomized LaTeX Formula Generator

Stars: 2

Forks: 0

Last push: Jul 8, 2025

License: NOASSERTION

Official implementation from Papers with Code

Repository link is mentioned in the paper metadata

License ✓

CI ✓

Deps ✓

Docker –

Selected jdrechsel13/sympy-random-latex as the strongest maintained implementation for new work.
Includes CI workflow signals.
Includes dependency/environment manifest signals.
Repository activity is within the last 24 months.

Historical official implementation

Preserved for provenance. Not recommended as the default path for new builds.

aieng-lab/math-mutator

Stars: 6

Last push: Mar 19, 2026

Reproduction readiness

Setup Required

Time to first repro: hours

Last checked: Apr 30, 2026

Dependencies pinned, manual setup needed

· jdrechsel13/sympy-random-latex has pyproject.toml but requires manual environment setup.
· Last push was 298 days ago — expect possible dependency version conflicts.
· No Dockerfile — you will set up the environment manually.

Open jdrechsel13/sympy-random-latex

Quick start

git clone https://github.com/jdrechsel13/sympy-random-latex.git
pip install -e .

Additional implementations

Official

aieng-lab/transformer-math-evaluation
Confidence: Medium

aieng-lab/transformer-math-evaluation

Stars: 2

Forks: 0

Last push: Jul 8, 2025

License: Apache-2.0
aieng-lab/transformer-math-pretraining
Confidence: Medium

Framework to pretrain mathematical aware transformer models using MAMUT datasets

Stars: 1

Forks: 0

Last push: Jul 8, 2025

License: Apache-2.0

Community

No additional community repositories detected yet.

Hugging Face artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.

Models

aieng-lab/MathBERT-mamut

Curated Related

Downloads: 45,070

Likes: 0
aieng-lab/math_pretrained_bert_mamut

Curated Related

Downloads: 45

Likes: 2

Broaden model search

Transformer Retrieval / indexing Transformer Natural Language Processing Retrieval / indexing

Datasets

ddrg/named_math_formulas

Curated Related

Downloads: 107

Likes: 19

Updated: Jan 16, 2026
ddrg/math_formulas

Curated Related

Downloads: 149

Likes: 9

Updated: Jul 8, 2025