OpenTrain AI
Maintained implementation availablenonePretrained Models Available

MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training

Jonathan Drechsel, Anja Reusch, Steffen Herbold

February 28, 2025arXiv: 2502.20855
4 repos2 stars~a few hours to reproduce
arXiv PDF

Abstract

Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships. While state-of-the-art transformer models excel in processing and understanding natural language, they encounter challenges with mathematical notation, which involves a complex structure and diverse representations. This study focuses on the...

Results & Benchmarks

TaskDatasetMetricValue
Retrieval / indexingMATHAccuracy93.98
Retrieval / indexingMP BERTRecall99.5
Retrieval / indexingMP BERT -random-falsesRecall99.7
Retrieval / indexingMP BERT -constant-falsesRecall99.2

Best Implementation

A computer algebra system written in pure Python with a randomized LaTeX Formula Generator

2 0 Jul 2025 NOASSERTION
License
CI
Deps
Docker
  • Selected jdrechsel13/sympy-random-latex as the strongest maintained implementation for new work.
  • Includes CI workflow signals.
  • Includes dependency/environment manifest signals.
  • Repository activity is within the last 24 months.

Reproduction Path

  1. 1

    Start with jdrechsel13/sympy-random-latex and validate setup instructions in README.

  2. 2

    Reproduce the baseline result with the provided defaults before modifying hyperparameters.

  3. 3

    Log exact dependency versions and runtime environment for reproducibility.

Time to first repro: a few hoursTop repository has low community adoption

Additional Implementations

Official

Community

No additional community repositories detected yet.

Hugging Face Artifacts

No direct paper-linked artifacts were found. Showing strongest curated related artifacts.