Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Q: How reproducible is "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Q: Are there pretrained models available for "Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs"?

Yes, 3 Hugging Face models found. The top result is NYTK/named-entity-recognition-nerkor-hubert-hungarian with 2,188 downloads.

Vinicius Anjos de Almeida, Sandro Saorin da Silva, Josimar Chire, Leonardo Vicenzi, Nícolas Henrique Borges, Helena Kociolek, Sarah Miriã de Castro Rocha, Frederico Nassif Gomes, Júlia Cristina Ferreira, Oge Marques, Lucas Emanuel Silva e Oliveira

Published: Mar 27, 2026

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-core

Verified repos: 0

Core AI workload signals detected from paper context and implementation/artifact evidence.

Time to first repro: a few days

2 risk flags

arXiv PDF

Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and ...

Read full abstract

mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and overall performance. Multilingual BERT models, particularly mmBERT, perform strongly for Portuguese clinical NER and can run locally with limited computational resources. Balanced data-splitting strategies further enhance performance.

Technical details

Canonical key: arxiv-2603.26510

Cache status: Stale (SWR served)

Generated at: May 30, 2026, 5:25 AM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Natural language processing

biobertpt-all

SemClinBr

0.6443

Source: paper fulltext

Natural language processing

0.6830

SemClinBr

0.6573

Source: paper fulltext

Natural language processing

0.3671

SemClinBr

0.7602

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Natural language processing	biobertpt-all	SemClinBr	0.6443	paper-derived	No explicit refs
Natural language processing	0.6830	SemClinBr	0.6573	paper-derived	No explicit refs
Natural language processing	0.3671	SemClinBr	0.7602	paper-derived	No explicit refs

Clinical notes contain valuable unstructured information.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 75/100, grounding 78/100, status high.

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: NYTK/named-entity-recognition-nerkor-hubert-hungarian