Compressing Language Models for Specialized Domains

Miles Williams, George Chrysostomou, Vitor Jeronymo, Nikolaos Aletras · Feb 25, 2025 · Citations: 0

Abstract

Language models (LMs) excel at tasks across diverse domains, yet require substantial computational resources during inference. Compression techniques such as pruning and quantization offer a practical path towards efficient LM deployment, exemplified by their ability to preserve performance on general-purpose benchmarks. However, general-purpose LM compression methods can negatively affect performance in specialized domains (e.g. biomedical or legal). Recent work has sought to address this issue, but requires a computationally expensive full-parameter fine-tuning pipeline. To this end, we propose MixCal, a novel calibration method designed to improve the in-domain performance of compressed LMs in a post-training setting. Through extensive experimentation, we demonstrate that MixCal substantially outperforms existing approaches on domain-specific tasks and preserves general performance. Notably, these performance gains are achieved while also reducing the computational cost of LM compression.

Human Data Lens

Uses human feedback: No
Feedback types: None
Rater population: Unknown
Unit of annotation: Unknown
Expertise required: Law, Medicine

Evaluation Lens

Evaluation modes: Automatic Metrics
Agentic eval: None
Quality controls: Calibration
Confidence: 0.45
Flags: low_signal, possible_false_positive

Research Summary

Contribution Summary

Language models (LMs) excel at tasks across diverse domains, yet require substantial computational resources during inference.
Compression techniques such as pruning and quantization offer a practical path towards efficient LM deployment, exemplified by their ability to preserve performance on general-purpose benchmarks.
However, general-purpose LM compression methods can negatively affect performance in specialized domains (e.g.

Why It Matters For Eval

Compression techniques such as pruning and quantization offer a practical path towards efficient LM deployment, exemplified by their ability to preserve performance on general-purpose benchmarks.