Skip to content
← Back to explorer

Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

Christian Møller Dahl, Torben Johansen, Christian Vedel · Feb 21, 2024 · Citations: 0

Abstract

This paper introduces OccCANINE, an open-source tool that maps occupational descriptions to HISCO codes. Manual coding is slow and error-prone; OccCANINE replaces weeks of work with results in minutes. We fine-tune CANINE on 15.8 million description-code pairs from 29 sources in 13 languages. The model achieves 96 percent accuracy, precision, and recall. We also show that the approach generalizes to three systems - OCC1950, OCCICEM, and ISCO-68 - and release them open source. By breaking the "HISCO barrier," OccCANINE democratizes access to high-quality occupational coding, enabling broader research in economics, economic history, and related disciplines.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: Coding

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: None
  • Quality controls: Not reported
  • Confidence: 0.35
  • Flags: low_signal, possible_false_positive

Research Summary

Contribution Summary

  • This paper introduces OccCANINE, an open-source tool that maps occupational descriptions to HISCO codes.
  • Manual coding is slow and error-prone; OccCANINE replaces weeks of work with results in minutes.
  • We fine-tune CANINE on 15.8 million description-code pairs from 29 sources in 13 languages.

Related Papers