Automated Coding of Communication Data Using ChatGPT: Consistency Across Subgroups
Jiangang Hao, Wenju Cui, Patrick Kyllonen, Emily Kerzabi · Oct 23, 2025 · Citations: 0
Data freshness
Extraction: FreshCheck recency before relying on this page for active eval decisions. Use stale pages as context and verify against current hub results.
Metadata refreshed
Mar 6, 2026, 3:17 PM
RecentExtraction refreshed
Mar 14, 2026, 2:04 AM
FreshExtraction source
Persisted extraction
Confidence 0.70
Abstract
Assessing communication and collaboration at scale depends on a labor intensive task of coding communication data into categories according to different frameworks. Prior research has established that ChatGPT can be directly instructed with coding rubrics to code the communication data and achieves accuracy comparable to human raters. However, whether the coding from ChatGPT or similar AI technology perform consistently across different demographic groups, such as gender and race, remains unclear. To address this gap, we introduce three checks for evaluating subgroup consistency in LLM-based coding by adapting an existing framework from the automated scoring literature. Using a typical collaborative problem-solving coding framework and data from three types of collaborative tasks, we examine ChatGPT-based coding performance across gender and racial/ethnic groups. Our results show that ChatGPT-based coding perform consistently in the same way as human raters across gender or racial/ethnic groups, demonstrating the possibility of its use in large-scale assessments of collaboration and communication.