Skip to content
← Back to explorer

Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation

Rishikesh Devanathan, Varun Nathan, Ayush Kumar · Aug 25, 2025 · Citations: 0

Abstract

Synthetic data is increasingly critical for contact centers, where privacy constraints and data scarcity limit the availability of real conversations. However, generating synthetic dialogues that are realistic and useful for downstream applications remains challenging. In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages. To test downstream utility, we evaluate synthetic transcripts on an automated quality assurance (AutoQA) task, finding that prompts optimized on real transcripts consistently outperform those optimized on synthetic transcripts. These results suggest that current synthetic transcripts fall short in capturing the full realism of real agent-customer interactions. To highlight these downstream gaps, we introduce a diagnostic evaluation framework comprising 17 metrics across four dimensions: (1) Emotional and Sentiment Arcs, (2) Linguistic Complexity, (3) Interaction Style, and (4) Conversational Properties. Our analysis shows that even with structured supervision, current generation strategies exhibit measurable deficiencies in sentiment fidelity, disfluency modeling, behavioral variation, and conversational realism. Together, these results highlight the importance of diagnostic, metric-driven evaluation for synthetic conversation generation intended for downstream applications.

Human Data Lens

  • Uses human feedback: No
  • Feedback types: None
  • Rater population: Unknown
  • Unit of annotation: Unknown
  • Expertise required: Multilingual

Evaluation Lens

  • Evaluation modes: Automatic Metrics
  • Agentic eval: None
  • Quality controls: Not reported
  • Confidence: 0.30
  • Flags: low_signal, possible_false_positive

Research Summary

Contribution Summary

  • Synthetic data is increasingly critical for contact centers, where privacy constraints and data scarcity limit the availability of real conversations.
  • However, generating synthetic dialogues that are realistic and useful for downstream applications remains challenging.
  • In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages.

Why It Matters For Eval

  • In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages.
  • These results suggest that current synthetic transcripts fall short in capturing the full realism of real agent-customer interactions.

Related Papers