PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Sudip Bhujel · Mar 3, 2026 · Citations: 0

Automatic Metrics Coding Expert Verification Medicine Pairwise Preference

Abstract

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content. We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue. Our design enforces differential privacy at every training stage that directly accesses dialogue-derived supervision: (i) Differential Private Stochastic Gradient Descent (DP-SGD) for medical SFT and (ii) DP-SGD for reward model learning from preference pairs. To limit additional privacy expenditure during alignment, we apply DP-SGD to the PPO actor and critic when operating on dialogue-derived prompts, while the reward model remains fixed after DP training. We also introduce an annotation-free preference construction strategy that pairs physician responses with filtered non-expert generations to produce scalable preference data without clinician labeling. Experiments on medical dialogue benchmarks show that PrivMedChat at $\varepsilon=7$ achieves the highest ROUGE-L of 0.156 among all DP models, reduces clinical hallucinations to 1.4% and harmful advice to 0.4%, and obtains the highest overall score of 2.86 in a 3-model LLM-jury evaluation, while producing membership-inference signals that are near chance (AUC 0.510-0.555). We open-source our code at https://github.com/sudip-bhujel/privmedchat.

HFEPX Relevance Assessment

This paper has direct human-feedback and/or evaluation protocol signal and is likely useful for eval pipeline design.

Eval-Fit Score

65/100 • Medium

Useful as a secondary reference; validate protocol details against neighboring papers.

Human Feedback Signal

Detected

Evaluation Signal

Detected

HFEPX Fit

High-confidence candidate

If you are doing eval pipeline work, start here:

Human Eval Hub LLM-as-Judge Hub Pairwise Preference Hub Tool-Use Eval Hub

Human Data Lens

Uses human feedback: Yes
Feedback types: Pairwise Preference, Expert Verification
Rater population: Mixed
Unit of annotation: Unknown
Expertise required: Medicine, Coding
Extraction source: Persisted extraction

Evaluation Lens

Evaluation modes: Automatic Metrics
Agentic eval: None
Quality controls: Not reported
Confidence: 0.70
Flags: None

Protocol And Measurement Signals

Benchmarks / Datasets

No benchmark or dataset names were extracted from the available abstract.

Reported Metrics

rouge

Research Brief

Deterministic synthesis

Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content. HFEPX signals include Pairwise Preference, Expert Verification, Automatic Metrics with confidence 0.70. Updated from current HFEPX corpus.

Generated Mar 4, 2026, 4:38 AM · Grounded in abstract + metadata only

Key Takeaways

Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of…
We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue.

Researcher Actions

Compare its human-feedback setup against pairwise and rubric hubs.
Identify benchmark choices from full text before operationalizing conclusions.
Validate metric comparability (rouge).

Caveats

Generated from title, abstract, and extracted metadata only; full-paper implementation details are not parsed.
Extraction confidence is probabilistic and should be validated for critical decisions.

Recommended Queries

human-eval protocol design pairwise preference data quality inter-rater agreement adjudication

Research Summary

Contribution Summary

Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content.
We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue.
Experiments on medical dialogue benchmarks show that PrivMedChat at \varepsilon=7 achieves the highest ROUGE-L of 0.156 among all DP models, reduces clinical hallucinations to 1.4% and harmful advice to 0.4%, and obtains the highest overall…

Why It Matters For Eval

Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content.
Experiments on medical dialogue benchmarks show that PrivMedChat at \varepsilon=7 achieves the highest ROUGE-L of 0.156 among all DP models, reduces clinical hallucinations to 1.4% and harmful advice to 0.4%, and obtains the highest overall…

Researcher Checklist

Pass: Human feedback protocol is explicit

Detected: Pairwise Preference, Expert Verification
Pass: Evaluation mode is explicit

Detected: Automatic Metrics
Gap: Quality control reporting appears

No calibration/adjudication/IAA control explicitly detected.
Gap: Benchmark or dataset anchors are present

No benchmark/dataset anchor extracted from abstract.
Pass: Metric reporting is present

Detected: rouge

Related Papers

Papers are ranked by protocol overlap, extraction signal alignment, and semantic proximity.

Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare Protocol Overlap

Citations: 0 Relevance: 10.90 Shared tag: Pairwise PreferenceShared tag: Expert VerificationShared tag: Medicine
- Shared 3 HFEPX protocol tags
- Aligned human feedback protocol
Multi-Objective Alignment of Language Models for Personalized Psychotherapy Protocol Overlap

Citations: 0 Relevance: 10.90 Shared tag: Pairwise PreferenceShared tag: Expert VerificationShared tag: Medicine
- Shared 3 HFEPX protocol tags
- Aligned human feedback protocol
Multi-Agent Comedy Club: Investigating Community Discussion Effects on LLM Humor Generation Protocol Overlap

Citations: 0 Relevance: 8.20 Shared tag: Pairwise PreferenceShared tag: Expert Verification
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
A Scalable Framework for Evaluating Health Language Models Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Cold-Start Personalization via Training-Free Priors from Structured World Models Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Pairwise PreferenceShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Diffusion Model in Latent Space for Medical Image Segmentation Task Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
DistillNote: Toward a Functional Evaluation Framework of LLM-Generated Clinical Note Summaries Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
ExpGuard: LLM Content Moderation in Specialized Domains Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol
Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification Protocol Overlap

Citations: 0 Relevance: 6.80 Shared tag: Expert VerificationShared tag: Medicine
- Shared 2 HFEPX protocol tags
- Aligned human feedback protocol

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.

Post a Job Get a Quote