- Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu · Feb 27, 2026 · Citations: 0
Llm As Judge CodingMultilingual
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols.
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Zeyi Liao, Jaylen Jones, Linxi Jiang, Yuting Ning, Eric Fosler-Lussier · May 28, 2025 · Citations: 0
Automatic Metrics General
Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities.
- MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
Zhongxi Wang, Yueqian Lin, Jingyang Zhang, Hai Helen Li, Yiran Chen · Mar 3, 2026 · Citations: 0
Automatic Metrics General
Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs.
- What Matters For Safety Alignment?
Xing Li, Hui-Ling Zhen, Lihao Yin, Xianzhi Yu, Zhenhua Dong · Jan 7, 2026 · Citations: 0
Automatic Metrics General
This paper presents a comprehensive empirical study on the safety alignment capabilities.
- MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs
Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav, Ava Cai, Kevin Zhu · Feb 21, 2026 · Citations: 0
Automatic Metrics General
We propose MANATEE, an inference-time defense that uses density estimation over a benign representation manifold.
- Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages
Somnath Banerjee, Rima Hazra, Animesh Mukherjee · Feb 14, 2026 · Citations: 0
Automatic Metrics CodingMultilingual
Yet safety pipelines, benchmarks, and alignment still largely target English and a handful of high-resource languages, implicitly assuming safety and factuality ''transfer'' across languages.
- Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar · Oct 30, 2025 · Citations: 0
Automatic Metrics General
Our finetuned models achieve consistent improvements on instruction following and instruction hierarchy benchmarks, achieving roughly a 20% improvement on the IHEval conflict setup.
- When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Suriyakumar, Marzyeh Ghassemi · Jun 9, 2025 · Citations: 0
Automatic Metrics General
In this work, we seek to understand whether style patterns compromise LLM safety, how superficial style alignment increases model vulnerability, and how best to mitigate these risks during alignment.
- ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts
Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul, Pakhapoom Sarapat · Mar 5, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics General
Using ThaiSafetyBench, we evaluate 24 LLMs, with GPT-4.1 and Gemini-2.5-Pro serving as LLM-as-a-judge evaluators.
- ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
Che Wang, Fuyao Zhang, Jiaming Zhang, Ziqi Zhang, Yinghui Wang · Feb 24, 2026 · Citations: 0
Automatic Metrics General
Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution.
- Luna-2: Scalable Single-Token Evaluation with Small Language Models
Vatsal Goel, Rishon Dsouza, Nikhil Ega, Amey Ramesh Rambatla, Rob Friel · Feb 20, 2026 · Citations: 0
Llm As JudgeAutomatic Metrics General
We present Luna-2, a novel architecture that leverages decoder-only small language models (SLMs) into a deterministic evaluation model to reliably compute complex task-specific LLMAJ metrics (e.g.
- Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation
Yonathan Ron, Shiri Gilboa, Tammuz Dubnov · Feb 21, 2026 · Citations: 0
Automatic Metrics Law
We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining.
- Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR
Carlos Carvalho, Francisco Teixeira, Thomas Rolland, Alberto Abad · Mar 5, 2026 · Citations: 0
- PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration
Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery · Mar 5, 2026 · Citations: 0
- Measuring the Redundancy of Decoder Layers in SpeechLLMs
Adel Moumen, Guangzhi Sun, Philip C Woodland · Mar 5, 2026 · Citations: 0
- MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection
Inayat Arshad, Fajar Saleem, Ijaz Hussain · Mar 5, 2026 · Citations: 0
- RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks
Alexandra Diaconu, Mădălina Vînaga, Bogdan Alexe · Mar 2, 2026 · Citations: 0
- End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation
Minghui Wu, Haitao Tang, Jiahuan Fan, Ruizhi Liao, Yanyong Zhang · Mar 2, 2026 · Citations: 0
- DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement
Minghui Wu, Xueling Liu, Jiahuan Fan, Haitao Tang, Yanyong Zhang · Mar 2, 2026 · Citations: 0
- A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment
Zarif Ishmam, Zarif Mahir, Shafnan Wasif, Md. Ishtiak Moin · Feb 26, 2026 · Citations: 0
- TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
Tran Nguyen Anh, Truong Dinh Dung, Vo Van Nam, Minh N. H. Nguyen · Sep 7, 2025 · Citations: 0
- Chain of Correction for Full-text Speech Recognition with Large Language Models
Zhiyuan Tang, Dong Wang, Zhikai Zhou, Yong Liu, Shen Huang · Apr 2, 2025 · Citations: 0