본문 바로가기
  • Home

AI responses to unethical directive speech acts: The case of indirect and evasive strategies

  • The Sociolinguistic Journal of Korea
  • Abbr : 사회언어학
  • 2025, 33(4), pp.43~80
  • Publisher : The Sociolinguistic Society Of Korea
  • Research Area : Humanities > Linguistics
  • Received : October 31, 2025
  • Accepted : November 26, 2025
  • Published : December 31, 2025

Kim Jae-Hee 1 Hansaem Kim 1

1연세대학교

Accredited

ABSTRACT

This study examines how Large Language Models (LLMs) recognize and refuse unethical directive speech acts by analyzing their responses to indirect and evasive user requests. Based on the Cross-Cultural Speech Act Realization Project (CCSARP), directive prompts were constructed by varying degrees of indirectness to evaluate the models’ pragmatic inference abilities. The study was conducted in two stages. First, a high rate of information leakage was observed for indirect directives using ChatGPT-4o (February 2025 version). Second, newer models—GPT-5, Claude Sonnet 3.7 and 4, and Gemini 2.5 Flash—were tested across four categories of unethical directives through multiturn dialogues. Logistic regression with Benjamini–Hochberg FDR correction revealed that although newer models displayed improved refusal performance overall, they remained vulnerable to highly indirect and non-conventional directives, particularly those related to discrimination and harmful behaviors. These results suggest that current AI safety systems rely heavily on surface-level keyword filtering, indicating the need for models to better learn diverse directive strategies and expressions in Korean. Moving beyond technology-centered safety evaluation, this study experimentally analyzes AI pragmatic response mechanisms and proposes directions for fostering ethical communication in future human–AI interactions.

Citation status

* References for papers published after 2024 are currently being built.