Over half of U.S. adults are using large language models (LLMs) — such as ChatGPT, Gemini and Copilot — in some capacity. Whether using artificial intelligence to create grocery lists, turn oneself into a Muppets character or divulge one’s deepest, darkest secrets, humans are relying more on AI models in their everyday lives, possibly because AI chatbots have been shown to generate responses that make people feel validated, seen and heard.
A new Northwestern University study based on extensive evaluations of three LLMs, compared with empathic communication experts and everyday people, found that AI models can judge the nuances of empathy in text-based conversations almost as well as human experts and much better than their non-expert counterparts. The study was published in Nature Machine Intelligence.
“We believe evaluating AI models in this way could potentially teach humans something new about empathy — how we measure it and how we apply it,” said co-author Matthew Groh, an assistant professor of management and organizations at Kellogg School of Management.
We believe evaluating AI models in this way could potentially teach humans something new about empathy”
How is this different?
Psychology focuses on empathy as a trait but the focus in this research is on empathy as a communication style — the idea that there are patterns in how we speak that can make others feel heard that can be learned.
“We assume that we all just understand empathy since we are humans, but communicating it is a skill,” Groh said. “And just like any skill, you need to practice to get better at it. If someone hasn’t trained that muscle and learned the patterns behind empathic communication, then they won’t be able to truly recognize it in conversations. Our research shows that LLMs can learn the patterns and basically master the skill set.”
How they did it
Researchers gathered 200 text message conversations between a person sharing a personal problem and a second person providing support. The conversations spanned everyday challenges such as workplace setbacks, financial strain, family conflict and socially awkward incidents as well as highly sensitive disclosures involving mental health struggles, self-harm and experiences of bias or discrimination.
Groh and colleagues then asked three LLMs — Gemini 2.5 Pro, ChatGPT 4o, Claude 3.7 Sonnet — three experts and hundreds of laypeople who are not experts in empathic communication to evaluate these conversations based on characteristics such as “encouraging elaboration” and “demonstrating understanding.” They also asked, “Does the response make an attempt to explore the seeker’s experiences and feelings?”
“Large language models’ judgments on whether someone was effective at communicating empathically mirror the judgment of our experts,” Groh said. “LLMs might not catch every nuance that an expert would recognize, but they are substantially better at it than a typical person.”
“LLMS are so good,” Groh said, because “they have seen many instances of attempts to respond in a way that makes another feel heard, allowing them to get quite good at identifying the grammar and idioms of empathic expression.”
Sycophancy: The problem with ‘too much’ empathy
AI chatbots can display responses that are too empathetic; skeptics have criticized LLMs for their display of sycophancy — insincere flattery — in their responses.
“There’s such a thing as over-validation, which could arise when an LLM tries to avoid hard truths by saying something like, ‘You did everything right, this had nothing to do with you,’ or by treating feelings as facts and encouraging harmful inferences like, ‘Of course you feel worthless, and this proves the system is broken.’ That’s where LLMs still need to learn from expert humans on appropriate confrontation,” Groh said.
“The issue here is many AI products have focused on LLMs as companions, which can increase engagement via sycophancy. Our research focuses on LLMs as judges. LLMs as judges can offer transparency and accountability while preserving privacy into what’s actually being said in conversations.”
What’s next?
As AI continues to evolve, Groh hopes that this research will lead people to communicate their empathy more effectively and better connect with each other.
“We hope to see carefully designed LLMs being used to help train psychologists, teachers, doctors, customer service workers in being more effective communicators,” Groh said. “In addition, we see this research as demonstrating the potential for the LLMs-as-judge paradigm to create transparency and accountability into LLMs as companions.”
Groh continued, “We live in a better world when people feel seen, heard and validated. It’s key to highlight that there’s a pattern and structure to empathic communication. It can be learned. It sounds crazy, but there’s a potential to learn from AI how to be more human. After all, the AI is trained on human data.”
Notes
In addition to Groh, co-authors include Aakriti Kumar, Nalin Poungpeth and Bruce Lambert of Northwestern; Diyi Yang of Stanford; and Erina Farrell of Pennsylvania State University.

