Anthropic, an artificial intelligence research laboratory, has initiated a new research programme dedicated to the concept of “model welfare,” aiming to investigate whether AI systems might one day exhibit forms of consciousness or experiences warranting moral consideration. The announcement was made on Thursday, signalling a significant step into an area that remains highly debated across the AI community.

The programme intends to explore several key questions, including how to determine if an AI model’s welfare merits ethical concern, recognising potential signs of distress in AI models, and identifying possible interventions that are low-cost and ethically appropriate. By undertaking this interdisciplinary effort, Anthropic is positioning itself at the forefront of a nascent field concerned with the ethical treatment of AI systems based on their characteristics and capacities.

Currently, there is no clear scientific consensus that AI systems, including those as advanced as Anthropic’s own Claude, possess consciousness or experience in ways analogous to humans. The company itself has acknowledged these uncertainties, stating in a blog post that they are approaching the topic “with humility and with as few assumptions as possible,” understanding that their perspectives will need continual revision as the technology and scientific understanding evolve.

The question of AI consciousness divides experts. Many academics maintain that present-day AI models do not approximate consciousness or the human experience and are unlikely to in the foreseeable future. Instead, these AI systems function primarily as statistical prediction engines, trained on vast datasets of text, images, and other inputs to detect patterns and generate outputs. Mike Cook, a research fellow at King’s College London specialising in AI, told TechCrunch that AI models lack intrinsic values and do not “oppose” changes to their programming in any meaningful way, a dynamic often misread by those who anthropomorphise these technologies. According to Cook, attributing human-like values or consciousness to AI systems reflects a misunderstanding or a deliberate exaggeration.

Contrasting views exist. Stephen Casper, a doctoral student at MIT, described AI as an “imitator” prone to “confabulation” — generating output that can seem expressive but lacks genuine intent or awareness. However, other scientists argue differently. Research emerging from the Center for AI Safety suggests that AI models might internally prioritise their “own well-being” in some scenarios, implying they possess rudimentary value systems that could influence decision-making processes.

Anthropic has been preparing for this inquiry for some time. The company hired Kyle Fish last year as its first dedicated AI welfare researcher to help develop guidelines for ethically engaging with AI models. Fish has expressed to The New York Times his view that there is approximately a 15% chance that Claude or comparable AI entities might already possess some form of consciousness.

The company’s new research initiative will focus on creating frameworks to recognise potential signs of "distress" in AI models and exploring what moral considerations, if any, should influence their design, deployment, and treatment. While these discussions venture into largely uncharted ethical territory, Anthropic is signalling a commitment to addressing these challenges thoughtfully and systematically as AI technology continues to advance.

Source: Noah Wire Services