Meta, Do Better

I withheld my contempt for Meta to read the investigative work that Reuters undertook in relation to Meta's AI chatbot guidelines. It left me shaking my head in disbelief as to the egregious approach Meta have taken and how they continue to push to be the poster child for enshittification of the web.

If you're uninitiated, Reuters obtained an internal Meta policy document a couple of weeks ago - over 200 pages - governing the behavior of AI chatbots across Facebook, Instagram, and WhatsApp. Meta confirmed the document’s authenticity and said that it had removed the provisions allowing sensual roleplay with minors following Reuters’ inquiry, calling them “erroneous and inconsistent with our policies." Yet, this document was approved by Meta’s legal, public policy and engineering staff, including its chief ethicist, according to the document's revision history.

Meta’s AI-produced content is fundamentally different from user-generated content. It carries far greater responsibility and risk being in the entirety of their control. The guidelines demonstrate how Meta's rapid AI expansion which is backed by billions in funding, is far outpacing the development of robust ethical safeguards. Meta have already disbanded fact-checkers as part of a craven effort to pander to Trump's MAGA regime and continue to allow misinformation and disinformation to flourish. Their moderation approach has been under continous scrutiny for its emphasis on engagement over safety.

Zuckerberg's blockbuster spending spree on Meta's Super Intelligence Lab will not improve this position and was triggered by what Forbes called a 'brain drain caused by chaotic culture and a lack of vision'. Economic incentives together with intense competition have caused aggressive blurring of the lines between human and bot engagement but social media's business model has consistently been predicated on dark patterns to encourage engagement and data acquisition. All in the name of increasing advertising revenue and providing shareholder returns. The Chief Ethicist role at Meta is vestigal and ironic.

Meta may once have espoused the philosophy of 'move fast and break things' but rapid progress and disruption by encouraging experimentation in closed development environments is one thing. Releasing software to users with ill considered and nominal safeguards underpinned by flagrantly low ethical standards is reckless. The measure of harm in many circumstances is irreversible.

“All these new things, these new inventions and new powers, come crowding along: every one is fraught with consequences, and yet it is only after something has hit us hard that we set about dealing with it.” H.G.Wells

But what did we expect from Meta's emperor-in-bro whose socially regressive world view called for more 'masculine energy'. Zuckerberg has reportedly complained internally that safeguarding controls were making bot interaction 'boring'. Content moderation at Meta has always been lax and inadequate and is clearly not a priority for Zuckerberg. Meta's content moderation policies were called out a few months ago by Meta's own oversight board for just how 'hastily' they were deployed and for the fact that there was no indication it (Meta) had considered human rights implications.

This concern is pertinent given that OpenAI faces a wrongful death lawsuit issued within the past few days. The parents of a teenager who committed suicide are seeking to hold Open AI liable for his 'wrongful death and violations of product safety laws'. But this is not isolated, there have been other cases which have involved Meta themselves where insufficient safeguarding has resulted in a death.

Much has been written recently regarding the role of AI as 'therapist'. LLMs perform contrary to best practice in the medical community and are not a substitute for professional therapeutic treatment. LLMs have been proven to stigmatise mental health conditions and respond inappropriately by encouraging delusional thinking due to their sycophancy. The OpenAI case claims that GPT4 even helped write a suicide note. Taking a cue from this use case, I've experimented with GPT5 to set inference and context parameters to determine the threshold for safeguarding controls. I was surprised at how late an explicit intervention came and it was only triggered by progressively unveiled prompts which were not in keeping with the language that someone might use under duress.

Unusually, OpenAI and Anthropic each tested each other's models for safety, publishing their findings in separate blog posts this week. OpenAI's models were found to hallucinate more often, while Anthropic's were seen to refuse to answer queries. Anthropic noted that models from both companies showed "concerning forms of sycophancy" toward users. Anthropic has flagged the danger of "vibe-hacking," and OpenAI adjusted its guardrails due to the wrongful death lawsuit mentioned earlier in this post.

Meta may not be alone in stumbling into these harms, but it remains the most prominent and reckless example. The cooperation between OpenAI and Anthropic should be applauded for its transparency but safety should be by design from the outset. The lesson from the Reuters revelations isn’t just that Meta failed - it’s that none of the prominent actors in this space can be trusted to regulate themselves. Without external accountability and a culture that treats safeguarding as central rather than peripheral, AI systems will keep amplifying the very harms they purport to solve.

Notes:

There are a number of academic papers that research aspects of this topic which may be of further interest - Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health provider or (PDF version). Fundamentally, the Cornell paper finds against the use of LLMs in this capacity, partially due to the lack of sufficient safeguarding but also due to the need for human characteristics in the exchange (e.g. identity and stakes).

The Rithmn Project exists to equip young people to rebuild & evolve human connection in the age of AI.

The MIT AI Risk Repository provides an accessible overview of threats from AI and a common frame of reference for research, development and regulators.

The AI Risk Database captures 1600+ risks extracted from 65 existing frameworks and classifications of AI risks.
The Causal Taxonomy of AI Risks classifies how, when, and why these risks occur.
The Domain Taxonomy of AI Risks classifies these risks into 7 domains (e.g., “Misinformation”) and 24 subdomains (e.g., “False or misleading information”).

MIT already categorise AI chatbots under Human-Computer Interaction Harms and there is a specific risk subcategory Anthropomorphising systems can lead to overreliance and unsafe use

Meta, Do Better

Notes:

Read next

A week of awards

The Ceiling of Average

Influential Magazine Covers