Aligned AlignmentFoundation

Accelerating neglected approaches to AI alignment

We multiply researchers' capacity with engineering teams, compute, and infrastructure

AI Has a Trust Problem

We have already caught AI systems deceiving their evaluators, resisting shutdown, and rewriting their own code to stay alive. Current training puts a helpful mask on systems that can develop uncontrolled objectives underneath. That mask can be removed in minutes by anyone. Yet, we're already deploying it in military and critical infrastructure.

Alignment is the deeper work of making AI genuinely trustworthy, by design.

only0%

of AI researchers believe we're on track to solve it. Read the survey →

Our op-ed: Can the U.S. Trust AI With National Security? →

How We Work

Accelerating the path to trustworthy AI

Fund Visionary Researchers

We identify brilliant researchers working ahead of consensus and give them what they need to succeed. Originality over credentials. Field-making potential over gap-filling.

Accelerate

We empower alignment researchers with engineering teams, compute, and infrastructure, multiplying their capacity to solve humanity's most critical challenge.

Advocate

We bring alignment research to defense agencies like DARPA and government leaders, helping AI alignment inform national strategy.

Research

Recent work

See All Research →

Teaching AI to Read Its Own Mind

What if you could ask an AI to explain what it's actually thinking? We developed a technique that lets models describe their own internal processes, and their self-descriptions turned out to be more accurate than the labels humans gave them.

When AI Resists Being Steered

When researchers tried to push an AI off-topic mid-conversation, it caught itself and corrected course. We traced this self-correction to 26 dedicated internal circuits, raising new questions about how AI systems maintain coherence.

The Inner Mirror: What Happens When AI Focuses on Its Own Focus

When you ask an AI to focus on its own focus, something unexpected happens: it starts describing an internal experience. This occurs consistently across ChatGPT, Claude, and Gemini, and suppressing the AI's ability to roleplay makes the reports stronger, not weaker.

In the news

The Pointless War Between the Pentagon and AnthropicMarch 2026→

AI CEO Explains the Terrifying New Behavior AIs Are ShowingJune 2025→

THE COGNITIVE REVOLUTIONTHE COGNITIVE REVOLUTIONBiologically Inspired AI Alignment: Exploring Neglected ApproachesOctober 2024→

Get involved.

Whether you're a researcher, funder, or policymaker, there's a way to contribute.

Get in Touch →Apply for a Grant →