DeepMind’s latest: An AI for handling mathematical proofs

Just like AlphaZero, AlphaProof in most cases used two main components. The first was a huge neural net with a few billion parameters that learned to work in the Lean environment through trial and error. It was rewarded for each proven or disproven statement and penalized for each reasoning step it took, which was a way of incentivizing short, elegant proofs.

It was also trained to use a second component, which was a tree search algorithm. This explored all possible actions that could be taken to push the proof forward at each step. Because the number of possible actions in mathematics can be near infinite, the job of the neural net was to look at the available branches in the search tree and commit computational budget only to the most promising ones.

After a few weeks of training, the system could score well on most math competition benchmarks based on problems sourced from past high school-level competitions, but it still struggled with the most difficult of them. To tackle these, the team added a third component that hadn’t been in AlphaZero. Or anywhere else.

Spark of humanity

The third component, called Test-Time Reinforcement Learning (TTRL), roughly emulated the way mathematicians approach the most difficult problems. The learning part relied on the same combination of neural nets with search tree algorithms. The difference came in what it learned from. Instead of relying on a broad database of auto-formalized problems, AlphaProof working in the TTRL mode started its work by generating an entirely new training dataset based on the problem it was dealing with.

The process involved creating countless variations of the original statement, some simplified a little bit more, some more general, and some only loosely connected to it. The system then attempted to prove or disprove them. It was roughly what most humans do when they’re facing a particularly hard puzzle, the AI equivalent of saying, “I don’t get it, so let’s try an easier version of this first to get some practice.” This allowed AlphaProof to learn on the fly, and it worked amazingly well.

What's Hot

You’ll Want to Cancel Your Cloud Account and Buy This 1TB Drive

Netflix lands a deal with MLB to air select live baseball games

This Excellent LG OLED Is Deeply Discounted Before Black Friday

DeepMind’s latest: An AI for handling mathematical proofs

The Verge’s 2025 holiday gift guide: 100 of the best gifts we’ve found

Stargate is coming back to TV, thanks to Prime Video

Why January Ventures is funding underrepresented AI founders

You’ll Want to Cancel Your Cloud Account and Buy This 1TB Drive

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

How to Disable Some or All AI Features on your Samsung Galaxy Phone

PayPal’s blockchain partner accidentally minted $300 trillion in stablecoins

The best AirPods deals for October 2025

Latest Post

You’ll Want to Cancel Your Cloud Account and Buy This 1TB Drive

Netflix lands a deal with MLB to air select live baseball games

This Excellent LG OLED Is Deeply Discounted Before Black Friday

Subscribe to Updates

What's Hot

DeepMind’s latest: An AI for handling mathematical proofs

Spark of humanity

Related Posts

Subscribe to Updates