Agent Cursor has surpassed human performance in solving one of the First Proof challenge tasks.
The First Proof challenge is a series of ten difficult mathematical problems prepared by 11 renowned experts, including Fields Medal laureate Martin Hayer.
These problems cover topics such as algebraic combinatorics, spectral graph theory, topology, stochastic analysis, and other fields. They are designed to replicate real-world problems faced by leading academic institutions worldwide.
Work on these tasks began just a month ago, and such assignments are not yet publicly published to prevent their use for training models.
Today, the CEO of Cursor announced that their AI agent, created specifically for coding (yes, for this purpose), solved one of these problems better than a human and found a more efficient solution.
Mathematicians confirmed that the system’s approach indeed differs from traditional methods and enables new results—improving the proof to an even higher constant.
Interestingly, the same system was used for this task—the same technology that previously helped Cursor create a browser from scratch. It operated autonomously for four days without any external intervention or hints.
Under the hood, it’s not just one agent—there’s an entire team of dozens of models. Each dynamically plans its actions and distributes work across different subtasks.
Created with n8n:
https://cutt.ly/n8n
Created with syllaby:
https://cutt.ly/syllaby
