⚡ Quick Answer
The Erdos problem LLM experiment is interesting because it tests whether language models can assist with hard mathematical reasoning, not just explain known results. But one experiment does not prove LLMs can independently solve advanced math research problems; it shows they may be useful collaborators in tightly guided settings.
If an Erdős problem fell to an LLM, we'd be looking at a history-book headline. That's why a Show HN post about yet another run at an Erdős-style problem with language models drew eyes fast. But math doesn't care about confidence. It asks for proof, structure, and no hand-waving at all. That's the real test.
What the Hacker News Erdos problem LLM experiment is really testing
The Hacker News Erdős problem LLM experiment is really asking whether large language models can add anything to mathematical discovery when the pressure rises. Big claim. In practice, the question is tighter: can a model produce conjectures, proof sketches, counterexamples, or search directions that still look useful after an expert tears through them. That's a consequential bar. Paul Erdős became famous not only for sheer output, but for problems that reward deep combinatorial insight, so an Erdős-style challenge makes a smart stress test for LLMs. Worth noting. The distinction matters because language models often do well on textbook-flavored math, then stumble when they have to build a fresh proof. Research from OpenAI, Google DeepMind, and academic groups like Epoch AI repeatedly suggests that benchmark wins in math don't transfer cleanly to open-ended research work. So the Show HN post matters less as a victory lap and more as a probe into where current systems bend, and where they simply snap.
Can LLMs solve advanced math research problems on their own
No, LLMs probably can't solve advanced math research problems on their own in a dependable way right now. Not quite. They can produce arguments that look elegant on the page, but advanced mathematics punishes even tiny logical holes, and models still create those holes far too often. That's the whole issue. DeepMind's AlphaGeometry and Google DeepMind's formal reasoning work make clear that performance improves when systems combine symbolic search with learned components, instead of leaning on language generation by itself. That's a bigger shift than it sounds. That should cool the hype around pure-chatbot discovery claims. A mathematician reading an LLM proof sketch has to check every step, test edge cases, and often rebuild the argument from scratch anyway. So the model may save time. Or just move the work around. We'd argue the current generation looks more like an imaginative but shaky research assistant than an autonomous theorem prover.
Why LLMs on Erdos problems expose the gap between fluency and proof
LLMs on Erdős problems expose the gap between fluency and proof, because mathematical writing can sound persuasive long before it becomes correct. Simple enough. Language models are tuned to continue patterns, not to keep an airtight deductive state intact across long chains of reasoning. And that design choice shows up fast in combinatorics and number theory. Terence Tao has written and spoken about using language models for exploratory assistance while warning that verification still matters, which lines up with what many working mathematicians think. Worth noting. Here's the thing: a plausible lemma isn't a proved lemma. An AI experiment built around an Erdős problem may offer a fresh angle or shrink literature-review work, but the standard at the end stays exactly the same. Math is brutally democratic. Either the argument holds, or it doesn't.
How language models for mathematical discovery may still become useful
Language models for mathematical discovery may still turn out useful as idea generators, literature guides, and formalization aides. That's the lane with real traction. It's a smaller claim, yes, but also a more believable one. Projects tied to Lean, Isabelle, and other proof assistants already hint at a future where models suggest steps while formal systems check validity. We'd argue that's far more credible than raw chat output standing alone. Google DeepMind's AlphaProof work and ongoing theorem-proving efforts point the same way: hybrid systems tend to outperform freeform text-only methods on hard reasoning tasks. So if you're asking whether the Show HN experiment matters, we'd say yes, but not because it proves full autonomy. It matters because it points to a collaborative workflow where LLMs widen the search space and humans or formal tools close it. That's worth watching.
Key Statistics
Frequently Asked Questions
Key Takeaways
- ✓The Hacker News Erdős problem LLM experiment offers more signal than spectacle.
- ✓LLMs on Erdős problems can assist reasoning, but they still need heavy verification.
- ✓Advanced math research exposes the limits of fluent text generation very quickly.
- ✓Mathematical discovery needs proof, not persuasion, and models often blur the two.
- ✓The real value likely sits in collaboration workflows, not solo AI breakthroughs.


