PartnerinAI

Erdos problem solved with LLMs? What the experiment shows

A closer look at the Hacker News Erdos problem LLM experiment, and whether LLMs can solve advanced math research problems.

📅April 27, 20266 min read📝1,268 words

⚡ Quick Answer

The Erdos problem LLM experiment is interesting because it tests whether language models can assist with hard mathematical reasoning, not just explain known results. But one experiment does not prove LLMs can independently solve advanced math research problems; it shows they may be useful collaborators in tightly guided settings.

If an Erdős problem fell to an LLM, we'd be looking at a history-book headline. That's why a Show HN post about yet another run at an Erdős-style problem with language models drew eyes fast. But math doesn't care about confidence. It asks for proof, structure, and no hand-waving at all. That's the real test.

What the Hacker News Erdos problem LLM experiment is really testing

What the Hacker News Erdos problem LLM experiment is really testing

The Hacker News Erdős problem LLM experiment is really asking whether large language models can add anything to mathematical discovery when the pressure rises. Big claim. In practice, the question is tighter: can a model produce conjectures, proof sketches, counterexamples, or search directions that still look useful after an expert tears through them. That's a consequential bar. Paul Erdős became famous not only for sheer output, but for problems that reward deep combinatorial insight, so an Erdős-style challenge makes a smart stress test for LLMs. Worth noting. The distinction matters because language models often do well on textbook-flavored math, then stumble when they have to build a fresh proof. Research from OpenAI, Google DeepMind, and academic groups like Epoch AI repeatedly suggests that benchmark wins in math don't transfer cleanly to open-ended research work. So the Show HN post matters less as a victory lap and more as a probe into where current systems bend, and where they simply snap.

Can LLMs solve advanced math research problems on their own

Can LLMs solve advanced math research problems on their own

No, LLMs probably can't solve advanced math research problems on their own in a dependable way right now. Not quite. They can produce arguments that look elegant on the page, but advanced mathematics punishes even tiny logical holes, and models still create those holes far too often. That's the whole issue. DeepMind's AlphaGeometry and Google DeepMind's formal reasoning work make clear that performance improves when systems combine symbolic search with learned components, instead of leaning on language generation by itself. That's a bigger shift than it sounds. That should cool the hype around pure-chatbot discovery claims. A mathematician reading an LLM proof sketch has to check every step, test edge cases, and often rebuild the argument from scratch anyway. So the model may save time. Or just move the work around. We'd argue the current generation looks more like an imaginative but shaky research assistant than an autonomous theorem prover.

Why LLMs on Erdos problems expose the gap between fluency and proof

LLMs on Erdős problems expose the gap between fluency and proof, because mathematical writing can sound persuasive long before it becomes correct. Simple enough. Language models are tuned to continue patterns, not to keep an airtight deductive state intact across long chains of reasoning. And that design choice shows up fast in combinatorics and number theory. Terence Tao has written and spoken about using language models for exploratory assistance while warning that verification still matters, which lines up with what many working mathematicians think. Worth noting. Here's the thing: a plausible lemma isn't a proved lemma. An AI experiment built around an Erdős problem may offer a fresh angle or shrink literature-review work, but the standard at the end stays exactly the same. Math is brutally democratic. Either the argument holds, or it doesn't.

How language models for mathematical discovery may still become useful

Language models for mathematical discovery may still turn out useful as idea generators, literature guides, and formalization aides. That's the lane with real traction. It's a smaller claim, yes, but also a more believable one. Projects tied to Lean, Isabelle, and other proof assistants already hint at a future where models suggest steps while formal systems check validity. We'd argue that's far more credible than raw chat output standing alone. Google DeepMind's AlphaProof work and ongoing theorem-proving efforts point the same way: hybrid systems tend to outperform freeform text-only methods on hard reasoning tasks. So if you're asking whether the Show HN experiment matters, we'd say yes, but not because it proves full autonomy. It matters because it points to a collaborative workflow where LLMs widen the search space and humans or formal tools close it. That's worth watching.

Key Statistics

The MATH benchmark paper reported that earlier large language models often scored below expert-human levels on competition-style mathematics, even when they sounded confident.That gap helps explain why open-ended research math remains a much steeper challenge than polished benchmark prompts.
Google DeepMind reported in 2024 that AlphaGeometry solved a majority of International Mathematical Olympiad geometry problems from a curated benchmark set.The result matters because it came from a hybrid reasoning system, not from freeform language generation alone.
OpenAI and other labs have shown strong gains on GSM8K and similar math benchmarks, where top systems now exceed 90% under some prompting setups.Those numbers are impressive, but they mostly reflect structured problem solving rather than original mathematical discovery.
Formal proof ecosystems such as Lean’s mathlib have grown to tens of thousands of theorems contributed by a global community.That scale points to a realistic path for AI in mathematics: models that work alongside formal systems and human experts, not apart from them.

Frequently Asked Questions

Key Takeaways

  • The Hacker News Erdős problem LLM experiment offers more signal than spectacle.
  • LLMs on Erdős problems can assist reasoning, but they still need heavy verification.
  • Advanced math research exposes the limits of fluent text generation very quickly.
  • Mathematical discovery needs proof, not persuasion, and models often blur the two.
  • The real value likely sits in collaboration workflows, not solo AI breakthroughs.