β‘ Quick Answer
The Claude Code benchmark dynamic vs static languages result points to a clear pattern: dynamic languages like Ruby, Python, and JavaScript finished faster and cheaper than statically typed alternatives in a 600-run test. Adding external type checkers narrowed the gap a bit, but dynamic stacks still kept much of their cost and speed edge.
The Claude Code benchmark dynamic vs static languages debate finally has hard numbers attached. And they're striking. Across 600 runs, Ruby committer Yusuke Endoh tested Claude Code in 13 languages by asking it to build a simplified Git, then tracked speed and cost for each run. The top-line result wasn't subtle. Ruby, Python, and JavaScript finished fastest and cheapest, a result that should push plenty of enterprise teams to question the old assumption that stricter typing always gives AI coding agents a cleaner path.
What does the Claude Code benchmark dynamic vs static languages test actually show?
The Claude Code benchmark dynamic vs static languages test points to a plain result: dynamic languages finished faster and cost less across a fairly large 600-run sample. That's worth watching. Endoh's setup carries real weight because he didn't fire off one toy prompt and call it research. He ran Claude Code through the same simplified Git implementation task in 13 languages. That gives the comparison more heft than most benchmark chatter online. Ruby, Python, and JavaScript reportedly grouped together at about $0.36 to $0.39 per run, which made them the cheapest options on raw cost. Not quite trivial. By comparison, statically typed languages often came in at 1.4x to 2.6x the cost, a spread that suggests Claude Code burns more turns, tokens, or correction cycles when strict type systems stay in the loop. We'd argue that matches what a lot of developers already sense from daily work: AI agents tend to move faster when they can sketch intent first and tidy up less formal structure later. Think of Ruby here as the concrete example. The benchmark doesn't prove dynamic languages always win in software engineering, but it does make clear a specific pattern for agent-assisted coding.
Why are Ruby, Python, and JavaScript the best programming language for Claude Code in this benchmark?
Ruby, Python, and JavaScript look like the best programming language for Claude Code in this benchmark because they cut friction during iterative code generation. Here's the thing. Claude Code seems to do its best work when it can read a task, draft code, patch files, and try again without constantly appeasing a compiler or dragging around verbose type annotations. That shifts the economics in a real way. A simplified Git implementation requires repeated file edits, command execution, and quick recovery after mistakes, and dynamic languages usually allow shorter programs plus more permissive in-between states. Ruby stands out as the named example because Endoh, a long-time Ruby committer, picked a language known for expressive syntax and concise standard-library workflows. Python and JavaScript share some of that same flexibility, just with different flavors. We'd put it simply: when an AI coding agent behaves like an eager but imperfect junior engineer, languages that tolerate partial correctness often let it converge faster. That's a bigger shift than it sounds.
Do type checkers vs dynamic languages Claude Code results change the story?
Type checkers vs dynamic languages Claude Code results make the picture more interesting, but they don't reverse it. Worth noting. Endoh found that adding type checkers to dynamic languages improved outcomes enough to matter, which suggests teams don't have to choose between speed and safety in the stark way old language arguments often imply. That's a useful middle ground. Tools like TypeScript, Sorbet for Ruby, and mypy for Python can catch entire categories of agent mistakes while keeping much of the low-friction workflow that lets Claude Code move quickly. Still, the benchmark summary says those additions didn't wipe out the dynamic-language edge on cost. So the main savings probably come from simpler generation loops, not from removing type analysis altogether. We think that's the practical read for engineering leaders: reach for lightweight static analysis where it earns its keep, but don't assume a fully static toolchain gives AI agents the best throughput. TypeScript is the obvious example here. Simple enough.
How should teams read the Claude Code 13 language benchmark results in real engineering work?
Teams should read the Claude Code 13 language benchmark results as a directional signal for AI-assisted development, not as a universal ranking of programming languages. That's the sane reading. The test used one agent, one benchmark style, and one task family, so a payments backend in Java or a systems tool in Rust could still win on durability, auditability, or runtime constraints. But benchmark design matters a bit less when the cost gap gets this wide. If Ruby, Python, and JavaScript really land near $0.36 to $0.39 per run while some static languages cost more than double, procurement and platform teams should pay attention, especially at scale. GitHub Copilot, Cursor, and Anthropic's own Claude Code all rely on iterative code-edit loops where token spend compounds fast. So the deepest lesson here isn't really about syntax preference. It's about operational efficiency in agentic workflows. We'd argue that's the part executives will care about first. If your team wants cheaper AI coding agents in dynamic languages, this benchmark offers one of the clearest public data points so far.
Key Statistics
Frequently Asked Questions
Key Takeaways
- βRuby, Python, and JavaScript had the cheapest Claude Code runs in the benchmark.
- βStatically typed languages often cost 1.4x to 2.6x more per task.
- βType checkers improved dynamic-language reliability without removing their price edge.
- βThe benchmark used a simplified Git implementation across 13 programming languages.
- βFor agentic coding, fewer tokens and simpler edit loops appear to make the difference.




