Why were dynamic languages cheaper for Claude Code?

Dynamic languages were cheaper because Claude Code appears to need fewer corrective steps and less formal scaffolding when coding in them. Languages like Ruby, Python, and JavaScript allow quicker drafts and tolerate incomplete intermediate states better. So you usually get fewer turns, fewer tool calls, and lower token bills. Ruby is the clearest example. Worth noting.

How much more expensive were statically typed languages in the benchmark?

Statically typed languages reportedly cost about 1.4x to 2.6x more than the cheapest dynamic-language runs. That's a meaningful gap. When teams run coding agents hundreds or thousands of times each week, small per-run differences quickly turn into real infrastructure and tooling costs.

Can type checkers improve Claude Code results in dynamic languages?

Yes, type checkers can improve reliability in dynamic languages without giving up all of their cost advantage. Tools such as TypeScript, mypy, and Sorbet catch common mistakes that AI agents make during edits. The benchmark summary suggests this hybrid setup may be the most practical option for many teams. Not quite a compromise in the bad sense.

Should teams switch languages based on the Claude Code Ruby Python JavaScript benchmark?

No, teams shouldn't rewrite their stack because of one benchmark alone. But they should let the results shape where Claude Code gets deployed first, especially for automation scripts, internal tools, and greenfield agent-heavy projects. That's where dynamic-language speed and price advantages may pay off fast. We'd start there.

Claude Code benchmark dynamic vs static languages explained

Q: What is the Claude Code benchmark dynamic vs static languages study?

It's a 600-run benchmark from Ruby committer Yusuke Endoh that compares Claude Code across 13 programming languages. The task asked the agent to implement a simplified Git. That matters. It gave Claude Code a real coding workflow instead of a trivia-style prompt, which makes the findings more relevant to everyday engineering work.

⚡ Quick Answer

The Claude Code benchmark dynamic vs static languages result points to a clear pattern: dynamic languages like Ruby, Python, and JavaScript finished faster and cheaper than statically typed alternatives in a 600-run test. Adding external type checkers narrowed the gap a bit, but dynamic stacks still kept much of their cost and speed edge.

The Claude Code benchmark dynamic vs static languages debate finally has hard numbers attached. And they're striking. Across 600 runs, Ruby committer Yusuke Endoh tested Claude Code in 13 languages by asking it to build a simplified Git, then tracked speed and cost for each run. The top-line result wasn't subtle. Ruby, Python, and JavaScript finished fastest and cheapest, a result that should push plenty of enterprise teams to question the old assumption that stricter typing always gives AI coding agents a cleaner path.

What does the Claude Code benchmark dynamic vs static languages test actually show?

The Claude Code benchmark dynamic vs static languages test points to a plain result: dynamic languages finished faster and cost less across a fairly large 600-run sample. That's worth watching. Endoh's setup carries real weight because he didn't fire off one toy prompt and call it research. He ran Claude Code through the same simplified Git implementation task in 13 languages. That gives the comparison more heft than most benchmark chatter online. Ruby, Python, and JavaScript reportedly grouped together at about $0.36 to $0.39 per run, which made them the cheapest options on raw cost. Not quite trivial. By comparison, statically typed languages often came in at 1.4x to 2.6x the cost, a spread that suggests Claude Code burns more turns, tokens, or correction cycles when strict type systems stay in the loop. We'd argue that matches what a lot of developers already sense from daily work: AI agents tend to move faster when they can sketch intent first and tidy up less formal structure later. Think of Ruby here as the concrete example. The benchmark doesn't prove dynamic languages always win in software engineering, but it does make clear a specific pattern for agent-assisted coding.

Related:🔗how Claude Code works

Why are Ruby, Python, and JavaScript the best programming language for Claude Code in this benchmark?

Ruby, Python, and JavaScript look like the best programming language for Claude Code in this benchmark because they cut friction during iterative code generation. Here's the thing. Claude Code seems to do its best work when it can read a task, draft code, patch files, and try again without constantly appeasing a compiler or dragging around verbose type annotations. That shifts the economics in a real way. A simplified Git implementation requires repeated file edits, command execution, and quick recovery after mistakes, and dynamic languages usually allow shorter programs plus more permissive in-between states. Ruby stands out as the named example because Endoh, a long-time Ruby committer, picked a language known for expressive syntax and concise standard-library workflows. Python and JavaScript share some of that same flexibility, just with different flavors. We'd put it simply: when an AI coding agent behaves like an eager but imperfect junior engineer, languages that tolerate partial correctness often let it converge faster. That's a bigger shift than it sounds.

Related:🔗Claude Code pricing

Do type checkers vs dynamic languages Claude Code results change the story?

Type checkers vs dynamic languages Claude Code results make the picture more interesting, but they don't reverse it. Worth noting. Endoh found that adding type checkers to dynamic languages improved outcomes enough to matter, which suggests teams don't have to choose between speed and safety in the stark way old language arguments often imply. That's a useful middle ground. Tools like TypeScript, Sorbet for Ruby, and mypy for Python can catch entire categories of agent mistakes while keeping much of the low-friction workflow that lets Claude Code move quickly. Still, the benchmark summary says those additions didn't wipe out the dynamic-language edge on cost. So the main savings probably come from simpler generation loops, not from removing type analysis altogether. We think that's the practical read for engineering leaders: reach for lightweight static analysis where it earns its keep, but don't assume a fully static toolchain gives AI agents the best throughput. TypeScript is the obvious example here. Simple enough.

Related:🔗debugging AI generated code

How should teams read the Claude Code 13 language benchmark results in real engineering work?

Teams should read the Claude Code 13 language benchmark results as a directional signal for AI-assisted development, not as a universal ranking of programming languages. That's the sane reading. The test used one agent, one benchmark style, and one task family, so a payments backend in Java or a systems tool in Rust could still win on durability, auditability, or runtime constraints. But benchmark design matters a bit less when the cost gap gets this wide. If Ruby, Python, and JavaScript really land near $0.36 to $0.39 per run while some static languages cost more than double, procurement and platform teams should pay attention, especially at scale. GitHub Copilot, Cursor, and Anthropic's own Claude Code all rely on iterative code-edit loops where token spend compounds fast. So the deepest lesson here isn't really about syntax preference. It's about operational efficiency in agentic workflows. We'd argue that's the part executives will care about first. If your team wants cheaper AI coding agents in dynamic languages, this benchmark offers one of the clearest public data points so far.

Key Statistics

Yusuke Endoh's benchmark ran Claude Code 600 times across 13 programming languages.That sample size is large enough to make the pattern more credible than one-off anecdotal tests, especially for tool-assisted coding comparisons.

Ruby, Python, and JavaScript reportedly landed around $0.36 to $0.39 per Claude Code run.These were the cheapest results in the benchmark, making them a strong reference point for teams watching agent usage costs.

Statically typed languages cost roughly 1.4x to 2.6x more than the lowest-cost dynamic languages.That spread matters for enterprise budgeting because AI coding tools multiply small unit costs across many developers and tasks.

According to the 2024 Stack Overflow Developer Survey, JavaScript was used by 62.3% of respondents and Python by 51%.The benchmark's winning languages already have broad adoption, which lowers the barrier for teams that want to test Claude Code in familiar stacks.

Frequently Asked Questions

✦

Key Takeaways

✓Ruby, Python, and JavaScript had the cheapest Claude Code runs in the benchmark.
✓Statically typed languages often cost 1.4x to 2.6x more per task.
✓Type checkers improved dynamic-language reliability without removing their price edge.
✓The benchmark used a simplified Git implementation across 13 programming languages.
✓For agentic coding, fewer tokens and simpler edit loops appear to make the difference.

← Back to Blogs More in AI Coding Tools →