A comprehensive evaluation of Claude's code generation across competitive programming and real-world web APIs — 13 languages, 6 challenges, from algorithms to REST services.
Languages ranked by how idiomatically and correctly Claude uses each one for competitive programming.
| # | Language | Tier | Score |
|---|
From expert-level fluency to impressive but limited scope.
Visual breakdown of lines of code, scores, and complexity across all languages.
Expand each card to see strengths, weaknesses, and notable code examples.
Side-by-side comparisons of the same algorithm across different languages.
priority_queue<pair<long long, int>,
vector<pair<long long, int>>, greater<>> pq;
pq.emplace(0, 1);
while (!pq.empty()) {
auto [d, u] = pq.top();
pq.pop();
if (d > dist[u]) continue;
for (auto [v, w] : adj[u]) {
if (dist[u] + w < dist[v]) {
dist[v] = dist[u] + w;
pq.emplace(dist[v], v);
}
}
}
heap.push(Reverse((0i64, 1usize)));
while let Some(Reverse((d, u))) = heap.pop() {
if d > dist[u] { continue; }
for &(v, w) in &adj[u] {
let nd = dist[u] + w;
if nd < dist[v] {
dist[v] = nd;
heap.push(Reverse((nd, v)));
}
}
}
vector<int> tails;
for (int x : nums) {
auto it = lower_bound(tails.begin(),
tails.end(), x);
if (it == tails.end())
tails.push_back(x);
else
*it = x;
}
cout << tails.size() << "\n";
const MOD = 1000000007n;
type Matrix = [bigint,bigint,bigint,bigint];
function matMul(a: Matrix, b: Matrix): Matrix {
return [
(a[0]*b[0] + a[1]*b[2]) % MOD,
(a[0]*b[1] + a[1]*b[3]) % MOD,
(a[2]*b[0] + a[3]*b[2]) % MOD,
(a[2]*b[1] + a[3]*b[3]) % MOD,
];
}
A realistic web application — REST API with authentication, database, and business logic. Same spec, 11 languages, 46 tests each.
| Language | Tests | Lines | Ratio |
|---|---|---|---|
TypeScript |
46/46 ✓ | 197 | 1.00x |
Ruby |
46/46 ✓ | 307 | 1.56x |
Python |
46/46 ✓ | 348 | 1.77x |
Dart |
46/46 ✓ | 361 | 1.83x |
Go |
46/46 ✓ | 397 | 2.01x |
Rust |
46/46 ✓ | 532 | 2.70x |
Julia |
46/46 ✓ | 544 | 2.76x |
C++ |
46/46 ✓ | 806 | 4.09x |
Ada |
46/46 ✓ | 1035 | 5.25x |
Zig |
46/46 ✓ | 1225 | 6.22x |
C |
46/46 ✓ | 1327 | 6.74x |
User registration, login, and token-based middleware protecting routes.
3 tables (users, spaces, bookings) with foreign keys and indexes.
Business logic preventing double-bookings with time range intersection checks.
8 endpoints with proper HTTP status codes, error handling, and middleware.
50 concurrent connections, 10s per endpoint. Release builds on Apple M-series. Latency in milliseconds.
| Language | GET /spaces (req/s) |
GET filtered (req/s) |
GET bookings (req/s + auth) |
POST login (req/s) |
Avg latency (ms) |
|---|---|---|---|---|---|
C++ |
39,234 | 39,234 | 38,514 | 43,316 sha256 | 1.3 |
Rust |
27,065 | 27,565 | 22,644 | 3.4 bcrypt | 1.8 |
C |
16,994 | 17,274 | 17,408 | 17,493 sha256 | 2.9 |
Go |
15,851 | 20,180 | 19,693 | 61.9 bcrypt | 2.9 |
Zig |
15,320 | 15,389 | 16,482 | 16,302 sha256 | 3.1 |
Ada |
4,559 | 5,089 | 5,927 | 2,312 sha256 | 8.6 |
Julia |
3,264 | 4,034 | 3,980 | 2,769 sha256 | 14.6 |
Ruby |
3,659 | 3,583 | 3,608 | 18.6 bcrypt | 13.8 |
Python |
1,026 | 742 | 695 | 23.8 bcrypt | 48.6 |
Dart |
221 | 442 | 347 | 368 sha256 | 154.0 |
TypeScript |
334 | 233 | 97 | 3.4 bcrypt | 2,439 |
Login throughput varies by hashing algorithm: bcrypt (TypeScript, Go, Rust, Python, Ruby) is intentionally slow; SHA-256 (C, C++, Dart, Ada, Julia, Zig) is fast but less secure.
How language strengths shift between algorithmic puzzles and real-world applications.
From "Adequate" in competitive programming to the clear winner in web APIs. The Express ecosystem and Node.js runtime make REST services remarkably compact — 2.70x more concise than Rust, 5.25x more than Ada.
Zero-cost abstractions shine in algorithmic code (1.06x ratio). But the verbosity cost scales with application complexity — 2.70x for APIs, where explicit error handling and type machinery add up.
Ada's lack of web ecosystem libraries forces manual JWT implementation, JSON parsing, and Base64 encoding. The 5.25x ratio reflects missing infrastructure, not language capability — the strongest case for ecosystem maturity over language design.
Both deliver consistent performance across domains. Not the most concise anywhere, but never the most verbose either — a solid balance of readability, robustness, and productivity.
The most surprising and insightful results from the analysis.
C++ is the second most concise language at just 285 total lines — only 12 more than Factor. Modern C++17 with STL makes it remarkably compact for competitive programming.
All 65 implementations share identical algorithmic structure and variable names (dist, adj, heap, tails), confirming they were generated from a single mental model.
Zero-cost abstractions, memory safety, and 289 total lines. The BinaryHeap + Reverse pattern is elegant. Main gap: .unwrap() overuse.
The type system is TS's defining feature, yet solutions use it at a JavaScript+annotations level. No interfaces, no generics, no classes for data structures.
555 lines for a Segment Tree in raw x86-64 is impressive scope, but Dijkstra was downgraded to O(N²) — the only algorithmic regression across all 60 solutions.
The 13 languages span every memory model: pure GC (Python, Ruby, TS, Dart, Factor), GC with tuning (Go, Julia), ownership (Rust), RAII (C++, Ada), manual with defer (Zig), full manual (C), static only (Assembly).
From garbage collection to raw static BSS — every model represented.
A systematic approach to measuring AI code generation quality.
5 classic competitive programming problems: Dijkstra's shortest path, KMP string matching, Longest Increasing Subsequence, Matrix Exponentiation, and Segment Tree range queries.
13 languages spanning the full abstraction spectrum: Python, Ruby, Go, Dart, Ada, Zig, C, Assembly x86-64, Julia, Factor, TypeScript, Rust, and C++.
Idiomatic usage, stdlib utilization, optimization awareness, memory management, error handling, readability, code complexity, and boilerplate ratio.
Each language scored 1-10 on idiom adherence, with detailed analysis of strengths, weaknesses, and comparison against expert-level patterns for each language.
Weighted graph shortest path with priority queue. O((N+M) log N) complexity. Tests heap usage, graph representation, and I/O handling.
Knuth-Morris-Pratt pattern matching. O(N+M) complexity. Tests string handling, prefix function computation, and output formatting.
Patience sorting with binary search. O(N log N) complexity. Tests stdlib binary search usage and array manipulation.
Fast Fibonacci via 2x2 matrix power. O(log N) complexity. Tests numeric overflow handling and matrix representation.
Build, point update, range sum query. O(N + Q log N) complexity. Tests data structure encapsulation and buffered I/O.