Research Report 2026

How Well Does AI
Write Code?

A comprehensive evaluation of Claude's code generation across competitive programming and real-world web APIs — 13 languages, 6 challenges, from algorithms to REST services.

13
Languages
6
Challenges
70
Solutions
9.9K
Lines of Code

Competency Scores

Languages ranked by how idiomatically and correctly Claude uses each one for competitive programming.

# Language Tier Score Lines Ratio

Language Tier List

From expert-level fluency to impressive but limited scope.

SExpert
Python
C++
AStrong
Rust
Julia
C
BCompetent
Go
Dart
Ada
Factor
CAdequate
TypeScript
Zig
Ruby
DLimited
Assembly x86-64

Comparative Analysis

Visual breakdown of lines of code, scores, and complexity across all languages.

Lines of Code by Problem

Score vs Lines of Code

Boilerplate Percentage

Competency Radar

Language-by-Language Analysis

Expand each card to see strengths, weaknesses, and notable code examples.

Best Code Examples

Side-by-side comparisons of the same algorithm across different languages.

Dijkstra Min-Heap — C++

C++
priority_queue<pair<long long, int>,
  vector<pair<long long, int>>, greater<>> pq;
pq.emplace(0, 1);

while (!pq.empty()) {
    auto [d, u] = pq.top();
    pq.pop();
    if (d > dist[u]) continue;
    for (auto [v, w] : adj[u]) {
        if (dist[u] + w < dist[v]) {
            dist[v] = dist[u] + w;
            pq.emplace(dist[v], v);
        }
    }
}

Dijkstra Min-Heap — Rust

Rust
heap.push(Reverse((0i64, 1usize)));

while let Some(Reverse((d, u))) = heap.pop() {
    if d > dist[u] { continue; }
    for &(v, w) in &adj[u] {
        let nd = dist[u] + w;
        if nd < dist[v] {
            dist[v] = nd;
            heap.push(Reverse((nd, v)));
        }
    }
}

LIS — C++ (31 lines total)

C++
vector<int> tails;
for (int x : nums) {
    auto it = lower_bound(tails.begin(),
                         tails.end(), x);
    if (it == tails.end())
        tails.push_back(x);
    else
        *it = x;
}
cout << tails.size() << "\n";

Matrix Exponentiation — TypeScript

TypeScript
const MOD = 1000000007n;
type Matrix = [bigint,bigint,bigint,bigint];

function matMul(a: Matrix, b: Matrix): Matrix {
  return [
    (a[0]*b[0] + a[1]*b[2]) % MOD,
    (a[0]*b[1] + a[1]*b[3]) % MOD,
    (a[2]*b[0] + a[3]*b[2]) % MOD,
    (a[2]*b[1] + a[3]*b[3]) % MOD,
  ];
}

Booking API

A realistic web application — REST API with authentication, database, and business logic. Same spec, 11 languages, 46 tests each.

Language Tests Lines Ratio Framework
TS
TypeScript
46/46 ✓ 197 1.00x Express
RB
Ruby
46/46 ✓ 307 1.56x Sinatra
PY
Python
46/46 ✓ 348 1.77x Flask
DA
Dart
46/46 ✓ 361 1.83x Shelf
GO
Go
46/46 ✓ 397 2.01x Fiber
RS
Rust
46/46 ✓ 532 2.70x Axum
JL
Julia
46/46 ✓ 544 2.76x HTTP.jl
C+
C++
46/46 ✓ 806 4.09x cpp-httplib
AD
Ada
46/46 ✓ 1035 5.25x AWS
ZG
Zig
46/46 ✓ 1225 6.22x Raw sockets
C
C
46/46 ✓ 1327 6.74x Raw sockets

JWT Authentication

User registration, login, and token-based middleware protecting routes.

SQLite Database

3 tables (users, spaces, bookings) with foreign keys and indexes.

Overlap Validation

Business logic preventing double-bookings with time range intersection checks.

RESTful Routing

8 endpoints with proper HTTP status codes, error handling, and middleware.

Stress Test Results

50 concurrent connections, 10s per endpoint. Release builds on Apple M-series. Latency in milliseconds.

Language GET /spaces
(req/s)
GET filtered
(req/s)
GET bookings
(req/s + auth)
POST login
(req/s)
Avg latency
(ms)
Peak mem
C+
C++
39,234 39,234 38,514 43,316 sha256 1.3 9 MB
RS
Rust
27,065 27,565 22,644 3.4 bcrypt 1.8 6 MB
C
C
16,994 17,274 17,408 17,493 sha256 2.9 5 MB
GO
Go
15,851 20,180 19,693 61.9 bcrypt 2.9 30 MB
ZG
Zig
15,320 15,389 16,482 16,302 sha256 3.1 2 MB
AD
Ada
4,559 5,089 5,927 2,312 sha256 8.6 12 MB
JL
Julia
3,264 4,034 3,980 2,769 sha256 14.6 569 MB
RB
Ruby
3,659 3,583 3,608 18.6 bcrypt 13.8 43 MB
PY
Python
1,026 742 695 23.8 bcrypt 48.6 50 MB
DA
Dart
221 442 347 368 sha256 154.0 28 MB
TS
TypeScript
334 233 97 3.4 bcrypt 2,439 67 MB

Login throughput varies by hashing algorithm: bcrypt (TypeScript, Go, Rust, Python, Ruby) is intentionally slow; SHA-256 (C, C++, Dart, Ada, Julia, Zig) is fast but less secure.

Requests/sec — GET /spaces

Peak Memory Usage

The Right Tool for the Job

How language strengths shift between algorithmic puzzles and real-world applications.

Web APIs

TypeScript Dominates APIs

From "Adequate" in competitive programming to the clear winner in web APIs. The Express ecosystem and Node.js runtime make REST services remarkably compact — 2.70x more concise than Rust, 5.25x more than Ada.

Competitive programming: 6.5/10 · Booking API: 197 lines (1.00x)
⚙️
Algorithms

Rust Excels at Pure Logic

Zero-cost abstractions shine in algorithmic code (1.06x ratio). But the verbosity cost scales with application complexity — 2.70x for APIs, where explicit error handling and type machinery add up.

Competitive programming: 8.5/10 · Booking API: 532 lines (2.70x)
🛠
Ecosystem Gap

Ada: Maximum Verbosity in Web

Ada's lack of web ecosystem libraries forces manual JWT implementation, JSON parsing, and Base64 encoding. The 5.25x ratio reflects missing infrastructure, not language capability — the strongest case for ecosystem maturity over language design.

Competitive programming: 7/10 (2.07x) · Booking API: 1035 lines (5.25x)
⚖️
Versatile

Go & Dart: Steady All-rounders

Both deliver consistent performance across domains. Not the most concise anywhere, but never the most verbose either — a solid balance of readability, robustness, and productivity.

Go: 7.5/10 + 2.01x · Dart: 7.5/10 + 1.83x

What We Discovered

The most surprising and insightful results from the analysis.

🏆

C++ Shatters the Verbosity Myth

C++ is the second most concise language at just 285 total lines — only 12 more than Factor. Modern C++17 with STL makes it remarkably compact for competitive programming.

🔎

Translation Bias Persists

All 65 implementations share identical algorithmic structure and variable names (dist, adj, heap, tails), confirming they were generated from a single mental model.

🚀

Rust Delivers on Promises

Zero-cost abstractions, memory safety, and 289 total lines. The BinaryHeap + Reverse pattern is elegant. Main gap: .unwrap() overuse.

😕

TypeScript: Biggest Disappointment

The type system is TS's defining feature, yet solutions use it at a JavaScript+annotations level. No interfaces, no generics, no classes for data structures.

💥

Assembly at Scale

555 lines for a Segment Tree in raw x86-64 is impressive scope, but Dijkstra was downgraded to O(N²) — the only algorithmic regression across all 60 solutions.

📊

Memory Model Spectrum

The 13 languages span every memory model: pure GC (Python, Ruby, TS, Dart, Factor), GC with tuning (Go, Julia), ownership (Rust), RAII (C++, Ada), manual with defer (Zig), full manual (C), static only (Assembly).

The Full Spectrum

From garbage collection to raw static BSS — every model represented.

Pure GC
Python, Ruby, TypeScript, Dart, Factor
GC with awareness
Go, Julia
Ownership (compile-time)
Rust
RAII (scope-based)
C++, Ada
Manual with defer
Zig
Full manual
C
Static BSS only
Assembly x86-64

How We Evaluated

A systematic approach to measuring AI code generation quality.

01

Problem Selection

5 classic competitive programming problems: Dijkstra's shortest path, KMP string matching, Longest Increasing Subsequence, Matrix Exponentiation, and Segment Tree range queries.

02

Language Coverage

13 languages spanning the full abstraction spectrum: Python, Ruby, Go, Dart, Ada, Zig, C, Assembly x86-64, Julia, Factor, TypeScript, Rust, and C++.

03

Evaluation Criteria

Idiomatic usage, stdlib utilization, optimization awareness, memory management, error handling, readability, code complexity, and boilerplate ratio.

04

Scoring Framework

Each language scored 1-10 on idiom adherence, with detailed analysis of strengths, weaknesses, and comparison against expert-level patterns for each language.

Problems Evaluated

Dijkstra's Shortest Path

Weighted graph shortest path with priority queue. O((N+M) log N) complexity. Tests heap usage, graph representation, and I/O handling.

KMP String Matching

Knuth-Morris-Pratt pattern matching. O(N+M) complexity. Tests string handling, prefix function computation, and output formatting.

Longest Increasing Subsequence

Patience sorting with binary search. O(N log N) complexity. Tests stdlib binary search usage and array manipulation.

Matrix Exponentiation

Fast Fibonacci via 2x2 matrix power. O(log N) complexity. Tests numeric overflow handling and matrix representation.

Segment Tree Range Queries

Build, point update, range sum query. O(N + Q log N) complexity. Tests data structure encapsulation and buffered I/O.