When you feed a large language model (LLM) structured data, you probably reach for JSON—it’s the universal standard. But here’s the problem: JSON wasn’t designed with token costs in mind. Every quote, comma, bracket, and repeated key name is money flying out the window, especially when you’re processing thousands of records through an API like OpenAI’s or Anthropic’s.

TOONToken-Oriented Object Notation—is a new data format designed specifically to fix this. It compresses JSON down by 30–60% while keeping the same data model and actually improving LLM parsing accuracy. GitHub repo has 21.5K+ stars and implementations in TypeScript, Python, Go, Rust, and .NET.

What Is TOON?

TOON is a compact, human-readable encoding of the JSON data model. The core idea: move repeated structure declarations to the top and state the array length upfront. It borrows from two formats you already know:

  • YAML—indentation-based nesting (no curly braces needed)
  • CSV—tabular layout for uniform object arrays (field headers declared once, data streamed row by row)

The result is a format that feels familiar, reads naturally, and costs far less to send to an LLM.

JSON vs. TOON: Side by Side

Take this typical JSON for a list of hikes:

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    { "id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320, "companion": "ana", "wasSunny": true },
    { "id": 2, "name": "Ridge Overlook", "distanceKm": 9.2, "elevationGain": 540, "companion": "luis", "wasSunny": false },
    { "id": 3, "name": "Wildflower Loop", "distanceKm": 5.1, "elevationGain": 180, "companion": "sam", "wasSunny": true }
  ]
}

The same data in TOON:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

Notice: hikes[3] declares there are 3 items, and {id,name,...} declares the fields once. No repeated key names, no curly braces, no commas between objects.

Benchmarks: Real Numbers

The official benchmarks test LLM comprehension across 209 data retrieval questions using 4 models. The results are striking:

Format Accuracy Tokens Acc/1K Tokens
TOON 76.4% 2,759 27.7
JSON compact 73.7% 3,104 23.7
YAML 74.5% 3,749 19.9
JSON pretty 75.0% 4,587 16.4
XML 72.1% 5,221 13.8

TOON achieves higher accuracy (76.4% vs JSON’s 75.0%) while using 39.9% fewer tokens. The token savings are even more dramatic on tabular data:

  • 100 employee records: TOON uses 2,498 tokens vs. JSON compact’s 3,924 (36% savings)
  • GitHub repositories: TOON uses 1,553 tokens vs. JSON compact’s 2,354 (34% savings)

Key Features of TOON

  • Token-Efficient: 30–60% fewer tokens than equivalent JSON
  • Lossless round-trip: Deterministic encoding back to JSON
  • LLM-Friendly guardrails: Explicit [N] length markers and {fields} headers give models a clear schema to follow
  • Minimal syntax: No curly braces, no quotes on keys, indentation does the work
  • Tabular arrays: Uniform objects collapse into CSV-style tables
  • Multi-language SDKs: TypeScript, Python, Go, Rust, .NET
  • File extension: .toon, media type text/toon

When NOT to Use TOON

TOON isn’t a universal replacement for JSON. It’s optimized for a specific sweet spot:

  • Deeply nested or irregular structures: JSON compact often wins when data is highly nested with no tabular eligibility (~0%)
  • Semi-uniform data: When only 40–60% of arrays are tabular, the savings diminish
  • Pure flat tables: Plain CSV is smaller than TOON for single-level tables (TOON adds ~5–10% overhead for structure)
  • Latency-critical applications: Some local/quantized models (e.g., Ollama) may process compact JSON faster despite TOON’s lower token count. Measure your specific setup.

How to Use TOON in Your Projects

JavaScript / TypeScript

import { stringify, parse } from '@toon-format/toon';

// Encode JSON to TOON
const toon = stringify({
  users: [
    { id: 1, name: 'Alice', age: 30 },
    { id: 2, name: 'Bob', age: 25 }
  ]
});
// users[2]{id,name,age}:
// 1,Alice,30
// 2,Bob,25

// Decode TOON back to JSON
const json = parse(toon);
console.log(json.users[0].name); // "Alice"

Python

import toon

# Encode
data = {
    "users": [
        {"id": 1, "name": "Alice", "age": 30},
        {"id": 2, "name": "Bob", "age": 25}
    ]
}
toon_string = toon.dumps(data)
# users[2]{id,name,age}:
# 1,Alice,30
# 2,Bob,25

# Decode
parsed = toon.loads(toon_string)

TOON vs. the Other JSON Derivatives

Format Goal Best For
JSON Universal standard General-purpose API responses
JSONL Streaming / newline-delimited records Logs, training data, large file processing
SQLite Queryable local database RAG pipelines, offline-first apps
Markdown Human-readable structured docs Prompts, documentation, reports
TOON Token-minimized LLM input High-frequency API calls, large tabular datasets

The Bigger Picture: AI Is Reshaping Data Formats

TOON is part of a broader trend: LLMs are forcing us to rethink the formats we’ve used for decades. JSON was designed for computers, not for token-based context windows. When humans—and the models that process our text—pay per token, compression isn’t just optimization, it’s economics.

As context windows grow, we’re stuffing more structured data into prompts: RAG results, database query outputs, agent tool call parameters. TOON’s 30–60% token savings compound quickly in production systems making thousands of API calls per day.

The format spec lives at github.com/toon-format/spec, and there are online playgrounds where you can convert JSON to TOON interactively and compare token counts.

Verdict

TOON isn’t replacing JSON—but for a specific, increasingly common use case—sending structured data to LLMs at scale—it delivers real savings. 30–60% fewer tokens means lower API bills, faster responses, and more room in your context window for actual content.

If you’re building AI-powered applications with frequent structured data exchanges, TOON is worth evaluating. The TypeScript and Python SDKs make integration straightforward, and the format’s familiarity means your team can read it without learning a new notation.


Frequently Asked Questions

Is TOON a replacement for JSON?

No. TOON is a translation layer—use JSON programmatically in your code, then convert to TOON before sending to an LLM. The encoding is lossless and deterministic.

How much can I save on token costs?

Benchmarks show 30–60% token reduction depending on data structure. For tabular data (uniform object arrays), savings are typically 35–40%. For deeply nested data, the savings are smaller.

Does TOON improve accuracy over JSON?

Yes. In official benchmarks, TOON achieved 76.4% accuracy vs JSON’s 75.0% while using 40% fewer tokens. The explicit [N] length markers and {fields} headers help LLMs reliably parse structure.

Which languages have TOON libraries?

TypeScript/JavaScript (@toon-format/toon), Python (toon), Go, Rust, .NET, and more. See github.com/toon-format/toon for the full list.

When should I NOT use TOON?

TOON is not ideal for deeply nested, irregular structures; pure flat tabular data (use CSV instead); or latency-critical setups where you’ve benchmarked and found JSON faster on your specific model.

Disclaimer: Unless otherwise specified or noted, all articles on this site are co-publications with AI. Any individual or organization is prohibited from copying, misappropriating, collecting, or publishing the content of this site to any website, book, or other media platform without the prior consent of this site. If any content on this site infringes upon the legitimate rights and interests of the original author, please contact us for processing. 声明:本站所有文章,如无特殊说明或标注,均为和AI 共创。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。