Abstract
CBOR-Web defines a method for websites to expose a machine-native copy of their content as a parallel channel alongside existing HTML. A single file — index.cbor — placed at the root of a domain contains the entire site's content in structured binary format. AI agents fetch this file in one request and obtain every page, product, and data point without parsing HTML, CSS, or JavaScript.
1. Protocol
An AI agent discovers and reads CBOR-Web content through a single HTTP request:
GET /index.cbor HTTP/1.1 Host: example.com Accept: application/cbor
The response is a CBOR-encoded document (Content-Type: application/cbor) beginning with the self-described CBOR tag 0xD9D9F7 (tag 55799, RFC 8949 §3.4.6).
| Response | Meaning |
|---|---|
200 OK + application/cbor | CBOR-Web supported. The body contains the entire site. |
404 Not Found | CBOR-Web not available. Fallback to HTML. |
2. Document Structure
An index.cbor file contains a CBOR map with the following top-level keys:
| Key | Name | Type | Required | Description |
|---|---|---|---|---|
| 0 | @type | text | Yes | "cbor-web" |
| 1 | @version | uint | Yes | 3 for this version |
| 2 | site | map | Yes | Domain, name, languages, contact, geo |
| 3 | security | map | Yes | Authentication, access tiers (T0/T1/T2), rate limits |
| 4 | navigation | map | Recommended | Menus, hierarchy, breadcrumbs |
| 5 | pages | array | Yes | All pages with structured content blocks |
| 6 | meta | map | Yes | Timestamp, generator, signature |
2.1 Page Entry
Each element in the pages array (key 5) contains:
| Field | Type | Required | Description |
|---|---|---|---|
"path" | text | Yes | URL path ("/", "/products") |
"title" | text | Yes | Page title |
"lang" | text | Yes | BCP 47 language code |
"access" | text | Yes | "T0", "T1", or "T2" |
"content" | array | Yes | Ordered content blocks |
"hash" | bstr | Recommended | SHA-256 of serialised content |
"updated" | tag 1 | Recommended | Last modification timestamp |
"_describe" | text | Optional | Publisher guidance for AI navigation |
"_l" | uint | Optional | Depth level (0-4) for progressive reading |
2.2 Content Blocks
Content blocks are the atomic units of a page. Each block is a CBOR map with a required "t" (type) field:
| Code | Type | Keys | Description |
|---|---|---|---|
h | Heading | l (1-6), v | Section heading |
p | Paragraph | v | Body text |
ul | Unordered list | v (array) | Bullet list |
ol | Ordered list | v (array) | Numbered list |
table | Table | headers, rows | Data table |
cta | Call to action | v, href | Button or link |
q | Quote | v, attr | Citation with source |
img | Image | src, alt | Image reference |
code | Code | v, lang | Source code |
dl | Definitions | v (array of maps) | Term/definition pairs |
3. Discovery Methods
An AI agent uses the following methods to discover CBOR-Web support, in order of priority:
| Priority | Method | Example |
|---|---|---|
| 1 (highest) | Direct access | GET /index.cbor |
| 2 | DNS TXT record | _cbor-web.example.com TXT "v=cbor-web; url=..." |
| 3 | HTML link element | <link rel="alternate" type="application/cbor" href="..."> |
| 4 | cbor.txt | Plain text file at root with index URL |
| 5 | robots.txt | CBOR-Web: /index.cbor |
| 6 | llms.txt | Section referencing index.cbor |
4. Access Tiers
| Tier | Name | Authentication | Content |
|---|---|---|---|
| T2 | Open | None | Public pages, metadata |
| T1 | Authenticated | CBORW Token / API key | Premium content, full data |
| T0 | Institutional | eIDAS 2.0 / X.509 EV | Government, verified identity |
index.cbor creates a verifiable chain: signature → DNS public key → CBORW token → Ethereum wallet → legal entity. AI agents assess source trustworthiness through this chain.
5. Comparison with Existing Standards
| Standard | Format | Content | Relation to CBOR-Web |
|---|---|---|---|
robots.txt | Text | Crawl permissions | Complementary |
sitemap.xml | XML | URL list | Replaced by index.cbor |
llms.txt | Markdown | Text summary for LLMs | Complementary |
index.html | HTML | Human-readable page | Parallel — index.cbor is for machines |
| JSON-LD | JSON | Structured entities | Complementary (entities vs full content) |
6. Efficiency
Measured on a production website (deltopide.fr, 25 March 2026):
| Metric | HTML | CBOR-Web | Improvement |
|---|---|---|---|
| File size | 41,537 bytes | 2,487 bytes | ×17 |
| LLM tokens | ~10,400 | ~620 | ×17 |
| HTTP requests | 9+ | 1 | ×9 |
| Useful signal | 12% | 100% | ×8 |
| Formats to parse | 4 | 1 | — |
| Accent encodings | 3 | 1 (UTF-8) | — |
7. History
Chapter 1 — Two researchers, one idea (2013)
In 2013, Carsten Bormann (University of Bremen) and Paul Hoffman (ICANN) invented CBOR — Concise Binary Object Representation. The same data model as JSON, encoded in binary. Published as RFC 7049 by the IETF. Designed for constrained devices: thermometers, door locks, electricity meters.
Chapter 2 — Internet Standard (2020)
In December 2020, RFC 8949 replaced the original specification. CBOR became an Internet Standard (STD 94) — the highest level of validation at the IETF. Adopted by WebAuthn, COSE, CoAP. Millions of devices spoke CBOR without anyone noticing.
Chapter 3 — AI agents arrive (2024-2026)
AI agents started browsing the web — and found it absurd. 2.86 MB average per page. 86 requests to display a single page. 95% noise for 5% useful content. 50 billion AI crawler requests per day (Cloudflare 2025).
Chapter 4 — One file, the entire site (2026)
index.cbor — placed next to index.html. The HTML for humans, the CBOR for machines. A 160 KB Shopify page becomes 8 KB of CBOR. A complete 84-page site fits in 700 KB.
Chapter 5 — The ecological imperative
Data centres consumed 415 TWh of electricity in 2024, projected to reach 945 TWh by 2030 (IEA). Every unnecessary byte transferred is energy wasted. CBOR-Web doesn't just make the web faster for AI — it makes it lighter for the planet.
Chapter 6 — A standard that belongs to everyone
The CBOR-Web read protocol is published under CC0 — Public Domain. No licence to pay, no permission to ask. Anyone can read an index.cbor. The reading protocol belongs to humanity.
8. Implementations
Live Sites
| Site | Method | Pages |
|---|---|---|
| deltopide.fr | Direct /index.cbor + DNS TXT + cbor.txt + robots.txt + llms.txt + HTML link | 1 |
| laforetnousregale.fr | DNS TXT | 5 |
| pacific-planet.com | DNS TXT | 6 |
| verdetao.com | DNS TXT | 15 |
| eloiseplot-dieteticienne.com | HTML link | 6 |
| crm.laforetnousregale.fr | Direct /index.cbor | 1 |
Tools
| Tool | Language | Purpose |
|---|---|---|
| text2cbor | Rust | Convert HTML websites to index.cbor |
| cbor-crawl | Rust | AI-side crawler for index.cbor files |
9. References
- [RFC 8949] CBOR — Concise Binary Object Representation (STD 94)
- [RFC 8610] CDDL — Concise Data Definition Language
- [RFC 9052] COSE — CBOR Object Signing and Encryption
- [EU 2024/1183] eIDAS 2.0 — European Digital Identity Framework
- [EIP-20] ERC-20 Token Standard
- [IEA 2025] Energy and AI — Data centre energy demand
- [Cloudflare 2025] From Googlebot to GPTBot — AI crawler statistics
- [HTTP Archive 2024] Web Almanac — Page weight statistics
10. Licence
The read protocol is CC0 — Public Domain. The full specification is CC BY-ND 4.0. Tools are MIT.
The reading protocol belongs to everyone. The creation tools are where value lives.