CBOR-Web — Machine-Readable Binary Web Content Standard

Abstract

CBOR-Web defines a method for websites to expose a machine-native copy of their content as a parallel channel alongside existing HTML. A single file — index.cbor — placed at the root of a domain contains the entire site's content in structured binary format. AI agents fetch this file in one request and obtain every page, product, and data point without parsing HTML, CSS, or JavaScript.

Status of This Document: This is a draft specification published for community review. It is not yet an IETF Internet-Draft. The specification is stable and implemented in production on multiple sites.

1. Protocol

An AI agent discovers and reads CBOR-Web content through a single HTTP request:

GET /index.cbor HTTP/1.1
Host: example.com
Accept: application/cbor

The response is a CBOR-encoded document (Content-Type: application/cbor) beginning with the self-described CBOR tag 0xD9D9F7 (tag 55799, RFC 8949 §3.4.6).

Response	Meaning
`200 OK` + `application/cbor`	CBOR-Web supported. The body contains the entire site.
`404 Not Found`	CBOR-Web not available. Fallback to HTML.

2. Document Structure

An index.cbor file contains a CBOR map with the following top-level keys:

Key	Name	Type	Required	Description
0	@type	text	Yes	`"cbor-web"`
1	@version	uint	Yes	`3` for this version
2	site	map	Yes	Domain, name, languages, contact, geo
3	security	map	Yes	Authentication, access tiers (T0/T1/T2), rate limits
4	navigation	map	Recommended	Menus, hierarchy, breadcrumbs
5	pages	array	Yes	All pages with structured content blocks
6	meta	map	Yes	Timestamp, generator, signature

2.1 Page Entry

Each element in the pages array (key 5) contains:

Field	Type	Required	Description
`"path"`	text	Yes	URL path (`"/"`, `"/products"`)
`"title"`	text	Yes	Page title
`"lang"`	text	Yes	BCP 47 language code
`"access"`	text	Yes	`"T0"`, `"T1"`, or `"T2"`
`"content"`	array	Yes	Ordered content blocks
`"hash"`	bstr	Recommended	SHA-256 of serialised content
`"updated"`	tag 1	Recommended	Last modification timestamp
`"_describe"`	text	Optional	Publisher guidance for AI navigation
`"_l"`	uint	Optional	Depth level (0-4) for progressive reading

2.2 Content Blocks

Content blocks are the atomic units of a page. Each block is a CBOR map with a required "t" (type) field:

Code	Type	Keys	Description
`h`	Heading	`l` (1-6), `v`	Section heading
`p`	Paragraph	`v`	Body text
`ul`	Unordered list	`v` (array)	Bullet list
`ol`	Ordered list	`v` (array)	Numbered list
`table`	Table	`headers`, `rows`	Data table
`cta`	Call to action	`v`, `href`	Button or link
`q`	Quote	`v`, `attr`	Citation with source
`img`	Image	`src`, `alt`	Image reference
`code`	Code	`v`, `lang`	Source code
`dl`	Definitions	`v` (array of maps)	Term/definition pairs

3. Discovery Methods

An AI agent uses the following methods to discover CBOR-Web support, in order of priority:

Priority	Method	Example
1 (highest)	Direct access	`GET /index.cbor`
2	DNS TXT record	`_cbor-web.example.com TXT "v=cbor-web; url=..."`
3	HTML link element	`<link rel="alternate" type="application/cbor" href="...">`
4	cbor.txt	Plain text file at root with index URL
5	robots.txt	`CBOR-Web: /index.cbor`
6	llms.txt	Section referencing index.cbor

4. Access Tiers

Tier	Name	Authentication	Content
T2	Open	None	Public pages, metadata
T1	Authenticated	CBORW Token / API key	Premium content, full data
T0	Institutional	eIDAS 2.0 / X.509 EV	Government, verified identity

Trust Chain: A signed index.cbor creates a verifiable chain: signature → DNS public key → CBORW token → Ethereum wallet → legal entity. AI agents assess source trustworthiness through this chain.

5. Comparison with Existing Standards

Standard	Format	Content	Relation to CBOR-Web
`robots.txt`	Text	Crawl permissions	Complementary
`sitemap.xml`	XML	URL list	Replaced by `index.cbor`
`llms.txt`	Markdown	Text summary for LLMs	Complementary
`index.html`	HTML	Human-readable page	Parallel — `index.cbor` is for machines
JSON-LD	JSON	Structured entities	Complementary (entities vs full content)

6. Efficiency

Measured on a production website (deltopide.fr, 25 March 2026):

Metric	HTML	CBOR-Web	Improvement
File size	41,537 bytes	2,487 bytes	×17
LLM tokens	~10,400	~620	×17
HTTP requests	9+	1	×9
Useful signal	12%	100%	×8
Formats to parse	4	1	—
Accent encodings	3	1 (UTF-8)	—

7. History

Chapter 1 — Two researchers, one idea (2013)

In 2013, Carsten Bormann (University of Bremen) and Paul Hoffman (ICANN) invented CBOR — Concise Binary Object Representation. The same data model as JSON, encoded in binary. Published as RFC 7049 by the IETF. Designed for constrained devices: thermometers, door locks, electricity meters.

Chapter 2 — Internet Standard (2020)

In December 2020, RFC 8949 replaced the original specification. CBOR became an Internet Standard (STD 94) — the highest level of validation at the IETF. Adopted by WebAuthn, COSE, CoAP. Millions of devices spoke CBOR without anyone noticing.

Chapter 3 — AI agents arrive (2024-2026)

AI agents started browsing the web — and found it absurd. 2.86 MB average per page. 86 requests to display a single page. 95% noise for 5% useful content. 50 billion AI crawler requests per day (Cloudflare 2025).

Chapter 4 — One file, the entire site (2026)

index.cbor — placed next to index.html. The HTML for humans, the CBOR for machines. A 160 KB Shopify page becomes 8 KB of CBOR. A complete 84-page site fits in 700 KB.

Chapter 5 — The ecological imperative

Data centres consumed 415 TWh of electricity in 2024, projected to reach 945 TWh by 2030 (IEA). Every unnecessary byte transferred is energy wasted. CBOR-Web doesn't just make the web faster for AI — it makes it lighter for the planet.

Chapter 6 — A standard that belongs to everyone

The CBOR-Web read protocol is published under CC0 — Public Domain. No licence to pay, no permission to ask. Anyone can read an index.cbor. The reading protocol belongs to humanity.

8. Implementations

Live Sites

Site	Method	Pages
deltopide.fr	Direct `/index.cbor` + DNS TXT + cbor.txt + robots.txt + llms.txt + HTML link	1
laforetnousregale.fr	DNS TXT	5
pacific-planet.com	DNS TXT	6
verdetao.com	DNS TXT	15
eloiseplot-dieteticienne.com	HTML link	6
crm.laforetnousregale.fr	Direct `/index.cbor`	1

Tools

Tool	Language	Purpose
text2cbor	Rust	Convert HTML websites to `index.cbor`
cbor-crawl	Rust	AI-side crawler for `index.cbor` files

9. References

[RFC 8949] CBOR — Concise Binary Object Representation (STD 94)
[RFC 8610] CDDL — Concise Data Definition Language
[RFC 9052] COSE — CBOR Object Signing and Encryption
[EU 2024/1183] eIDAS 2.0 — European Digital Identity Framework
[EIP-20] ERC-20 Token Standard
[IEA 2025] Energy and AI — Data centre energy demand
[Cloudflare 2025] From Googlebot to GPTBot — AI crawler statistics
[HTTP Archive 2024] Web Almanac — Page weight statistics

10. Licence

The read protocol is CC0 — Public Domain. The full specification is CC BY-ND 4.0. Tools are MIT.

The reading protocol belongs to everyone. The creation tools are where value lives.

CBOR-Web Standard