-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcid.html
163 lines (160 loc) · 10.5 KB
/
cid.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<!DOCTYPE html><html lang="en"><head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Content IDs (CIDs)</title>
<link rel="stylesheet" href="spec.css"><link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><rect x=%220%22 y=%220%22 width=%22100%22 height=%22100%22 fill=%22%2300ff75%22></rect></svg>"><meta name="twitter:card" content="summary_large_image"><meta name="twitter:title" property="og:title" content="DASL: Content IDs (CIDs)"><meta name="twitter:description" property="og:description" content="DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash with enough metadata to be extensible (to add new hash types in the future) and to indicate whether they are pointing to raw bytes or to structured data."><meta name="twitter:image" property="og:image" content="https://dasl.ing/cid.png"><meta name="twitter:image:alt" content="Very colourful stripes, so colourful it hurts"><meta name="twitter:url" property="og:url" content="https://dasl.ing/"><meta property="og:site_name" content="DASL"><meta property="og:locale" content="en"><meta name="theme-color" content="#00ff75"></head>
<body><div class="nav-back">A specification of the <a href="/">DASL Project</a>.</div><main><header><h1>Content IDs (CIDs)</h1><table><tbody><tr><th>date</th><td>2025-01-17</td></tr><tr><th>editors</th><td><a href="https://berjon.com/">Robin Berjon</a> <<a href="mailto:[email protected]">[email protected]</a>><br><a href="https://bumblefudge.com/">Juan Caballero</a> <<a href="mailto:[email protected]">[email protected]</a>></td></tr><tr><th>issues</th><td><a href="https://github.com/darobin/dasl.ing/issues">list</a>, <a href="https://github.com/darobin/dasl.ing/issues/new">new</a></td></tr><tr><th>abstract</th><td><div id="abstract">
<p>
DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
they are pointing to raw bytes or to structured data.
</p>
</div></td></tr></tbody></table></header>
<section>
<h2>Introduction</h2>
<p>
DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
they are pointing to raw bytes or to structured data. If you're simply DASL CIDs as identifiers, you
can almost certainly just use the string as an opaque ID and worry no further.
</p>
<p>
A DASL CID can be represented as a string or as an array of bytes. If you wish to understand the
internals of a CID, it has the following structure:
</p>
<ol>
<li>
A <code>b</code> prefix (only in string form). This is an extensibility point for future
CID encodings other than the current base32 to be supported. (Currently this is the only one.)
</li>
<li>
A version number, which is currently always 1.
</li>
<li>
A content codec, which is a flag indicating whether it is pointing to structured or raw
data.
</li>
<li>
A hash type, that is always SHA-256 ([<a href="#ref-sha256" class="ref">sha256</a>]).
</li>
<li>
A hash size, indicating how many bytes long the digest is.
</li>
<li>
A digest, which is the hash of the content being identified.
</li>
</ol>
</section>
<section>
<h2>Parsing CIDs</h2>
<p>
Use the following steps to <dfn id="dfn-parse-a-cid-string">parse a CID string</dfn>:
</p>
<ol>
<li>Accept a string <var>CID</var>.</li>
<li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li>
<li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li>
<li>
Decode the rest of <var>CID</var> using
<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the base32 algorithm from
RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var> ([<a href="#ref-rfc4648" class="ref">rfc4648</a>]).
</li>
<li>Return the result of applying the steps to <a href="#dfn-decode-a-cid" class="dfn-ref">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="dfn-parse-a-binary-cid">parse a binary CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>binary CID</var>.</li>
<li>
Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>.
</li>
<li>
If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256
prefix), throw an error.
</li>
<li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li>
<li>Return the result of applying the steps to <a href="#dfn-decode-a-cid" class="dfn-ref">decode a CID</a> to <var>CID bytes</var>.</li>
</ol>
<p>
Use the following steps to <dfn id="dfn-decode-a-cid">decode a CID</dfn>:
</p>
<ol>
<li>Accept an array of bytes <var>CID bytes</var>.</li>
<li>
Remove the first byte in <var>CID bytes</var> and store it in <var>version</var>.
</li>
<li>If <var>version</var> is not equal to <code>1</code>, throw an error.</li>
<li>
Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>.
</li>
<li>
If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42),
throw an error ([<a href="#ref-dcbor42" class="ref">dcbor42</a>]).
</li>
<li>
Remove the next byte in <var>CID bytes</var> and store it in <var>hash type</var>.
</li>
<li>
If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256), throw an error ([<a href="#ref-sha256" class="ref">sha256</a>]).
</li>
<li>Read an LEB128 <var>hash size</var> from <var>CID bytes</var> ([<a href="#ref-leb128" class="ref">leb128</a>]).</li>
<li>
If the number of bytes left in <var>CID bytes</var> is smaller than <var>hash size</var>,
throw an error. Note that it is possible for <var>hash size</var> to be zero in order to code
for an empty CID. At this point, we have two options:
<ul>
<li>
<u>We are expecting <var>CID bytes</var> to <em>only</em> contain a CID.</u> If
the number of bytes left in <var>CID bytes</var> is greater than <var>hash size</var>,
throw an error. Store the remaining <var>CID bytes</var> in <code>digest</code>.
</li>
<li>
<u>We are reading from the beginning of <var>CID bytes</var> and it may contain more data.</u>
This is the case for instance when processing a CAR block ([<a href="#ref-car" class="ref">car</a>]). Only read <var>hash size</var>
bytes from <var>CID bytes</var> and leave the rest available for further processing.
Store the <var>hash size</var> bytes from <var>CID bytes</var> in <code>digest</code>.
</li>
</ul>
</li>
<li>
Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>,
and <var>digest</var>.
</li>
</ol>
</section>
<section>
<h2>Relationship to IPFS</h2>
<p>
You don't need to understand IPFS in order to use DASL. This section is for informational
purposes only.
</p>
<p>
DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a>
with the following properties:
</p>
<ul>
<li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li>
<li>
Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable
(and subdomain-usable) string encoding.
</li>
<li>
Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the
latter used only for dCBOR42-conformant DAGs.
</li>
<li>Only SHA-256 (0x12) for the hash function .</li>
<li>
The CID isn't the boss of anyone, but the expectation is that, regardless of size, resources
<em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization
in IPFS systems) but rather hashed in their entirety and content-addressed directly. That being
said, a DASL CID can point to a piece of dCBOR42 metadata that describes this kind of
chunking, if needed. (A separate specification may be added for that.)
</li>
<li>
This set of options has the added advantage that all the aforementioned single-byte prefixes require no
additional varint processing or byte-fiddling.
</li>
</ul>
</section>
<section><h2>References</h2><dl><dt id="ref-car">[car]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/car.html"><cite>Content-Addressable aRchives (CAR)</cite></a>. 2025-01-17. URL: <a href="https://dasl.ing/car.html">https://dasl.ing/car.html</a></dd><dt id="ref-dcbor42">[dcbor42]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/dcbor42.html"><cite>Deterministic CBOR with tag 42 (dCBOR42)</cite></a>. 2025-01-17. URL: <a href="https://dasl.ing/dcbor42.html">https://dasl.ing/dcbor42.html</a></dd><dt id="ref-leb128">[leb128]</dt><dd>Wikipedia. <a href="https://en.wikipedia.org/wiki/LEB128"><cite>LEB128</cite></a>. Retrieved December 2024. URL: <a href="https://en.wikipedia.org/wiki/LEB128">https://en.wikipedia.org/wiki/LEB128</a></dd><dt id="ref-rfc4648">[rfc4648]</dt><dd>S. Josefsson. <a href="https://www.rfc-editor.org/rfc/rfc4648"><cite>The Base16, Base32, and Base64 Data Encodings</cite></a>. October 2006. URL: <a href="https://www.rfc-editor.org/rfc/rfc4648">https://www.rfc-editor.org/rfc/rfc4648</a></dd><dt id="ref-sha256">[sha256]</dt><dd>National Institute of Standards and Technology, <cite>Secure Hash Algorithm. NIST FIPS 180-2</cite>. August 2002.</dd></dl></section></main></body></html>