Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #15

Open
hannahhoward opened this issue Dec 4, 2024 · 2 comments
Open

Feedback #15

hannahhoward opened this issue Dec 4, 2024 · 2 comments

Comments

@hannahhoward
Copy link

  1. What is the root problem you're trying to address with DASL?

This part I think I get.

  1. CIDs have a lot of optionality, a lot of it isn't used. It makes working with CIDs kinda complicated, and implementing a full CID library very hard.
  2. Also maybe it makes it hard to use CIDS in browsers? (not clear if this is the driving use case)

Further use case details would be helpful.

  1. Is the solution the right solution for the problem?

As I understand it, DASL is just a subset of CID around which you propose to do interoperability. If this is case, I'd just say that, and skip the "we're making a new standard".

The reason I feel this way is that compatibility with CIDS seems to be an explicit goal. Even in this reduced spec you're willing to pay a price on complexity to maintain CID compatibility. (cause why else have these unused bytes for multibase, version, encoding and hash algo). Is not breaking compatibility with CIDs in the future also a goal? It seems that way.

In this case, it's a useful subset/convention, closer to something like Javascript strict mode or using prettier to format your code.

Again, still very useful, and especially useful if it's well defined the target context you'd want to use this with (I get the feeling it's CIDs in browsers, but again not 100% clear)

I think the first thing to decide is if this is a subset that will be permanently compatible with the larger CID standard or not. If compatibility is intended, then even if you follow @b5 's suggestion to "hide the IPFS" the fact that it is an explicit subset of a larger standard is something to remain explicit about. Otherwise it's implied insider knowledge.

  1. Chunking

One piece stands out as very different from the others:

"Regardless of size, resources should not be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly."

It stands out because it's the only thing that goes well beyond "I will limit what is allowed for CIDs". I can somewhat intuit the goal here (incremental verifiability is certainly not natively supported in browsers), but it's not explicit.

Also as a spec, absent the context of IPFS and UnixFS annoyances, it's super unclear what constitutes linked data vs chunked data. Clearly there are some DAGs here unless you're saying CIDs are not allowed as fields in dCBOR 42 -- which given you're using 0x71 is implied. I think maybe you just need to define more precisely what a "resource" is (maybe you mean an HTTP resource?).

BTW, chunked encoding is a real transport concern, I tend to feel our fail in IPFS design was doing it BEFORE the moment of transport rather than just in time. That's why Blake3 can be kinda awesome -- you can chunk at transport time rather than encoding time. There's even an interesting proposal from @Gozala to do Blake3 for structured IPLD data in a very interesting way: https://github.com/Gozala/merkle-reference/blob/main/docs/spec.md

@hannahhoward hannahhoward changed the title Further context needed Feedback Dec 4, 2024
@mishmosh
Copy link
Contributor

mishmosh commented Dec 5, 2024

Hey, thanks for the thoughtful comments.

2. Also maybe it makes it hard to use CIDS in browsers? (not clear if this is the driving use case)

I wouldn't say browsers are the driving use case, but consistency (determinism) across browsers and desktop implementations (and even hosted services) is. For example:

  • I want to use one implementation to generate a CID for a given video testimony, and publish that CID onchain
  • Next year, I want to fetch that video and use another implementation to generate a CID and make sure it's the same CID as the one from the public chain

There's some tension here because BLAKE3 is heavy to run in browsers but useful for large files (@bnewbold goes into more detail on atproto's divergent needs in #1).

As I understand it, DASL is just a subset of CID around which you propose to do interoperability. If this is case, I'd just say that, and skip the "we're making a new standard".

Noted! People are saying both "this is too IPFS-centric" and "this isn't IPFS-related enough" so there's clearly something here...

@bumblefudge
Copy link
Collaborator

Re: gozala's just-in-time chunking, there is a proposal open to make these merkle-aware CIDs an alternative to CIDv1s, i.e. CIDvM (not calling it v2 cuz I'm not sure there's backwards compatibility worked out or promised). It's not quite "ready to merge" and giev up one of the precious single-byte multicodecs and lots of big design questions are open, but it's something I'm tracking re: future of multiformats and seems relevant here as a back-of-mind/long-tail planning consideration.

On the other hand, I don't think having DASLs point to the "whole file hash" rather than to the root CID of a presumed [recursively retrieved] hash is only about incremental verification and just-in-time chunking; it's also about a cleaner interop with other use-cases, like CIDs that function as NIHs/dataURIs, CIDs that can be binary master-keys in package managers/binary archives, etc etc. A better framing would be that "chunking at transport time" is one of many use-cases where people want an identifier for the "whole content", not a promise that may timeout in unpredictable recursive delivery 😈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants