-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Schema Evolution document rough draft #23655
base: main
Are you sure you want to change the base?
Conversation
|
||
> Having an expected field means that the field is expected to be there, but the inverse is not true. | ||
_Not_ having an expected field with key _K_ does not mean that there _can't_ be a field with key _K_. | ||
Field _K_ might be in the tree, but it's invisible to a schema that doesn't know about it and that schema can (in theory) pretend it doesn't exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how reader should feel about this uncertainty. Is this a statement of current possibilities (ability to ignore extra fields), or is this a statement of possible future changes in capabilities or behaviors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noencke agree that this can be unclear. May be good to give a concrete example.
Documents contain stored schema A : 1, 1d | ||
Maintain view schema B in application source code : 2, 1d | ||
Documents contain stored schema B : 2, 1d | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your diagram missing the step in between that should read "application is not available, please come back in 48 hours" :)
Even "live service" applications, like web apps, might allow users to continue using an older/cached version of an application client long after a deployment of a new version. | ||
In these scenarios, when does the document get upgraded? | ||
When does the client move to the new version of the application? | ||
The developer cannot know, and therefore cannot guarantee that the stored schema and view schema of their application will be the same for all clients and documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are missing the main reasons:
- it's very risky (if this process goes wrong, data is likely lost). There is no way to adhere to safe velocity principles - it's all or nothing.
- in most environments, it's just too expensive (it would cost millions if not billions of dollars to do that at scale of SharePoint, for example convert all Word documents).
Many of the points you raised could be somewhat addressed. For example, application (of specific version) can be kicked out from collab session when upgrade for a document happens. As long as there are "before" and "after" versions of the app available, and "after" version was available for long enough time that it reached all users.
The application cannot function unless the view schema and the stored schema agree. | ||
|
||
How then does a developer in this situation go about upgrading their application schema? | ||
The developer must do a **staged rollout** - an incremental set of application deployments that release client code only when it is safe to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reference to https://github.com/microsoft/FluidFramework/blob/main/packages/dds/SchemaVersioning.md is due here, maybe even much earlier in the document
|
||
> In this way, if a schema node changes, then all schema nodes that transitively reference/contain that node will also have to change. | ||
Therefore a typical application schema will observe a "spine" of its schema tree change when _anything_ in the schema changes, climbing up to the application's root node. | ||
Since the root node always changes, even a very small change to a leaf schema node results in a "replacement" of the entire schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes:
- Schema has to change no matter what, in the way you describe it (the spine part) - what used to be "Dog" is now some kind of Dog-prime (at least meaning of it changes). The naming, however (does it stay Dog or Dog2 or something else) feels orthogonal problem to me. I'd think that the type identifier of the root field needs to change, as otherwise I'd expect ViewableTree.viewWith() to do a shallow check at the root and be satisfied (not do deep evaluation) if view schema's root field type matches exactly stored schema's root field type. Maybe other way to say it - I think type identifiers should have exactly one schema representation and never change. It's like guids in COM/OLE - once reserved by Excel, it stays Excel's ID (Excel can add more IDs, but old ID can't be reused by Word).
- That said, above point does not matter much: If we follow the principles of save velocity, then the app (or library) needs to deploy any changes like that slowly and thus maintain for quite a while both schemas. So, to some extent, there is no way around having Dog2, and it's not a problem that is solved by having some APIs.
- Maybe it's just me, but up till this point (as well as in the title / first paragraph) there is too much emphasis on schema, not much on the data. I.e. data needs to change. I even wonder, if there are cases where schema actually stays the same (for example, app wants to add a prefix for each dog name, but otherwise schema is exactly the same).
The developer must do a **staged rollout** - an incremental set of application deployments that release client code only when it is safe to do so. | ||
|
||
1. Initially, the documents are using stored schema A and the application is written against view schema A. | ||
2. View schema B is written into the application client code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd summarize this paragraph by saying that r/w support of schema B is shipped darked, but engaged if document is opened in schema B. Then elaborate what that means. I'd not mention views or UI - in many cases changes are abstracted from presentations layers.
If the application loads a document with stored schema A, it views that document with view schema A. | ||
If instead it loads a document with stored schema B, it views it with view schema B. | ||
Both versions of the view/UI/application/etc. must continue to be maintained until after step 4. | ||
This new version of the application is deployed and clients can begin updating to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be (and likely has to be in most scenarios) not through code deployment, but through flighting. Either way, if we go into these details, it might be worth mentioning that application should be prepared to pull this flight out (possibly by deploying kill-bit feature gate) if something goes wrong and figure out what to do (situation specific) after that with documents in schema B.
If every client is guaranteed to have write permissions, then the developer can also remove view schema A and the code written against it in the same deployment. | ||
This is because any client that encounters a document with stored schema A can upgrade it to stored schema B before interacting with it. | ||
However, if the client is a read-only client, then it can't do this, and it needs to retain the code to read view schema A in case it opens a document with stored schema A. | ||
This caveat is made more manageable by [adapters](#adapters), introduced later in this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand that part. I'd start by making a claim that documents at rest can't be broken (for most scenarios), and thus ability to open a document in old schema has to be present forever, in some form.
In most cases, I do not see read-only files handling being special. The expectation I'd have is that it's converted to schema B on open and most of the app logic only speaks in schema B. If that results in document being dirty (changes that can't be submitted), so be it - this can be handled (ignore these changes in data when calculating if document is dirty / not sending any ops).
What is true (and maybe that's what you wanted to say) is that ability to write in old schema can be removed after some time. But that's orthogonal to read-only files.
> To minimize the time between client saturation and document upgrades, the ability to upgrade the document can be made part of the deployment in step 1 but "shipped dark" and remotely enabled in this step via a feature flight. | ||
|
||
4. Wait until all documents have upgraded to the new stored schema. | ||
After this **document saturation** has been achieved, the application can do another deployment, removing view schema A and all of its related code, if it was not able to do so in step 3 (because some of its clients are read-only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section assumes this is achievable. This is not the case in most scenarios. The expectation that most developers and users would have - a document that was not opened for 20 years can still be opened. Even if it's a single document.
|
||
A **schema adapter** helps to reduce the code bifurcation/duplication (described in the previous section) that results from doing a staged rollout. | ||
One of the main pain points of the staged rollout process is that the application has to maintain two view schema while the migration is ongoing (or forever!), since the client might encounter documents of either the old stored schema or the new stored schema. | ||
In practice, this can mean a lot of redundant application code as the UI is bound to two different data models, but only one is ever used at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not using UI, instead find something neutral (application may have no UI)
In the context of a schema migration, this means that the application can write code (e.g. the code that binds the view schema to the UI) _only against the new view schema_. | ||
The code that binds to the old view schema is deleted at the same time that the new view schema and the new adapter are added. | ||
This is a net benefit because it's simpler to write the adapter's code than it is to write the UI binding code. | ||
During the migration, when the application loads a document with the old stored schema, it first adapts the data in the document to the new view schema before the data is read by the app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume (from the steps 1-5 below) that adapters are bi-directional. I.e. they can convert A -> B, and as changes are made (in schema B), appropriate transformation is done to reflect changes in schema A.
If so, it might be worth mentioning it here
2. Write an adapter from schema A to schema B, which will adapt the data in _stored_ schema A into the data in _view_ schema B. | ||
Replace view schema A with view schema B and deploy a new version of the application. | ||
3. Wait for [client saturation](#migration), then enable the upgrading from stored schema A to stored schema B. | ||
4. If applicable, remove the adapter from the source code and redeploy after [document saturation](#migration). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If applicable
Given that it likely does not apply to any scenario, I'd either remove # 4, or be a bit more specific (if you believe that all documents were converted).
``` | ||
|
||
> Similar to a staged rollout without an adapter, read-only clients must preserve the adapter for more of the migration than read/write clients. | ||
Therefore the advantages to using an adapter vs. not (i.e. maintaining the code bifurcation to support the old view schema) are extra valuable for applications with read-only clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar feedback as above about read-only permissions / scenarios.
|
||
An application could even write a "reverse" adapter to adapt the new schema to the old schema. | ||
This is useful if the application code wishes to move to a new view schema - and begin development on binding the UI to the model as soon as possible - but it can't do any upgrades to the stored schema of the documents yet. | ||
The application can use an adapter from schema A to schema B to read the document, and an adapter from schema B to schema A to write to the document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got lost here. I assumed (and left the comment above) that was included automatically in writing an adapter - the bi-directionality of the adapter.
If it's not the case, then I think the steps / description in earlier section is incorrect, because app would need (in addition to having adapters and shipping their usage dark) also maintain code that can r/w in schema A directly. Only when client reaches 100% saturation (and makes a call that there is enough data that it's not going back) of clients doing conversion A -> B on open, only then clients can remove direct support of A and start using one-directional adapter.
However, it must be employed preemptively. | ||
The developer needs to include the "Unknown" type ahead of time, anticipating that "there might be some new stuff here someday". | ||
* To mitigate that problem, another option would be to have the "Unknown" type be automatic - or "opt-out" rather than "opt-in". | ||
That is, all applications must handle the "Unknown" type in every field, unless they intentionally make the decision not to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it's possible. There could be myriads of places in the code where data is reconstructed and constructed back, losing any unknown fields. The trivial example where it happens is constructors - something like that:
class Point {
constructor (private x, private y) {}
}
If I have Point serialized in data model, I may clone the data directly by instantiating new Point instance, but it would lose property 'z' that is added in data model by future versions of application.
The "unknown content" strategy that you described is used in WXP, but it's very hard to design and use it right. It takes a lot of iterations, as developers are very likely to find that whatever they dreamed about is not sufficient the moment when they actually have a need to use it. It's very powerful tool, but not something that magically happens on its own.
|
||
> TODO: This is one of Noah's off-the-cuff ideas. Whether or not filtering makes sense for optional fields (or at all) is debatable. | ||
|
||
This strategy could be used in combination with the "Unknown" type - if Unknown is in the field's set of allowed types, then the foreign type will present as an Unknown, otherwise it will be filtered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say - it's similar to above, as in - it's hard to ensure that unknown content roundtrips correctly.
If the app is not testing it (doing end-to-end coverage using some artificial scenarios), then it's not working, whether runtime provides this support or not. And if they are testing it, they likely know where they need to handle unknown content (it needs to be explicit), so option # 1 is sufficient.
The newer schema can simply add the "Unknown" type to the set of allowed types if it doesn't already exist. | ||
Then the removed type, when present in the document, will present as an "Unknown". | ||
|
||
This does not require the developer to plan ahead - "Unknown" can be added to the new view schema when the field is first removed, rather than having to be added preemptively in the original schema as is the case when adding an allowed type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this describe existing tool, or just enumerates "we can build it if you ask"?
|
||
In the event that the developer merely wants to rename the _identifier_ of a node, but otherwise keep the node the same, we can support that with an aliasing feature. | ||
Specifically, when defining the node in the view schema it can be supplied with both a "view identifier" (the new name) and a "stored identifier" (the existing name). | ||
The view identifier will show up in the source code when reading/writing the tree, but it will be automatically translated to the legacy stored identifier when reading/writing to the document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be generalized? I think you describe any scenario that can be solved by bi-directional adapter (we continue to write in old schema, I assume forever).
One danger to this approach is if an old client attempts to _copy_ a node with a new optional field that is invisible to the client. | ||
Had this copy occurred on a client with the new view schema, the copy would include the optional field, but since the copy occurred on the old client, the client didn't know to also copy the new field. | ||
Depending on the application scenario, this could be interpreted as data loss. | ||
Therefore, tolerating new optional fields in this way should be a feature enabled by policy choice rather than automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, what I've described above (in case my comment above was not clear).
@@ -0,0 +1,583 @@ | |||
# SharedTree Schema Evolution | |||
|
|||
This document provides the necessary background for understanding SharedTree's strategy for conforming data to a given user-provided schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting question to ask here: should title / this intro paragraph be focused solely on schema, or should it be inclusive and include data migration / transformation? The former includes the latter, but not necessarily the other way around. The example I used below - what if I want to add some prefix to the data in some field? It stays of string type.
3. The developer waits until all (or most) clients have upgraded to this new application version. | ||
The amount of clients that have moved to the new version is known as the **client saturation**. | ||
It may be impossible to know for sure when 100% client saturation has been achieved, so a developer may estimate when the client saturation is sufficient - e.g. after 99% of online clients in the last 30 days had updated the new version. | ||
After client saturation, the stored schema of the documents can now be upgraded to B. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it happen? Is it a full rewrite of a document?
What happens with trailing ops?
What happens if two changes of schema crisscross?
This is the most interesting part that deserves a lot of details (maybe not in this document, but that's something that will define success of migration).
Similar questions exist even when using adapters, including bidirectional adapters / flows.
Let's use classical example - adding a new field. What happens if two clients add the same field? In most cases clients would be Ok with FWW semantics here (LWW could result in data loss). I can go into more details on scenarios (from Loop) and reasoning.
It takes time, it requires code bifurcation/duplication, and it's complicated to understand and coordinate. | ||
The next sections will explore ways to reduce the amount of effort a developer must take to change their schema. | ||
|
||
### Adapters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume adapters do not exist today (i.e. it's concept of the future).
That said even if it's the case, it would be nice to see some example pseudo-code.
Som readers would find reading code way more rewarding than reading English :)
For each of these expected fields... | ||
|
||
3. **Fields have a _kind_.** | ||
The kind of a field determines its behavior (e.g. in the case of SharedTree, this might affect how concurrent edits to the field are merged). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest "might affect" -> "determines"
I think it is useful to explicitly state that the merge semantics/behavior is associated with the field kind.
The kind of a field determines its behavior (e.g. in the case of SharedTree, this might affect how concurrent edits to the field are merged). | ||
One critical aspect of a field's kind is its _multiplicity_. | ||
The multiplicity describes how many nodes a field is allowed to hold. | ||
The Intentional tree model defines the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might only be confusing to those who know the original Intentional model (and it seems like you might be redefining that to just mean our internal model), but the Intentional model just has sequences. It might be useful to have a third name, idk.
* **Required fields** must have exactly one node at all times. | ||
* **Optional fields** can have either zero nodes or one node. | ||
* **Sequence fields** can have any number of nodes (including zero). | ||
* **Forbidden fields** must have exactly zero nodes (this is useful in some complicated scenarios but not something a typical user would desire). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, unless this is relevant (it's not mentioned anywhere else) I suggest either removing this or adding a concrete example of the use. It's not obvious why this is useful, so it's mostly confusing.
A field under the "pet" key of a node of type "Person" might only allow nodes of type "Dog" or "Cat". | ||
|
||
> As we'll see later when introducing the existing [SharedTree schema](#modern-sharedtree-schema), it's also useful to allow a "wildcard" field key for SharedTree map nodes. | ||
This means a node is allowed to have any number of optional/sequence fields under any arbitrary keys - keys which may not be know outside of runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we allow a wildcard for sequences?
|
||
> In fact, by modeling the tree as having a single, special root _field_ rather than having a root _node_, there is no distinction between adding a new allowed type at the root field vs. any other field in the tree. | ||
1. **Remove an allowed type.** | ||
1. **Change the type of a node.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example of a field going from containing dogs to cats is confusing; was this just a side effect of a rename? Or did you chang the type of the field (if so, why wasn't that just a case of 1 + 2?)?
After this **document saturation** has been achieved, the application can do another deployment, removing view schema A and all of its related code, if it was not able to do so in step 3 (because some of its clients are read-only). | ||
It's important that this happens after document saturation, or else there will be read-only clients that fail to open a document with stored schema A because they only know about view schema B. | ||
|
||
> In some environments, there's no guarantee that document saturation will ever occur - and this step is never completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change "some environments" to "most environments"
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output
|
Description
This is a PR for gathering feedback on a rough draft of a document detailing some aspects of SharedTree schema evolution.
It will not be checked in.