Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way forward with the JsonProvider #1478

Open
mlaily opened this issue Mar 5, 2023 · 17 comments
Open

A way forward with the JsonProvider #1478

mlaily opened this issue Mar 5, 2023 · 17 comments

Comments

@mlaily
Copy link
Contributor

mlaily commented Mar 5, 2023

Hello!

The more I use the JsonProvider and the more I work inside its code base, the more I feel it needs a big update.

I think some kind of "strict" mode with less (surprising/bad) magic is much needed.

(For instance, inferring empty strings as null made sense with other providers that don't have types other than strings, but I don't think it makes much sense with json. In the same vein, multiplicity inference for arrays doesn't feel natural at all with json...)

I also think it might be a good idea to use System.Text.Json types as the erased implementation instead of the custom parser it currently uses.

This is easier said than done of course, but first this raises a few questions.

What about backward compatibility?!
What would be the preferred approach for such a venture?

I'm thinking it might be best to create a new provider instead of trying to fit these changes into the existing one.
What's your opinion about that?

And generally speaking, do you have any recommendations regarding this?

Thanks.

@Numpsy
Copy link

Numpsy commented Mar 24, 2023

I'm thinking it might be best to create a new provider instead of trying to fit these changes into the existing one.
What's your opinion about that?

fwiw, I've seen comments elsewhere about breaking the type providers for different data types into different packages so maybe that's a place to start for that, otherwise just creating something new might help to avoid any legacy stuff altogether?

@mlaily
Copy link
Contributor Author

mlaily commented Mar 29, 2023

I went ahead and started a JsonProvider2 implementation.
https://github.com/fsprojects/FSharp.Data/compare/main...mlaily:FSharp.Data:jsonprovider2?expand=1

Here are some progress screenshots and side-by-side comparisons with the original JsonProvider...

An array is an array. It does not depend on the number of items it contains:
image

An empty string is still a string, not an optional value:
image

0, 1, or "yes" are not booleans:
image

@goswinr
Copy link

goswinr commented May 8, 2023

@mlaily I love this work! Is your JsonProvider2 working ? I guess these changes are not merging here soon. Could you publish your own nuget?

@mlaily
Copy link
Contributor Author

mlaily commented May 8, 2023

@goswinr thanks for your interest.

This new json provider is still a prototype: there are a lot of things that are still changing or are not tested properly and can crash.

My branch is available but it's way too early for a release.

Handling null or missing values properly is proving challenging, but I'm learning a lot and making progress.

Ideally, I would also like to use System.Text.Json, but for now all my changes are based on the existing JsonValue parser.

I'll worry about how to release it when it works well enough for production use :)

@dlidstrom
Copy link

I just learnt about empty strings being inferred as null. Using this file as a sample:

{
  "Lines": [
    "             **",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "              *",
    "              *",
    "    *****     *",
    "   *******     ",
    "    *****     *",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "               "
  ]
}

Now reading this file using the provider results in an array for Lines being:

[|"             **"; "              *"; "              *"; "              *";
  "    *****     *"; "   *******     "; "    *****     *"; "              *"|]

Very surprising and interesting to me that I now find this newly created issue. Are the nuances of the JsonProvider documented somewhere? For example, I don't find any mention of null in the reference documentation.

@goswinr
Copy link

goswinr commented Sep 11, 2023

@dlidstrom does inferenceMode = InferenceMode.NoInference help? in type MyJson = JsonProvider<myFilePath, InferenceMode = InferenceMode.NoInference>

@dlidstrom
Copy link

@goswinr It doesn't seem to help. Here is my test program and files:

#r "nuget: FSharp.Data"

open FSharp.Data

type Provider = JsonProvider<"sample.json", InferenceMode = InferenceMode.NoInference>

let sample = Provider.Load "test.json"

// show the Lines from json value and from generated type
printfn
  "%A\n%A"
  (sample.JsonValue.Properties() |> Array.find (fst >> ((=) "Lines")) |> snd)
  (sample.Lines)

sample.json:

{
  "Lines": [
    ""
  ]
}

test.json:

{
  "Lines": [
    "             **",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "              *",
    "              *",
    "    *****     *",
    "   *******     ",
    "    *****     *",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "               "
  ]
}

It looks like the provider reads all lines, as can be seen when inspecting the JsonValue, but "provides" only a subset.
Output:

PS C:\> dotnet fsi .\Scratch.fsx
[
  "             **",
  "              *",
  "               ",
  "               ",
  "               ",
  "               ",
  "              *",
  "              *",
  "    *****     *",
  "   *******     ",
  "    *****     *",
  "              *",
  "               ",
  "               ",
  "               ",
  "               ",
  "               "
]
[|"             **"; "              *"; "              *"; "              *";
  "    *****     *"; "   *******     "; "    *****     *"; "              *"|]

@Thorium
Copy link
Member

Thorium commented Nov 15, 2023

Yes it seems MS puts quite an effort for System.Text.Json (also Newtonsoft would probably be faster than custom implementation).

System.Text.Json source code is open (https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Json/src/System/Text/Json) and one of the maintainers @eiriktsarpalis has been active in F# community. I don't know if there would be any quick wins without rewriting a lot, and do we even want dependencies like System.Memory to FSharp.Data (to use Spans).

@Thorium
Copy link
Member

Thorium commented Nov 16, 2023

His advice was:

just use Utf8JsonReader under wraps. That way make sure it's compliant and also take advantage of all the vectorization code it's using under the hood.

This makes sense:
The current FSharp.Data parser JsonValue (see: https://github.com/fsprojects/FSharp.Data/blob/main/src/FSharp.Data.Json.Core/JsonValue.fs#L39)
would actually map the System.Text.Json Utf8JsonReader values (just testing here: https://gist.github.com/Thorium/09a67ce08adee4fcc02ec7f0048e6962).
So the parsing could be replaced without re-building the whole domain model.

And now that FSharp.Data is split to multiple NuGet packages, maybe a dependency of System.Text.Json to FSharp.Data.Json.Core would be acceptable?

@mlaily
Copy link
Contributor Author

mlaily commented Nov 17, 2023

I haven't worked on it for a while but I actually have started to use System.Text.Json on my branch.

I opted to use the JsonNode api instead of the JsonDocument one. https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom

It seemed the most fitting for a balance between (user)-convenience and performance for a general use library.
(Keep in mind we want to be able to serialize json, not only deserialize it, and IMO an F# immutable wrapper over a mutable DOM is perfect. It's also very close to what is done with the XmlProvider and I think the two of them having a similar feature set is a good idea.)


The parsing somewhat works (not for all cases yet I think), but serialization is currently non-existent.

One big problem I had that I decided to ignore for now is that I didn't manage to make the dependency on System.Text.Json work everywhere when using the compiled provider.
In my tests it works with Visual Studio Code without issue, but I can't figure out how to make it work with Visual Studio! (IIRC a project using the compiled provider referencing STJ fails to compile, complaining that it doesn't find STJ or something similar)

Another thing is my JsonWrapper object backing every provided type after erasure is currently a struct. My reasoning is that this JsonWrapper is only there because there are members on JsonNode I don't want to expose, and if I could I would use JsonNode directly as the backing type for provided types, so I want to make it have the smallest footprint possible. I'm not 100% sure it's the right call...

Anyway if you are curious I pushed my current version to my branch (still in a somewhat PoC-state):

(side note: the JsonNode design makes it hard to use properly, e.g. because it uses null to represent json null tokens, and plenty of other strange decisions, but it still seems to me like the most fitting api to use for our case)

@Thorium
Copy link
Member

Thorium commented Nov 17, 2023

I'm trying to get this working. So far:

  • I updated System.Text.Json to version 8.0.0. There are more serialization options which may be helpful.
  • I merged main to this branch. No conflicts, but now the build wanted me to run some code formatting tool: dotnet fake build -t Format
  • "build.cmd" failed as I didn't have .NET 6.0.5. That happens in main branch too, which is Ironic as this provider is using .NET5. "dotnet build" works but I need to use "build.cmd -t pack" to generate packet. So I copied the latest C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Ref\6.0.25 to required C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Ref\6.0.5 and now I can run "build.cmd -t pack"
  • Packed. Then I unzipped \bin\FSharp.Data.6.3.0.nupkg and copied over my files in my Nuget cache at C:\Users\(myuser)\.nuget\packages\fsharp.data\6.3.0.
  • Then, to avoid missing STJ complaint, I copied netstandard2.0 dll versions of System.Text.Json.dll and Microsoft.Bcl.AsyncInterfaces.dll from my nuget-cache to a temp-folder and then to nuget-cache both folders FSharp.Data\6.3.0\typeproviders\fsharp41\netstandard2.0 and FSharp.Data\6.3.0\lib\netstandard2.0 (The temp is needed because NuGet-restores will remove them easily.) If those are needed in the package, there could be added some post-build deployment task.

Now I'm this far:

Error	FS3021	Unexpected exception from provided type 'FSharp.Data.JsonProvider2,Sample="{\"x\": 1}"' 
member 'GetMethods': The type provider 'ProviderImplementation.JsonProvider2' reported an error: 
The design-time type 'System.Text.Json.Nodes.JsonNode' utilized by a type provider was not found in the target reference assembly 
set '[tgt assembly FSharp.Core, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a;
 ---<cut list of assemblies...but not System.Text.Json here>---
]'. You may be referencing a profile which contains fewer types than those needed by the type provider you are using.

@mlaily
Copy link
Contributor Author

mlaily commented Nov 17, 2023

I forgot to mention another interesting thing about this: if you dotnet build a test project referencing FSharp.Data containing my version based on STJ, it works, and then VS is able to build it too! (at least until you clean your bin output folder...)

@Thorium
Copy link
Member

Thorium commented Nov 17, 2023

As far as I know:

System.Text.Json is released on multiple frameworks:
net462 / netstandard2.0 / net6.0 / net7.0 / net8.0

TypeProvider uses the runtime-code on compile-time.
Different IDEs may use different versions (e.g. VS2022 may be running on net462, meanwhile VSCode can be running on net6 or whatever).

After compilation is the easy part, because if the customer referenced the System.Text.Json to their project (or other dependencies did it), then it probably picks the correct runtime-version anyway.

@Thorium
Copy link
Member

Thorium commented Nov 21, 2023

I tried to play with this version but I think it goes very complex to try to expose the System.Text.Json types on design-time.

I think what I would rather prefer, is just to use the current "obsolete" JSON-model mapping and add 2 methods:

  • For input, current .Parse, to have sister method like ".ParseWithSystemTextJson"
  • For output, current .JsonValue to have sister method like ".JsonValueWithSystemTextJson" (or JsonSaveOptions overload but that's messy).

And so, not to expose System.Text.Json types to outside at all. Design-time would work as it is, no System.Text.Json, and runtime user would have option to use faster serialization if needed. Keep backward compatibility. Just map current Json types to Utf8JsonReader types with simple match clause. And, actually, if even more independency wanted, the System.Text.Json.dll could be loaded with reflection, so that if it's missing, the methods would just throw an error, that way there would not be dependency needed to Nuget-packet level at all to the actual library.

@mlaily
Copy link
Contributor Author

mlaily commented Nov 21, 2023

I think exposing the underlying JsonNode is a desirable feature. The providers are not perfect, and it's very valuable as a user to be able to get full control and bypass the provider if you need it, and if the provider is based on STJ I would expect to get control of the STJ primitives, and not just some opaque wrapper. (Same as the XmlProvider)

I don't like the idea of choosing an implementation based on a method name but without actually seeing anything else change.
Also, throwing at runtime if STJ is missing seems worse than failing at build time...

I agree adding STJ as a dependency of FSharp.Data isn't that great, but I don't see a good path around that...

@Thorium
Copy link
Member

Thorium commented Nov 22, 2023

hmm, maybe the ideas are both doable..

Maybe the "faster serializer" I was thinking (not as good as raw STJ as you pointed out) could be a separate nuget package depending both fsharp.data and system.text.json. That way current old provider can stay as it is, and old projects could benefit with minimal migration.

@mlaily
Copy link
Contributor Author

mlaily commented Nov 22, 2023

That would probably be ideal if we could make the providers independent yeah.

Even though I'm aware some work has already been done to go in that direction, I'm not sure how easy it will be though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants