-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dotnet] Tolerate invalid UTF-16 strings in DevTools JSON response #14972
base: trunk
Are you sure you want to change the base?
[dotnet] Tolerate invalid UTF-16 strings in DevTools JSON response #14972
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
@@ -600,7 +600,19 @@ private void ProcessMessage(string message) | |||
var methodParts = method.Split(new char[] { '.' }, 2); | |||
var eventData = messageObject["params"]; | |||
|
|||
LogTrace("Recieved Event {0}: {1}", method, eventData.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvborisenko You mentioned trying out JavaScriptEncoder.UnsafeRelaxedJsonEscaping
. I cannot edit the generated code directly but I added a spoof here, and it did not fix the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified generator to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping
, it didn't help. The worst scenario is that it works with Newtonsoft.Json
(Selenium v4.23?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand how Newtonsoft accepts this nonsense input. It’s not real JSON, just some raw binary data.
Maybe we should intersept the post data and sanitize it ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more observation: in json response I see binary data as is, probably not even encoded! This is why VS Code shows json improperly. CDP specification doesn't say it should be encoded (base64). I think this is why they deprecated it.
PS: But why Newtonsoft.Json
could parse it successfully? I think it is issue in this library. I didn't check it yet, but I guess:
1 using old selenium version (v4.23?) when we upload binary file
2 and we download it manually back
3 then the content of "original" and "downloaded" files is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am stupid for this activity. It is easier to:
- really
Newtonsoft.Json
parses it correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I narrowed down the issue. It looks like this character U+DB40
appears here for some reason. It is the first part of a UTF-16 surrogate pair, but the second half is missing.
I opened the raw Firefox Installer.exe
file and found this segment here:
Note how in this case, it is just a regular I
.
What is happening here? The CDP is giving us back broken JSON, this may be a bug on their end.
I will investigate whether it is possible to work around this somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a workaround, with a custom converter.
We can add to the postData property:
As well as to PostDataEntry.Bytes
:
The following converter:
internal sealed class InvalidUtf16Converter : JsonConverter<string>
{
public override bool HandleNull => true;
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
try
{
return reader.GetString();
}
catch (InvalidOperationException)
{
var bytes = reader.ValueSpan;
var sb = new StringBuilder(bytes.Length);
foreach (var b in bytes)
sb.Append(Convert.ToChar(b));
return sb.ToString();
}
}
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options) =>
writer.WriteStringValue(value);
}
Since we have a good idea of which models may have arbitrary data,
However... it only works with JsonElement.Deserialize<T>
. When using a JsonNode.Deserialize<T>
, the exception happens way before even the converter is reached. Even JsonNode.ToString()
throws an exception!
For this reason, we may need to change various JsonNode
s to JsonElement
s, such as DevToolsEventReceivedEventArgs.EventData
(that property is used everywhere in the cdp-generated code, but luckily the line e.EventData.Deserialize(eventData.EventArgsType)
works exactly the same (it just hits a different overload). Also in DevToolsCommandData.Result
I am not familiar enough with the cdp code generation process to make generator changes. Is there any docs for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely you are interested in https://github.com/SeleniumHQ/selenium/tree/trunk/third_party/dotnet/devtools/src/generator/Templates
There is no docs, generally saying build process invokes generation of CDP classes based on *.cdl
(I don't remember exactly, it is a definition of contract) files applying *.hbs
templates (https://handlebarsjs.com/).
Repro test: [Test]
public void AAA()
{
driver.Manage().Network.StartMonitoring().GetAwaiter().GetResult(); // this enables serialization
driver.Navigate().GoToUrl("https://tus.io/demo");
Thread.Sleep(10_000);
driver.FindElement(By.Id("P0-0")).SendKeys("C:\\Users\\Nick\\Downloads\\Firefox Installer.exe");
Thread.Sleep(60_000);
} Passes both on Firefox and Chrome. |
third_party/dotnet/devtools/src/generator/Templates/DevToolsSessionDomains.hbs
Outdated
Show resolved
Hide resolved
…olsSessionDomains`
…el/selenium into devtools-file-upload
Co-authored-by: Nikolay Borisenko <[email protected]>
} | ||
catch (InvalidOperationException) | ||
{ | ||
// Backwards compatibility with Newtonsoft tolerating invalid UTF-16 sequences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us improve this comment:
- don't mention Newtonsoft
- don't use "sometimes" word - we always should know what is happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing mention to Newtonsoft.
I would also like to know why this is happening! I don't know what is causing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback to read the value as bytes instead of string.
System.Text.Json library throws exception when CDP remote end sends non-encoded string as binary data.
Using JavaScriptEncoder.UnsafeRelaxedJsonEscaping doesn't help because the string actually is byte[].
https://chromedevtools.github.io/devtools-protocol/tot/Network/#type-Request - here "postData" property
is a string, which we cannot deserialize properly. This property is marked as deprecated, and new "postDataEntries"
is suggested for using, where most likely it is base64 encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this solution is so targeted, do you think we should try to target Network.Request.postData
specifically here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we cannot manage the source for generator, it is provided by Google.
In this comment we should leave all knowledge we know about why this "strange" workaround lives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, it will add complexity and will lead to even more "strange workaround". As soon as Google will remove deprecated property, we will "easily" remove this workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But from other hand why not to be specific? If we will generate attribute conditionally, then will it be friendly with upcoming AOT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, adding a complexity to stuff which will be deprecated is not good idea.
- Chromium removes "bad property" (
postData
) - we "remove" workaround - CDP is supposed to be replaced by BiDi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of removing complexity, wouldn't it be cool to convert the handlebars generator into a modern incremental source generator? That sounds like a fun little project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added
That's my bad for forgetting the file header. Fixed. |
User description
Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it
Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.
Description
This restores Newtonsoft behavior of tolerating invalid UTF-16 strings.
Motivation and Context
Fixes #14903
Types of changes
Checklist