-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast path for ASCII strings #620
Conversation
This optimization is based on a few assumptions: - Most strings are ASCII only. - Most strings had their coderange scanned already. If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 140.000 i/100ms oj 230.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.464k (± 1.4%) i/s (682.83 μs/i) - 7.420k in 5.067573s oj 2.338k (± 1.5%) i/s (427.64 μs/i) - 11.730k in 5.017336s rapidjson 1.075k (± 1.6%) i/s (930.40 μs/i) - 5.400k in 5.025469s Comparison: json: 1464.5 i/s oj: 2338.4 i/s - 1.60x faster rapidjson: 1074.8 i/s - 1.36x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ```
break; | ||
case ENC_CODERANGE_VALID: | ||
if (RB_UNLIKELY(state->ascii_only)) { | ||
convert_UTF8_to_ASCII_only_JSON(buffer, obj, state->script_safe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reflection, the split that might have the more impact is script_safe
, as it would also speedup the ASCII path.
I pushed an extra commit that re-use the escape table from a81ec47
For now it's only applied to the ASCII fast path, but I think we could optimize UTF8 too with the assumption that most bytes in an UTF8 string are ASCII. But I'd rather not complexify this PR more. |
This performs noticeably better than the boolean logic. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 224.000 i/100ms oj 230.000 i/100ms rapidjson 107.000 i/100ms Calculating ------------------------------------- json 2.254k (± 1.6%) i/s (443.69 μs/i) - 11.424k in 5.069999s oj 2.318k (± 1.4%) i/s (431.32 μs/i) - 11.730k in 5.060421s rapidjson 1.081k (± 1.9%) i/s (925.05 μs/i) - 5.457k in 5.049738s Comparison: json: 2253.8 i/s oj: 2318.5 i/s - same-ish: difference falls within error rapidjson: 1081.0 i/s - 2.08x slower ``` The escape table is taken directly from Mame's PR. Co-Authored-By: Yusuke Endoh <[email protected]>
9a00965
to
e2a2a33
Compare
} | ||
|
||
beg = pos + 1; | ||
switch (ch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating on this change in your blog post ❤️
I was wondering if we could entirely eliminate this switch
at an expense of static memory by storing escaped values instead of booleans in the lookup table. Just as an example, in ruby:
JSON_ESCAPE_TABLE['"'] = "\\\"
JSON_ESCAPE_TABLE['\\'] = "\\\\"
JSON_ESCAPE_TABLE['\b'] = "\\b"
# ...
# default: case can be pre-calculated as well for other characters
It may not result in a visible improvement in realistic benchmarks due to having not that many chars to escape but could be visible on a micro-benchmark specifically crafted to measure escaping performance at no cost (except static memory) for the main flow.
I could be missing something that makes it hard to store non-boolean values in escape_table
but would love to at least learn about this as well. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be worth trying, but there's a couple things making this not an obvious win.
Escaped values in this case aren't scalar. You can either store them as a table of char *
:
static const char *[256] escape_table = {
0, 0, 0, "\\n", // ....
};
If you do this:
char *
is8B
on 64-bit platform, whereaschar
is just1B
, so we went from256B
to2kiB
, not nothing.char *
doesn't store the length, so presumably to append these you'd need to callstrlen
, they are shot string so probably not very slow, but given we're just trying to eliminate a case, it may actually be slower.char *
isn't stored contiguiously, so that's another memory region to keep in CPU cache.
Alternatively, you could store the escaped strings inline, e.g.:
struct escaped_char {
unsigned char len;
char bytes[8];
}
static const struct escaped_char[256] escape_table = {
{0, 0}, {0, 0}, {2, "\\n"}, // ....
}
But this would make the escape table even bigger.
Maybe you could only do this for strings that are known to be pure-ASCII so the table only has 128
entries, keeping the memory usage down, but then you can't really re-use it for UTF-8.
Note that all of these are just "static assumption", performance isn't always easy to predict, so maybe your suggestion would actually help.
Did this get applied to the Java parser anywhere? I can take it if not but I don't want to dive in and waste my time if it's already there. |
As mentioned previously, I don't know enough about Java to do much to it, and I don't really have any incentive either. If you want to speedup the Java version feel free to do it. |
@byroot Ok thanks for clarifying. You've done so many PRs lately I lost track! I'll do the port and push a PR. The code in the Java extension will be pretty much the same as what you wrote for the C version. |
Good news: much of the structure of the Java code already mimics the C code, and a profile of the twitter.json file dumping shows most of the overhead is in two places: handling each character (because it's using heavier UTF-8 logic always) and picking the right logic for each object type (bad lookup of handler logic based on the Ruby class, most likely). Ought to be able to speed this up nicely. |
Work is proceeding on this and other optimizations for the JRuby extension in #725. |
Fix: #562
This optimization is based on a few assumptions:
If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings.
Before:
After: