Add a fast path for ASCII strings #620

casperisfine · 2024-10-17T14:01:44Z

This optimization is based on a few assumptions:

Most strings are ASCII only.
Most strings had their coderange scanned already.

If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings.

Before:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   140.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.464k (± 1.4%) i/s  (682.83 μs/i) -      7.420k in   5.067573s
                  oj      2.338k (± 1.5%) i/s  (427.64 μs/i) -     11.730k in   5.017336s
           rapidjson      1.075k (± 1.6%) i/s  (930.40 μs/i) -      5.400k in   5.025469s

Comparison:
                json:     1464.5 i/s
                  oj:     2338.4 i/s - 1.60x  faster
           rapidjson:     1074.8 i/s - 1.36x  slower

After:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   189.000 i/100ms
                  oj   228.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.903k (± 1.2%) i/s  (525.55 μs/i) -      9.639k in   5.066521s
                  oj      2.306k (± 1.3%) i/s  (433.71 μs/i) -     11.628k in   5.044096s
           rapidjson      1.069k (± 2.4%) i/s  (935.38 μs/i) -      5.400k in   5.053794s

Comparison:
                json:     1902.8 i/s
                  oj:     2305.7 i/s - 1.21x  faster
           rapidjson:     1069.1 i/s - 1.78x  slower

This optimization is based on a few assumptions: - Most strings are ASCII only. - Most strings had their coderange scanned already. If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 140.000 i/100ms oj 230.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.464k (± 1.4%) i/s (682.83 μs/i) - 7.420k in 5.067573s oj 2.338k (± 1.5%) i/s (427.64 μs/i) - 11.730k in 5.017336s rapidjson 1.075k (± 1.6%) i/s (930.40 μs/i) - 5.400k in 5.025469s Comparison: json: 1464.5 i/s oj: 2338.4 i/s - 1.60x faster rapidjson: 1074.8 i/s - 1.36x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ```

casperisfine · 2024-10-17T14:02:26Z

ext/json/ext/generator/generator.c

+            break;
+        case ENC_CODERANGE_VALID:
+            if (RB_UNLIKELY(state->ascii_only)) {
+                convert_UTF8_to_ASCII_only_JSON(buffer, obj, state->script_safe);


After reflection, the split that might have the more impact is script_safe, as it would also speedup the ASCII path.

casperisfine · 2024-10-17T15:12:46Z

I pushed an extra commit that re-use the escape table from a81ec47

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   224.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   107.000 i/100ms
Calculating -------------------------------------
                json      2.254k (± 1.6%) i/s  (443.69 μs/i) -     11.424k in   5.069999s
                  oj      2.318k (± 1.4%) i/s  (431.32 μs/i) -     11.730k in   5.060421s
           rapidjson      1.081k (± 1.9%) i/s  (925.05 μs/i) -      5.457k in   5.049738s

Comparison:
                json:     2253.8 i/s
                  oj:     2318.5 i/s - same-ish: difference falls within error
           rapidjson:     1081.0 i/s - 2.08x  slower

For now it's only applied to the ASCII fast path, but I think we could optimize UTF8 too with the assumption that most bytes in an UTF8 string are ASCII. But I'd rather not complexify this PR more.

This performs noticeably better than the boolean logic. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 224.000 i/100ms oj 230.000 i/100ms rapidjson 107.000 i/100ms Calculating ------------------------------------- json 2.254k (± 1.6%) i/s (443.69 μs/i) - 11.424k in 5.069999s oj 2.318k (± 1.4%) i/s (431.32 μs/i) - 11.730k in 5.060421s rapidjson 1.081k (± 1.9%) i/s (925.05 μs/i) - 5.457k in 5.049738s Comparison: json: 2253.8 i/s oj: 2318.5 i/s - same-ish: difference falls within error rapidjson: 1081.0 i/s - 2.08x slower ``` The escape table is taken directly from Mame's PR. Co-Authored-By: Yusuke Endoh <[email protected]>

nvasilevski · 2025-01-06T15:13:51Z

ext/json/ext/generator/generator.c

+            }
+
+            beg = pos + 1;
+            switch (ch) {


Thanks for elaborating on this change in your blog post ❤️

I was wondering if we could entirely eliminate this switch at an expense of static memory by storing escaped values instead of booleans in the lookup table. Just as an example, in ruby:

JSON_ESCAPE_TABLE['"'] = "\\\" JSON_ESCAPE_TABLE['\\'] = "\\\\" JSON_ESCAPE_TABLE['\b'] = "\\b" # ... # default: case can be pre-calculated as well for other characters

It may not result in a visible improvement in realistic benchmarks due to having not that many chars to escape but could be visible on a micro-benchmark specifically crafted to measure escaping performance at no cost (except static memory) for the main flow.

I could be missing something that makes it hard to store non-boolean values in escape_table but would love to at least learn about this as well. Thanks!

It could be worth trying, but there's a couple things making this not an obvious win.

Escaped values in this case aren't scalar. You can either store them as a table of char *:

static const char *[256] escape_table = { 0, 0, 0, "\\n", // .... };

If you do this:

char * is 8B on 64-bit platform, whereas char is just 1B, so we went from 256B to 2kiB, not nothing.

char * doesn't store the length, so presumably to append these you'd need to call strlen, they are shot string so probably not very slow, but given we're just trying to eliminate a case, it may actually be slower.

char * isn't stored contiguiously, so that's another memory region to keep in CPU cache.

Alternatively, you could store the escaped strings inline, e.g.:

struct escaped_char { unsigned char len; char bytes[8]; } static const struct escaped_char[256] escape_table = { {0, 0}, {0, 0}, {2, "\\n"}, // .... }

But this would make the escape table even bigger.

Maybe you could only do this for strings that are known to be pure-ASCII so the table only has 128 entries, keeping the memory usage down, but then you can't really re-use it for UTF-8.

Note that all of these are just "static assumption", performance isn't always easy to predict, so maybe your suggestion would actually help.

headius · 2025-01-08T18:21:24Z

Did this get applied to the Java parser anywhere? I can take it if not but I don't want to dive in and waste my time if it's already there.

byroot · 2025-01-08T20:02:38Z

As mentioned previously, I don't know enough about Java to do much to it, and I don't really have any incentive either.

If you want to speedup the Java version feel free to do it.

headius · 2025-01-08T21:53:46Z

@byroot Ok thanks for clarifying. You've done so many PRs lately I lost track!

I'll do the port and push a PR. The code in the Java extension will be pretty much the same as what you wrote for the C version.

headius · 2025-01-08T22:55:00Z

Good news: much of the structure of the Java code already mimics the C code, and a profile of the twitter.json file dumping shows most of the overhead is in two places: handling each character (because it's using heavier UTF-8 logic always) and picking the right logic for each object type (bad lookup of handler logic based on the Ruby class, most likely).

Ought to be able to speed this up nicely.

@byroot

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

@byroot

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

headius · 2025-01-09T08:46:28Z

Work is proceeding on this and other optimizations for the JRuby extension in #725.

casperisfine commented Oct 17, 2024

View reviewed changes

casperisfine force-pushed the optimize-ascii-strings branch from 9a00965 to e2a2a33 Compare October 17, 2024 15:15

byroot merged commit 7106b03 into ruby:master Oct 17, 2024
73 checks passed

byroot mentioned this pull request Oct 17, 2024

Make JSON.generate 1.75x as fast #562

Closed

casperisfine deleted the optimize-ascii-strings branch October 17, 2024 15:28

nvasilevski reviewed Jan 6, 2025

View reviewed changes

headius added a commit to headius/json that referenced this pull request Jan 9, 2025

Port convert_UTF8_to_ASCII_only_JSON to Java

be45c9a

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

headius added a commit to headius/json that referenced this pull request Jan 9, 2025

Port convert_UTF8_to_JSON from C

38c7831

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fast path for ASCII strings #620

Add a fast path for ASCII strings #620

casperisfine commented Oct 17, 2024 •

edited

Loading

casperisfine Oct 17, 2024

casperisfine commented Oct 17, 2024

nvasilevski Jan 6, 2025

byroot Jan 7, 2025

headius commented Jan 8, 2025

byroot commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 9, 2025

Add a fast path for ASCII strings #620

Add a fast path for ASCII strings #620

Conversation

casperisfine commented Oct 17, 2024 • edited Loading

casperisfine Oct 17, 2024

Choose a reason for hiding this comment

casperisfine commented Oct 17, 2024

nvasilevski Jan 6, 2025

Choose a reason for hiding this comment

byroot Jan 7, 2025

Choose a reason for hiding this comment

headius commented Jan 8, 2025

byroot commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 9, 2025

casperisfine commented Oct 17, 2024 •

edited

Loading