Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast path for ASCII strings #620

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

casperisfine
Copy link

@casperisfine casperisfine commented Oct 17, 2024

Fix: #562

This optimization is based on a few assumptions:

  • Most strings are ASCII only.
  • Most strings had their coderange scanned already.

If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings.

Before:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   140.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.464k (± 1.4%) i/s  (682.83 μs/i) -      7.420k in   5.067573s
                  oj      2.338k (± 1.5%) i/s  (427.64 μs/i) -     11.730k in   5.017336s
           rapidjson      1.075k (± 1.6%) i/s  (930.40 μs/i) -      5.400k in   5.025469s

Comparison:
                json:     1464.5 i/s
                  oj:     2338.4 i/s - 1.60x  faster
           rapidjson:     1074.8 i/s - 1.36x  slower

After:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   189.000 i/100ms
                  oj   228.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.903k (± 1.2%) i/s  (525.55 μs/i) -      9.639k in   5.066521s
                  oj      2.306k (± 1.3%) i/s  (433.71 μs/i) -     11.628k in   5.044096s
           rapidjson      1.069k (± 2.4%) i/s  (935.38 μs/i) -      5.400k in   5.053794s

Comparison:
                json:     1902.8 i/s
                  oj:     2305.7 i/s - 1.21x  faster
           rapidjson:     1069.1 i/s - 1.78x  slower

This optimization is based on a few assumptions:

  - Most strings are ASCII only.
  - Most strings had their coderange scanned already.

If the above is true, then by checking the string coderange, we can
use a much more streamlined function to encode ASCII strings.

Before:

```
== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   140.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.464k (± 1.4%) i/s  (682.83 μs/i) -      7.420k in   5.067573s
                  oj      2.338k (± 1.5%) i/s  (427.64 μs/i) -     11.730k in   5.017336s
           rapidjson      1.075k (± 1.6%) i/s  (930.40 μs/i) -      5.400k in   5.025469s

Comparison:
                json:     1464.5 i/s
                  oj:     2338.4 i/s - 1.60x  faster
           rapidjson:     1074.8 i/s - 1.36x  slower

```

After:

```
== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   189.000 i/100ms
                  oj   228.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.903k (± 1.2%) i/s  (525.55 μs/i) -      9.639k in   5.066521s
                  oj      2.306k (± 1.3%) i/s  (433.71 μs/i) -     11.628k in   5.044096s
           rapidjson      1.069k (± 2.4%) i/s  (935.38 μs/i) -      5.400k in   5.053794s

Comparison:
                json:     1902.8 i/s
                  oj:     2305.7 i/s - 1.21x  faster
           rapidjson:     1069.1 i/s - 1.78x  slower
```
break;
case ENC_CODERANGE_VALID:
if (RB_UNLIKELY(state->ascii_only)) {
convert_UTF8_to_ASCII_only_JSON(buffer, obj, state->script_safe);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reflection, the split that might have the more impact is script_safe, as it would also speedup the ASCII path.

@casperisfine
Copy link
Author

I pushed an extra commit that re-use the escape table from a81ec47

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   224.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   107.000 i/100ms
Calculating -------------------------------------
                json      2.254k (± 1.6%) i/s  (443.69 μs/i) -     11.424k in   5.069999s
                  oj      2.318k (± 1.4%) i/s  (431.32 μs/i) -     11.730k in   5.060421s
           rapidjson      1.081k (± 1.9%) i/s  (925.05 μs/i) -      5.457k in   5.049738s

Comparison:
                json:     2253.8 i/s
                  oj:     2318.5 i/s - same-ish: difference falls within error
           rapidjson:     1081.0 i/s - 2.08x  slower

For now it's only applied to the ASCII fast path, but I think we could optimize UTF8 too with the assumption that most bytes in an UTF8 string are ASCII. But I'd rather not complexify this PR more.

This performs noticeably better than the boolean logic.

Before:

```
== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   189.000 i/100ms
                  oj   228.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.903k (± 1.2%) i/s  (525.55 μs/i) -      9.639k in   5.066521s
                  oj      2.306k (± 1.3%) i/s  (433.71 μs/i) -     11.628k in   5.044096s
           rapidjson      1.069k (± 2.4%) i/s  (935.38 μs/i) -      5.400k in   5.053794s

Comparison:
                json:     1902.8 i/s
                  oj:     2305.7 i/s - 1.21x  faster
           rapidjson:     1069.1 i/s - 1.78x  slower
```

After:

```
== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   224.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   107.000 i/100ms
Calculating -------------------------------------
                json      2.254k (± 1.6%) i/s  (443.69 μs/i) -     11.424k in   5.069999s
                  oj      2.318k (± 1.4%) i/s  (431.32 μs/i) -     11.730k in   5.060421s
           rapidjson      1.081k (± 1.9%) i/s  (925.05 μs/i) -      5.457k in   5.049738s

Comparison:
                json:     2253.8 i/s
                  oj:     2318.5 i/s - same-ish: difference falls within error
           rapidjson:     1081.0 i/s - 2.08x  slower
```

The escape table is taken directly from Mame's PR.

Co-Authored-By: Yusuke Endoh <[email protected]>
@casperisfine casperisfine force-pushed the optimize-ascii-strings branch from 9a00965 to e2a2a33 Compare October 17, 2024 15:15
@byroot byroot merged commit 7106b03 into ruby:master Oct 17, 2024
73 checks passed
@casperisfine casperisfine deleted the optimize-ascii-strings branch October 17, 2024 15:28
}

beg = pos + 1;
switch (ch) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating on this change in your blog post ❤️

I was wondering if we could entirely eliminate this switch at an expense of static memory by storing escaped values instead of booleans in the lookup table. Just as an example, in ruby:

JSON_ESCAPE_TABLE['"'] = "\\\"
JSON_ESCAPE_TABLE['\\'] = "\\\\"
JSON_ESCAPE_TABLE['\b'] = "\\b"
# ...
# default: case can be pre-calculated as well for other characters

It may not result in a visible improvement in realistic benchmarks due to having not that many chars to escape but could be visible on a micro-benchmark specifically crafted to measure escaping performance at no cost (except static memory) for the main flow.

I could be missing something that makes it hard to store non-boolean values in escape_table but would love to at least learn about this as well. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth trying, but there's a couple things making this not an obvious win.

Escaped values in this case aren't scalar. You can either store them as a table of char *:

static const char *[256] escape_table = {
  0, 0, 0, "\\n", // ....
};

If you do this:

  • char * is 8B on 64-bit platform, whereas char is just 1B, so we went from 256B to 2kiB, not nothing.
  • char * doesn't store the length, so presumably to append these you'd need to call strlen, they are shot string so probably not very slow, but given we're just trying to eliminate a case, it may actually be slower.
  • char * isn't stored contiguiously, so that's another memory region to keep in CPU cache.

Alternatively, you could store the escaped strings inline, e.g.:

struct escaped_char {
  unsigned char len;
  char bytes[8];
}

static const struct escaped_char[256] escape_table = {
  {0, 0}, {0, 0}, {2, "\\n"}, // ....
}

But this would make the escape table even bigger.

Maybe you could only do this for strings that are known to be pure-ASCII so the table only has 128 entries, keeping the memory usage down, but then you can't really re-use it for UTF-8.

Note that all of these are just "static assumption", performance isn't always easy to predict, so maybe your suggestion would actually help.

@headius
Copy link
Contributor

headius commented Jan 8, 2025

Did this get applied to the Java parser anywhere? I can take it if not but I don't want to dive in and waste my time if it's already there.

@byroot
Copy link
Member

byroot commented Jan 8, 2025

As mentioned previously, I don't know enough about Java to do much to it, and I don't really have any incentive either.

If you want to speedup the Java version feel free to do it.

@headius
Copy link
Contributor

headius commented Jan 8, 2025

@byroot Ok thanks for clarifying. You've done so many PRs lately I lost track!

I'll do the port and push a PR. The code in the Java extension will be pretty much the same as what you wrote for the C version.

@headius
Copy link
Contributor

headius commented Jan 8, 2025

Good news: much of the structure of the Java code already mimics the C code, and a profile of the twitter.json file dumping shows most of the overhead is in two places: handling each character (because it's using heavier UTF-8 logic always) and picking the right logic for each object type (bad lookup of handler logic based on the Ruby class, most likely).

Ought to be able to speed this up nicely.

headius added a commit to headius/json that referenced this pull request Jan 9, 2025
This is new specialized logic to reduce overhead when appending
ASCII-only strings to the generated JSON.

Original code by @byroot

See ruby#620
headius added a commit to headius/json that referenced this pull request Jan 9, 2025
Also includes updated logic for generate (generate_json_string)
based on current C code.

Original code by @byroot

See ruby#620
@headius
Copy link
Contributor

headius commented Jan 9, 2025

Work is proceeding on this and other optimizations for the JRuby extension in #725.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants