Overhead due to unneeded __error assignments #1592

awelzel · 2023-11-10T15:03:02Z

Hope that's a valid test. I'm not sure why there's all the redundant (*__self).__error = __error assignments, but it doesn't come for free. I took the generated C++ code of the following example (spicyc -D jit -T and copying away the cc files and commands) and patched out the obvious ones that seemed superflous. This results in a 10-12% raw parsing performance improvement for this microbenchmark. (Debian 12, GCC 12.2).

module Test;

type K = unit {
  x: skip bytes &size=1;
};

type L = unit {
  k: K;
};

type M = unit {
  l: L;
};

public type N = unit {
  : M[] &eod;
};

Command	Mean [s]	Min [s]	Max [s]	Relative
`cat ./test-data.txt \| spicy-driver Test.hlto`	6.011 ± 0.109	5.875	6.177	1.11 ± 0.02
`cat ./test-data.txt \| spicy-driver Test3.hlto`	5.413 ± 0.051	5.375	5.497	1.00

test-data.txt was generated as follows:

python3 -c 'print("x" * 5000000, end="")' > test-data.txt

There might some performance improvements by removing the unneeded assignments (or somehow convincing the compiler to not emit code for them).

Patch:

--- Test.cc     2023-11-10 15:07:04.152724653 +0100
+++ Test3.cc    2023-11-10 15:49:59.353374911 +0100
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -275,7 +275,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:7:10-9:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_L_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -294,7 +294,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:7:10-9:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }
 
@@ -313,11 +313,11 @@
     (*__self).__error = __error;
       __location__("nested.spicy:12:3");
     (void());
-    __error = (*__self).__error;
-    (*__self).__error = __error;
+    // __error = (*__self).__error;
+    // (*__self).__error = __error;
       __location__("nested.spicy:11:10-13:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -334,7 +334,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:11:10-13:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_M_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -353,7 +353,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:11:10-13:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }
 
@@ -374,7 +374,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:15:17-17:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -426,7 +426,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:15:17-17:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_N_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -445,7 +445,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:15:17-17:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }

The text was updated successfully, but these errors were encountered:

bbannier · 2023-11-10T15:05:57Z

We have #1444 for removing dead stores but didn't have a benchmark showing it was useful. Would you be open to compare topic/bbannier/dead_stores against the baseline ec87b43?

awelzel · 2023-11-13T08:54:36Z

We have #1444 for removing dead stores but didn't have a benchmark showing it was useful. Would you be open to compare topic/bbannier/dead_stores against the baseline ec87b43?

As discussed in DM, might be more reasonable to test a rebased version. There's also nothing fancy around the benchmark. The reproducer is essentially testing how quickly Spicy can discard one byte at a time within a nested unit structure.

We use this to remove two statement constructs that the main optimizer may leave behind: 1. `default<void>()` 2. `(*self).__error = __error; __error = (*self).__error;` The second case is a quite specific situation that eventually, once we have CFG/DFG tracking, the main optimizer should be able to cover more generically. However, for now, it's just not nice to always have these blocks in the generated C++ code, so adding this special case seems useful. Per #1592, case 2 may also have overhead. Closes #1592.

We use this to remove two statement constructs that the main optimizer may leave behind: 1. `default<void>()` 2. `(*self).__error = __error; __error = (*self).__error;` The second case is a quite specific situation that eventually, once we have CFG/DFG tracking, the main optimizer should be able to cover more generically. However, for now, it's just not nice to always have these blocks in the generated C++ code, so adding this special case seems useful. Couples notes on (2): - Per #1592, case 2 may also have overhead. Closes #1592. - Technically, this optimization isn't always correct: subsequent code could assume that `(*self).__error` is set, whereas after removal it's not (or not to the expected value). However, `__error` is strictly-internal state, and we know that we don't use it any different, so this seems ok until we have more general optimizer logic.

bbannier added Cleanup Codegen labels Nov 10, 2023

This was referenced Nov 10, 2023

Overhead due to default std::tuple initialization in generated code #1593

Open

std::string parameters for debug::indent() and debug::dedent() in generated code #1591

Closed

rsmmr linked a pull request Aug 19, 2024 that will close this issue

Reduce generated C++ code for bytes parsing #1837

Merged

rsmmr closed this as completed in 15ef0e6 Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhead due to unneeded __error assignments #1592

Overhead due to unneeded __error assignments #1592

awelzel commented Nov 10, 2023 •

edited

Loading

bbannier commented Nov 10, 2023

awelzel commented Nov 13, 2023

Overhead due to unneeded __error assignments #1592

Overhead due to unneeded __error assignments #1592

Comments

awelzel commented Nov 10, 2023 • edited Loading

bbannier commented Nov 10, 2023

awelzel commented Nov 13, 2023

awelzel commented Nov 10, 2023 •

edited

Loading