Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead due to unneeded __error assignments #1592

Closed
awelzel opened this issue Nov 10, 2023 · 2 comments · Fixed by #1837
Closed

Overhead due to unneeded __error assignments #1592

awelzel opened this issue Nov 10, 2023 · 2 comments · Fixed by #1837

Comments

@awelzel
Copy link
Contributor

awelzel commented Nov 10, 2023

Hope that's a valid test. I'm not sure why there's all the redundant (*__self).__error = __error assignments, but it doesn't come for free. I took the generated C++ code of the following example (spicyc -D jit -T and copying away the cc files and commands) and patched out the obvious ones that seemed superflous. This results in a 10-12% raw parsing performance improvement for this microbenchmark. (Debian 12, GCC 12.2).

module Test;

type K = unit {
  x: skip bytes &size=1;
};

type L = unit {
  k: K;
};

type M = unit {
  l: L;
};

public type N = unit {
  : M[] &eod;
};
Command Mean [s] Min [s] Max [s] Relative
cat ./test-data.txt | spicy-driver Test.hlto 6.011 ± 0.109 5.875 6.177 1.11 ± 0.02
cat ./test-data.txt | spicy-driver Test3.hlto 5.413 ± 0.051 5.375 5.497 1.00

test-data.txt was generated as follows:

python3 -c 'print("x" * 5000000, end="")' > test-data.txt

There might some performance improvements by removing the unneeded assignments (or somehow convincing the compiler to not emit code for them).


Patch:

--- Test.cc     2023-11-10 15:07:04.152724653 +0100
+++ Test3.cc    2023-11-10 15:49:59.353374911 +0100
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -275,7 +275,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:7:10-9:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_L_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -294,7 +294,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:7:10-9:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }
 
@@ -313,11 +313,11 @@
     (*__self).__error = __error;
       __location__("nested.spicy:12:3");
     (void());
-    __error = (*__self).__error;
-    (*__self).__error = __error;
+    // __error = (*__self).__error;
+    // (*__self).__error = __error;
       __location__("nested.spicy:11:10-13:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -334,7 +334,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:11:10-13:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_M_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -353,7 +353,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:11:10-13:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }
 
@@ -374,7 +374,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:15:17-17:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     ::hilti::rt::debug::dedent(std::string("spicy"));
     __result = std::make_tuple(__cur, __lah, __lahe, __error);
     return __result;
@@ -426,7 +426,7 @@
         (*__self).__error = __error;
           __location__("nested.spicy:15:17-17:2");
         (void());
-        __error = (*__self).__error;
+        // __error = (*__self).__error;
         ::hilti::rt::StrongReference<::hilti::rt::Stream> filtered = ::hilti::rt::StrongReference<::hilti::rt::Stream>();
         if ( ! (::hilti::rt::Bool(static_cast<bool>(filtered))) ) {
             __result = (*__self).__parse_Test_N_stage2(__data, __begin, __cur, __trim, __lah, __lahe, __error);
@@ -445,7 +445,7 @@
     (*__self).__error = __error;
       __location__("nested.spicy:15:17-17:2");
     (void());
-    __error = (*__self).__error;
+    // __error = (*__self).__error;
     return __result;
 }

@bbannier
Copy link
Member

We have #1444 for removing dead stores but didn't have a benchmark showing it was useful. Would you be open to compare topic/bbannier/dead_stores against the baseline ec87b43?

@awelzel
Copy link
Contributor Author

awelzel commented Nov 13, 2023

We have #1444 for removing dead stores but didn't have a benchmark showing it was useful. Would you be open to compare topic/bbannier/dead_stores against the baseline ec87b43?

As discussed in DM, might be more reasonable to test a rebased version. There's also nothing fancy around the benchmark. The reproducer is essentially testing how quickly Spicy can discard one byte at a time within a nested unit structure.

@rsmmr rsmmr linked a pull request Aug 19, 2024 that will close this issue
rsmmr added a commit that referenced this issue Aug 19, 2024
We use this to remove two statement constructs that the main optimizer may leave
behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Per #1592, case 2 may also have overhead. Closes #1592.
rsmmr added a commit that referenced this issue Aug 19, 2024
We use this to remove two statement constructs that the main optimizer may leave
behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Per #1592, case 2 may also have overhead. Closes #1592.
rsmmr added a commit that referenced this issue Aug 20, 2024
We use this to remove two statement constructs that the main optimizer may leave
behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Couples notes on (2):

    - Per #1592, case 2 may also have overhead. Closes #1592.
    - Technically, this optimization isn't always correct: subsequent
      code could assume that `(*self).__error` is set, whereas after
      removal it's not (or not to the expected value). However,
      `__error` is strictly-internal state, and we know that we don't
      use it any different, so this seems ok until we have more
      general optimizer logic.
rsmmr added a commit that referenced this issue Sep 20, 2024
We use this to remove two statement constructs that the main optimizer
may leave behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Couples notes on (2):

    - Per #1592, case 2 may also have overhead. Closes #1592.
    - Technically, this optimization isn't always correct: subsequent
      code could assume that `(*self).__error` is set, whereas after
      removal it's not (or not to the expected value). However,
      `__error` is strictly-internal state, and we know that we don't
      use it any different, so this seems ok until we have more
      general optimizer logic.
rsmmr added a commit that referenced this issue Sep 23, 2024
We use this to remove two statement constructs that the main optimizer
may leave behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Couples notes on (2):

    - Per #1592, case 2 may also have overhead. Closes #1592.
    - Technically, this optimization isn't always correct: subsequent
      code could assume that `(*self).__error` is set, whereas after
      removal it's not (or not to the expected value). However,
      `__error` is strictly-internal state, and we know that we don't
      use it any different, so this seems ok until we have more
      general optimizer logic.
rsmmr added a commit that referenced this issue Sep 25, 2024
We use this to remove two statement constructs that the main optimizer
may leave behind:

    1. `default<void>()`
    2. `(*self).__error = __error; __error = (*self).__error;`

The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.

Couples notes on (2):

    - Per #1592, case 2 may also have overhead. Closes #1592.
    - Technically, this optimization isn't always correct: subsequent
      code could assume that `(*self).__error` is set, whereas after
      removal it's not (or not to the expected value). However,
      `__error` is strictly-internal state, and we know that we don't
      use it any different, so this seems ok until we have more
      general optimizer logic.
@rsmmr rsmmr closed this as completed in 15ef0e6 Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants