WIP: Prism work, rebase with squashed commits #404

egiurleo · 2025-01-14T23:15:35Z

This is a test for upstreaming our Prism work. In this PR, I've rebased on Prism master and squashed many of our commits (398 commits --> 117)

Motivation

Test plan

See included automated tests.

Initial attempt at Prism to Sorbet translation layer Create and run Prism benchmarking script Translate more node types in `pipeline.cc` Wrap Prism C API into new `Prism::Parser` class And move all other Prism related code to the new parser/prism component Translate more node types Introduce memory comparison script for Prism and Sorbet parsers Translate more node types Write memory comparison script for Prism and Sorbet parser Translate more node types Add VS Code tasks for common operations Use `setup-bazel` to cache Bazel in CI Compare location data between Prism and Sorbet parsers To make sure our translation layer correctly translates node locations, we create a task that, for every Prism regression test, runs the test with location comparison on. Merge location and regression testing tasks Translate more node types CI improvements Translate more node types Use a `repository_rule` to get the RUBY_ROOT env var Up to this point, in order to build Prism, we've needed to manually pass the path to the correct RUBY version to the build process because Bazel does not take the user's environment into account and will just use the system's default Ruby version (which is often very old). This is a workaround for our workaround, where we use a Bazel `repository_rule` to get the `RUBY_ROOT` environment variable, write it to a file within the Bazel file system, and then pull the value from that file while building Prism. Fix output of Prism location test errors Co-Authored-By: Vinicius Stock <[email protected]> Co-Authored-By: Alex Momchilov <[email protected]> Co-Authored-By: Emily Samp <[email protected]> Co-Authored-By: Alexander Momchilov <[email protected]>

This debug config launches the main Sorbet executable to type check the provided file path (with Prism). Add vscode-lldb to recommended extensions list Add support for debugging Prism tests in VSCode Remove unnecessary task to set BAZEL_EXEC_ROOT It turns out that we only need `sourceMap`'s value to be set to the workspace folder with any arbitrary key. So not setting `BAZEL_EXEC_ROOT` is fine.

In our `downcast` helper, we check that the node argument is actually a Prism node. However, doing this requires us to check the `type` field, which will fail if the node is a `nullptr`. By doing a `nullptr` check in the `ENFORCE`, we guarantee we won't crash by checking for a field on a `nullptr`. Co-authored-by: Alex Momchilov <[email protected]>

Only do dynamic constant assignment workaround in specific cases Only do the dynamic constant assignment workaround when translating a constant from a `PM_CONSTANT_WRITE_NODE` and `PM_CONSTANT_PATH_NODE`. Test dynamic constant workaround for all constant assignment types Add more test cases to rescue prism regression tests Handle all constant path assignment node types #317

Stop surfacing Prism's EOF error It usually appears when there's an unclosed statements/assignments, where the root error will also be surfaced. So it's not exactly important to surface this error. But more importantly, because Prism puts this error at the very last line of the file when there's a missing `end`, it can't be represented in Sorbet's error comments. Keep Prism error id's type for easy comparison We currently don't use the converted value and keeping the `pm_diagnostic_id_t` type makes it easy to compare the value with `PM_ERR_*` constants directly. Add a task to display Prism's parse errors Address feedback Change location test's naming convention By not replacing `/` with `_`, the test naming structure is more consistent with normal regression tests. But more importantly, it allows us to run location tests that are under a folder like `error_recovery/assign`, more easily. Generate error for Sorbet's `assign.rb` error recovery fixture The error location doesn't match what Sorbet currently generates. But the idea is to first make parse trees match, accumulate more recovery tests, and then come up with a strategy to generate errors that match locations. #318 #319

To make it clearer that these are the regression tests we've built, as opposed to pre-existing Sorbet tests Add test corpus with Prism The purpose of these tests is to run the Sorbet tests that already exist, but to use the Prism parser instead of the Sorbet parser. As we fix more edge cases, we will include more of the test suite and run them in CI. Add tasks to run corpus tests with Prism Update the test target on CI Add Prism handling for packager tests too Because pipeline_tests handles packager tests differently, we need to add Prism handling for them too. Otherwise, we'll get test conflict errors when `test_corpus_prism`'s pattern covers packager tests. Follow up on #323 - Fix the command of single Prism regression test task - Update launch.json with the new task name Fix `launch.json` for prism regression tests Make test Pipeline parse with Prism in the case of `indexOne` Seperate test input for regression and corpus tests Remove error recovery tests from prism test corpus And start running corpus tests in CI Co-Authored-By: Emily Samp <[email protected]>

Sorbet's legacy parser treats `~[Integer]` and `-[Integer]` as integer literals, but Prism treats `~[Integer]` as a method call. (Technically, both of them are method calls.) So in the translator, we need to give `~` special handling when its receiver is an integer. We don't need the same for Float or Numeric because only Integer has the `~` method.

Unlike Integer or Float, Complex literals with prefix `-` or `+` should be translated to method calls on the Complex literal.

When we create a new translator instance, we haven't been maintaining the value of the unique counter, which keeps track of anonymous arguments in order to give them unique names. This was resulting in different parse trees for Sorbet and Prism.

Exclude any that are failing because of error recovery or legitimate errors.

Remove desugar/csend test from exclusions

`uniqueCounter` is a counter that is used to give unique ids to anonymous arguments while parsing them in Sorbet. Thus far, we have coordinated the `uniqueCounter` between multiple Translators by passing the value back and forth, but this approach is brittle. With this new approach, we can maintain the same `uniqueCounter` between multiple translators: 1. The parent translator maintains the source-of-truth counter in a variable called `uniqueCounterStorage` 2. It also maintains a pointer to that value called `uniqueCounter` 3. Child translators are created with a dummy value in `uniqueCounterStorage` that will never be used, as well as the `uniqueCounter` pointer that points to their parent translator's storage 4. Incrementing the `uniqueCounter` happens via the pointer, which will always point to the parent's `uniqueCounter` value Co-authored-by: Alexander Momchilov <[email protected]>

Add prism regression test for singleton method defs Add more newly-passing tests to Prism corpus Add `desugar/defs_not_self` to Prism corpus tests

Handle forwarding super node with block argument Correctly handle calls to `super` with block Change argument name to `blockArgumentNode` Refactor `translateArguments` to remove unused parameter And make block parameter optional Refactor index operator writes to use refactored arguments logic Refactor `PM_CALL_NODE` case to use refactored arguments logic Refactor `PM_INDEX_TARGET_NODE` case to use refactored arguments logic Refactor `PM_SUPER_NODE` case to use refactored arguments logic Refactor `translateArguments` to handle block argument as well Add a `super` test case to `def_block_param` regression test Pass block arguments to super calls Add case to `def_singleton` Prism regression test Testing the translation of a call to super that passes a block argument.

When the splat arg's expression is an `Arg`, we need to translate it to a `Restarg` instead of a `SplatLhs`.

1. We don't need to translate implicit rest nodes in array patterns 2. We need to return an `ArrayPatternWithTail` instead of an `ArrayPattern` when the pattern ends with an implicit rest node

Currently, corpus tests may fail with Prism for a few reasons: 1. The error checking fails with Prism. This is expected. 2. There are tree mismatches. And sometimes, a test fail because of both 1 and 2. This means that by skipping the test altogether, we are potentially skipping legitimately failing tests. This commit updates the test runner to skip the error message checking when running corpus tests with Prism and updates the skipped tests in the BUILD file. So now if a test is skipped, it's only because of the second reason. And we should minimize the number of tests that are skipped in that case. Co-authored-by: Emily Samp <[email protected]>

Rather than specifying whether to *skip* the logic, we should say whether to replace the constant node with a dynamic const assignment. Then, remove unnecessary type check and move this logic to the top of the method so we short-circuit any unnecessary code in that case. Run two more tests that are only failing because of error differences Remove `desugar/regex` from failing tests Remove `desugar/range` from excluded tests Co-Authored-By: Stan Lo <[email protected]>

1. It fixes the case where a hash pattern doesn't have a value, which will generate a `PM_IMPLICIT_NODE` that we currently don't handle. 2. It fixes the translation of hash patterns with a named splat node, like `in **o`.

Run `rewriter/def_delegator` test as part of prism corpus Remove `rewriter/rails/delegate` test from skipped rewriter tests Refactor `translateHash` into `translateKeyValuePairs` The logic of `translateHash` was kind of confusing because it was handling both vanilla hashes and also assocs/assoc splat nodes. By isolating one part of that method, which generates key/value pairs, and then calling that method in both cases, the two code paths become cleaner and more obvious.

When the method call is `[]`, the message location should be just the begin location of `[]`.

Add `parser/kwnil_errors` to skipped tests with a note Run `local_vars/z_super_args_splats_blocks` test in Prism corpus

And exclude failing tests

Add `namer/arguments` test to Prism corpus Run two more namer tests that now pass

In cases like `foo.()`, the message would be `:call`, but the message location would have null start and end. In this case, we need to fall back to use the call operator location. Include LSP tests in Prism Corpus tests

This is not covered in parse tree location tests, but reflected in symbol table tests: when we write `FOO = 1` (or other constant write nodes), the location of `FOO` should just be the name, not the entire assignment. In those cases, we should use `name_loc` instead of `base.location`.

When the symbol is used as `a: 1`, the location should not include the colon. Run Prism Corpus against all Sorbet component tests

Use improved libprism source to build Prism without dependency on Ruby The main blocker for upstreaming Prism parser to Sorbet is that we currently rely on Ruby to build Prism, as its required by Prism's `rake template` task. But by package the files generated by `rake template` in Prim's release, we will be able to remove the dependency on Ruby while significantly simplify the build configurations for Prism. Use another mock source

When ENFORCE macros fail, they use fmt (via spdlog) to format error messages. Add a formatter specialization for pm_node_type to properly display Prism node types in these error messages. Instead of showing numeric values, the formatter uses pm_node_type_to_str to display human-readable node type names.

Add empty Gemfile to use Ruby LSP

ee42436

st0012 force-pushed the prism-squashed branch 2 times, most recently from a4c9ea7 to 234c4a3 Compare January 21, 2025 12:56

egiurleo and others added 27 commits January 21, 2025 13:00

Extract up_cast helper

fb1c987

Implement the translation layer for all node types

f178ebb

Fix edge case for lambda with no params

53cd9a9

Fix float translation

68a8d63

Fix integer translation

9538b76

Properly translate complex literals with prefix - or +

323ef31

Unlike Integer or Float, Complex literals with prefix `-` or `+` should be translated to method calls on the Complex literal.

Correctly translate prefixed integer literals

a0eb5da

Handle special integers with signs too

fe1e2f6

Handle underscore in integer literals too

36a02cf

Add desugar tests to Prism corpus

4a04fa1

Exclude any that are failing because of error recovery or legitimate errors.

Make LHS of conditional call operator assignment CSend

86e43ae

Remove desugar/csend test from exclusions

Translate def nodes with receivers as DefS

298302f

Add prism regression test for singleton method defs Add more newly-passing tests to Prism corpus Add `desugar/defs_not_self` to Prism corpus tests

Fix splat arg translation in Prism

a49afaf

When the splat arg's expression is an `Arg`, we need to translate it to a `Restarg` instead of a `SplatLhs`.

Handle implicit rest nodes in array patterns

c44a700

1. We don't need to translate implicit rest nodes in array patterns 2. We need to return an `ArrayPatternWithTail` instead of an `ArrayPattern` when the pattern ends with an implicit rest node

Properly translate pattern matching value node with patternTranslate()

88a13b4

st0012 and others added 17 commits January 21, 2025 13:00

Improve pattern matching support with hash patterns

dabd5cf

1. It fixes the case where a hash pattern doesn't have a value, which will generate a `PM_IMPLICIT_NODE` that we currently don't handle. 2. It fixes the translation of hash patterns with a named splat node, like `in **o`.

Handle implicit hash literal too

9de2ba7

Update test/BUILD for rewriter tests

01ed4fc

Add local_vars tests to prism corpus and exclude failing test

4bc3677

Fix bracket method call's location translation

0d147ed

When the method call is `[]`, the message location should be just the begin location of `[]`.

Generate empty Args node for method definitions with empty parentheses

92aa2e7

Handle translation of post params

1fc4dd0

Add `parser/kwnil_errors` to skipped tests with a note Run `local_vars/z_super_args_splats_blocks` test in Prism corpus

Correctly translate post while loop

6a0aef7

Correctly translate post until loop

b046638

Add namer tests to prism corpus

1a3ec98

And exclude failing tests

Modify locations of kw arguments and params to match legacy parser

15d8ae9

Add `namer/arguments` test to Prism corpus Run two more namer tests that now pass

Fix call with empty message's location translation

d0116e9

In cases like `foo.()`, the message would be `:call`, but the message location would have null start and end. In this case, we need to fall back to use the call operator location. Include LSP tests in Prism Corpus tests

Fix symbol location for hash, keyword argument keys

0676c30

When the symbol is used as `a: 1`, the location should not include the colon. Run Prism Corpus against all Sorbet component tests

st0012 force-pushed the prism-squashed branch from 234c4a3 to 0c73cb0 Compare January 21, 2025 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Prism work, rebase with squashed commits #404

WIP: Prism work, rebase with squashed commits #404

egiurleo commented Jan 14, 2025

WIP: Prism work, rebase with squashed commits #404

Are you sure you want to change the base?

WIP: Prism work, rebase with squashed commits #404

Conversation

egiurleo commented Jan 14, 2025

Motivation

Test plan