Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Prism work, rebase with squashed commits #404

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from
Draft

Conversation

egiurleo
Copy link

This is a test for upstreaming our Prism work. In this PR, I've rebased on Prism master and squashed many of our commits (398 commits --> 117)

Motivation

Test plan

See included automated tests.

@st0012 st0012 force-pushed the prism-squashed branch 2 times, most recently from a4c9ea7 to 234c4a3 Compare January 21, 2025 12:56
egiurleo and others added 27 commits January 21, 2025 13:00
Initial attempt at Prism to Sorbet translation layer

Create and run Prism benchmarking script

Translate more node types in `pipeline.cc`

Wrap Prism C API into new `Prism::Parser` class

And move all other Prism related code to the new parser/prism component

Translate more node types

Introduce memory comparison script for Prism and Sorbet parsers

Translate more node types

Write memory comparison script for Prism and Sorbet parser

Translate more node types

Add VS Code tasks for common operations

Use `setup-bazel` to cache Bazel in CI

Compare location data between Prism and Sorbet parsers

To make sure our translation layer correctly translates node locations, we create a task that, for every Prism regression test, runs the test with location comparison on.

Merge location and regression testing tasks

Translate more node types

CI improvements

Translate more node types

Use a `repository_rule` to get the RUBY_ROOT env var

Up to this point, in order to build Prism, we've needed to manually pass the path to the correct RUBY version to the build process because Bazel does not take the user's environment into account and will just use the system's default Ruby version (which is often very old).

This is a workaround for our workaround, where we use a Bazel `repository_rule` to get the `RUBY_ROOT` environment variable, write it to a file within the Bazel file system, and then pull the value from that file while building Prism.

Fix output of Prism location test errors

Co-Authored-By: Vinicius Stock <[email protected]>
Co-Authored-By: Alex Momchilov <[email protected]>
Co-Authored-By: Emily Samp <[email protected]>
Co-Authored-By: Alexander Momchilov <[email protected]>
This debug config launches the main Sorbet executable to type check
the provided file path (with Prism).

Add vscode-lldb to recommended extensions list

Add support for debugging Prism tests in VSCode

Remove unnecessary task to set BAZEL_EXEC_ROOT

It turns out that we only need `sourceMap`'s value to be set to the
workspace folder with any arbitrary key. So not setting `BAZEL_EXEC_ROOT`
is fine.
In our `downcast` helper, we check that the node argument is actually
a Prism node. However, doing this requires us to check the `type` field,
which will fail if the node is a `nullptr`. By doing a `nullptr`
check in the `ENFORCE`, we guarantee we won't crash by checking for
a field on a `nullptr`.

Co-authored-by: Alex Momchilov <[email protected]>
Only do dynamic constant assignment workaround in specific cases

Only do the dynamic constant assignment workaround when translating
a constant from a `PM_CONSTANT_WRITE_NODE` and
`PM_CONSTANT_PATH_NODE`.

Test dynamic constant workaround for all constant assignment types

Add more test cases to rescue prism regression tests

Handle all constant path assignment node types

#317
Stop surfacing Prism's EOF error

It usually appears when there's an unclosed statements/assignments, where
the root error will also be surfaced. So it's not exactly important to
surface this error.

But more importantly, because Prism puts this error at the very last line
of the file when there's a missing `end`, it can't be represented in
Sorbet's error comments.

Keep Prism error id's type for easy comparison

We currently don't use the converted value and keeping the `pm_diagnostic_id_t`
type makes it easy to compare the value with `PM_ERR_*` constants directly.

Add a task to display Prism's parse errors

Address feedback

Change location test's naming convention

By not replacing `/` with `_`, the test naming structure is more consistent
with normal regression tests. But more importantly, it allows us to run
location tests that are under a folder like `error_recovery/assign`, more
easily.

Generate error for Sorbet's `assign.rb` error recovery fixture

The error location doesn't match what Sorbet currently generates. But the
idea is to first make parse trees match, accumulate more recovery tests,
and then come up with a strategy to generate errors that match locations.

#318
#319
To make it clearer that these are the regression tests we've built,
as opposed to pre-existing Sorbet tests

Add test corpus with Prism

The purpose of these tests is to run the Sorbet tests that already exist,
but to use the Prism parser instead of the Sorbet parser. As we fix
more edge cases, we will include more of the test suite and run them
in CI.

Add tasks to run corpus tests with Prism

Update the test target on CI

Add Prism handling for packager tests too

Because pipeline_tests handles packager tests differently, we need to add
Prism handling for them too. Otherwise, we'll get test conflict errors
when `test_corpus_prism`'s pattern covers packager tests.

Follow up on #323

- Fix the command of single Prism regression test task
- Update launch.json with the new task name

Fix `launch.json` for prism regression tests

Make test Pipeline parse with Prism in the case of `indexOne`

Seperate test input for regression and corpus tests

Remove error recovery tests from prism test corpus

And start running corpus tests in CI

Co-Authored-By: Emily Samp <[email protected]>
Sorbet's legacy parser treats `~[Integer]` and `-[Integer]` as integer literals,
but Prism treats `~[Integer]` as a method call.

(Technically, both of them are method calls.)

So in the translator, we need to give `~` special handling when its receiver is an integer.
We don't need the same for Float or Numeric because only Integer has the `~` method.
Unlike Integer or Float, Complex literals with prefix `-` or `+` should
be translated to method calls on the Complex literal.
When we create a new translator instance, we haven't been maintaining
the value of the unique counter, which keeps track of anonymous arguments
in order to give them unique names. This was resulting in different parse
trees for Sorbet and Prism.
Exclude any that are failing because of error recovery or
legitimate errors.
Remove desugar/csend test from exclusions
`uniqueCounter` is a counter that is used to give unique ids to anonymous arguments
while parsing them in Sorbet. Thus far, we have coordinated the `uniqueCounter` between
multiple Translators by passing the value back and forth, but this approach is brittle.

With this new approach, we can maintain the same `uniqueCounter` between multiple
translators:

1. The parent translator maintains the source-of-truth counter in a variable called
  `uniqueCounterStorage`
2. It also maintains a pointer to that value called `uniqueCounter`
3. Child translators are created with a dummy value in `uniqueCounterStorage` that
  will never be used, as well as the `uniqueCounter` pointer that points to their
  parent translator's storage
4. Incrementing the `uniqueCounter` happens via the pointer, which will always
  point to the parent's `uniqueCounter` value

Co-authored-by: Alexander Momchilov <[email protected]>
Add prism regression test for singleton method defs

Add more newly-passing tests to Prism corpus

Add `desugar/defs_not_self` to Prism corpus tests
Handle forwarding super node with block argument

Correctly handle calls to `super` with block

Change argument name to `blockArgumentNode`

Refactor `translateArguments` to remove unused parameter

And make block parameter optional

Refactor index operator writes to use refactored arguments logic

Refactor `PM_CALL_NODE` case to use refactored arguments logic

Refactor `PM_INDEX_TARGET_NODE` case to use refactored arguments logic

Refactor `PM_SUPER_NODE` case to use refactored arguments logic

Refactor `translateArguments` to handle block argument as well

Add a `super` test case to `def_block_param` regression test

Pass block arguments to super calls

Add case to `def_singleton` Prism regression test

Testing the translation of a call to super that passes a block argument.
When the splat arg's expression is an `Arg`, we need to translate it to a `Restarg`
instead of a `SplatLhs`.
1. We don't need to translate implicit rest nodes in array patterns
2. We need to return an `ArrayPatternWithTail` instead of an `ArrayPattern`
   when the pattern ends with an implicit rest node
Currently, corpus tests may fail with Prism for a few reasons:

1. The error checking fails with Prism. This is expected.
2. There are tree mismatches.

And sometimes, a test fail because of both 1 and 2.

This means that by skipping the test altogether, we are potentially skipping
legitimately failing tests.

This commit updates the test runner to skip the error message checking when
running corpus tests with Prism and updates the skipped tests in the BUILD file.

So now if a test is skipped, it's only because of the second reason. And
we should minimize the number of tests that are skipped in that case.

Co-authored-by: Emily Samp <[email protected]>
Rather than specifying whether to *skip* the logic, we should
say whether to replace the constant node with a dynamic const assignment.

Then, remove unnecessary type check and move this logic to
the top of the method so we short-circuit any unnecessary
code in that case.

Run two more tests that are only failing because of error differences

Remove `desugar/regex` from failing tests

Remove `desugar/range` from excluded tests

Co-Authored-By: Stan Lo <[email protected]>
st0012 and others added 17 commits January 21, 2025 13:00
1. It fixes the case where a hash pattern doesn't have a value, which will
   generate a `PM_IMPLICIT_NODE` that we currently don't handle.
2. It fixes the translation of hash patterns with a named splat node, like
   `in **o`.
Run `rewriter/def_delegator` test as part of prism corpus

Remove `rewriter/rails/delegate` test from skipped rewriter tests

Refactor `translateHash` into `translateKeyValuePairs`

The logic of `translateHash` was kind of confusing because it was handling
both vanilla hashes and also assocs/assoc splat nodes. By isolating one
part of that method, which generates key/value pairs, and then calling
that method in both cases, the two code paths become cleaner and more
obvious.
When the method call is `[]`, the message location should be just the begin
location of `[]`.
Add `parser/kwnil_errors` to skipped tests with a note

Run `local_vars/z_super_args_splats_blocks` test in Prism corpus
And exclude failing tests
Add `namer/arguments` test to Prism corpus

Run two more namer tests that now pass
In cases like `foo.()`, the message would be `:call`, but the message location
would have null start and end. In this case, we need to fall back to use
the call operator location.

Include LSP tests in Prism Corpus tests
This is not covered in parse tree location tests, but reflected in symbol
table tests: when we write `FOO = 1` (or other constant write nodes), the
location of `FOO` should just be the name, not the entire assignment.

In those cases, we should use `name_loc` instead of `base.location`.
When the symbol is used as `a: 1`, the location should not include the colon.

Run Prism Corpus against all Sorbet component tests
Use improved libprism source to build Prism without dependency on Ruby

The main blocker for upstreaming Prism parser to Sorbet is that we currently
rely on Ruby to build Prism, as its required by Prism's `rake template` task.

But by package the files generated by `rake template` in Prim's release,
we will be able to remove the dependency on Ruby while significantly simplify
the build configurations for Prism.

Use another mock source
When ENFORCE macros fail, they use fmt (via spdlog) to format error messages.
Add a formatter specialization for pm_node_type to properly display Prism node
types in these error messages. Instead of showing numeric values, the formatter
uses pm_node_type_to_str to display human-readable node type names.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants