Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PNP rule #1

Closed
wants to merge 2 commits into from
Closed

PNP rule #1

wants to merge 2 commits into from

Conversation

danjuv
Copy link
Owner

@danjuv danjuv commented Nov 21, 2019

This PR attempts to add PnP support to the yarn_install rule in bazel. PnP is designed to replace the node_modules folder with a lookup table pointing directly to the yarn cache..

The core idea behind the design is to emulate the way PnP resolving works by providing the integrity checksum as input to bazel instead of individual files. Which means we save time normally spent symlinking node_modules into the sandbox/repository directories since we only need to pass around a solitary .pnp.js file (the file containing the lookup table).

The main drawback of this design is the pnp file references the yarn cache, which isn't sandboxed by bazel (and we don't want it to, since that would be more time consuming than fine-grained node_modules). While bazel cannot guarantee the hermeticity of the yarn cache, determinism is a core feature of yarn, so the rules as a whole will still be deterministic.

Copy link

@uri-canva uri-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joscha can you review the typescript code to make sure it's idiomatic?
@toby does this make sense to you?

@@ -239,6 +239,7 @@ def _nodejs_binary_impl(ctx):
"TEMPLATED_repository_args": _to_manifest_path(ctx, ctx.file._repository_args),
"TEMPLATED_script_path": _to_execroot_path(ctx, ctx.file.entry_point),
"TEMPLATED_vendored_node": "" if is_builtin else strip_external(ctx.file._node.path),
"TEMPLATED_pnp_file": _to_manifest_path(ctx, ctx.file.pnp_file),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be its own substitution: if we use this we don't use TEMPLATED_bazel_require_script, so you can replace that with this when pnp_file is set.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to if we're going to keep this backwards compatible though

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if it has its own substitution, it makes it easier in the launcher script to verify if we are in PnP mode or not. WDYT?

@@ -207,7 +206,7 @@ if [ "${EXPECTED_EXIT_CODE}" -eq "0" ]; then
# handled by the node process.
# If we had merely forked a child process here, we'd be responsible
# for forwarding those OS interactions.
exec "${node}" "${NODE_OPTIONS[@]}" "${MAIN}" "${ARGS[@]}"
exec "${node}" "${ARGS[@]}" "${NODE_OPTIONS[@]}" "$(pwd)/TEMPLATED_script_path"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from MAIN to $(pwd)/TEMPLATED_script_path? Does it not work with the relative path? If so why not change the value of MAIN?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because MAIN is a node script that patches the resolver to fix the resolve order for node_modules for some cases (like Angular). Since we don't have node modules anymore I replaced that with just invoking the entry script directly.

We can change the value of MAIN though, that will also work


filegroup(
name = "${pkg}",
srcs = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here explaining that the srcs are intentionally empty, otherwise it would look like a mistake.

const contents = `package(default_visibility = ["//visibility:public"])
exports_files([
".pnp.js",
"package.json",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent in BUILD.bazel files.

load("@build_bazel_rules_nodejs//internal/npm_install:node_module_library.bzl", "node_module_library")

filegroup(
name = "${pkg}",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent in BUILD.bazel files.

): Promise<Set<string>> {
const flattenedDeps: Set<string> = new Set();
deps.forEach((key: string) => {
// loop over transitive dependencies here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment, it's not relevant anymore.


flattenedDeps.forEach(createBuildFile);

processDependencies(devDependencies, "devDependencies");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create build files for dev dependencies too. Are there also test dependencies or other dependencies types we should handle?


def _create_build_files(repository_ctx, rule_type, node, lock_file):
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is empty? Is this intentional?


const pnpPath = path.join(workspacePath, ".pnp.js")
const pnpFile = fs.readFileSync(pnpPath, "utf8");
const patchedPnpFile = pnpFile.replace("issuerModule ? issuerModule.filename : `${process.cwd()}/`;", `"${process.cwd()}"`);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining the intended replacement here. Otherwise it's very confusing.

Copy link

@uri-canva uri-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an explanation of the design of the change in the PR description. Copy that in the commit message too, so it will show up in the patch when you create it with git format-patch.

"1" if error_on_build_files else "0",
repository_ctx.path(lock_file),
",".join(repository_ctx.attr.included_files),
])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed to execute the new generator when using pnp, but still running the old one when not.

repository_ctx.path(repository_ctx.attr.package_json.package + "/package.json"),
])
if result.return_code:
fail("deleting postinstall scripts failed: \nSTDOUT:\n%s\nSTDERR:\n%s" % (result.stdout, result.stderr))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deleting the install scripts? Did this come from the other patch?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this code came from the yarn_install patch in the canva repo


args = [
repository_ctx.path(yarn),
"--cwd",
root,
"--network-timeout",
str(repository_ctx.attr.network_timeout * 1000), # in ms
"--modules-folder",
repository_ctx.path("node_modules"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change for? Did it come from another patch?

@danjuv
Copy link
Owner Author

danjuv commented Nov 22, 2019

@uri-canva changes in

substitutions = {
"TEMPLATED_args": " ".join([
expand_location_into_runfiles(ctx, a)
for a in templated_args
]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious change, revert: the smaller the change the easier it will be to apply cleanly when we update the baseline.

@@ -258,15 +259,15 @@ def _nodejs_binary_impl(ctx):
# entry point is only needed in runfiles if it is a .js file
if ctx.file.entry_point.extension == "js":
runfiles = depset([ctx.file.entry_point], transitive = [runfiles])

run_files = [ctx.outputs.loader]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context there's now a variable called runfiles and one called run_files. That is too confusing.

files = node_tool_files + [
ctx.outputs.loader,
] + ctx.files._source_map_support_files +
files = node_tool_files + run_files + ctx.files._source_map_support_files +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are node_tool_files, node_modules and sources added to both files and transitive_files? Does pnp_file also need added to both? Or do neither?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, i tried to change as little as possible, since even if node_modules lists are passed around, they're going to be empty

ALL_ARGS=(TEMPLATED_args $NODE_REPOSITORY_ARGS "$@")
for ARG in "${ALL_ARGS[@]}"; do
case "$ARG" in
--bazel_node_modules_manifest=*) MODULES_MANIFEST="${ARG#--bazel_node_modules_manifest=}" ;;
--nobazel_patch_module_resolver)
MAIN="TEMPLATED_script_path"
NODE_OPTIONS+=( "--require" "$bazel_require_script" )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore this to stay backwards compatible.

MAIN="TEMPLATED_script_path"
NODE_OPTIONS=("--require" "${pnp_file}")
fi

ALL_ARGS=(TEMPLATED_args $NODE_REPOSITORY_ARGS "$@")
for ARG in "${ALL_ARGS[@]}"; do
case "$ARG" in
--bazel_node_modules_manifest=*) MODULES_MANIFEST="${ARG#--bazel_node_modules_manifest=}" ;;
--nobazel_patch_module_resolver)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is conditional on passing --nobazel_patch_module_resolver from https://github.com/bazelbuild/rules_nodejs/blob/5b4e4a0fa403af0aa201ee94e00e9807daee6dbd/internal/providers/node_runtime_deps_info.bzl#L62 can our code do the same, rather than looking for pnp_file?

"JAVA_HOME",
"LDFLAGS",
"PATH",
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than including the changes in the patches we already use, create a new branch with them, and rebase this branch on that branch, so the PR only shows changes in the current patch, not all patches together.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at this, it seems to introduce too many merge conflicts to the point that I don't think it's too worthwhile

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be necessary once we apply the patches in the repo anyway: we don't want to specify a single patch that contains all the changes we made to upstream, we want a separate patch for each logical change, so we know when we can stop patching them in because upstream have merged our PRs, or because we don't need them anymore.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i didnt think about it that way, i'll split this out


repository_ctx.template(
repository_ctx.template(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious change, revert (it might require editing the file in an editor that doesn't strip trailing whitespace from lines).

@@ -267,6 +309,7 @@ cd "{root}" && "{npm}" {npm_args}
)
if result.return_code:
fail("pre_process_package_json.js failed: \nSTDOUT:\n%s\nSTDERR:\n%s" % (result.stdout, result.stderr))
is_pnp = result.stdout.strip()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a json parsing library for starlark: bazelbuild/bazel#3732 (comment)

Please use that instead of this, this is too hacky.

writeFileSync(`BUILD.bazel`, contents);
}

main();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject process.argv here:

Suggested change
main();
main(process.argv[3]);

so that the file can be required without side-effects or potentical errors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ideally call main from a CLI entry file.


const workspacePath = process.argv.slice(2)[0];

function mkdirp(p: string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node 10.12 has fs.mkdir(..., { recursive: true }) - do we need to support older versions?

writeFileSync(`${pkg}/BUILD.bazel`, contents);
}

async function main() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async function main() {
async function main(workspacePath: string) {

}
}

function writeFileSync(p: string, content: string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOI: do all these need to be synchronous?

Also nit: (p: string, content: string) is a much more narrow signature than the original writeFileSync on fs. You could either inline the mkdirp calls (there are only two invocations anyway) or make your own writeFileSync a typeof fs.writeFileSync to align the signatures. I'd probably do the former.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but I didn't particularly want to deal with callbacks, but I see now fs has a promises property in node@^10, so it might be better to just use that

async function main() {
const packageJson = require(path.join(workspacePath, "package.json"));
const dependencies = Object.keys(packageJson.dependencies || {});
const devDependencies = Object.keys(packageJson.dependencies || {});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const devDependencies = Object.keys(packageJson.dependencies || {});
const devDependencies = Object.keys(packageJson.devDependencies || {});

also: what about peer and optional dependencies?

* not the entry_point's directory. PnP assumes they are the same so when bazel creates the
* pnp file in a different directory, the pnp files can't resolve the paths.
*/
function main() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function main() {
function main(workspacePath: string) {

const pnpFile = fs.readFileSync(pnpPath, "utf8");
const patchedPnpFile = pnpFile.replace(
"issuerModule ? issuerModule.filename : `${process.cwd()}/`;",
`"${process.cwd()}"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`"${process.cwd()}"`
JSON.stringify(process.cwd())


const pnpPath = path.join(workspacePath, ".pnp.js");
const pnpFile = fs.readFileSync(pnpPath, "utf8");
const patchedPnpFile = pnpFile.replace(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some docs here with an example of what this is supposed to match? Also, do we expect to only replace the first mach?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we expect their to only ever be 1 match, this is a hack to force the path we look for PnP modules to be the same as where yarn_install is invoked. If we don't do this the pnp resolver defaults to the path the nodejs binary is invoked, which for bazel is different.

I suppose it's not necessarily only supposed to replace once, so we should do a global replace to make it better

@@ -53,9 +53,13 @@ function main() {
// Note: there is no equivalent npm functionality to clean out individual packages
// from the npm cache.
clearYarnFilePathCaches(pkg);
console.log(isPnp(pkg));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(isPnp(pkg));

leftover?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is called by bazel as a way to tell the rule that the workspace is a pnp workspace, bazel can only take from stdout, so we need to log this to the console

const packageJson = require(path.join(workspacePath, "package.json"));
const dependencies = Object.keys(packageJson.dependencies || {});
const devDependencies = Object.keys(packageJson.dependencies || {});
dependencies.concat(devDependencies).forEach(createBuildFile);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheap enough, but there is probably no advantage over creating a new combination array and then iterating.

Object.keys(packageJson.dependencies || {}).forEach(createBuildFile);
Object.keys(packageJson.devDependencies || {}).forEach(createBuildFile);

will be as expressive but half the LOC.

or if you're keen:

[
  'dependencies',
  'devDependencies',
  'peerDependencies',
  'optionalDependencies'
]
  .forEach(d => Object.keys(packageJson[d] || {}).forEach(createBuildFile));

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second one is nice

if (!existsSync(path.dirname(p))) {
mkdirSync(path.dirname(p), { recursive: true });
}
return fsp.writeFile(p, content);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will return a Promise object now. Which is quite confusing because you use the sync methods above.

I suggest at least changing the signature to:
function writeFile(p: string, content: string): Promise<void> {

or even better:

async function writeFile(p: string, content: string): Promise<void> {
  await fsp.mkdir(path.dirname(p), { recursive: true });
  return await fsp.writeFile(p, content);
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this should be renamed


writeFileSync(`${pkg}/BUILD.bazel`, contents);
return writeFileSync(`${packagePath}/BUILD.bazel`, contents);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFile is not synchronous anymore which makes it return a promise and thus also makes createBuildFile async.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional, I should be dealing with all the async stuff with Promise.all though I may have missed something

"peerDependencies",
"optionalDependencies"
].map(d => {
Promise.all(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value of this is never returned up from the map, so if anything gets rejected it will be unhandled. It also means that your outer Promise.all waits for nothing:

Suggested change
Promise.all(
return Promise.all(

Promise.all(
Object.keys(packageJson[d] || {})
.map(createBuildFile)
.concat([writeFileSync(`BUILD.bazel`, contents)])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.concat([writeFileSync(`BUILD.bazel`, contents)])
.concat([writeFileSync("BUILD.bazel", contents)])

or

Suggested change
.concat([writeFileSync(`BUILD.bazel`, contents)])
.concat([writeFileSync('BUILD.bazel', contents)])

as you're not using any interpolation.

@@ -53,9 +53,13 @@ function main() {
// Note: there is no equivalent npm functionality to clean out individual packages
// from the npm cache.
clearYarnFilePathCaches(pkg);
console.log(JSON.stringify({pnp: isPnp(pkg)}));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(JSON.stringify({pnp: isPnp(pkg)}));
process.stdout.write(JSON.stringify({pnp: isPnp(pkg)}));

makes it more obvious that this is a deliberate print to sdtout.

@@ -0,0 +1,20 @@
Copyright 2019 Erick Johnson <[email protected]>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

third party json parser, we can't introduce it as an http_archive since you can't patch the workspace file in a repo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that fully explains why the vendoring is necessary, but there is an example of this upstream already so I think this is ok:
https://github.com/bazelbuild/rules_nodejs/tree/a13f2b6e2a132954fb63b7a905818cf8a6ceb824/third_party/github.com/bazelbuild/bazel-skylib/rules

@@ -200,7 +200,7 @@ def _nodejs_binary_impl(ctx):
node_tool_files.append(ctx.file._bazel_require_script)

if not ctx.outputs.templated_args_file:
templated_args = ctx.attr.templated_args
templated_args = [a for a in ctx.attr.templated_args]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this string list is frozen, so I created a new list so we can mutate it (add the pnp flag if necessary)

@danjuv danjuv changed the base branch from master to basedpr November 28, 2019 02:41
@@ -200,7 +200,7 @@ def _nodejs_binary_impl(ctx):
node_tool_files.append(ctx.file._bazel_require_script)

if not ctx.outputs.templated_args_file:
templated_args = ctx.attr.templated_args
templated_args = [a for a in ctx.attr.templated_args]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using .to_list(), I'm not 100% sure it works, but if it does, it's nicer than an identity list comprehension.

@@ -226,6 +226,10 @@ def _nodejs_binary_impl(ctx):

is_builtin = ctx.attr._node.label.workspace_name in ["nodejs_%s" % p for p in BUILT_IN_NODE_PLATFORMS]

output_files = [ctx.outputs.loader]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called output_files? Isn't ctx.file.pnp_file an input file?

@@ -277,6 +310,7 @@ cd "{root}" && "{npm}" {npm_args}
)
if result.return_code:
fail("pre_process_package_json.js failed: \nSTDOUT:\n%s\nSTDERR:\n%s" % (result.stdout, result.stderr))
is_pnp = json_parse(result.stdout)["pnp"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you can parse json, why not parse package.json directly, rather than going through the script?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Starlark doesn't support file i/o as far as I know

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, makes sense it's exposed on repository rules

Copy link

@uri-canva uri-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close! I think if we can read the package.json from starlark then it's good.

@danjuv
Copy link
Owner Author

danjuv commented Jan 2, 2020

see https://canvadev.atlassian.net/browse/DEVF-831 for more details

@danjuv danjuv closed this Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants