Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permute improvements #154

Open
ethteck opened this issue Jan 30, 2023 · 2 comments · May be fixed by #158
Open

Permute improvements #154

ethteck opened this issue Jan 30, 2023 · 2 comments · May be fixed by #158

Comments

@ethteck
Copy link
Contributor

ethteck commented Jan 30, 2023

The permuter should be able to work off of improvements it has already found, deepening the search space. Ideally this would work when running multithreaded.

Each Permuter would have an immutable starting point, and main.py or something else would create new Permuters when improvements to a base are found. We also need to worry about caching results between Permuters to avoid storing the same improvements multiple times. (simon to fill in details)

One pitfall of this approach is that it's sure to generate tons of noisy mutations to the original code, especially with each new layer of findings. To address that, we should try to eliminate meaningless code mutations (#155).

@ethteck
Copy link
Contributor Author

ethteck commented Jan 30, 2023

expanded version of #32

@simonlindholm
Copy link
Owner

We also need to worry about caching results between Permuters to avoid storing the same improvements multiple times. (simon to fill in details)

Basically, Permuter.hashes for all the Permuters needs to point to the same set. I think we may get away with just making the Permuter constructor accept hashes as an argument.

There's some subtleties around threading here: what happens is the parent keeps a Permuter per directory you pass in on the cmdline, and each worker keeps a copy of all the Permuters. For each candidate result found by a Permuter, it checks should_output() to see if it should be outputted, and if so runs record_result() which puts it in the hash set so it won't be found again in the future (unless score = 0 because multiple score = 0 matches is fine). This is done in each worker as a prefilter and again in the parent process if the worker's filter matches. The prefilter is an optimization which avoids needing to serialize AST into C + send it across process (/network) boundaries for each candidate. I don't think this is actually something you need to be aware of; if all Permuter.hashes point to the set same the pickling done when starting the multiprocessing workers will ensure that happens in the workers too.

I see a couple of possible ways forward for this:

  1. have Permuter grow some field for possible nice starting points, and remember which point was started from for the current iteration to know what score is considered good.
  2. create a new wrapper class that will take Permuter's place, and keep a list of old-Permuters as starting points and forward calls to them.
  3. use a separate Permuter for each starting point, have main.py manage creation/deletion of Permuters on the fly.
  4. give up on a wholly automatic solution and pick out the starting points on startup instead, managing them as separate Permuters with shared hashes.

4 feels clearly easiest. UI is a bit unclear, but could perhaps have a flag for "in addition to the starting point, also try to start from the top N entries", or maybe some way of manually picking out which ones look reasonable to start from.

3 is pretty terrible, especially with how messy the scheduler code in main.py is (some of which is permuter@home's fault).

1/2 are about the same. If you wanted to sync between workers it would still need changes to the scheduler code, but the changes would be smaller. (It would need some p@h-specific changes though, which 3 does not.) I think probably you'd add a message like "found new good starting point: source X with score/hash Y". 2 is probably slightly nicer than 1 but on reflection the difference doesn't feel significant.

@ethteck ethteck linked a pull request Mar 23, 2023 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants