Permute improvements #154

ethteck · 2023-01-30T14:40:43Z

The permuter should be able to work off of improvements it has already found, deepening the search space. Ideally this would work when running multithreaded.

Each Permuter would have an immutable starting point, and main.py or something else would create new Permuters when improvements to a base are found. We also need to worry about caching results between Permuters to avoid storing the same improvements multiple times. (simon to fill in details)

One pitfall of this approach is that it's sure to generate tons of noisy mutations to the original code, especially with each new layer of findings. To address that, we should try to eliminate meaningless code mutations (#155).

The text was updated successfully, but these errors were encountered:

ethteck · 2023-01-30T14:42:17Z

expanded version of #32

simonlindholm · 2023-01-31T00:52:35Z

We also need to worry about caching results between Permuters to avoid storing the same improvements multiple times. (simon to fill in details)

Basically, Permuter.hashes for all the Permuters needs to point to the same set. I think we may get away with just making the Permuter constructor accept hashes as an argument.

There's some subtleties around threading here: what happens is the parent keeps a Permuter per directory you pass in on the cmdline, and each worker keeps a copy of all the Permuters. For each candidate result found by a Permuter, it checks should_output() to see if it should be outputted, and if so runs record_result() which puts it in the hash set so it won't be found again in the future (unless score = 0 because multiple score = 0 matches is fine). This is done in each worker as a prefilter and again in the parent process if the worker's filter matches. The prefilter is an optimization which avoids needing to serialize AST into C + send it across process (/network) boundaries for each candidate. I don't think this is actually something you need to be aware of; if all Permuter.hashes point to the set same the pickling done when starting the multiprocessing workers will ensure that happens in the workers too.

I see a couple of possible ways forward for this:

have Permuter grow some field for possible nice starting points, and remember which point was started from for the current iteration to know what score is considered good.
create a new wrapper class that will take Permuter's place, and keep a list of old-Permuters as starting points and forward calls to them.
use a separate Permuter for each starting point, have main.py manage creation/deletion of Permuters on the fly.
give up on a wholly automatic solution and pick out the starting points on startup instead, managing them as separate Permuters with shared hashes.

4 feels clearly easiest. UI is a bit unclear, but could perhaps have a flag for "in addition to the starting point, also try to start from the top N entries", or maybe some way of manually picking out which ones look reasonable to start from.

3 is pretty terrible, especially with how messy the scheduler code in main.py is (some of which is permuter@home's fault).

1/2 are about the same. If you wanted to sync between workers it would still need changes to the scheduler code, but the changes would be smaller. (It would need some p@h-specific changes though, which 3 does not.) I think probably you'd add a message like "found new good starting point: source X with score/hash Y". 2 is probably slightly nicer than 1 but on reflection the difference doesn't feel significant.

ethteck linked a pull request Mar 23, 2023 that will close this issue

Spawn new Permuters when improvements are found #158

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permute improvements #154

Permute improvements #154

ethteck commented Jan 30, 2023 •

edited

Loading

ethteck commented Jan 30, 2023

simonlindholm commented Jan 31, 2023

Permute improvements #154

Permute improvements #154

Comments

ethteck commented Jan 30, 2023 • edited Loading

ethteck commented Jan 30, 2023

simonlindholm commented Jan 31, 2023

ethteck commented Jan 30, 2023 •

edited

Loading