-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correction of benchmark results #87
Comments
@kvas7andy Thanks for filing this issue with a detailed explanation. Could we split this into three separate issues to facilitate the discussion? |
Hi @blumu surely, lets split into three. Only thing is I will get back to discussion tomorrow. |
choice 1: Gym register new environments with version 0 preserving same AttackerGoal as before
@kvas7andy Is your commit above addressing all three problems mentioned in this issue or just some of them? (By the way, if you could split them as separate bugs that would be helpful.) Many thanks! |
I moved |
Hi everyone,
Found several bugs while checking the code of ipynb notebooks with benchmark results for 3 environments TinyToy, ToyCTF, Chain.
I think my findings might be useful for community, who uses this nice implementation of cyberattacks simulation.
own_atleast_percent: float 1.0
is included as condition with AND, for raising flagdone = True
, thus for TinyToy and ToyCTF (not Chain) leads to long duration of training, wrong RL signal for evaluating Q function and low sample-efficiency.done
. This means inclusion ofown_atleast_percent: 1.0
in initialization of"v0"
versions oftoyctf
andtinytoy
environments and creation of new envs 'CyberBattleTiny-v1' and 'CyberBattleToyCTF-v1', by defaultown_atleast_percent=0
andown_atleast=6
. This is reasonable, due to the fact that CTF solution includes only 6 nodes to be owned and with correct reward engineering training stops at the attack, which owns 6 nodes with highest reward.Figure 6: 1500 max iterations during training of 20 episodes, before PR
Figure 7: training on both 20 and 200 episodes, either use more RL techniques or learn for more episodes
The text was updated successfully, but these errors were encountered: