yawning_titan.envs.specific.nsa_node_def#
A new node network that can be configured for multiple different configurations.
- Currently suppports:
18 node network from the research paper.
a network creator that allows you to use multiple topologies and change the connectivity of the network.
- Red agent actions:
- Spread:
Tries to spread to each node connected to a compromised node.
- Randomly infect:
Tries to randomly infect every currently un-compromised node.
- Configurable parameters:
- chance_to_spread
This is the chance for the red agent to spread between nodes
- chance_to_spread_during_patch
There is a chance that when a compromised node is patched the red agent “escapes” to neaby nodes and compromises them.
- chance_to_randomly_compromise
This is the chance that the red agent randomly infects a un-compromised node.
- cost_of_isolate
The cost (negative reward) associated with performing the isolate action (initially set to 10 based on data from the paper).
- cost_of_patch
The cost (negative reward) associated with performing the patch action (initially set to 5 based on data from the paper).
- cost_of_nothing
The cost (negative reward) associated with performing the do nothing action (initially set to 0 based on data from the paper).
- end
The number of steps that the blue agent must survive for to win.
- spread_vs_random_intrusio
The chance that the red agent will choose the spread action on its turn as apposed to the random intrusion action.
- punish_for_isolate
Either True or False. If True then each step the agent is punished based on the number of isolated nodes there are.
- reward_method
- Either 0, 1 or 2. Each constitutes a different method of rewarding the agent:
0 is the papers reward system.
1 is my reward system rewarding based on number of un-compromised nodes.
2 is the minimal reward system. The agent gets 1 for a win or -1 for a loss.
Classes
Class that creates a similar environments to that presented in Ridley 17 (Ref above). |
- class yawning_titan.envs.specific.nsa_node_def.NodeEnv(chance_to_spread=0.01, chance_to_spread_during_patch=0.01, chance_to_randomly_compromise=0.15, cost_of_isolate=10, cost_of_patch=5, cost_of_nothing=0, end=1000, spread_vs_random_intrusion=0.5, punish_for_isolate=False, reward_method=1, network=None)[source]#
Class that creates a similar environments to that presented in Ridley 17 (Ref above).
- __init__(chance_to_spread=0.01, chance_to_spread_during_patch=0.01, chance_to_randomly_compromise=0.15, cost_of_isolate=10, cost_of_patch=5, cost_of_nothing=0, end=1000, spread_vs_random_intrusion=0.5, punish_for_isolate=False, reward_method=1, network=None)[source]#
- reset()[source]#
Reset the environment to the default state.
- Returns:
A new starting observation (numpy array)
- step(action)[source]#
Take one timestep within the environment.
Execute the actions for both Blue RL agent and hard-hard coded Red agent.
- Parameters:
action – The action value generated from the Blue RL agent (int)
- Returns:
The next environment observation (numpy array) reward: The reward value for that timestep (int) done: Whether the epsiode is done (bool) info: a dictionary containing info about the current state
- Return type:
observation