yawning_titan.envs.generic.core.reward_functions#
A collection of reward functions used be the generic network environment.
You can select the reward function that you wish to use in the config file under settings. The reward functions take in a parameter called args. args is a dictionary that contains the following information:
-network_interface: Interface with the network -blue_action: The action that the blue agent has taken this turn -blue_node: The node that the blue agent has targeted for their action -start_state: The state of the nodes before the blue agent has taken their action -end_state: The state of the nodes after the blue agent has taken their action -start_vulnerabilities: The vulnerabilities before blue agents turn -end_vulnerabilities: The vulnerabilities after the blue agents turn -start_isolation: The isolation status of all the nodes at the start of a turn -end_isolation: The isolation status of all the nodes at the end of a turn -start_blue: The env as the blue agent can see it before the blue agents turn -end_blue: The env as the blue agent can see it after the blue agents turn
The reward function returns a single number (integer or float) that is the blue agents reward for that turn.
Functions
Calculate the cost function for DCBO using a set of fixed action cost values. |
|
Calculate the reward for the current state of the environment. |
|
Provide reward based on the proportion of nodes safe within the environment. |
|
Give a reward for 0.1 for every timestep that the blue agent is alive. |
|
Just punishes bad actions bad moves. |
|
Give 1 reward for every safe node at that timestep. |
|
Calculate the reward for the current state of the environment. |
|
Return zero reward per timestep. |
- yawning_titan.envs.generic.core.reward_functions.standard_rewards(args)[source]#
Calculate the reward for the current state of the environment.
Actions cost a certain amount and blue gets rewarded for removing red nodes and reducing the vulnerability of nodes
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
The reward earned for this specific turn for the blue agent
- yawning_titan.envs.generic.core.reward_functions.experimental_rewards(args)[source]#
Calculate the reward for the current state of the environment.
Actions cost a certain amount and blue gets rewarded for removing red nodes and reducing the vulnerability of nodes
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
The reward earned for this specific turn for the blue agent
- yawning_titan.envs.generic.core.reward_functions.one_per_timestep(args)[source]#
Give a reward for 0.1 for every timestep that the blue agent is alive.
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
0.1
- yawning_titan.envs.generic.core.reward_functions.zero_reward(args)[source]#
Return zero reward per timestep.
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
0
- yawning_titan.envs.generic.core.reward_functions.safe_nodes_give_rewards(args)[source]#
Give 1 reward for every safe node at that timestep.
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
The reward earned for this specific turn for the blue agent
- yawning_titan.envs.generic.core.reward_functions.punish_bad_actions(args)[source]#
Just punishes bad actions bad moves.
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
The reward earned for this specific turn for the blue agent
- yawning_titan.envs.generic.core.reward_functions.num_nodes_safe(args)[source]#
Provide reward based on the proportion of nodes safe within the environment.
- Parameters:
args – A dictionary containing information from the
timestep (environment for the given) –
- Returns:
The calculated reward
- yawning_titan.envs.generic.core.reward_functions.dcbo_cost_func(args)[source]#
Calculate the cost function for DCBO using a set of fixed action cost values.
- Parameters:
args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn
- Returns:
The cost for DCBO