yawning_titan.envs.generic.core.reward_functions#

A collection of reward functions used be the generic network environment.

You can select the reward function that you wish to use in the config file under settings. The reward functions take in a parameter called args. args is a dictionary that contains the following information:

-network_interface: Interface with the network -blue_action: The action that the blue agent has taken this turn -blue_node: The node that the blue agent has targeted for their action -start_state: The state of the nodes before the blue agent has taken their action -end_state: The state of the nodes after the blue agent has taken their action -start_vulnerabilities: The vulnerabilities before blue agents turn -end_vulnerabilities: The vulnerabilities after the blue agents turn -start_isolation: The isolation status of all the nodes at the start of a turn -end_isolation: The isolation status of all the nodes at the end of a turn -start_blue: The env as the blue agent can see it before the blue agents turn -end_blue: The env as the blue agent can see it after the blue agents turn

The reward function returns a single number (integer or float) that is the blue agents reward for that turn.

Functions

dcbo_cost_func

Calculate the cost function for DCBO using a set of fixed action cost values.

experimental_rewards

Calculate the reward for the current state of the environment.

num_nodes_safe

Provide reward based on the proportion of nodes safe within the environment.

one_per_timestep

Give a reward for 0.1 for every timestep that the blue agent is alive.

punish_bad_actions

Just punishes bad actions bad moves.

safe_nodes_give_rewards

Give 1 reward for every safe node at that timestep.

standard_rewards

Calculate the reward for the current state of the environment.

zero_reward

Return zero reward per timestep.

yawning_titan.envs.generic.core.reward_functions.standard_rewards(args)[source]#

Calculate the reward for the current state of the environment.

Actions cost a certain amount and blue gets rewarded for removing red nodes and reducing the vulnerability of nodes

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

The reward earned for this specific turn for the blue agent

yawning_titan.envs.generic.core.reward_functions.experimental_rewards(args)[source]#

Calculate the reward for the current state of the environment.

Actions cost a certain amount and blue gets rewarded for removing red nodes and reducing the vulnerability of nodes

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

The reward earned for this specific turn for the blue agent

yawning_titan.envs.generic.core.reward_functions.one_per_timestep(args)[source]#

Give a reward for 0.1 for every timestep that the blue agent is alive.

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

0.1

yawning_titan.envs.generic.core.reward_functions.zero_reward(args)[source]#

Return zero reward per timestep.

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

0

yawning_titan.envs.generic.core.reward_functions.safe_nodes_give_rewards(args)[source]#

Give 1 reward for every safe node at that timestep.

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

The reward earned for this specific turn for the blue agent

yawning_titan.envs.generic.core.reward_functions.punish_bad_actions(args)[source]#

Just punishes bad actions bad moves.

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

The reward earned for this specific turn for the blue agent

yawning_titan.envs.generic.core.reward_functions.num_nodes_safe(args)[source]#

Provide reward based on the proportion of nodes safe within the environment.

Parameters:
  • args – A dictionary containing information from the

  • timestep (environment for the given) –

Returns:

The calculated reward

yawning_titan.envs.generic.core.reward_functions.dcbo_cost_func(args)[source]#

Calculate the cost function for DCBO using a set of fixed action cost values.

Parameters:

args – A dictionary containing the following items: network_interface: Interface with the network blue_action: The action that the blue agent has taken this turn blue_node: The node that the blue agent has targeted for their action start_state: The state of the nodes before the blue agent has taken their action end_state: The state of the nodes after the blue agent has taken their action start_vulnerabilities: The vulnerabilities before blue agents turn end_vulnerabilities: The vulnerabilities after the blue agents turn start_isolation: The isolation status of all the nodes at the start of a turn end_isolation: The isolation status of all the nodes at the end of a turn start_blue: The env as the blue agent can see it before the blue agents turn end_blue: The env as the blue agent can see it after the blue agents turn

Returns:

The cost for DCBO