Module `functional`¶

The functional module implements various functional needed for reinforcement learning calculations.

Exposed functions:

loss.entropy()
loss.policy_gradient()
vtrace.from_logits()

Exposed functions¶

Loss functions (`functional.loss`)¶

Collection of loss functions necessary for reinforcement learning objective calculations.

pytorch_seed_rl.functional.loss.entropy(logits: torch.Tensor) → torch.Tensor [source]¶

Return the entropy loss, i.e., the negative entropy of the policy.

This can be used to discourage an RL model to converge prematurely.

Vtrace (`functional.vtrace`)¶

Functions to compute V-trace off-policy actor critic targets.

All exposed functions return a VTraceFromLogitsReturns.

class pytorch_seed_rl.functional.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)¶

Bases: tuple

property behavior_action_log_probs¶: Alias for field number 3

property log_rhos¶: Alias for field number 2

property pg_advantages¶: Alias for field number 1

property target_action_log_probs¶: Alias for field number 4

property vs¶: Alias for field number 0

pytorch_seed_rl.functional.vtrace.from_logits(behavior_policy_logits: torch.Tensor, target_policy_logits: torch.Tensor, values: torch.Tensor, bootstrap_value: torch.Tensor, actions: torch.Tensor, discounts: torch.Tensor, rewards: torch.Tensor, clip_rho_threshold: float = 1.0, clip_pg_rho_threshold: float = 1.0) → pytorch_seed_rl.functional.vtrace.VTraceFromLogitsReturns [source]¶

V-trace for softmax policies.

Parameters

behavior_policy_logits (torch.Tensor) – The policies logits used for action sampling during interaction with the environment.
target_policy_logits (torch.Tensor) – The policies logits returned by the learning model.
values (torch.Tensor) – The values returned by the learning model.
bootstrap_value (torch.Tensor) – The value used for bootstrapping (usually most recent value returned by learning model.)
actions (torch.Tensor) – The actions used during interaction with the environment.
discounts (torch.Tensor) – The discounted rewards.
rewards (torch.Tensor) – The original rewards.
clip_rho_threshold (float,) – Clipping value for Vtrace. See paper for details.
clip_pg_rho_threshold (float,) – Clipping value for Vtrace. See paper for details.

Module functional¶

Exposed functions¶

Loss functions (functional.loss)¶

Vtrace (functional.vtrace)¶

Module `functional`¶

Loss functions (`functional.loss`)¶

Vtrace (`functional.vtrace`)¶