The PVL-Delta
Introduction
The PVL-Delta model is a classic model in mathematical psychology, which can be used to model reward learning over a discrete set of choices. It is in particular used with the Iowa Gambling Task (IGT), which is a standard task for cognitive modelling, where participants choose between four decks of cards, each with different reward and loss probabilities, and must learn which decks are better to choose from. The PVL-Delta model combines principles from Prospect Value Learning (PVL) models and Expectancy Valence (EV) models. It consists of three steps. First, the observed reward is transformed according to Prospect Theory, to make it into a subjective value $u_t$. This is done by transforming the value of the reward according to a power function, which is defined by the reward sensitivity parameter $A$, and adding a loss aversion parameter $w$ that controls how much losses and rewards differ in subjective value.
\[u_t = \begin{cases} r_t^{A} & \text{if } r_t \geq 0 \\ -w \cdot |r_t|^{A} & \text{if } r_t < 0 \end{cases}\]
where $r_t$ is the observed reward, $A$ is the reward sensitivity, and $w$ is the loss aversion. The next step uses a classic Rescorla-Wagner learning rule to update the expected value of the chosen option:
\[V_{t,c} = V_{t-1,c} + \alpha \cdot (u_t - V_{t-1,c})\]
where $V_{t,c}$ is the expected value at time $t$ for option $c$, $V_{t-1, c}$ is the expected value at time $t-1$, $\alpha$ is the learning rate, and $u_t$ is the subjective value of the reward. Options that were not chosen are not updated, so the expected values of all other options remain the same. Finally, the action probabilities are calculated using a softmax function over the expected values, weighted by a noise parameter $\beta$:
\[P(a_t = i) = \sigma(E_{t,i} \cdot \beta)\]
where $P(a_t = i)$ is the probability of choosing action $i$ at time $t$, $E_{t,i}$ is the expected value of action $i$ at time $t$, and $\beta$ is the action precision. $\sigma$ is the softmax function, which ensures that the action probabilities sum to 1, defined as:
\[\sigma(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}\]
And the resulting actions denote which deck was chosen at each timestep.
In total, the PVL-Delta model then has five parameters:
- the learning rate $\alpha \in [0,1]$, which controls how quickly the expected values are updated
- the reward sensitivity $A \in [0,1]$, which controls how quickly increases in the subjective value drops in relation to increases in observed reward, with $A = 1$ meaning that the subjective value is equal to the observed reward
- the loss aversion $w \in [0, \infty]$, which controls how much losses are weighted more than gains, with $w = 1$ meaning that losses and gains are weighted equally
- the action precision $\beta \in [0, \infty]$, which controls how much noise there is in the action selection process
- the initial expected value $V_{0,c} \in \mathbb{R}$ for each option $c$, by default set to 0 for all actions.
And there is one state:
- the expected value $V_{t,c} \in \mathbb{R}$ for each option $c$ at time $t$.
It takes two observations:
- the index of the chosen option $c_t \in \{1, 2, \ldots, n\}$, which in the IGT is the index of the deck chosen at time $t$
- the observed reward $r_t \in \mathbb{R}$, which is the reward received for the chosen option at time $t$.
And returns one action:
- the chosen option $a_t \in \{1, 2, \ldots, n\}$, which is the index of the option chosen at time $t+1$.
Notably, in this example, as is common in many datasets, the action returned by the PVL-delta is the index of the deck chosen at that same timestep. This means that actions must be sampled before the reward is observed, which is ensured by setting act_before_update = true
when creating the model. See below. Had it been a dataset where the action where the reward from the last trial was received in the same timestep as the action for this timestep was chosen, we would have set act_before_update = false
when creating the model.
Implementation in ActionModels
In this section, we will demonstrate how to use the premade PVL-Delta model in ActionModels.
We first load the ActionModels package, StatsPlots for plotting results, and DataFrames and CSV for loading data
using ActionModels
using StatsPlots
using DataFrames, CSV
Then we create the PVL-Delta action model.
action_model = ActionModel(
PVLDelta(
#The number of options, which is the number of decks in the IGT
n_options = 4,
#The various parameters of the PVL-Delta
learning_rate = 0.1,
action_noise = 1,
reward_sensitivity = 0.5,
loss_aversion = 1,
#The initial expected value for each option. This should be the same length as `n_options`.
initial_value = zeros(4),
#Set to true if the action is made before the reward is observed
act_before_update = true,
),
)
-- ActionModel --
Action model function: pvl_delta_act_before_update
Number of parameters: 5
Number of states: 1
Number of observations: 2
Number of actions: 1
And we can now use this action model to simulate behaviour or fit to data. The PVL-Delta model takes two observations: the deck chosen and the reward received.
#Create observations
observations = [
(1, 75.0),
(1, -50.0),
(2, 100.0),
(2, -25.0),
(3, 50.0),
(3, -75.0),
(4, 0.0),
(4, 25.0),
(1, 100.0),
(1, -100.0),
(2, 50.0),
(2, -50.0),
(3, 75.0),
(3, -25.0),
(4, 0.0),
(4, 100.0),
]
#Instantiate agent
agent = init_agent(action_model, save_history = true)
#Simulate behaviour
simulated_actions = simulate!(agent, observations);
#Extract the history of expected values
expected_values = get_history(agent, :expected_value)
#Collect into a Matrix
expected_values = transpose(hcat(expected_values...))
#Plot expectations over time
expectation_plot = plot(
0:length(observations), expected_values,
ylabel = "Expected Value",
legend = :topright,
label = ["Deck 1" "Deck 2" "Deck 3" "Deck 4"],
)
#Plot rewards colored by deck choice
deck_choices = [observation[1] for observation in observations]
rewards = [observation[2] for observation in observations]
rewards_plot = scatter(
1:length(rewards), rewards;
group = deck_choices,
xlabel = "Timestep",
ylabel = "Reward",
label = nothing,
markersize = 6,
ylims = (minimum(rewards)-20, maximum(rewards)+20),
)
#Plot together
plot_title = plot(
title = "Expected rewards over time by deck",
grid = false,
showaxis = false,
bottom_margin = -180Plots.px,
)
plot(plot_title, expectation_plot, rewards_plot, layout = (3, 1))
#TODO: Do this with a proper IGT environment
And finally, we can also fit it to data. We will do this by fitting to behavioural data from an experiment where healthy controls and subbjects with addictions to heroin and amphetamine play the IGT, by Ahn et al. (2014). We will here only use a subset of the data. See the tutorial on the IGT for more details the data, and an example of fitting the full dataset.
First, we load the data:
#Read in data on healthy controls
data = CSV.read(
joinpath(docs_path, "example_data", "ahn_et_al_2014", "IGTdata_healthy_control.txt"),
DataFrame,
)
data[!, :clinical_group] .= "control"
#Make column with total reward
data[!, :reward] = Float64.(data[!, :gain] + data[!, :loss])
#Make column with subject ID as string
data[!, :subjID] = string.(data[!, :subjID])
#Select the three first subjects
data = filter(row -> row[:subjID] in ["103", "104", "337"], data)
show(data)
200×7 DataFrame
Row │ trial deck gain loss subjID clinical_group reward
│ Int64 Int64 Int64 Int64 String String Float64
─────┼─────────────────────────────────────────────────────────────
1 │ 1 3 50 0 103 control 50.0
2 │ 2 3 60 0 103 control 60.0
3 │ 3 3 40 -50 103 control -10.0
4 │ 4 4 50 0 103 control 50.0
5 │ 5 3 55 0 103 control 55.0
6 │ 6 1 100 0 103 control 100.0
7 │ 7 1 120 0 103 control 120.0
8 │ 8 2 100 0 103 control 100.0
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
194 │ 94 2 120 0 104 control 120.0
195 │ 95 4 80 0 104 control 80.0
196 │ 96 2 120 0 104 control 120.0
197 │ 97 4 40 0 104 control 40.0
198 │ 98 2 140 0 104 control 140.0
199 │ 99 4 80 0 104 control 80.0
200 │ 100 2 110 0 104 control 110.0
185 rows omitted
We can then create the PVL-Delta action model. In this case, the data is strutured so that actions are made on each timestep before the reward is observed, so we set act_before_update = true
.
action_model = ActionModel(PVLDelta(n_options = 4, act_before_update = true))
-- ActionModel --
Action model function: pvl_delta_act_before_update
Number of parameters: 5
Number of states: 1
Number of observations: 2
Number of actions: 1
We can then create the full model, and sample from the posterior.
population_model = (
learning_rate = LogitNormal(),
action_noise = LogNormal(),
reward_sensitivity = LogitNormal(),
loss_aversion = LogNormal(),
)
model = create_model(
action_model,
population_model,
data;
action_cols = :deck,
observation_cols = (chosen_option = :deck, reward = :reward),
session_cols = :subjID,
)
chns = sample_posterior!(model)
Chains MCMC chain (1000×20×2 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 2
Samples per chain = 1000
Wall duration = 118.83 seconds
Compute duration = 117.98 seconds
parameters = learning_rate.session[1], learning_rate.session[2], action_noise.session[1], action_noise.session[2], reward_sensitivity.session[1], reward_sensitivity.session[2], loss_aversion.session[1], loss_aversion.session[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
learning_rate.session[1] 0.0458 0.0468 0.0021 512.1666 583.0329 0.9999 4.3411
learning_rate.session[2] 0.3822 0.2069 0.0054 1453.0000 1153.7302 1.0009 12.3154
action_noise.session[1] 3.2320 2.4843 0.1072 427.6931 1073.4886 0.9999 3.6251
action_noise.session[2] 8.3371 6.7214 0.2013 1174.5419 1232.1044 1.0004 9.9553
reward_sensitivity.session[1] 0.5674 0.2681 0.0171 329.2441 1097.5590 0.9995 2.7906
reward_sensitivity.session[2] 0.2689 0.1410 0.0041 1201.5269 970.4318 0.9999 10.1840
loss_aversion.session[1] 0.6686 0.2495 0.0090 632.5294 493.6759 1.0000 5.3612
loss_aversion.session[2] 0.4836 0.4468 0.0138 1327.0148 1108.4794 1.0004 11.2476
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
learning_rate.session[1] 0.0099 0.0212 0.0329 0.0513 0.1667
learning_rate.session[2] 0.0719 0.2128 0.3574 0.5265 0.8127
action_noise.session[1] 0.3411 1.2863 2.6705 4.5825 9.6468
action_noise.session[2] 1.8648 4.0217 6.4728 10.2668 26.0845
reward_sensitivity.session[1] 0.0832 0.3114 0.6597 0.7937 0.9180
reward_sensitivity.session[2] 0.0666 0.1614 0.2464 0.3528 0.5988
loss_aversion.session[1] 0.1781 0.5277 0.6631 0.7948 1.2128
loss_aversion.session[2] 0.0811 0.2117 0.3559 0.6085 1.5911
TODO: plot results
This page was generated using Literate.jl.