SQAB Notes -- Day 1
Jay Moore, Some effects of procedural variables on the dynamics of operand choice:
It is not only the overall rate of reinforcement that influences responding in choice paradigms, but also the amount of delay discounting. Procedural variations (inter-response times, change over delays, etc.) have a big impact, and can change the independent variable.
Increasing the amount of time of first delay until reinforcement in a schedule decreases the preference for that schedule. This indicates that the pigeon emphasizes the time elapsed from the beginning of the schedule to the end.
Federico Sanabria, The dynamics of conditioning and extinction:
How can you measure learning? Probe sessions have some problems, and may not be completely ecologically valid.
Instead, you can do probabilistic automaintenence, and measure how well the animal can extract a US given signal from noise.
They have a Momentum/Pavlovian model, which tracks both what the animal did on the previous trial, as well as conditioning, i.e., factoring in the reinforcement status on the last trial.
With long inter-trial intervals (increased delays between USs), their model implies that positive momentum will have *more* impact. Also, conditioning is faster when the US presentation is *more* sporadic, they have apparently found.
In terms of probability of producing an operant response, jumping from p=0.1 to p=0.2 produces a proportionally larger increase in responding than increasing from either p=0.025 to 0.05 or from p=0.05 to p=0.1.
Chris Podlesnik and Tim Shahan. Extinction, relapse, and behavioral momentum:
Adding variable time behavior to a schedule (i.e., variable interval 60 seconds versus variable interval 60 seconds plus variable time 15 seconds) increases stimulus strength but decreases the response-reinforcer relation. Relative resistance to extinction, however, was greater in the schedule with the inclusion of the variable time.
Relapse of extinguished operant behavior also depends on the Pavlovian relationship, using a similar procedure as above.
Nevin and Grace (2001)'s augmented model of extinction takes into account this power law function describing the resistance to extinction, and in 2005 they added a parameter to amplify the disruptive effect. This model gets pretty good correlations, with r squared values typically around 0.75 for individuals and 0.95 for group data.
If you reduce the disruptive effects preventing behavior, you'll get an increase in behavior, or relapse. Drug abuse may be relevant here, because you get similar results in alcohol and cocaine self-administration.
Bruce Curry, The tautology of the matching law in consumer behavior analysis.
Cost matching is more of a standard economic approach, but there are all sorts of interesting topics to be studied in the matching law.
If you specify the model correctly the risk of tautology should be low. Things that are testable (i.e., falsifiable), cannot be tautological.
You can escape from tautology by reference to aggregation, adding coefficients across brands or across consumers. If you do not do so, you run into a danger of circularity.
Matching could be seen as optimization, which would theoretically justify a certain form of matching.
If you test your specific functional form against a free form regression model (i.e., neural networks or kernel analysis), and find that the free form is better you have reason to doubt your functional form.
McDowell's generalized matching law equation contains an error term, and he is worried about it. It could be made into an extraneous or latent variable in consumer analysis, for example. You need to take into account error terms in the regression, so maybe that is why you cannot also insert them into the original equation.
Many questioners doubted that consumer relations can adequately be explained by the matching law, because there are too many random variables at play in real life behavior.
People started arguing (pretty virulently) about whether you can use the R sub e approximation in probabilistic data. They agreed that the solution is to put in a feedback function for the error term.
Steven Hirsch, Exponential demand and cross-price demand interactions, extensions for multiple reinforcers:
There are two concepts of value: scalar value (the dose/potency/size of the reinforcer or commodity) and essential value (the importance of the commodity to the subjects, i.e., their ordinal preferences).
The form of the demand function and the slope across qualitative demand curves can tell us about the relative essential value of a stimulus.
The rate constant of the demand function (alpha) is what distinguishes commodities (i.e., luxuries vs. normal goods). Different doses of the same reinforcer have the same rate constants, and the only difference is the starting quantity.
His equation is log Q = Q0 + k * ( exp (-alpha * (Q0 * c)) - 1), where Q0 = scalar value variance, alpha = essential value or rate of change in elasticity, and k = scaling constant.
Most research has been into scalar values instead of essential values, but often drug companies use behavioral pharmacology research on rodents to determine how similar the effects of two different drugs will be upon reinstatement. By comparing them to opiates, they can quantify the potential for dependency.
People assume that there must be negative feedback effects at higher levels that causes the inverted U-shape dose-dependent response curve you see in most drugs, either due to diminished motor responses or simple toxicity. However, you get the same inverted U-shaped dose-dependent response curves to food, and nobody would say that there is toxicity or diminished motor responses in that case, instead it is explained as a preference. So maybe there isn't a negative feedback but merely an attenuation in pleasure that results in the downward sloping portion of the graph. This might come about because the animal has already reached its optimal level of responding based on dopamine receptors, for instance.
Ido Erev, Learning and decisions from experience:
It is useful to go back to learning and behavior research in order to gain insight into behavioral economics. He uses a simple "clicking paradigm" in most of his experiments, where participants simply click on one of two buttons to gain a reward.
Any behavior can be rationalized by different priors, so the whole non-rational approach has limitations. Nevertheless, in terms of seeking the highest expected value within his clicking paradigm, here are the ways in which subjects commonly deviate from optimality:
Underweighting of rare events. Say that the two buttons have reward schedules of either 1 with p=0.9 and -10 with p=0.1, or simply 0. The best response would be to choose the schedule of 0, but people only do it 40% of the time. When you have a schedule with rewards of 10 with p=0.1 and -1 with p=0.9 against a schedule with a constant reward of 0, subjects tend to prefer the 0, even though in this case this it has a lower expected value. When asked in questionnaires following the task, subjects overestimate the probability of the 10 in both tasks. But subjects act as if they under weigh this rare event.
Payoff variability effect. High payoff variability tends to move behavior towards more random choices, even if there is still one schedule that has a higher overall expected value.
Big eyes effect. If you show the mean foregone payoff of the possibilities that were not chosen, individuals tend to prefer risky alternatives with low expected value. By showing the mean subjects may begin to assume that there are better alternatives out there. This is a counter-intuitive effect because it does not display the typical loss aversion that you see in other experiments.
Regressive exploration. Individuals display too much exploration in binary choice tasks, but not enough exploration in tasks with multiple schedules.
Allais paradox. This is an effect that shows how individuals can deviate from expected utility theory, in that if subjects prefer lottery A over B they will not necessarily prefer some A + C over B + C where C is some other lottery. This has also been demonstrated in rats and bees. In humans, there is an experience-description gap. Although individuals report that they would act consistent with the expected utility theory, in behavioral tests they usually fail to do so.
He talked about applying his research to fixing his owns patterns of irrationality. In his introspection, he noted that we plan our behavior to be careful, but it doesn't usually work out that way.
He also applied this research to the broken window effect on cheating in exams. The optimal threshold for punishment depends on how many students as cheating, because an individual might as well cheat if lots of others are cheating, but you should not do so if you would be the only one; these are the two equilibria at the extremes. He suggests that in this scenario continuous punishment would be the optimal solution from the teacher's perspective.