Monday, May 7, 2012

Multiplicative effects in sensory panel data

In a previous post I used JAGS to build the Bayesian equivalent of a two-way ANOVA. Effects were determined of products, panelists and their interaction. In this post this model will be rebuild to provide a more simplified and advanced model. The interaction between panelists and products is removed, which is the simplification. A multiplicative effect is added to represent scale usage by panelists, which is more advanced. Just to be clear, this scale effect is on top of a panelist dependent location effect.

Description of a scale effect

For me, in sensory analysis, a scale effect is that some panelists use wider scales than others. Imagine three products with 'true' mean effects A=3, B=4 and C=5. A panelist with large scale size might score these as A=2, B=4 and C=6. A panelist with small scale size might score A=3.5, B=4 and C=4.5. In addition, the normal additive effect is present, so a panelist with a normal scale size and low average might score A=2, B=3 and C=4. I would object to negative scale sizes, a panelist with scores A=5, B=4 and C=3 disagrees and does not have a scale of -1, this panelist hence provides evidence against A=3, B=4 and C=5.

Implementation of a scale effect

The implementation will follow the model of the previous post, with some changes. The essential changes are highlighted here. Full script is at the end of the post.

Fit the data

The multiplication itself is shown in this line
 fit[i] <-  grandmean +  mPanelist[Panelist[i]] + 
            mProduct[Product[i]] * sPanelist[Panelist[i]]
The part grandmean + mPanelist[ ]  is the location for the specific panelist. The overall mean (grandmean) plus the individual offset in mPanelist

mProduct[ ] contains the effects of the products. These are multiplied by sPanelist[ ] to get the product effect as scored by the individual panelists. 
In contrast to the previous model where mProduct[1] was defined as 0, in this model mProduct is drawn from a normal distribution with mean 0. This is chosen so that the scaling influences products with low means similar to products with high means. The model shows this as mProduct[ ] ~ dnorm(0,tauProduct).

sPanelist[ ] is the scaling. As stated before, it has to be positive. Also, a scaling of 1 (one) would be no scaling, while scaling 1/2 and 2 would be equally different from no scaling (1). I chose to use the exponential function on top of a normal distribution. The normal distribution has mean 0, which becomes 1 after exponentiation. The precision is 9. This represents a standard deviation of 1/3. Hence multiplication factors over exp(2/3) =2 or under 1/2 would need convincing data. This precision seems reasonable, but could be investigated further. As it is, this would have the widest scale 4 times as wide as the narrowest scale, which seems quite a lot to me.
sPanelist[ ] <- exp(EsPanelist[ ])
EsPanelist[ ] ~ dnorm(0,9) 

Estimates for products

Product estimates are now scale dependent. As the average scale for samples is not by definition 1, the scale needs to be incorporated on top of the location effects.
       meanProduct[ ] <- grandmean + mean(mPanelist[1:nPanelist]) + 
                         mProduct[ ]*exp(mean(EsPanelist[1:nPanelist])) 
grandmean + mean(mPanelist[1:nPanelist]) is an offset.
exp(mean(EsPanelist[1:nPanelist])) is the average scale over the panel. This specific formula is chosen so that a scale of 2 has a similar sized effect as a scale of 1/2. 

Results

Panelist scale

The figure shows the posterior 95% intervals for panelist scale. All the intervals cover level 1. This may be due to the fact that the scale factor is determined on limited data, with 6 products and two repetitions in the data the intervals seem wide. It should also be noted that in a different calculation, where the prior for the scale had precision 4, the posterior intervals were a bit wider but visually similar. This shows that the prior has the desired effect on the posterior.
We learn from the scale information that panelist 13 and 28 have a rather wide scale usage. There are no panelists with a scale usage which is particularly small compared to the panel as a group. 

Product results

In the end, the product effects are almost indistinguishable from the previous results. The same product pairs seem different, the product means are slightly different, say a shift of 0.1 at most. The S.E. of the products are slightly smaller, e.g. in the first product 0.20 vs. 0.22 in the interaction model.

          Mean        SD    Naive SE Time-series SE
choc1 6.990775 0.2004567 0.003169499    0.003679465
choc2 6.532153 0.1991429 0.003148726    0.003090198
choc3 4.824947 0.2137638 0.003379902    0.003806849
choc4 6.294411 0.1984455 0.003137699    0.003369465
choc5 6.680679 0.1999901 0.003162121    0.003291441
choc6 6.383033 0.1991875 0.003149432    0.003189546

Conclusion

The model with a multiplicative scale effect is an interesting model to use. It makes sense in terms of how a panel works, which is aesthetically nice. In terms of product results the result is close to regular ANOVA with the example data. In terms of panel validation, it has an added value, it shows which panelists have a particularly high or low scale usage. 
The disadvantage is in the cumbersome analysis compared to a standard ANOVA. It is also an incomplete model, session effects and round effects which are often part of the sensory analysis need to be incorporated. Finally this model is that it has not been field tested or simulation tested in any way. Prior to presenting results as sensory results I would want to have some feeling how it compares to ANOVA.

R script

library(SensoMineR)
library(coda)
library(R2jags)

FullContrast <- function(n) {
UseMethod('FullContrast',n)
}
FullContrast.default <- function(n) stop('FullContrast only takes integer and factors')
FullContrast.integer <- function(n) {
mygrid <- expand.grid(v1=1:n,v2=1:n)
mygrid <- mygrid[mygrid$v1<mygrid$v2,]
rownames(mygrid) <- 1:nrow(mygrid)
as.matrix(mygrid)
}
FullContrast.factor <- function(n) {
FullContrast(nlevels(n))
}

data(chocolates)
names(sensochoc)

data_list <- with(sensochoc,list(Panelist = as.numeric(Panelist),
nPanelist = nlevels(Panelist),
Product = as.numeric(Product),
nProduct = nlevels(Product),
y=CocoaA,
N=nrow(sensochoc)))
data_list$Productcontr <-  FullContrast(sensochoc$Product) 
data_list$nProductcontr <- nrow(data_list$Productcontr)


model.file <- file.path(tempdir(),'mijnmodel.txt')

mymodel <- function() {
# core of the model  
for (i in 1:N) {
fit[i] <-  grandmean +  mPanelist[Panelist[i]] +  mProduct[Product[i]] * sPanelist[Panelist[i]]
y[i] ~ dnorm(fit[i],tau)
}
# grand mean and residual 
tau ~ dgamma(0.001,0.001)
gsd <-  sqrt(1/tau)
grandmean ~ dnorm(0,.001)
# variable Panelist distribution  
for (i in 1:nPanelist) {
mPanelist[i] ~ dnorm(0,tauPanelist) 
}
tauPanelist ~ dgamma(0.001,0.001)
sdPanelist <- sqrt(1/tauPanelist)
# Product distribution 
for (i in 1:nProduct) {
mProduct[i] ~ dnorm(0,tauProduct)
}
tauProduct ~ dgamma(0.001,0.001)
sdProduct <- sqrt( 1/tauProduct)
# distribution of the multiplicative effect
for (i in 1:nPanelist) {
sPanelist[i] <- exp(EsPanelist[i])
EsPanelist[i] ~ dnorm(0,9)  
}
# getting the interesting data
# true means for Panelist
for (i in 1:nPanelist) {
meanPanelist[i] <-  grandmean + mPanelist[i] + mean(mProduct[1:nProduct])*sPanelist[i]
}
# true means for Product
for (i in 1:nProduct) {
meanProduct[i] <- grandmean + mProduct[i]*exp(mean(EsPanelist[1:nPanelist]))  + mean(mPanelist[1:nPanelist])
}
for (i in 1:nProductcontr) {
Productdiff[i] <- meanProduct[Productcontr[i,1]]-meanProduct[Productcontr[i,2]]
}
}

write.model(mymodel,con=model.file)

inits <- function() list(
grandmean = rnorm(1,3,1),
mPanelist = c(0,rnorm(data_list$nPanelist-1)) ,
mProduct = c(0,rnorm(data_list$nProduct-1)) ,
EsPanelist = rnorm(data_list$nPanelist) ,
tau = runif(1,1,2),
tauPanelist = runif(1,1,3),
tauProduct = runif(1,1,3)
)

parameters <- c('sdPanelist','sdProduct','gsd','meanPanelist','meanProduct','Productdiff','sPanelist')
jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=10000)

jagsfit
# plot(jagsfit) # not shown in blog
jagsfit.mc <- as.mcmc(jagsfit)

# plot(jagsfit.mc) # not shown in blog

fitsummary <- summary(jagsfit.mc)
# extract the scale effects and plot them
sPanelist <- fitsummary$quantiles[ grep('sPanelist',rownames(fitsummary$quantiles)),]
colnames(sPanelist) <-c('x2.5','x25','x50','x75','x97.5')
sPanelist <- as.data.frame(sPanelist)
sPanelist$pnum <- 1:nrow(sPanelist)
library(ggplot2)
limits <- aes(ymax = sPanelist$x97.5, ymin=sPanelist$x2.5) 
p <- ggplot(sPanelist, aes(y=x50, x=pnum)) 
p + geom_point() + geom_errorbar(limits, width=0.2) + scale_y_log10('Panelist scale') + scale_x_continuous("Panelist number") 

# extract differences
Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]
# extract differences different from 0
data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]
# get the product means
ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),]
rownames(ProductMean) <- levels(sensochoc$Product)
ProductMean

No comments:

Post a Comment