Monday, April 2, 2012

Linking apples liking to analytical data

This post describes the last puzzle piece of the model. The link of instrumental to sensory data. Together with the previous pieces this leads to a model starting from physico-chemical measurements, to sensory data, to consumers' perception and finally liking.
In this post I choose to use generalized additive models (GAM). The aim is to build models for SJuiciness, SSweetness, SCrispness and SMealiness. The latter because in the previous post it was possible that it has interest, so I want to keep it in the picture.
GAM appears to be a relatively simple to apply method, however, even with four variables the 17 rows in the data is too low to allow an extensive model, meaning that I had to restrict the smoother factor.

Data preparation and SJuiciness

library(xlsReadWrite)
library(ggplot2)
library(gam)
datain <- read.xls('condensed.xls')
#remove storage conditions
datain <- datain[-grep('bag|net',datain$Products,ignore.case=TRUE),]
datain$week <-  sapply(strsplit(as.character(datain$Product),'_'),
    function(x) x[[2]])
dataval <- datain
vars <- names(dataval)[-1]
for (descriptor in vars) {
  dataval[,descriptor] <- as.numeric(gsub('[[:alpha:]]','',
          dataval[,descriptor]))
}
#CJuiciness
indepV <- grep('^A',vars,value=TRUE)
MJuicL <- gam(SJuiciness ~ AInstrumental.firmness + AJuice.release + 
        ASoluble.solids + ATitratable.acidity ,data=dataval)
MJuics2 <- gam(SJuiciness ~ s(AInstrumental.firmness,2) + s(AJuice.release,2) + 
        s(ASoluble.solids,2) + s(ATitratable.acidity,2) ,data=dataval)
MJuics3 <- gam(SJuiciness ~ s(AInstrumental.firmness,3) + s(AJuice.release,3) + 
        s(ASoluble.solids,3) + s(ATitratable.acidity,3) ,data=dataval)
summary(MJuicL)
Call: gam(formula = SJuiciness ~ AInstrumental.firmness + AJuice.release + 
    ASoluble.solids + ATitratable.acidity, data = dataval)
Deviance Residuals:
    Min      1Q  Median      3Q     Max 
-5.9405 -2.5110 -0.6978  3.1524  7.9597 

(Dispersion Parameter for gaussian family taken to be 18.5651)

    Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 222.7816 on 12 degrees of freedom
AIC: 103.9845 
1 observation deleted due to missingness 

Number of Local Scoring Iterations: 2 
DF for Terms


                       Df
(Intercept)             1
AInstrumental.firmness  1
AJuice.release          1
ASoluble.solids         1
ATitratable.acidity     1

summary(MJuics2)
Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release, 
    2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2), data = dataval)
Deviance Residuals:
    Min      1Q  Median      3Q     Max 
-5.3781 -2.1730 -0.8181  1.9755  6.2297 

(Dispersion Parameter for gaussian family taken to be 19.4993)

    Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 155.997 on 8.0002 degrees of freedom
AIC: 105.9262 
1 observation deleted due to missingness 

Number of Local Scoring Iterations: 2 

DF for Terms and F-values for Nonparametric Effects

                             Df Npar Df  Npar F  Pr(F)
(Intercept)                   1                       
s(AInstrumental.firmness, 2)  1       1 0.77410 0.4046
s(AJuice.release, 2)          1       1 1.14455 0.3159
s(ASoluble.solids, 2)         1       1 0.40054 0.5444
s(ATitratable.acidity, 2)     1       1 1.04256 0.3371

summary(MJuics3)

Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release, 
    3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3), data = dataval)
Deviance Residuals:
       2        3        4        5        6        7        8        9 
 4.06906 -0.28794 -1.14197 -0.20378 -1.77053  0.68596  2.67061 -0.38020 
      12       13       14       15       16       17       18       19 
-4.06569  3.64093 -1.10219 -0.08048  1.03191 -1.13588  2.64431 -3.02135 
      20 
-1.55275 

(Dispersion Parameter for gaussian family taken to be 20.1907)

    Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 80.7624 on 4 degrees of freedom
AIC: 102.735 
1 observation deleted due to missingness 

Number of Local Scoring Iterations: 4 

DF for Terms and F-values for Nonparametric Effects

                             Df Npar Df  Npar F  Pr(F)
(Intercept)                   1                       
s(AInstrumental.firmness, 3)  1       2 0.78051 0.5174
s(AJuice.release, 3)          1       2 1.04430 0.4316
s(ASoluble.solids, 3)         1       2 0.31434 0.7468
s(ATitratable.acidity, 3)     1       2 1.25657 0.3772

anova(MJuicL3,MJuics2,MJuicL)
Analysis of Deviance Table

Model 1: SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release, 
    3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3)
Model 2: SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release, 
    2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2)
Model 3: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids + 
    ATitratable.acidity
  Resid. Df Resid. Dev      Df Deviance Pr(>Chi)
1    4.0000     80.762                          
2    8.0002    155.997 -4.0002  -75.235   0.4444
3   12.0000    222.782 -3.9998  -66.785   0.5077
par(mfrow=c(2,2))
plot(MJuics2,se=TRUE)

plot(MJuics3,se=TRUE)
From this plot and tables it was concluded that ASoluble.solids is not needed. AJuice.release and ATitrable.Acidity may have non-linear effects and AInstrumental.firmness has a linear effect. Based on this, two models are defined, with different levels of curvature.
MJuicsSEL2 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,2) + 
         s(ATitratable.acidity,2) ,data=dataval)
MJuicsSEL3 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,3) + 
         s(ATitratable.acidity,3) ,data=dataval)
anova(MJuicL,MJuicsSEL2,MJuicsSEL3) 
Analysis of Deviance Table

Model 1: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids + 
    ATitratable.acidity
Model 2: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 2) + 
    s(ATitratable.acidity, 2)
Model 3: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 3) + 
    s(ATitratable.acidity, 3)
  Resid. Df Resid. Dev      Df Deviance Pr(>Chi)  
1        12     222.78                            
2        11     177.48 0.99998   45.302  0.05808 .
3         9     113.53 2.00006   63.953  0.07927 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
From this it is concluded that the model MJuicsSEL3 is the best model to predict SJuiciness.

SSweetness, SCrispness and SMealiness

For briefness, it is chosen not to show all the results of the other three responses. The plot of the final models is shown 
Ssweetness is mainly related to AJuicerelease (3 df) and smaller linear effects of the other three variables.

SCrispness is non-linear related to AInstrumental.firmness (3 df), with the other variables much less influential linear effects.
SMealiness again a 3 df smoother with much less influential effects.

Discussion

GAM

GAM is suitable method to examine models where there are no (expected) interactions, but there are expected non-linear effects. It is simple to use. It shows which of the variables have the larger effects and how these effects approximately look like. However, it does have its problems. For instance, there is an artifact in AInstrumental.firmness around 60 where there is a small bump in the responses. If this is a representation of a physical reality, then the rest of the curves seems fairly smooth. It would also indicate that the amount of data (types of apples) needs to be much larger to get a good understanding of the relations between the variables. A second problem is the absence of any interactions. In much of the data interactions between variables are needed.

Overall model for liking

The various parts of the model have now been defined. But, these separate parts do not make a whole model. This needs to be assembled and subsequently compared with a more traditional model such as (multiblock) PLS.

No comments:

Post a Comment