Sunday, October 7, 2012

Footbal ordinal model: examination and predictions

In the previous entry an ordinal model for football games was developed. It is now time to look a bit better at the model and use it. This means three sections; A look at likelihood and link function, a model interpretation part, which focuses on the effect of playing away or at home and a look at some predictions.

model definition

Just to recall the model defined. 
clm4b <- clm(oGoals ~OffenseClub + DefenseClub*OffThuis, data=StartData)

link function and likelihood

Looking at the vignette 'Analysis of ordinal data with cumulative link models — estimation with the R-package ordinal' there is a section where the link function is examined. I am not the expert here, but logit, probit and cloglog are most often used. Logit and probit are similar, except for extreme results. cloglog and loglog are asymetrical. Cauchit, I never encountered in the literature, but since it's there it is taken along. A model with logit link is a proportional odds model, a model with cloglog link is a proportional hazard model. See also McCullagh and Nelder (2nd edition, 1989). 
The example in the vignette ran thus:
links <- c("logit", "probit", "cloglog", "loglog", "cauchit")
sapply(links, function(link) {
      clm(oGoals ~OffenseClub + DefenseClub*OffThuis 
          ,link=link,data=StartData)$logLik })
    logit    probit   cloglog    loglog   cauchit 
-906.3591 -906.9762 -914.2396 -915.1181 -924.0992 
Luckily the proportional odds model holds.

Slice can be used to plot the behavior of the likelihood as function of parameter estimates. It does give one plot for each parameter. To keep this post short only the first 9 are shown. These include the estimates of the category thresholds. It can be seen that thresholds 5|6 and 6|7 are asymmetrical, but close to the maximum likelihood value they are close to quadratic (detail plot not shown).

slice.fm4b <- slice(clm4b, lambda = 5)
par(mfrow = c(3,3))
plot(slice.fm4b,1:9)

Parameter interpretation

The most interesting parameter to assess is the interaction; DefenseClub*OffThuis. It is not simple to do so, all parameters are dependent on each other and the odd parameter is not present to keep the model estimable. As a way out the difference for each team between playing at home or away is used. Ideally this is against a similar team, so. The chances for a team when playing against itself are used. It is a very abstract idea; a team which plays at home against itself when away. The merit is in the interpretation.
library(lattice)
teams <- data.frame(Off=levels(StartData$OffenseClub),
    Def=levels(StartData$OffenseClub))
homeaway <- morepred(clm4b,teams)
longha <- reshape(homeaway[,-1],
            idvar='club2',varying=list(chance=names(homeaway)[3:5]),
            direction='long',v.names='chance',
            times=c('Home','Draw','Away'),timevar='Location')
dotplot(club2 ~ chance,groups=Location,data=longha,
    auto.key=list(columns=3,space='bottom'),
    xlim=c(0,1),xlab='Chance',
    main='Chance of a club to win from itself, depending on home or away')
As can be seen, VVV-Venlo has a lot of advantage playing at home, as do Heracles Almelo, AZ and ADO Den Haag (these all have a black dot far right and a blue dot far left). In contrast, SC Heerenveen, FC Utrecht and De Graafschap show better chances away than at home. This I don't believe, if this were a Bayesian model this believe might be enforced, as this is a frequentist model I have to live the estimates. Clearly  the plot could be improved with confidence intervals, hopefully showing quite some overlap for these estimates. Going Bayesian is clearly in the frame some when, as Gianluca showed. 

Predictions

Previously it was attempted to predict the outcomes of one weekend of games of current season. The results and predictions given below.

Roda JC      - Utrecht      0-1
PEC Zwolle   - Groningen    1-2
RKC Waalwijk - VVV          1-1
Vitesse      - Heracles     1-1
NEC          - Willem II    0-0
ADO Den Haag - Ajax         1-1
Twente       - Heerenveen   1-0
NAC Breda    - AZ           2-1
PSV          - Feyenoord    3-0
These are the original expectations:
         club1           club2      win1     equal      win2
1      Roda JC      FC Utrecht 0.4580659 0.2126926 0.3291782
2 RKC Waalwijk       VVV-Venlo 0.6076020 0.2180298 0.1743364
3      Vitesse Heracles Almelo 0.5723334 0.2275537 0.2000907
4 ADO Den Haag            Ajax 0.1037534 0.1511710 0.7446496
5    FC Twente   SC Heerenveen 0.6605607 0.1558135 0.1822923
6    NAC Breda              AZ 0.2539698 0.2627759 0.4832506
7          PSV       Feyenoord 0.5055082 0.2147899 0.2796468
The new model has different predictions; Most interesting FC Utrecht has now a slightly better chance to win than lose, while in the previous model it was predicted to lose. Big changes are in RKC Waalwijk-VVV Venlo and FC Twente-SC Heerenveen
         club1           club2       win1     equal      win2
1      Roda JC      FC Utrecht 0.36966744 0.2506325 0.3797000
2 RKC Waalwijk       VVV-Venlo 0.78326557 0.1343054 0.0824290
3      Vitesse Heracles Almelo 0.69169975 0.1776653 0.1306350
4 ADO Den Haag            Ajax 0.09980196 0.1591822 0.7410159
5    FC Twente   SC Heerenveen 0.47168784 0.2140003 0.3143119
6    NAC Breda              AZ 0.29419688 0.2577802 0.4480229
7          PSV       Feyenoord 0.47611532 0.2253700 0.2985147

code for predictions:

topred <- read.table(textConnection("
            'Roda JC'        'FC Utrecht'
            'PEC Zwolle'     'FC Groningen'
            'RKC Waalwijk'   'VVV-Venlo'
            'Vitesse'        'Heracles Almelo'
            'NEC'            'Willem II'
            'ADO Den Haag'   'Ajax'
            'FC Twente'      'SC Heerenveen'
            'NAC Breda'      'AZ'
            'PSV'            'Feyenoord'"
    ),col.names=c('Off','Def'))
morepred(clm4b,topred)

Additional code

fbpredict <- function(object,club1,club2) {
  UseMethod('fbpredict',object)
}

fbpredict.polr <- function(object,club1,club2) {
  top <- data.frame(OffenseClub=c(club1,club2),DefenseClub=c(club2,club1),OffThuis=c(1,0))
  prepred <- predict(object,top,type='p')
  oo <- outer(prepred[2,],prepred[1,])
  rownames(oo) <- 0:(ncol(prepred)-1)
  colnames(oo) <- rownames(oo)
  class(oo) <- c('fboo',class(oo))
  attr(oo,'row') <- club1
  attr(oo,'col') <- club2
  wel <- c(sum(oo[upper.tri(oo)]),sum(diag(oo)),sum(oo[lower.tri(oo)]))
  names(wel) <- c(club1,'draw',club2)
  return(list(details=oo,'summary chances'=wel))
}

fbpredict.clm <- function(object,club1,club2) {
  top <- data.frame(OffenseClub=c(club1,club2),DefenseClub=c(club2,club1),OffThuis=c(1,0))
  prepred <- predict(object,top,type='p')$fit
  oo <- outer(prepred[2,],prepred[1,])
  rownames(oo) <- 0:(ncol(prepred)-1)
  colnames(oo) <- rownames(oo)
  class(oo) <- c('fboo',class(oo))
  attr(oo,'row') <- club1
  attr(oo,'col') <- club2
  wel <- c(sum(oo[upper.tri(oo)]),sum(diag(oo)),sum(oo[lower.tri(oo)]))
  names(wel) <- c(club1,'draw',club2)
  return(list(details=oo,'summary chances'=wel))
}

print.fboo <- function(x,...) {
  cat(attr(x,'row'),'in rows against',attr(x,'col'),'in columns \n')
  class(x) <- class(x)[-1]
  attr(x,'row') <- NULL
  attr(x,'col') <- NULL
  oo <- formatC(x,format='f',width=4)
  oo <- gsub('\\.0+$','       ',oo)
  oo <- substr(oo,1,6)
  print(oo,quote=FALSE,justify='left')
}

morepred <- function(mymodel,topred) {
  UseMethod('morepred',mymodel)
}

morepred.polr <- function(mymodel,topred) {
  topred <- topred[topred[,1] %in% mymodel$xlevels$OffenseClub & 
          topred[,2] %in% mymodel$xlevels$OffenseClub ,]
  ap <- lapply(1:nrow(topred),function(irow) {
        fbp <- fbpredict(mymodel,as.character(topred[irow,1]),
            as.character(topred[irow,2]))
        sec2 <- fbp[[2]]
        mydf <- data.frame(club1=topred[irow,1],
            club2=topred[irow,2],
            win1=sec2[1],
            equal=sec2[2],
            win2=sec2[3])
      })
  dc <- do.call(rbind,ap)
  rownames(dc) <- 1:nrow(dc)
  dc
}

morepred.sclm <- function(mymodel,topred) {
  topred <- topred[topred[,1] %in% mymodel$xlevels$OffenseClub & 
          topred[,2] %in% mymodel$xlevels$OffenseClub ,]
  ap <- lapply(1:nrow(topred),function(irow) {
        fbp <- fbpredict(mymodel,as.character(topred[irow,1]),
            as.character(topred[irow,2]))
        sec2 <- fbp[[2]]
        mydf <- data.frame(club1=topred[irow,1],
            club2=topred[irow,2],
            win1=sec2[1],
            equal=sec2[2],
            win2=sec2[3])
      })
  dc <- do.call(rbind,ap)
  rownames(dc) <- 1:nrow(dc)
  dc
}

No comments:

Post a Comment