Sunday, October 14, 2012

Putting a football model into JAGS

In this post the football model is programmed into JAGS. There are all the reasons to do so. Jags 3.3 is recently released, I was stimulated by Gianluca's post . Obviously I could copy the model in his paper, but that would be too easy and not sure about copyright either. So, in this post, variations of the model of an earlier post is recreated.

Data structure

The starting data looks like this:
 head(Jold)
    Seizoen      Datum      Thuisclub          Uitclub Thuisscore Uitscore
1 2011-2012 2011-08-05      Excelsior        Feyenoord          0        2
2 2011-2012 2011-08-06   RKC Waalwijk  Heracles Almelo          2        2
3 2011-2012 2011-08-06        Roda JC     FC Groningen          2        1
4 2011-2012 2011-08-06  SC Heerenveen              NEC          2        2
5 2011-2012 2011-08-06      VVV-Venlo       FC Utrecht          0        0
6 2011-2012 2011-08-07   ADO Den Haag          Vitesse          0        0
The data has to be made ready to be placed into JAGS. All factors have to be made numbers. Note that each game is still one row. The two outcomes will be resolved in JAGS
JAGSData <- with(Jold,list(

N=length(Datum),
nclub = nlevels(Thuisclub),
HomeClub=(1:nlevels(Thuisclub))[Thuisclub],
OutClub=(1:nlevels(Uitclub))[Uitclub],
HomeMadeGoals=Thuisscore,
OutMadeGoals=Uitscore))
str(JAGSData)
List of 6
 $ N            : int 306
 $ nclub        : int 18
 $ HomeClub     : int [1:306] 5 14 15 16 18 1 3 4 11 7 ...
 $ OutClub      : int [1:306] 9 10 6 12 8 17 13 2 7 3 ...
 $ HomeMadeGoals: int [1:306] 0 2 2 2 0 0 3 1 0 2 ...
 $ OutMadeGoals : int [1:306] 2 2 1 2 0 0 1 4 1 0 ...

Model

The model in JAGS has to contain all the details and all distributions. That makes it much longer, but also gives the opportunity to fiddle around with details. For now it is reasonably simple. Each club has two properties, an attack strength (Astr) and a defense strength (Dstr). Based on the combination of these strengths is the number of goals, again Poisson distributed. The interesting part is in the way attack and defense strength are constructed. These are created from an overall strength (TopStr) and an attack/defense balance (AD). 
fbmodel1 <- function() {
  for (i in 1:N) {
    HomeMadeGoals[i] ~ dpois(HS[i])
    OutMadeGoals[i]  ~ dpois(OS[i])
    log(HS[i]) <- Home + Astr[HomeClub[i]] + Dstr[OutClub[i]]
    log(OS[i]) <-        Dstr[HomeClub[i]] + Astr[OutClub[i]]
  }
  for (i in 1:nclub) {
    Astr[i] <-  TopStr[i]+AD[i]
    Dstr[i] <-  TopStr[i]-AD[i]
    TopStr[i] ~ dnorm(0,tauTop)
    AD[i]     ~ dnorm(0,tauAD)
  }
  tauAD <- pow(sigmaAD,-2)
  sigmaAD ~ dunif(0,100)
  tauTop <- pow(sigmaTop,-2)
  sigmaTop ~ dunif(0,100)
  Home ~ dnorm(0,.0001)
}
To make this model work it needs a series on incantations in R. To keep things short only a basic output plot is given
params <- c("TopStr","AD","sigmaAD","sigmaTop","Home")
inits <- function(){
    list(TopStr=rnorm(JAGSData$nclub),
         AD=rnorm(JAGSData$nclub),
         sigmaAD=runif(1),
         sigmaTop=runif(0,.5),
         Home=rnorm(1)
         )
}
jagsfit <- jags(JAGSData, model=fbmodel1, inits=inits, 
                parameters=params,progress.bar="gui",
                n.iter=4000)
plot(jagsfit)

Parameter extraction

As the interest is in the teams, their properties are extracted. As Gianluca wrote we considered a slightly more complex structure in which we included information on each team's propensity to be "good", "average", or "poor". This helped avoid overshrinkage in the estimations  To examine this I looked at the teams relative scores.The plots look like they have a shoulder, so there is a point there.  
jags.smat <- jagsfit$BUGSoutput$summary
for (i in 1:nlevels(Jold$Thuisclub))  
  rownames(jags.smat) <- sub(paste('[',i,']',sep=''),levels(Jold$Thuisclub)[i],
      rownames(jags.smat),fixed=TRUE) 
plot(density(jags.smat[grep('^Top',rownames(jags.smat)),1])
                       ,main='Distribution of Strength')
points(x=jags.smat[grep('^Top',rownames(jags.smat)),1],
       y=rep(0,nlevels(Jold$Thuisclub))) 
plot(density(jags.smat[grep('^AD',rownames(jags.smat)),1]),
main='Distribution of Attack / Defense')
points(x=jags.smat[    grep('^AD',rownames(jags.smat)),1],
y=rep(0,nlevels(Jold$Thuisclub)))

model variation

There are a number of ways to create distributions for attack and defense strength (Astr and Dstr). One variation is to make them independent, but also to give them fatter tails, which is handled by a t distribution rather than a normal distribution. This fatter tail could be accommodate teams much better or worse than the majority. Unfortunately this did not work, the number of degrees of freedom for the t distribution varied a lot and was too high to make this an alternative model.
# section replaced in fbmodel1 
 for (i in 1:nclub) {
    Astr[i] ~ dt(0,tauStr,nuStr)
    Dstr[i] ~ dt(0,tauStr,nuStr)
  }
  tauStr <- pow(sigmaStr,-2)
  sigmaStr ~ dunif(0,100)
  nuStr <- 1/InuStr
  InuStr ~ dunif(0,.5)

No comments:

Post a Comment