Sunday, May 18, 2014

European MEP data, part 3

As final post on European MEPs voting I wanted to look at the individual MEP. The variables examined are how often present and how often present but not voted. The latter might be a marker of sign in and slope off. The analysis chosen is a hierarchical Bayesian analysis, which should push individual's outcomes towards the population behavior.
As mentioned before, votewatch's data describe how often MEPs voted what in the European Parliament. For each MEP the number of votes, percentages Yes, No, Abstain, number of elections and number of elections not voted. A description of the data can be found in this pdf.

Reading data

Data read as previously. All code shown at end of the post.

Modelling Data

Initially I wanted to use a beta binomial model as the beta distribution is the conjugate prior for the binomial and often seen in the text books. Then I realized the usual approach would be using the logit transformation. The wish to compare the two approaches lead me to adding the probit and finally a different way to calculate the beta binomial. Calculations are done in Jags. In text the models are displayed, end of text is full code. The number of samples is chosen 1000, which is more than needed for interpretation of the data, but with the comparison between methods a few extra samples seemed nice.

Model 1: beta binomial

The model is simple enough not to need much explanation. Priors on a and b are intended to be non-informative. 
mbb1 <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    p[i]  ~ dbeta(a,b)
  }
  a ~ dexp(.001)
  b ~ dexp(.001)

Model 2: Beta binomial

The thought leading to this specification is that with beta distribution conjugate prior, it should be possible to calculate posteriour distributions for probabilities outside of Jags, thereby saving Jags a bit of time and saving a load of data transfer between Jags and R.
mbb2 <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbetabin(a,b,nvotes[i])
  }
  a ~ dexp(.001)
  b ~ dexp(.001)

Model 3: Logit

The model is reasonable straigthforward. The awkward names for mean and sd, ap and asd respectively, are chosen thus so samples can be extracted similar to model 1.
mlogit <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    logit(p[i])  <- lp[i]
    lp[i] ~ dnorm(ap,tau)
  }
  ap ~ dnorm(0,.001)
  tau <- pow(asd,-2)
  asd ~dunif(0,10)

Model 4: Probit

This is just replacing logit() by probit().
mprobit <- mlogit <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    logit(p[i])  <- lp[i]
    lp[i] ~ dnorm(ap,tau)    
  }
  ap ~ dnorm(0,.001)
  tau <- pow(asd,-2)
  asd ~dunif(0,10)

Results

Presence of MEPs

The beta distribution obtained shows that most of the MEPs are signed in. MEPs with more that 20% of the votes not singed in are rare.

The best performing MEPs are shown below for beta binomial 1, logit and probit model are shown below. The results show different MEPs for each calculation.  This is an artifact due to using a sampler, these MEPs are way to close to each other for meaningful ordering
beta-binomial
      Last.Name         First.Name                              National.Party
463 STAVRAKAKIS           Georgios              Panhellenic Socialist Movement
218      JEGGLE          Elisabeth Christlich Demokratische Union Deutschlands
80         AUDY        Jean-Pierre           Union pour un Mouvement Populaire
452        DESS             Albert     Christlich-Soziale Union in Bayern e.V.
273       LISEK          Krzysztof                       Platforma Obywatelska
372       ZEMKE Janusz Władysław                Sojusz Lewicy Demokratycznej
           Mean           p5         p95
463 0.001439957 4.612416e-05 0.004372737
218 0.001450050 3.465712e-05 0.004839997
80  0.001470705 4.752273e-05 0.004617277
452 0.001491458 3.185192e-05 0.005023073
273 0.001497189 4.672592e-05 0.004592358
372 0.001499812 5.112743e-05 0.004861630
logit
      Last.Name         First.Name        Mean           p5         p95
57         BACH            Georges 0.002845830 0.0005561477 0.006789563
514  LUDVIGSSON               Olle 0.002854100 0.0005746037 0.006441176
548 SCHALDEMOSE           Christel 0.002882655 0.0006066471 0.006735655
372       ZEMKE Janusz Władysław 0.002925083 0.0006555610 0.006699789
85       SCHWAB            Andreas 0.002927801 0.0006034058 0.007039026
273       LISEK          Krzysztof 0.002940536 0.0006825662 0.007001949
Probit
      Last.Name First.Name        Mean           p5         p95
687 EHRENHAUSER     Martin 0.002929976 0.0006652270 0.006671044
57         BACH    Georges 0.002935575 0.0006351852 0.006839898
98       MATULA      Iosif 0.002947576 0.0006889951 0.006754586
449      CERCAS  Alejandro 0.002953899 0.0006167549 0.007239345
564      BELDER   Bastiaan 0.002961148 0.0006210927 0.007109177
184      LANGEN     Werner 0.002967294 0.0006372111 0.006767979

The worst MEPs are quite different from each other, the calculations yield similar order
Beta-Binomial 
               Last.Name First.Name
116           TREMATERRA       Gino
5      PETROVIĆ JAKOVINA     Sandra
734               MORVAI  Krisztina
27           VADIM TUDOR   Corneliu
396              CROWLEY      Brian
60  VITKAUSKAITE BERNARD    Justina
                                                  National.Party      Mean
116 Unione dei Democratici cristiani e dei Democratici di Centro 0.2961918
5                            Socijaldemokratska partija Hrvatske 0.2974738
734                              JOBBIK MAGYARORSZÁGÉRT MOZGALOM 0.3184573
27                                         Partidul România Mare 0.3411132
396                                            Fianna Fáil Party 0.5118451
60                                                 Darbo partija 0.5233919
           p5       p95
116 0.2507316 0.3439449
5   0.2031520 0.4045347
734 0.2862299 0.3524033
27  0.3084684 0.3742007
396 0.4744408 0.5481621
60  0.4469903 0.5995429
logit
               Last.Name First.Name      Mean        p5       p95
116           TREMATERRA       Gino 0.3010280 0.2535735 0.3496571
734               MORVAI  Krisztina 0.3222543 0.2894436 0.3566998
5      PETROVIĆ JAKOVINA     Sandra 0.3393291 0.2280846 0.4542476
27           VADIM TUDOR   Corneliu 0.3447283 0.3108516 0.3791264
396              CROWLEY      Brian 0.5189438 0.4840589 0.5560512
60  VITKAUSKAITE BERNARD    Justina 0.5530630 0.4794193 0.6288764
probit
               Last.Name First.Name      Mean        p5       p95
116           TREMATERRA       Gino 0.3033747 0.2566566 0.3510163
734               MORVAI  Krisztina 0.3217914 0.2889948 0.3545175
5      PETROVIĆ JAKOVINA     Sandra 0.3389094 0.2317717 0.4564703
27           VADIM TUDOR   Corneliu 0.3461575 0.3142402 0.3773828
396              CROWLEY      Brian 0.5193148 0.4849846 0.5543466
60  VITKAUSKAITE BERNARD    Justina 0.5556912 0.4837601 0.6299831

Second binomial
     name                   mean      q05       q95      
[1,] "TREMATERRA"           0.2956197 0.2491403 0.3420113
[2,] "PETROVIĆ JAKOVINA"   0.2998581 0.2078026 0.4042624
[3,] "MORVAI"               0.3183077 0.2855259 0.3524003
[4,] "VADIM TUDOR"          0.3411089 0.3064908 0.3754585
[5,] "CROWLEY"              0.511809  0.4767657 0.5477082
[6,] "VITKAUSKAITE BERNARD" 0.5231093 0.4554704 0.5919841
It thus seems that beta binomial have slightly different outcomes than logit and probit, with beta-binomial giving slightly smaller parameter values. The shift for PETROVIĆ JAKOVINA is due to less data. It appears beta binomial had a stronger effect of the prior.
             Last.Name TotalPossible nNotIn
5   PETROVIĆ JAKOVINA            40     15
27         VADIM TUDOR           514    179
734             MORVAI           514    167

Signed in, not voting

To save duplication, only logit and beta-binomial are shown. It would seem that signing in but not voting is structural, the typical MEP does this a few to 15% of the votes.
The best MEPs in this respect include an independent, which did surprise me. As a Dutch voter, having a member of SGP there does not surprise me, while strong Christianity is not my political flavor, they do have a trustworthy reputation many other politician can be jealous of. In terms of logit against beta-binomial, the logit now has lower numbers, which suggests that the beta-binomial again has again more pull toowards the population. 
beta binomial
         Last.Name       First.Name                    National.Party
350        BŘEZINA              Jan                       Independent
175     GROSSETÊTE        Françoise Union pour un Mouvement Populaire
21   MAZEJ KUKOVIČ           Zofija     Slovenska demokratska stranka
333     BRATKOWSKI Arkadiusz Tomasz        Polskie Stronnictwo Ludowe
564         BELDER         Bastiaan  Staatkundig Gereformeerde Partij
55    PAPANIKOLAOU         Georgios                    Nea Demokratia
           Mean           p5        p95
350 0.004855719 0.0012133168 0.01027123
175 0.005019880 0.0011882656 0.01056993
21  0.006102050 0.0007614780 0.01547558
333 0.006392797 0.0007950631 0.01628781
564 0.006590296 0.0021129110 0.01337183
55  0.006678283 0.0022042544 0.01360759
logit
       Last.Name   First.Name        Mean          p5        p95
175   GROSSETÊTE    Françoise 0.007681981 0.003254561 0.01387197
350      BŘEZINA          Jan 0.007820303 0.003207069 0.01421726
55  PAPANIKOLAOU     Georgios 0.008982877 0.003933873 0.01604022
452         DESS       Albert 0.009006247 0.003923229 0.01594389
376        CUTAŞ George Sabin 0.009011560 0.003966942 0.01585792
237      CANCIAN      Antonio 0.009144514 0.004149473 0.01669278

Finally, the worst politicians in this respect. Note that Jerzy BUZEK was president of the EP 2009-2012, Martin Schultz president of EP since 2012, which probably means they were there but did not vote in that role. Comparing beta-binomial to logit, the beta-binomial again pulls a bit stronger to the population. 
binomial
    Last.Name            First.Name                          National.Party
686    ZIOBRO              Zbigniew                        Solidarna Polska
492     DÉSIR                Harlem                        Parti Socialiste
766     BLOOM               Godfrey       United Kingdom Independence Party
418    BAALEN Johannes Cornelis van Volkspartij voor Vrijheid en Democratie
542    SCHULZ                Martin Sozialdemokratische Partei Deutschlands
123     BUZEK                 Jerzy                   Platforma Obywatelska
         Mean        p5       p95
686 0.3490508 0.3118425 0.3865028
492 0.3921216 0.3546858 0.4314310
766 0.3928973 0.3527089 0.4342107
418 0.4118334 0.3731467 0.4485519
542 0.4815201 0.4464745 0.5160744
123 0.5878925 0.5531420 0.6217835
logit
    Last.Name            First.Name      Mean        p5       p95
686    ZIOBRO              Zbigniew 0.3567185 0.3177950 0.3941565
492     DÉSIR                Harlem 0.4000572 0.3610226 0.4393352
766     BLOOM               Godfrey 0.4013182 0.3633043 0.4407786
418    BAALEN Johannes Cornelis van 0.4211575 0.3836522 0.4590419
542    SCHULZ                Martin 0.4903926 0.4539334 0.5266999
123     BUZEK                 Jerzy 0.6003081 0.5642402 0.6344741

Code

library(gdata)
r1 <- read.xls('votewatch-europe-yes-no-votes-data-11-december-2013.preprocessed.xls',
    stringsAsFactors=TRUE)
# adapted in source file:
# headers
# number of decimals
# "

# fix a double
r1$National.Party <- as.character(r1$National.Party)
(uu <- unique(r1$National.Party[r1$Country=='Hungary' & 
                  grepl('ri Sz',r1$National.Party)]))
r1$National.Party[r1$National.Party %in% uu] <- uu[1]
r1$National.Party <- factor(r1$National.Party)

r1$Date <- as.Date(r1$Sdate)
r1$Group <- relevel(r1$Group,'S&D')
r1$nYES=round(r1$pYES*r1$TotalDone/100)
r1$nNO=round(r1$pNO*r1$TotalDone/100)
r1$nAbstain=round(r1$pAbstain*r1$TotalDone/100)
# 3 vars before  sum to TotalDone
r1$nNoVote=round(r1$pNoVote*r1$TotalPossible/100)
r1$nNotIn <- r1$TotalPossible-r1$nNoVote-r1$TotalDone
# 5 vars before sum to TotalPossible
n.iter=1000

library(R2jags)
datain <- list(nchosen=r1$nNotIn,
    nvotes=r1$TotalPossible,
    N=nrow(r1))


mbb1 <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    p[i]  ~ dbeta(a,b)
  }
  a ~ dexp(.001)
  b ~ dexp(.001)

params <- c('a','b','p')
inits <- function() {
  list(a=runif(1,0,5),
      b=runif(1,0,5),
      p=runif(datain$N,0,1))
}

jfbb1 <- jags(datain, 
    model=mbb1, 
    inits=inits, 
    parameters=params,
    progress.bar="gui",
    n.iter=n.iter)

pd <- seq(0,1,by=.01)
png('plotbb1.png')
plot(dbeta(pd,
        mean(jfbb1$BUGSoutput$sims.matrix[,1]),
        mean(jfbb1$BUGSoutput$sims.matrix[,2])),
    x=pd,
    type='l',
    main='MEPs signing in',
    xlab='Proportion not signed in',
    ylab='MEP Density')
dev.off()
##############
mbb2 <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbetabin(a,b,nvotes[i])
  }
  a ~ dexp(.001)
  b ~ dexp(.001)

params2 <- c('a','b')
inits2 <- function() {
  list(a=runif(1,0,5),
      b=runif(1,0,5))
}

jfbb2 <- jags(datain, 
    model=mbb2, 
    inits=inits2, 
    parameters=params2,
    progress.bar="gui",
    n.iter=n.iter)

res2 <- subset(r1,,c(
        Last.Name,
        First.Name, 
        Country, 
        Group, 
        National.Party,
        nNotIn,
        TotalPossible))    
res2 <- merge(res2,
    data.frame(a=jfbb2$BUGSoutput$sims.matrix[,1],
        b=jfbb2$BUGSoutput$sims.matrix[,2]))

res2$p <- rbeta(
    nrow(res2),
    res2$a+res2$nNotIn,
    res2$b+res2$TotalPossible-res2$nNotIn)

##############
mlogit <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    logit(p[i])  <- lp[i]
    lp[i] ~ dnorm(ap,tau)    
  }
  ap ~ dnorm(0,.001)
  tau <- pow(asd,-2)
  asd ~dunif(0,10)
paramslp <- c('ap','asd','p')
initslp <- function() {
  list(ap=rnorm(1,0),
      asd=runif(1,0,10),
      lp=rnorm(datain$N,0,1))
}

jflogit <- jags(datain, 
    model=mlogit, 
    inits=initslp, 
    parameters=paramslp,
    progress.bar="gui",
    n.iter=n.iter)

mprobit <- mlogit <- function() {
  for (i in 1:N) {
    nchosen[i] ~ dbin(p[i],nvotes[i])
    logit(p[i])  <- lp[i]
    lp[i] ~ dnorm(ap,tau)    
  }
  ap ~ dnorm(0,.001)
  tau <- pow(asd,-2)
  asd ~dunif(0,10)

jfprobit <- jags(datain, 
    model=mprobit, 
    inits=initslp, 
    parameters=paramslp,
    progress.bar="gui",
    n.iter=n.iter)

# dispay results
makres <- function(bf) {
  sm <- bf$BUGSoutput$sims.matrix[,-(1:3)]
  res <- r1[,c(1,2,3,5)]
  res$Mean <- apply(sm,2,mean)
  res$p5 <- apply(sm,2,quantile,p=.05)
  res$p95 <- apply(sm,2,quantile,p=.95)
  res <- res[order(res$Mean),]
}
rbb1 <- makres(jfbb1)
rlogit <- makres(jflogit)
rprobit <- makres(jfprobit)

head(rbb1[,-3])
head(rlogit[,-(3:4)])
head(rprobit[,-(3:4)])
tail(rbb1[,-3])
tail(rlogit[,-(3:4)])
tail(rprobit[,-(3:4)])

t(sapply(rbb1$Last.Name[761:766],function(x) {
          mm=mean(res2$p[res2$Last.Name==x])
          qq=quantile(res2$p[res2$Last.Name==x],c(0.05,0.95))
          list(name=as.character(x),mean=mm,q05=qq[1],q95=qq[2])
        }
    ))


r1[r1$Last.Name %in% rbb1$Last.Name[762:764],c(1,11,18)]


####################

datain2 <- list(nchosen=r1$nNoVote,
    nvotes=r1$TotalPossible-r1$nNotIn,
    N=nrow(r1))

jfbbnv <- jags(datain2, 
    model=mbb1, 
    inits=inits, 
    parameters=params,
    progress.bar="gui",
    n.iter=n.iter)

jflogitnv <- jags(datain2, 
    model=mlogit, 
    inits=initslp, 
    parameters=paramslp,
    progress.bar="gui",
    n.iter=n.iter)

png('novote.png')
plot(dbeta(pd,
        mean(jfbbnv$BUGSoutput$sims.matrix[,1]),
        mean(jfbbnv$BUGSoutput$sims.matrix[,2])),
    x=pd,
    type='l',
    main='MEPs not voting',
    xlab='Proportion not voting',
    ylab='MEP Density')
dev.off()

rbbnv <- makres(jfbbnv)
rlogitnv <- makres(jflogitnv)
head(rbbnv[,-3])
head(rlogitnv[,-(3:4)])
tail(rbbnv[,-3])
tail(rlogitnv[,-(3:4)])


No comments:

Post a Comment