Category Archives: rstats

Getting R2WinBUGS to talk to WinBUGS 1.4 on Ubuntu 12.04 LTS

Disclaimer 1: WinBUGS is old and not maintained. There are other packages to use, if you would like to take advantage of more modern developments in MCMC such as:

  • PyMC which transparently implements adaptive Metropolis-Hastings proposals (among other great features), or
  • the LaplacesDemon R package, which dispenses guidance on whether or not your chain converged, or
  • the as of yet released STAN, which will use an automatically tuned Hamiltonian Monte Carlo sampler when it can and (presumably) a WinBUGS like Gibbs sampler when it can’t.

Disclaimer 2: There are also WinBUGS alternatives, like JAGS and OpenBUGS, that are both currently maintained and cross platform (Windows, Mac, and linux). They are worth checking out if you want to maintain some legacy BUGS code.

If you are set on using WinBUGS, the installation is remarkably easy on Ubuntu (easier than Windows 7, in fact).The steps are as follows:

1. Install R. (R 2.14.2)
2. Install wine.  (wine-1.4)
3. Install WinBUGS via wine and setup R2WinBugs. That guide was written for Ubuntu 10.04. Some modifications for Ubuntu 12.04:

  • Ignore the bits about wine 1.0. Wine 1.4 works great.
  • The R2WinBUGS example won’t work. When you run this:
    > schools.sim <- bugs( data, inits, parameters, model.file, n.chains=3, n.iter=5000)
    

    WinBUGS will pop-up, but it will get stuck at its license screen. If you close the license screen, nothing happens. If you close the WinBUGS window, you get:

    schools.sim p11-kit: couldn't load module: /usr/lib/i386-linux-gnu/pkcs11/gnome-keyring-pkcs11.so: /usr/lib/i386-linux-gnu/pkcs11/gnome-keyring-pkcs11.so: cannot open shared object file: No such file or directory
    err:ole:CoGetClassObject class {0003000a-0000-0000-c000-000000000046} not registered
    err:ole:CoGetClassObject class {0003000a-0000-0000-c000-000000000046} not registered
    err:ole:CoGetClassObject no class object {0003000a-0000-0000-c000-000000000046} could be created for context 0x3&lt;/code&gt;
    
    Error in bugs.run(n.burnin, bugs.directory, WINE = WINE, useWINE = useWINE, : Look at the log file and try again with 'debug=TRUE' to figure out what went wrong within Bugs.
    

    Which isn’t a particularly helpful error message.

  • The error is that the intermediate files that R2WinBUGS produces are not getting shared with WinBUGS, so WinBUGS thinks it doesn’t have to do anything. As mentioned by ‘zcronix’ in the comment thread for the instructions, it is a two step fix: (1) create a temporary directory to store those files and (2) tell R2WinBugs about it with the working.directory and clearWD options.

    In your shell:

    nathanvan@nathanvan-N61Jq:~$ cd .wine/drive_c/
    nathanvan@nathanvan-N61Jq:~/.wine/drive_c$ mkdir temp
    nathanvan@nathanvan-N61Jq:~/.wine/drive_c$ cd temp
    nathanvan@nathanvan-N61Jq:~/.wine/drive_c/temp$ mkdir Rtmp
    

    In R:

    > schools.sim <- bugs( data, inits, parameters, model.file, n.chains=3, n.iter=5000, working.directory='~/.wine/drive_c/temp/Rtmp/', clearWD=TRUE)
    

Hopefully that will work for you too.

R is not C

I keep trying to write R code like it was C code. It is a habit I’m trying to break myself of.

For example, the other day I need to construct a model matrix of 1’s and 0’s in the standard, counting in binary, pattern. My solution was:

n <- 8
powers <- 2^(0:(n-1))
NN <- (max(powers)*2)
designMatrix <- matrix( NA, nrow=NN, ncol=n)
for( ii in 0:(NN-1) ) {
     leftOver <- ii
     for ( jj in 1:n ) {
          largest <- rev(powers)[jj]
          if ( leftOver != 0 && largest <= leftOver ) {
               designMatrix[ii+1,jj] <- 1	
               leftOver <- leftOver - largest
          } else {
               designMatrix[ii+1,jj] <- 0
          }
     }	
} 
print(designMatrix)

The code works, but it is a low-level re-implementation of something that already exists in base R. R is not C, because base R has pieces that implement statistical ideas for you. Consider:

expand.grid                package:base                R Documentation

Create a Data Frame from All Combinations of Factors

Description:

     Create a data frame from all combinations of the supplied vectors
     or factors.  See the description of the return value for precise
     details of the way this is done.

So then instead of writing (and debugging!) a function to make a binary model matrix, I could have simply used a one-liner:

# Note that c(0,1) is encased in list() so that
# rep(..., n) will repeat the object c(0,1) n 
# times instead of its default behavior of 
# concatenating the c(0,1) objects. 
designMatrix_R <- as.matrix( expand.grid( rep( list(c(0,1) ), n) ) )

I like it. It is both shorter and easier to debug. Now I just need to figure out how to find these base R functions before I throw up my hands and re-implement them in C.

Revolution R with Eclipse Helios

One of the reasons that I don’t often take advantage of the cool features in Revolution R is that I absolutely can’t stand their Visual Studio interface. Previously, if I wanted to run something in RevoR, I fired up the RGui.exe that comes buried in their distribution and used R’s built in script editor. My normal workflow is to use StatEt inside of Eclipse, so dealing with R’s meager editor was always painful. (Although less painful than the bloated VS-standalone alternative.)

Over the break, I ran across Luke Miller’s excellent post on getting Eclipse setup with StatEt the right way. I was able to follow his tutorial to get vanilla 64-bit R setup on a new installation of 64-bit Eclipse Helios. Once that was working, I changed two things to add a second shortcut for Revo R.

First, I followed his directions to install rJava in RevoR:

C:\Users\nathanvan>cd C:\Revolution\Revo-4.0\RevoEnt64\R-2.11.1\bin
C:\Revolution\Revo-4.0\RevoEnt64\R-2.11.1\bin>R.exe

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
...
Type 'revo()' to visit http://www.revolutionanalytics.com for the latest
Revolution R news, 'forum()' for the community forum, or 'readme()'
for release notes.

> install.packages("rJava")
...
package 'rJava' successfully unpacked and MD5 sums checked

The downloaded packages are in
C:\Users\nathanvan\AppData\Local\Temp\RtmpG3tMzb\downloaded_packages

And then installed rj in RevoR, once again using his directions.

C:\Revolution\Revo-4.0\RevoEnt64\R-2.11.1\bin>R CMD INSTALL --no-test-load "C:\Users\nathanvan\Downloads\rj_0.5.2-1.tar.gz"
...
* DONE (rj)

And finally setup Eclipse with a second Run Configuration which I named Revo-R-x64-2.11.1. Now I can run the 64bit version of RevoR without having to deal with the VisualStudio interface. If I get around to it, I’ll post some performance numbers. (The last time I used the VS interface, it was noticeably slower than calling RGui.exe directly.)

Messing with R packages

This was really frustrating. I’m trying to modify a package from Matt Johnson and although I could get the package he sent me to install flawlessly, I couldn’t un-tar it, make a change, re-tar it, and then R CMD INSTALL it. I was about to pull out my hair. The error I got was:
ERROR: cannot extract package from ‘hrm-rev9.tar.gz’

The secret: you have to have the name correct.
R CMD INSTALL hrm-rev9.tar.gz
barfs. But
R CMD INSTALL hrm_0.1-9.tar.gz
works fine. I’m sure it’s somewhere in the docs. I just couldn’t find it.

As always, I made a script to do it for me: (Updated 6/17/2010 15:41)

#!/bin/bash
# Quick script to tar & gzip the package, remove the old one, and install the new one
# I'll add options automatically tag and release it later.

#Set the library that I'm using
LIB="/home/vanhoudn/R/i486-pc-linux-gnu-library/2.10/"

#Commit
svn commit -m "Build commit"

#get the revision number from svn
REV=`svn info -R | grep Revision | cut -d: -f 2 | sort -g -r | head -n 1 | sed 's/ //g'`

#Build the filename
FILENAME="hrm_0.1-$REV.tar.gz"

# I need to tar up the pkg so I can install it.
# Jump to the parent directory and work from there.
cd ..
# Exclude any hidden files under the directories (svn has a bunch)
# and add the named files
tar czf $FILENAME --exclude '.*' hrm/DESCRIPTION hrm/NAMESPACE hrm/src hrm/R

# Remove the old version of the package
R CMD REMOVE -l $LIB hrm

# Install the new package
R CMD INSTALL $FILENAME

# Clean up
rm $FILENAME

# Go back to our previous directory
cd hrm

StatEt in Ubuntu 10.04

I wanted a “lightweight” version of Eclipse to run R from Ubuntu. (I installed eclipse-pde using apt-get. It worked fine.) Once it was running, I installed StatEt via the “Install new software” feature from http://download.walware.de/eclipse-3.5. While it was downloading, I opened up an R console and ran install.packages("rJava"). When the installation of both StatEt and rJava finished I restarted Eclipse. This is when things stopped working and I couldn’t really find any step-by step directions on how to proceed. Here is what I did:

  1. Run -> Run Configurations
  2. Click on R-Console in the left pane. This will create a new run configuration. Change the name to “R 2.10”
  3. Click on the “R_Config” tab. Choose “Selected Configuration:” and then hit the “Configure…” button.
  4. Click “Add”. Change “Location (R_Home):” to “/usr/lib/R” and click “Detect Default Properties/Settings” Click “Ok” until you are back to the “Run Configurations” window
  5. This is the important step. Without it you will get

    Launching the R Console was cancelled, because it seems starting the Java process/R engine failed.
    Please make sure that R package 'rJava' with JRI is installed and look into the Troubleshooting section on the homepage.

    Click on the JRE tab. In the “VM Arguments” box add
    -Drjava.path=/home/<username>/R/i486-pc-linux-gnu-library/2.10/rJava

    Where <username> is your username. (You are providing the path to rJava, for some reason, even though Eclipse will detect it during the setup in the “R_Config” step, it doesn’t seem to share that information with the JRE.)

  6. Click Run. It should work.

Pegging your multicore CPU in Revolution R, Good and Bad

Seven of eight cores at maximum usage

I take an almost unhealthy pleasure in pushing my computer to its limits. This has become easier with Revolution R and its free license for academic use. One of its best features is debugger that allows you to step through R code interactively like you can with python on PyDev. The other useful thing it packages is a simple way to run embarrassingly parallel jobs on a multicore box with the doSMP package.


library(doSMP)

# This declares how many processors to use.
# Since I still wanted to use my laptop, during the simulation I chose cores-1.
workers <- startWorkers(7)
registerDoSMP(workers)

# Make Revolution R not try to go multi-core since we're already explicitly running in parallel
# Tip from: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
setMKLthreads(1)

chunkSize <- ceiling(runs / getDoParWorkers())
smpopts <- list(chunkSize=chunkSize)

#This just let's me see how long the simulation ran
beginTime <- Sys.time()

#This is the crucial piece. It parallelizes a for loop among the workers and aggregates their results
#with cbind. Since my function returns c(result1, result2, result3), r becomes a matrix with 3 rows and
# "runs" columns.
r <- foreach(icount(runs), .combine=cbind, .options.smp=smpopts) %dopar% {
# repeatExperiment is just a wrapper function that returns a c(result1, result2, result3)
tmp <- repeatExperiment(N,ratingsPerQuestion, minRatings, trials, cutoff, studentScores)
}

runTime <- Sys.time() - beginTime

#So now I can do something like this:
boxplot(r[1,], r[2,], r[3,],
main=paste("Distribution of Percent of rmse below ", cutoff,
"\n Runs=", runs, " Trials=",trials, " Time=",round(runTime,2)," mins\n",
"scale: ",scaleLow,"-",scaleHigh,
sep=""),
names=c("Ave3","Ave5","Ave7"))

If you are intersested in finding out more of about this, their docs are pretty good.

The only drawback is that Revolution R is a bit rough around the edges and crashes much more than it should. Worse, for me at least the support forum doesn’t show any posts when I’m logged in and I can’t post anything. Although I’ve filled out (what I think is) the appropriate web-form no one has gotten back to me about fixing my account. I’m going to try twitter in a bit. Your mileage may vary.

Update: 6/9/2010 22:03 EST

Revolution Analytics responded to my support request after I mentioned it on twitter. Apparently, they had done something to the forums which corrupted my account. Creating a new account fixed the problem, so now I can report the bugs that I
find and get some help.

Update: 6/11/2010 16:03 EST

It turns out that you get a small speed improvement by setting setMKLthreads(1). Apparently, the libraries Revolution R links against attempt to use multiple cores by default. If you are explicitly parrallel programing, this means that your code is competing with itself for resources. Thanks for the tip!