Stan tutorial for beginners in ~6 mins: Bayesian Data Analysis Software

Stan is a software for Bayesian computation
just like winbugs and jags. Take a look at winbugs or jags videos in my
channel if you are interested about those software.
Instead of using gibbs sampling, stan uses a method called Hamiltonian monte carlo or
HMC. For complex models, HMC may be more effective
than gibbs sampler. Stan uses C++, so it is fast. But you don’t
have to learn c++ to use it. The language stan uses is similar to winbugs and also there
is a R package to run the Bayesian computation from R.
Let us see how to install it first. In google we search “stan bayesian” to get the home
page of stan. Go to rstan link to get info about how to install it. I am using windows
7. I installed R, and the Rtools. For R go to r-project, get the latest version of R.
I am using R 3.1.2. Also go to Rtools page to download and install Rtools. I am using
Rtools 32. Within R, to install rstan, copy the commands
from the website and run in R. It takes a while to complete the installation. Restart
R to start using stan. Let us explain how rstan works using an example:
popularly known as “eight schools” example. You can get the description of this example
in the Rstan vignette. I am running R from RStudio.
The data need to be declared first in a block. In this example, we first write data block.
In here we express how the data list should look like, with data type and restrictions.
Inside it, number of schools J is declared as an integer, with the restriction that it
can not be lower than 0. A semicolon at the end of each line means that
the command has ended. This semicolon is required at the end of every
command in stan. To make a comment, we use double slash.
Now, y is a vector of real numbers with length J and sigma is a vector of real numbers with
length J with the restriction that lower value of standard errors sigma can be 0.
The parameters are also declared similarly in a separate block. Here we put the parameters
of interest. Mean mu, population standard deviation tau
are scalar with tau having a non-negative restriction. And eta is a vector with J elements.
We can also define transformed identical parameter or new parameter theta for school effects
in another block. This block is optional but sometimes necessary for complex model building. The model block describes the model; that
includes priors and likelihoods. The model specification starts in order with top level
of prior, then secondary levels of prior and so on and likelihood at the end.
Here eta is generated from standard normal and y is generated from normal with mean theta
and standard deviation sigma. Note that, we can generate data in a vectorised
form. We do not need a for loop to do so. Therefore, we can generate data faster.
All the Stan data, parameter and model blocks need to be inserted in R as a string in a
block. To do so, we can save them in another file
with extension .stan. we first set the directory where the stan
model file is saved and then run the stanc command to read the model statements.
We then use stan_model command to translate them into C++ commands. Depending on complexity,
this may take some time to complete the compilation. Next we need to prepare data. The data has
to be in a list format in R. In this list, we have J y and sigma; all the variables we
specified in the data block previously. Once this is complete we can generate monte
carlo posterior sample using sampling command. Alternatively, if we do not want to save the
model blocks separately in an external file, we can save them in R as strings. Then stanc
command is not required. We directly move on to stan_model command with model_code argument.
Rest is the same. We can also directly sample from posterior
distribution by using stan command. This stan function is a wrapper for translating, compiling
and sampling altogether. Additional arguments such as warmup and thin
can be used to do burn-in and thining. To examine the model fit, we can print the
parameters and specified quantiles. After fitting, we can summarize the fit and
plot them. It is also possible to use traceplot function.
It is also possible to extract samples from the fit.
If you prefer to use coda package for summaries, it is possible to convert Rstan object to
coda object as well. Then you can summarize the coda objects.
The links of the codes used in this video are provided in the description of the video.

8 thoughts on “Stan tutorial for beginners in ~6 mins: Bayesian Data Analysis Software”

1. Ehsan Karim says:

#bayesian   #stan   #beginners   #mcmc   #hmc   #r   #rstan   #jags   #bugs   #winbugs   #openbugs   #statistics

2. meshack amimo says:

Prof,
This is a beautiful new r package for Bayesian analysis.

May God keep you longer on this planet, n that way, fewer people will fear mathematics n statistics.

3. Bob Carpenter says:

This is great.  Thanks so much for adding this.  Let us know on the stan-users mailing list if you put up more tutorials.

4. Daniel-Ermias Woldemariam says:

Thanks, that's a great intro. I normally use JAGS but now having issues with memory and speed. I have problem how to code categorical predictors with more than two categories in the data, model blocks. Do you have any suggested sources?  I can send you part of my JAGS code, thanks

This is Amazing, i am new to bayesian statistics. I feel it is good. Your video gives me hope

6. Martin Romero_martinez says:

Thanks, it is a ver helpful video

7. Danilo Muniz says:

Wow, huge help! Thank you Ehsan!