How to look like a statistician: a developer’s guide to probabilistic programming


– And we’re back at fsharp Conf.
– Hello. – And I’m here with Evelina.
– Hello. – Awesome. I’m real excited that you were able
to make it here to Redmond in person. – Thank you. It’s exciting to be in the Channel
9 Studios so thanks for having me. – Yes. Yeah, we’re really excited
about your talk. Last time we talked you lived in Cambridge
and I heard you moved to London. – Yes, because I started working
in the Alan Turing Institute which is the British national
institute for data science. – That sounds awesome. So, you’re a data scientist? – Yes, my official title is a data scientist.
– It sounds really interesting. Are there any cool projects
that you’re going to work on? – Next month we are starting a really cool project
with the National Air Traffic Control in the UK where we will be basically automating
parts of their training system and putting AI agents into that. So, that’s going to be a lot of fun. – That sounds really cool.
– So, be careful if you fly into the UK. – And your talk title sounds really cool,
“How to Look Like a Statistician.” – Do you want to look like a statistician? – I do. I mean, yeah.
– I think you’ll switch from the enterprise usage of F# to more cool uses for functional programming. – It sounds cool. So, I’ll let you take it away. – Thank you.
– Yeah, let’s listen to your talk. – So, my title is, “How to Look Like a Statistician: A Developer’s Guide to
Probabilistic Programming.” So, if you ask someone if they want
to look like a statistician, well, am I a statistician? That’s a question. As I mentioned I work as a data
scientist in the Alan Turing Institute and if I ask someone,
do I look like a statistician? They probably imagine someone like this. This is actually a famous
statistician, Sir Brian Cox. He worked on survival models,
but if you also search how statisticians look like, do you want to look
like a statistician? The Internet vision of a statistician
is someone like this and this is a stock photo
of a statistician and I really like this photo
because the guy is wearing a lab coat. Statisticians don’t wear a lab coat. And he’s actually writing on
the screen the other way around. So, hopefully after this talk you
will know how to look like this guy. So, I’ll be talking about
probabilistic programming. So, what is actually
probabilistic programming? You have probably heard
about probabilities, probably, and probabilistic programming
is about creating probabilistic models which is huge area
of machine learning. So, what are probabilistic models? I’ll be using a very nice used case
where I actually used probabilistic models and the data set
that I played with and that’s
Stack Overflow Developer Survey Results from 2017 and the data
scientists in Stack Overflow are really nice and let others
download the data from the Stack Overflow Developer
Survey. So, I looked at the data
and when they published the data. There are a lot of
articles in the news. These were the titles and it got
picked up by major media like BBC. And they ran articles about
programmers who use spaces are paid more. I really like the quotes here,
by the way, “are paid more.” So, I thought, okay,
does it makes sense that programmers who use spaces
make more money than those who use tabs? So, I looked at the data and this is a salary
distribution when you look at the data. And there was a field for people
to fill in their salary and this is the distribution. You can see it looks slightly strange. You can see that there are
two bumps in the distribution. One in the beginning
and one in the middle. So, I thought, “Okay, there is
something strange happening.” Who are the people
who report very low salaries? So, I looked at the people who report
their annual salary lower than $3,000 which comes to about $250 per month. That’s not a huge salary. Most developers reported
this salary came from India which, unfortunately, may be true
because developers in India are probably not paid that well. But the second country that
reported very low salaries is actually Poland and then Russia and then among the first
10 countries there’s also Germany. And then I thought, “Okay, there is something
really strange happening there,” because if developers in Germany report a salary of $250
per month that’s not much. That’s really, really low. So, I decided to look at the data in more detail and this is the distribution
from different countries, from France, India,
and the United Kingdom. You can see that these distributions
have a nice, big bump in the middle. And then I plotted some
of the suspicious countries. So, this is the salary distribution
in central and eastern Europe coming from Germany,
Poland, and Russia. You can see that these
distributions they look different. For example, the Polish one there
are two nice, big bumps. And in the Russian, that’s the green one,
the first bump is even larger than the second bump. And for Germany there is one small bump
and then a big bump in the middle. So, what is happening there? Are there groups of developers
that are paid nothing and then a lot of developers
that are paid a normal salary? Well, what’s wrong? Well, my suspicious was that people
didn’t read the question properly. So, this is the actual question as it
was presented in the survey and it asked: Blah, blah, blah, blah, blah. And even though annual is there
highlighted, it’s underlined, I thought, “Okay, maybe people just report
their monthly salary because people just don’t
read questions properly, especially in eastern
Europe probably.” I am actually from the Czech Republic
and my background knowledge is that when people negotiate their salary and talk about how much they are paid they never talk about
their monthly salaries. They always talk only about their monthly salary. So, they don’t talk about year salaries. And I asked my friends in Poland
and they confirmed the same thing, people don’t ever talk
about their annual salary they just ask about
their monthly salaries. I thought, “Okay, people just looked
at the question, saw the word “salary” and reported their monthly one, probably.” And I will use probabilistic
programming in F# to show you how you can
actually quantify that and find out what’s the real salary and how many people
can’t read questions. So, this is the salary
distribution in Poland. And as I said there are two bumps. They are almost equally sized. So, what can I use to model
the data like this? This is my theory that I will
try to validate using the data. And I will use something that statisticians
call “mixture distributions.” What is a mixture distribution? Well, it’s a distribution that’s formed
by mixture of multiple, normal or standard distributions. So, this is an example of two
Gaussians or normal distributions. You can see that they all have one bump
and they are centered on different values and if we do a mixture we will just multiple them
by a certain weight and sum them together. So, now this is my mixture distribution
where I mixed the two Gaussians with equal weight. I can also change the weights. For example, now the distribution
to the right has larger weight and the distribution on the left has smaller weight and they are about the same height right now and I can change it as well. I can give one distribution a very small weight
and the other distribution a very large weight. So, you can see it, I can use this
to model my two bump distribution that I had for the developers’
salaries in Poland. The only thing that I need to find are the parameters
of the distributions and the weights. So, now we are okay. That’s a lot of statistics. What is this whole thing about? Well, I will talk about how we can use this
in the framework of probabilistic programming. And in probabilistic programming the main
thing is that probability distributions become first class citizens
in the language. Later on I will show you how functional programming
actual creates a very nice framework for this and how we can basically put
in probability distributions and treat them as first class citizens, as if they were our variables. But first we need to look at the
mixture distributions a bit more formally. So, this is how a statistician would
write a mixture distribution. They are many Greek letters. So, what does this mean? I don’t want to go through the Greek letters and see what are the mus
and sigmas and things like that. If I want to look at it
from a developer’s perspective I want to see something
that I can understand and the most straightforward way
to interpret something with probability distributions is through sampling. And what is sampling actually? Well, let’s look at a very famous problem
called the “Monty Hall Problem.” The Monty Hall Problem actually comes
from a TV competition called, “Let’s Make a Deal,” and there are three doors
in the competition and people came there
dressed in crazy ways and they competed
to actually win a car. A nice, old car like this. So, they had three doors
and the question is, so the car, if you want to win, is behind one of the doors and two other doors have something
that doesn’t have any value and traditionally this is a goat. So, in the game you pick one of
the doors and now Monty Hall, whose name is the name
of the challenge and who was the presenter
of the competition, asks, “Okay, so what do you want to do?” I will open another door. So, Monty Hall opens another door
and shows a goat. And now they ask you, “Do you
want to change your selection? Do you want to keep your selection? The door you selected, do you think
the car is behind there or do you want to switch
to the other door, the unopened one?” And normally the probability
that the car is behind the door that you picked originally is one-third because there is equal probability
and after opening the door, do you want to switch or not? If you haven’t seen this challenge
before then you might think, “Okay, it doesn’t matter because
the probability is still one-third.” Well, we will check that. So, after you change
or not change the door Monty Hall opens the other doors
and shows another goat and a car. So, how can we do this
using probabilistic programming? Well, first I will start
with normal sampling. This is my F# code to actually do
a simulation from the Monty Hall problem. So, we’ll just create some types. Type for a door that can be either a goat
or a car and the game is just some list of doors and my strategy is to either stay
with the door I picked originally or to switch to the other door. So, these are my helper functions
to generate a game. And I’ll just run this
and explain it a bit later. So, this is my very simple,
straightforward function to play the game. So, I will just generate some state
of the game, pick a door, and then the host opens
the other door that I didn’t pick and if I decided to switch then I will just choose the other door that the host didn’t choose and I will choose one of them randomly. And then depending on my strategy
I will just say, “Okay, did I win or not?” The actual code is not
that important in this case. So, let’s have a look at the probabilities. So, the probability of winning if I stay
with the current selection, if I tried the game 10 times, is 40% and the probability if I switch
is actually 100% which is weird. Wow. What if I increase the number of samples to 100? Now, my probability of winning if I stay
with my original selection is 37% and the probability
if I switched the door is 72%. What if we increase the number
of samples even more? Let’s say to 10,000. Let’s have a look. Okay, and now the probability of winning
if I stay with my original selection is 34% almost and if I switch it’s 66%. So, we can see that it’s getting more
towards one-third and two-thirds. And as I increase the number of samples
it gets more and more precise. And this is something called Monty Carlo sampling and it’s a very common framework
in estimating distributions. When you want to see the value of a probability
distribution you just sample from it maybe using a very
straightforward sampling like I was using here or a more complicated schema. And after sampling you just a pick
a large enough number of samples and this will give you a good estimate of
the distributions and probability values. So, why should I use functional programming
in this and where does it come into it? Well, the sampling that I was showing
you here was very straightforward. The problem is very straightforward. We can also represent it in a much more interesting
way using computation expressions in F#. So, here I have my door type
that can be either a goat or a car and now I created a new type
for a Monty Hall value which gives me a door
and the probability value for that door. So, initially if I do my original selection
the probabilities are all equal so both goats and cars get a probability of one-third. And after that I will get the different values
after I change or not change my selection. Now, I’ll create another type called “distribution”
that will be a sequence of the Monty Hall values. And here are some helper functions
for creating uniform distributions. And now here comes the interesting
part with computation expressions. So, here I created another type
which will be probabilistic computation. I will call it the probabilistic computation
and it can be either a sample from the distribution or it can return a value. So, this looks quite complicated. So, what does it actually do or what
do I want to use it for? Here I use it to create in my
probabilistic computation builder which is a computation expressions
and I will basically just use the builder to wrap my values in that
probabilistic computation type. And my goal with this is to basically just
record what kind of selections I am doing in the game so that when I use my
probabilistic computation builder or when I create
a computation expression and I actually run it on my data it will record
what am I doing in the computation, what kind of distributions
I’m going through because distributions, remember, are the sequences
of Monty Hall values and which door I’m selecting. So, let’s look at the rest of the Monty
Hall with computation expressions. So, here I created the probabilistic
computation called “Prob” and if you have never worked with
computation expressions, for example, Async is a computation expression. So, I will use it almost the same way
as you would use in Async, I would just call Prob with [inaudible] and now you can see that my
program simplified greatly because now my stay probability
or stay scenario where I just pick an initial door
and don’t change my selection will basically just create
the initial door distribution which is a uniform distribution
over two goats and one car and then I’ll just return my selection and if it contains a car then I won,
and if contains a goat then I lost. And my switch strategy is slightly more –
slightly longer, but still much more readable than my original simulation code. So, first I will just pick my initial door
and then if I decided to switch then I will just look at the initial door and if it contains a car,
if my original selection was a car then I won a goat so I will return a goat
and if I originally selected goat then Monty Hall opened another door
which contained another goat so that means that when I switch I won a car. And the interesting thing here
is that the initial door that’s here, although I created a uniform
distribution now my initial door basically just represents
a sample from the distribution. You can see that its type is door
which is either a goat or a car. And my switch door is, again, either a goat or a car. And here I am basically doing pattern
match on the initial door which is a sample from the uniform distribution. So, the cool thing here is that I am
working with probability distributions, but I can refer directly to samples
that I don’t know the value of. So, let’s actually run this. I have to run everything. And the only thing I have to do is actually
wrap it in my computation expression which is the prob keyword in this case. So, what do I do with it now? As I set my type sample, et cetera and return type, basically just to record what I’m doing
and how the computation is processing the values. So, here I will basically just enumerate
all of my options together with our probabilities and I added some code
that will print what I am doing. And here is another helper function to actually
just compute the final probabilities. So, if I stay, let’s have a look
at what the code went through. So, the code actually went through
all the three different options that I had in my original distribution. So, I could either select a car with probability
33%, or I could select a goat with probability 33%, or I could select the other goat
with the same probability. And now my final result is a car with probability
33% or a goat with probability approximately 67%. So, here I got the values
of the whole distribution. And what if I switch? So, this is what the computation went through. So, in the first option, first I selected
a car with probability 33% and after Monty Hall
opened another door containing a goat I decided to switch,
so I switched to the other goat. So, my first option was the car with 33% probability,
but after that with probability one I got a goat. In the other scenario here
my first selection was a goat. Then after Monty Hall opened another door
I got – I decided to switch so I switched to the car
in the other option. So, again, 33% and then with probability one the car. And in the last scenario, again,
I selected a goat in the first place, Monty Hall opened the other door,
showed the other goat, and then I decided to switch
and I won a car with a probability 100%. So, you can see that my probabilistic computation, the computation expression, actually recorded
what traces I made during the game. If I go to back to the probability
probabilistic expression here, first it recorded what was my initial door selection and what was my second selection
in the switch door. And then I could just go through the whole computation and all the probability values
that it assigned to different scenarios, and compose them, do something clever with them. In this case the probability
[inaudible] straightforward, but in a more general case
or more complicated cases the probabilities would be more complicated and then I would have to do
something clever with them. But in this case it allowed me
to just summarize them directly. So, let’s go back to our mixture distributions. I showed you this equation
which looked fairly complicated. How does it look if we do it in more normal ways
or if we talk about it in a normal voice? So, a mixture distribution of a salary
is equal to the probability that someone read the question correctly and then multiply it
by the actual salary that they reported
or the other scenario is that someone made a mistake, they didn’t read the question properly
and then multiply it by 1/12 of the salary. And this will give me the probability
of a reported salary in general. Now we can see that I have just two
unknown things in this case. So, I have the probability that someone
can read a question properly and the value of their annual salary. So, what can I do with this? As I said, we have just two
unknown parameters. This equation it’s really easy
to compute if we know the values. If we knew the annual salary and if we knew
the probability that someone in Poland can read a question properly then we can compute the probability
that any salary gets reported. How do we actually write that down
as a probabilistic computation? This is my pseudo code in F#. I really want to make probability distributions
first class citizens in my language. So, this is how I would love to write it. So, I would like to just create a salary
that will be a Gaussian with some kind of mean and variance with already unknown parameters. And then mistake will be just a probability
distribution called a Bernoulli distribution which is just a statistical fancy way
of saying a coin toss distribution. So, just a distribution between two values. So, what’s the probability
that someone makes a mistake? And then my observed value will be
just if they made a mistake then it will be 1/12
of the annual salary and if they didn’t make a mistake
then I would just report the actual salary. So, how do we do this
with computation expressions? So, here I have some helpers first. Let’s evaluate them. And now my value is not a goat or a car,
now my unknown values are floating point numbers. And I have two distributions. So, instead of the Monty Hall values
which gave me just discrete values of goat or a car together with
their associated probabilities. I have two distributions. One will be a Gaussian with a mean and variance, and the other will be
a Bernoulli probability distribution. And my probabilistic computation
type looks exactly the same, exactly the same as I had
in my Monty Hall problem. So, again, I have a sample or I have a return value,
and the sample basically just records the distribution that I’m going through and the value and returns our probabilistic computation. And I will use this to create the probabilistic
computation builder, my computation expression. So, this is exactly the same thing I had
in the Monty Hall problem before. And, again, the goal is just to record
what I’m doing with the probability distributions because I can do something
fancy with them later on. This is my model for the whole problem. So, again, it’s wrapped in the prob
keyword in this case with [inaudible]. And the code looks almost the same
as I had on my slide. So, first, I will take the yearly salary
which is a Gaussian with some mean and some variance. And here you can see that its type is the value
which is just my wraparound float. So, even though it’s – I’m assigning it
to Gaussian which is a probability distribution it’s looking like just a sample from the distribution,
just like one value taken from the distribution. Now, my Bernoulli distribution for a mistake,
the probability that someone made a mistake is, again, another value. And what I’m doing here is I’m just checking
if someone made a mistake, if the sample distribution
from the Bernoulli is one then I will return the salary divided
by 12, the monthly salary. Otherwise, I will just return their annual salary. Again, you can see that this is very readable. You can see exactly what’s going on in there. And the cool thing right now is that
in the background I can do anything. I can do some clever things
or I can even do just sampling because I know how the structure
of the computation looks like and what the computation
is actually doing behind there. So, what I will get out of this is basically samples
of salaries and probability if someone made a mistake and if they made a mistake
then I will report one thing. If they didn’t, I will report the other thing. So, what can I do with it? Well, as I said, the mixture distributions and the
whole setting of the problem is fairly straightforward. So, how do we get the actual parameters
of the salaries like the salary mean, salary variance,
and the Bernoulli distribution, the probability that someone made a mistake? How do we actual get the values
of the parameters? Well, this is the complicated part. You actually get PhD out of this. These are actual slides from doing something
similar for a specific model in my PhD in Cambridge. So, this stuff is really complicated
and because this is just a talk on how to do probabilistic programming the cool thing about
probabilistic programming is that you don’t need
to know any of this. You really don’t need to know any of
these equations on how to construct them. The only thing if you are doing
probabilistic programming is you have to know how to specify
the problem and this is it. You just have to pick probability distributions
and the cool thing is it all happens in the background and you let the author of the library or whatever you are using to
figure out this complicated stuff. So, I’ll be talking about the world’s
slowest probability inference engine that I created for this talk. So, the algorithm that I used is
called the complete enumeration and usually you will find it
in any machine learning textbook in [inaudible] under the first chapter
on how not to do things. So, this is my distribution of salaries
in Poland and what I will do, I will try different parameter values, different annual salaries, different means, different variances
and different probabilities that someone made a mistake. So, these are just three examples here. And what I’ll do is I’ll discretize them because sometimes working with continuous
distributions is not that easy. So, I will just discretize them. That means I will create bins
and calculate how much mass is in each bin. And after this I’ll just compute the difference between
the distribution that was observed, meaning the distribution that was
reported in the Stack Overflow survey and my theoretical distribution that I got from
certain values of the means and probabilities. And then after comparing them
I will find basically just – I’ll just pick the closest distribution, the one that looks most like
the data that were reported. So, let’s have a look at the demo, the world’s
slowest probabilistic inference engine. So, here are my helper functions
on how to discretize the values. And here I will just basically
go through the distributions. For example, I know that for a Gaussian
most weight is concentrated between three standard
deviations from the mean. So, I will just take my Gaussian distribution,
compute the standard deviation, and then basically just iterate
through all the values between the mean minus
three standard deviations and mean plus three
standard deviations. And I’ll basically vary all parameters. As I said, I’ll just try
different values for the means, different values for the probability
that someone made a mistake. And as I said, I’ll just work
with discretized probabilities. The code is not very interesting
though, to be honest. And here is the interesting thing. Here I’m actually going through
the probabilistic computation, the computation expressions that wrapped
my choices in the sample or return types. So, here I’ll just basically take
the sample of my distribution and for all the different parameters,
discretize everything and then just enumerate
all the possible values. And I’ll just compute the histograms
for all the possible distributions. I’m going through this code fairly quickly,
but the important thing with probabilistic programming is that you really don’t need
to write this kind of code. The only thing you are interested
in writing is the actual model and then let the engine behind that
figure out how to compute everything. And this is my code to actually
compute to histograms, to discretize probability distributions, and now my code to pick
the most likely distributions. And I will apply it to Stack Overflow data. I have the data is a .csv file
so I’ll just use the .csv type provider and basically just filter out
the data that belong to Poland, and trust the developers
that reported their salary. Now, I got just 317 values
which is not that much, and the average reported salary
is almost $21,000 per year. And the maximum reported salary,
one lucky guy in Poland is basically paid $110,000
a month or a year, sorry. So, now actually I will run my slowest
probabilistic inference engine and find the most likely distribution that created the values
that I saw in the histogram. And it ran fairly quickly. So, actually the distribution that created
the values had a mean of $28,400 per year and the interesting thing is that my probability distribution for the mistake is 0.25. That means that 25% of developers
in Poland, I’m sorry, can’t read questions properly. I’m sorry to all my friends in Poland,
but 25% of people just don’t read questions
properly in Poland. So, this is my estimated distribution. This is how it looks
when I actually plot it. So, you can see that it’s estimated
to bump distribution. One is the distribution for the people
who actually made a mistake and the other is for the people
who didn’t make a mistake. And as I said, the cool thing
about probabilistic programming is that you don’t really have to care
about how it’s done in the background. In the real world there are two types
of probabilistic programming languages. One type is more procedural where, for example,
in Stan you specify your model in a very similar way that I specified it here in F#. And then it gets compiled into C++
and then you run the compiled program. And then the other type of probabilistic
language is where, for example, Anglican which is a nice functional probabilistic programming language enclosure where, again, you specify the program
in a very similar way that I used and then they just do very clever things
with it in the background either sampling or what you can use this kind of structure for when you actually create the program and report what’s happening there
and how the computation is progressing. You can differentiate the program
and then take the derivative of the program and do some cool stuff with that. But as I said, the cool thing is
you don’t have to care about that. That’s something for the authors of the probabilistic
programming language or library to figure out. So, this is the end of my talk. You can ping me on Twitter @evelgab
and I also have a blog at evalinag.com. And if you are interested in probabilistic programming
you probably use computation expressions because that’s a very nice way to actually hide
all the complexities that’s behind there and just take the computation
and do cool stuff with it. Thank you. – Hello.
– Hello, Phillip. – Hi. I am here to facilitate Q&A. – Thank you.
– I think my mic is on. It should be good.
Let’s go here. Let’s see if we got some questions. Well, it wasn’t really a question,
but we did get a statement on Twitter about saying they’re halfway
through this great talk about probabilistic
programming using F# and some pictures
of empty pizza boxes. – That’s great. That’s great to hear that people
are actually enjoying that. – So, maybe –
– I suppose that’s people from Europe who are relaxing somewhere
with a beer in their hand and pizza. – Yeah, maybe there’s something
that can be done in a future version of this talk about probability
of how much pizza you think you’re going to eat
over a period of time and if you’ve maybe misread
the question or something like that. – Yeah, yeah, yeah. Or if people reported their pizza
consumption in kilograms or ounces. – Yes, and then the survey respondent
just takes those as raw numbers and says, “Ah, well this is the amount
of pizza, unitless if you will.” So, there were quite
a few insights here. I think my favorite was definitely the people
are probably not reading the question correctly when they’re reporting salary information.
– Yes. Well, that’s actually an interesting
part of being a data scientist because sometimes
you have to deal with people just not behaving the way
you expect them to. – Mm-hm.
– Because the people come into play as well and sometimes you just
look at the raw data and do some conclusions from there
and then it doesn’t make sense. – Interesting. So, one thing that I really liked was – I really liked representing
this complex stuff in the form of a computation expression because I took calculus
and differential equations, but that was a long
time ago in college. I can’t really read that anymore,
but I can read computation expressions. So, I’m kind of curious like what are
some of the things that go really well? What are some things that go well,
and some things that may not go so well if you want to try to model this with computation
expressions or something else in F#? Because I’m kind of curious. Because this looked great, but I’m curious
if there are other things where it’s like, oh maybe a different approach
in the language might be better. I guess, I don’t really know
what that would look like though. – Well, the cool thing with this kind of expression is that it basically just records
what you are doing there. Because some people, for example,
like monads and things like that. This is not a monad, my computation
expression here because the types don’t really match and the only thing it’s doing is basically just recording
what we are doing in the computation. And then you can take that and do anything with that.
– Right. So, your goal is not necessarily composing this
and this and this and this or something like that, just more of like a nice way
to represent the work and then do something with
that representation later? – Yes.
– Okay, I see. – I know there are some papers on basically
using monads to represent probability distributions, but that’s usually not how probabilistic
computation or probabilistic languages work because they want to do multiple
different cool things in the background whereas in monads, for example, there are monads that use composition to compute
the probabilities, the resulting probabilities. But that’s not how you do it in practice
because that gets very complex very quickly. So, normally you just record that and then use
some kind of optimization engine in the background. – Okay, okay, cool. Yeah, that’s great. So, we have a question there. Why not use R for statistics? – Why not use R for statistics? Well, that’s a fair question. The thing is, for example, there are not that many
probabilistic languages in R either. The thing you can do in R, for example,
if you have a more complex problem than I had here is you basically specify
your program in like a Stan language and then you compile it from R and then you can use
the compiled model from R again. But this doesn’t actually
give you much advantage. The thing is, well, R is not
very efficient, to be honest. So, usually whenever you have
anything efficient in R it’s using some other inference
engine in the background. But from R mostly people, if they use
probabilistic programming they use Stan which, as I said, you don’t actually specify
even the model in R, you specify it the Stan language. – I see.
– Which, basically, the model specification looks
very much like this, basically. You specify some probability distributions,
and then you operate with their values in some way. – Would you say that R for something
like this might be useful to prototype something really quickly in R? – Yes, you can do that, but it depends on what you are doing.
– I see. Do you think it might just be a lot of work
to translate the thing that you built into something that’s actually going to run decently? – Yes. Yes, that’s always the question. But with something like this here
I chose, for example, a Gauss distribution because that’s the simplest to use usually. You can use different distributions,
for example, a Gamma distribution, but even the Gaussian
works pretty well. So, as I said, this is a very slow inference
engine that I had in the background. So, this is more sort of for a demo purpose.
– I see. So, there’s ways that you can write it
to make it much more efficient. – Yeah.
– Okay, cool. The code might not look as nice. – Yes.
– That’s how it always goes. – I just thought this is a very cool way to represent probabilistic computation. – Yeah, I really like it. I think that’s fantastic. So, what are some – oh,
it looks like we’re out of time here. I was going to ask one more question.
– Oh no. No, I want to talk about statistics. – Evelina, I guess you can sort
of mention where you are on GitHub, Twitter,
that sort of stuff. – Yes, just ping me on @evelgab on Twitter
or just find me on Twitter or on GitHub, anywhere. I think I’m the only person in the world
with my name so just Google me. – Very easy to find.
Excellent. So, next up we’re going to
have Ody from Lagos, Nigeria speaking with Tomas
about how he came into F# from just learning the language
and kind of transitioned over the course of a year from being an absolute beginner
to making a contribution to the compiler. So, that should be a very
interesting discussion. – Thank you.
– Thanks.

2 thoughts on “How to look like a statistician: a developer’s guide to probabilistic programming

  1. This is really good! It was so hard to find a relatable video on probabilistic programming! Thanks for a great vid!

Leave a Reply

Your email address will not be published. Required fields are marked *