Exploring Binomial and Poisson Distributions in Actuarial Science using R.
Actuarial models depend on particular assumptions on the probability distribution of X (our quantity of interest).
When X represents the claim amount or the length of life of an individual, we expect X to have a distribution on ℝ+, whereas when X represents the claim number, we deal with distribution on ℕ.
In R, each probability distribution is implemented by a set of four functions, let's say for example we have Poisson distribution then:
dpois computes the density function or the mass probability function,
ppois computes the cumulative distribution function,
qpois computes the quantile function, and
rpois is a random number generator.
In this post, we will cover two of the discrete distributions - Binomial and Poisson.
Binomial Distribution
The Binomial distribution is the sum of n independent and identical Bernoulli(p) trials. A Bernoulli trial is an experiment that has only two possible outcomes – “success” and “failure”.
A random variable X which is binomially distributed with parameters n and p, is written as X ~ Bin(n, p) and its probability distribution function is given as:
Here n denotes the number of trials of an experiment and p is the probability of success.
For example, we can model the number of sixes obtained when a fair die is thrown 5 times as a binomial distribution with parameters n = 5 and p = 1/6.
Here X denotes the number of sixes obtained when a fair die is thrown 5 times and can be written as X ~ Bin(5, 1/6).
In R, the Binomial distribution is represented as dbinom, pbinom, qbinom and rbinom.
Let's work with an example, suppose that a group of 10 people has been infected by a serious disease and the survival probability for the disease is 65%. Then let X be the number of survivors which can be modelled as a Binomial random variable with parameters n = 10 and p = 0.65, ie, X ~ Bin(10, 0.65).
We will first plot the probability distribution of X.
First, create a vector of values from 0 to 10. Here we choose x values up to 10 because n = 10 and for any value of x > 10 the probability would be not defined.
> x = 0:10
We will compute the probabilities of a binomial distribution for each value of x using dbinom() function.
> densbinom <- dbinom(x, size = 10, prob = 0.65)
Plotting the probability distribution function using the barplot() function.
> barplot(height = densbinom, names.arg = x, ylim = c(0, 0.30), xlab = "Number of survivors", ylab = "Probability", main = "Probability distribution of Bin(10,0.65)")
Looking at the graph we can see that the distribution is concentrated around x values of 6 and 7. That's because the mean of the distribution is 6.5 (= np). We can see that the distribution is a bit negatively skewed. Also, the mode of the distribution is at x = 7. Now, what will happen if the probability parameter p will be 0.5 instead of 0.65? The distribution will then be symmetrical and can be approximated as a normal distribution (given that p is close to 0.5).
Let's take a look at how the distribution will change for different values of p (the probability of success) and q (the probability of failure = 1-p).
In the above binomial distribution, X ~ Bin(10, 0.65), the graph of the probability distribution was a bit negatively skewed because p (= 0.65) was greater than q
(= 0.35). As the probability of success, p, gets larger than the probability of failure, q, the distribution tends to be more shifted to the right and hence more negatively skewed. Similarly, when p gets smaller than q the distribution tends to be shifted to the left and hence positively skewed. However, as mentioned before when p gets close to q (ie value of p tends to 0.5), the distribution becomes symmetrical.
Let's answer some questions!
(a) What is the probability that exactly 6 out of a group of 10 people who have been infected by a serious disease will survive?
Solution: The probability that exactly 6 people will survive is P(X = 6). This can be computed using dbinom() function in R.
> dbinom(x = 6, size = 10, prob = 0.65)
[1] 0.2376685
Thus, the required probability is 0.2376685.
(b) What is the probability that at least 8 people will survive?
Solution: The probability that at least 8 people will survive is P(X >= 8) = P(X > 7). This can be computed using pbinom() function with argument lower.tail = FALSE because the pbinom() function computes the cumulative probabilities P(X <= q). So using lower.tail argument will help compute P(X > q) probabilities.
> pbinom(q = 7, size = 10, prob = 0.65, lower.tail = FALSE)
[1] 0.2616074
Thus, the required probability is 0.2616074.
Poisson Distribution
The Poisson distribution models the number of events that occur one after another in a specified interval of time. The Poisson distribution assumes that the number of events that occur in separate time intervals is independent of one another. The parameter of the Poisson distribution λ, is the event rate, the rate at which the events occur.
For example, the number of visitors on a blog every day, the number of car accidents in a week, the number of customers arriving at the store in an hour, etc.
A random variable X which is Poisson distributed with parameter λ is written as X ~ Poi(λ) and its probability distribution function is given as:
In Actuarial/Insurance, the Poisson distribution is often used to model the number of claims that an insurance company receives per unit of time. It is also used to model the number of motor accidents per unit time.
In R, the Poisson distribution is represented as dpois, ppois, qpois, and rpois.
Let's work with an example, suppose that the claims on a home insurance policy occur at a rate of 0.6 per month. Then let X be the number of home insurance claims the company receives in a month which can then be modelled as a Poisson random variable with parameter λ = 0.6, ie, X ~ Poi(0.6).
We will first plot the probability distribution of X ~ Poi(0.6).
First, create a vector of values from 0 to 5. Here we choose this range because the Poisson distribution is defined for x = 0, 1, 2,.... and the purpose of choosing values up to 5 is that the probability for x>5 is almost negligible (look at the probabilities computed using dpois below).
> x = 0:5
We will compute the probabilities of a Poisson distribution for each value of x using dpois() function.
> denspois <- dpois(x, lambda = 0.6)
> denspois
[1] 0.5488116361 0.3292869817 0.0987860945 0.0197572189 0.0029635828 [6] 0.0003556299
Plotting the probability mass function using barplot() function.
> barplot(denspois, names.arg = x, lab = "Number of claims", ylab = "Probability", main = "Probability Distribution of Poisson(0.6)")
We can see that the chance of getting more than 1 claim in a month is very low. Well, this can be seen from the mean itself which is equal to the parameter λ = 0.6 for the Poisson distribution. The mean of the distribution is just 0.6 claims in a month which is why the probability of getting more than 1 claim is very low. Although, if the parameter of the distribution ie the mean number of claims in a month was greater than 0.6, say 2, then there would be a higher chance of getting more than 1 claim, and the distribution was then also be shifted to the right. Also to note here that the Poisson distribution is positively skewed.
Let's answer some questions!
(a) What is the probability that the company receives exactly 3 claims in a month?
Solution: The probability that the company receives exactly 3 claims in a month is P(X = 3). This can be computed using dpois() function in R.
> dpois(x = 3, lambda = 0.6)
[1] 0.01975722
Thus, the required probability is 0.01975722.
(b) What is the probability that the company receives at most 10 claims in a year?
Solution: The number of claims in a month, X, has a Poi(0.6) distribution; therefore the number of claims in a year, Y, has a Poi(7.2) distribution. To understand this consider Y = X1 + X2 +.......+ X12, where Xi is the number of claims in the month i (= 1, 2,....., 12) and Xi ~ Poi(0.6).
This is because the Poisson distribution assumes that the number of events that occur in separate time intervals is independent of one another. This is the reason Y can be represented as the sum of Xi's.
Hence, Y follows Poisson distribution with parameter λ = 12*0.6 = 7.2.
Therefore, the probability that the company receives at most 10 claims in a year is P(Y <= 10). This can be computed using ppois() function in R. The ppois() function computes the cumulative probabilities P(X <= q).
> ppois(q = 10, lambda = 7.2)
[1] 0.8866766
Thus, the required probability is 0.8866766.
See the power of R! Suppose you have calculated this using pen and paper then this would have taken a lot of work.
The relation between the Binomial and Poisson distribution
The Poisson distribution provides a very good approximation to the binomial distribution when n is large and p is small – typically when n = 100 or more and p = 0.05 or less. The parameter λ of the Poisson distribution is approximated as λ = np. The approximation depends only on the product np and the individual values of n and p are irrelevant. This is because when we deal with a large number of trials (large n) and the events that occur are "rare" (small p), then the distribution of the number of events that occurs depends only on the expected number (np).
Consider an example, suppose that in a group of 100 policyholders, the probability of a policyholder dying in the next month is 0.04. Then let X be the number of deaths in the next month and it can be modelled as a Binomial random variable with parameters n = 100 and p = 0.04, ie, X ~ Bin(100, 0.04).
Quick revision - Plot the PDF of the distribution to get an overall picture of the distribution. Note: Try different ranges for x and y axes to get a good picture of the graph. Use xlim() and ylim() arguments in the barplot function to set the ranges of x and y axes. The distribution should look like this.
The mean of the distribution is 4 (= np = 100*0.04) which means on average, 4 people will die in the next month and that is why the distribution is concentrated around x = 4.
Let's answer some questions!
(a) What is the probability that exactly 3 out of a group of 100 policyholders will die in the next month?
Solution: The probability that exactly 3 people will die in the next month is P(X = 3). This can be computed using dbinom() function in R.
> dbinom(x = 3, size = 100, prob = 0.04)
[1] 0.1973329
Note here, since n (=100) is large and p (=0.04) is very small, we can approximate the Binomial distribution with a Poisson distribution and the parameter of the Poisson distribution would then be computed as λ = np = 100*0.04 = 4.
Thus, X ~ Poi(4).
Let's again compute the probability of 3 people dying in the next month, this time using the Poisson approximation.
> dpois(x = 3, lambda = 4)
[1] 0.1953668
Look how close are both the probabilities. The Poisson distribution has provided a good approximation to the Binomial distribution. In today's world, computer programs (like R) are easily available to quickly and easily compute the binomial probabilities for large values of n but still, the Poisson approximation is widely used in practice when calculating binomial probabilities by hand.
Useful Resource: https://towardsdatascience.com/poisson-distribution-intuition-and-derivation-1059aeab90d
Thank you for reading the post. We hope you enjoyed it! Next, we’ll post about the other distributions like exponential, gamma, normal, lognormal, and beta. In the meantime, do you know any other actuarial applications of these distributions discussed above? Do let us know in the comments, if you enjoyed reading this post, we would be very grateful if you’d help it spread by sharing it with your friends and network. Thank you!
—Abhishek Goel and Bhavyadeep Gupta
We both are currently actuarial students and the above blog is to the best of our knowledge. We do not claim to be the experts in the field. We are just sharing our learning with others and this is an effort to help other students learn R. Your feedback and suggestions are welcomed.
Comments