Statistics Toolbox |

**Negative Binomial Distribution**

The following sections provide an overview of the negative binomial distribution.

- Background of the Negative Binomial Distribution
- Definition of the Negative Binomial Distribution
- Parameter Estimation for the Negative Binomial Distribution
- Example and Plot of the Negative Binomial Distribution

**Background of the Negative Binomial Distribution**

In its simplest form, the negative binomial distribution models the number of successes before a specified number of failures is reached in an independent series of repeated identical trials. It can also be thought of as modelling the total number of trials required before a specified number of successes, thus motivating its name as the "inverse" of the binomial distribution. Its parameters are the probability of success in a single trial, , and the number of failures, . A special case of the negative binomial distribution, when , is the geometric distribution (also known as the Pascal distribution), which models the number of successes before the first failure.

More generally, the parameter can take on non-integer values. This form of the negative binomial has no interpretation in terms of repeated trials, but, like the Poisson distribution, it is useful in modelling count data. It is, however, more general than the Poisson, because the negative binomial has a variance that is greater than its mean, often making it suitable for count data that do not meet the assumptions of the Poisson distribution. In the limit, as the parameter increases to infinity, the negative binomial distribution approaches the Poisson distribution.

**Definition of the Negative Binomial Distribution**

When the parameter is an integer, the negative binomial pdf is

where . When is non-integer, the binomial coefficient in the definition of the pdf is replaced by the equivalent expression

**Parameter Estimation for the Negative Binomial Distribution**

Suppose you are collecting data on the number of auto accidents on a busy highway, and would like to be able to model the number of accidents per day. Because these are count data, and because there are a very large number of cars and a small probability of an accident for any specific car, you might think to use the Poisson distribution. However, the probability of having an accident is likely to vary from day to day as the weather and amount of traffic change, and so the assumptions needed for the Poisson distribution are not met. In particular, the variance of this type of count data sometimes exceeds the mean by a large amount. The data below exhibit this effect: most days have few or no accidents, and a few days have a large number.

accident = [2 3 4 2 3 1 12 8 14 31 23 1 10 7 0]; mean(accident) ans = 8.0667 var(accident) ans = 79.352

The negative binomial distribution is more general than the Poisson, and is often suitable for count data when the Poisson is not. The function `nbinfit`

returns the maximum likelihood estimates (MLEs) and confidence intervals for the parameters of the negative binomial distribution. Here are the results from fitting the accident data above:

It's difficult to give a physical interpretation in this case to the individual parameters. However, the estimated parameters can be used in a model for the number of daily accidents. For example, a plot of the estimated cumulative probability function shows that while there is an estimated 10% chance of no accidents on a given day, there is also about a 10% chance that there will be 20 or more accidents.

plot(0:50,nbincdf(0:50,phat(1),phat(2)),'.-'); xlabel('Accidents per Day') ylabel('Cumulative Probability')

**Example and Plot of the Negative Binomial Distribution**

The negative binomial distribution can take on a variety of shapes ranging from very skewed to nearly symmetric. This example plots the probability function for different values of `r`

, the desired number of successes: .1, 1, 3, 6.

x = 0:10; plot(x,nbinpdf(x,.1,.5),'s-', ... x,nbinpdf(x,1,.5),'o-', ... x,nbinpdf(x,3,.5),'d-', ... x,nbinpdf(x,6,.5),'^-'); legend({'r = .1' 'r = 1' 'r = 3' 'r = 6'}) xlabel('x') ylabel('f(x|r,p')

Lognormal Distribution | Normal Distribution |