Background

…or, how to “do sums” with random components.

Estigator thinks of components - the tasks making up a project, or the costs making up a budget, for example - as random variables. In Estigator, they are independent (e.g. time taken to do task 1 does not affect task 2), and discrete (makes implementation easier).

The extract below is from Sums of Independent Random Variables - Chapter 7 of “Introduction to Probability” by Grinstead and Snell of Dartmouth College. This work was made redistributable under the GNU Free Documentation License and is available for download. Highlights have been added.

Suppose $X$ and $Y$ are two independent discrete random variables with distribution functions $m_1(x)$ and $m_2(x)$. Let $Z = X + Y$. We would like to determine the distribution function $m_3(x)$ of $Z$. To do this, it is enough to determine the probability that $Z$ takes on the value $z$, where $z$ is an arbitrary integer. The $Z = z$ if and only if $Y = z - k$. So the event $Z = z$ is the union of the pairwise disjoint events

$(X = k)$ and $(Y=z-k)$,

where $k$ runs over the integers. Since these events are pairwise disjoint, we have

$$P(Z=z) = \sum_{k=-\infty}^\infty P(X=k) \cdot P(Y=z-k)$$

Thus, we have found the distribution function of the random variable $Z$. This leads to the following definition.

Definition 7.1 Let $X$ and $Y$ be two independent integer-valued random variables, with distribution functions $m_1(x)$ and $m_2(x)$ respectively. Then the convolution of $m_1(x)$ and $m_2(x)$ is the distribution function $m_3=m_1 * m_2$ given by:

$$m_3(j) = \sum_{k} m_1(k) \cdot m_2(j-k)$$

for $j=...,-2,-1,0,1,2,...$ The function $m_3(x)$ is the distribution function of the random variable $Z=X+Y$.

It is easy to see that the convolution operation is communtative, and it is straightforward to show that it is also associative.

The highlighted formula - the discrete version of the convolution formula - can also be found on Wikipedia, which itself cites Susan Holmes, Sums of Random Variables: Statistics 11. Stanford (link).

This is the formula that lies at the heart of Estigator.

Estigator’s implementation

Suppose $S_n = X_1 + X_2 + … + X_n$ is the sum of $n$ independent discrete random variables. Let $m_k$ (defined on the integers) be the common distribution function of the random variable $X_k$. In Estigator, then, the user provides all the information for $m_k$, and the app’s job is to calculate $S_n$. The distribution function of $S_1$ is $m_1$ (think of the very simple case of a single-component project). Since we can write

$S_n = S_{n-1} + X_n$.

and we know the distribution function of $X_n$ is $m_n$, we can find the distribution function of $S_n$ by induction.

In other words, the distribution function of the sum of a series of random variables can be found by calculating the convolution of the distribution functions of the first two, then “convoluting” the result with the next variable’s distribution function, and so on until all the components have been “convoluted together”. Since convolution is commutative, it doesn’t matter what order we work in.

Estigator home