## The Galton Watson Process – Part II

4 Probability of Extinction

The probability a process is extinct by the ${n}$-th generation is given by

$\displaystyle e_n = P(X_n = 0) \ \ \ \ \ (9)$

Since there is exactly one parent at the beginning of the process (i.e. ${X_0 = 1}$) who produces all the future children, we can say that the probability that ${X_0}$ is equal to 1 is 100%, or ${P(X_0 = 1) = 1}$. Because there is definitely only one parent in the beginning, ${X_0}$ cannot be equal to any other value except one. This can be written mathematically as ${P(X_0 = i) = 0}$ for ${i \ne 1}$. The probability generation function of the total number of people in generation ${n}$ (i.e. ${X_n}$) is given by:

$\displaystyle G_n(s) = \sum_{i=0}^{\infty} P(X_n=i)s^i \ \ \ \ \ (10)$

By plugging in this initial probability observation we see that

$\displaystyle G_0(s) = P(X_0=0)s^0 + P(X_0=1)s^1 + P(X_0=2)s^2 + P(X_0=3)s^3 + \dots \ \ \ \ (11)$
$\\ = 0s^0 + 1s^1 + 0s^2 + 0s^3 + \dots \\ = s \ \ \ \ \ \ \ \ \ \ \ \$

Thus we get the boundary condition ${G_0 (s) = s}$. If we further define ${G(s)}$ as the probability generation function for the children, ${C}$, in a particular family of the lineage in an analogous fashion, we have:

$\displaystyle G(s) = \sum_{k=0}^{\infty} P(C=k)s^k \ \ \ \ \ (12)$

From here it then becomes clear there is a relation between ${G(s)}$ and ${G_n(s)}$ at the beginning generations. In fact, the constraint is that the total number of people in the first generation is determined by the number of children in the family the original parent, ${X_0}$, had produced. So the probability generation function of the first generation, ${G_1(s)}$ is determined by the probability generation function for the children. Thus, ${G_1(s) = G(s)}$.

Since there could be multiple families for any given generation (except the 0th generation) then the number of children, ${C}$, in a family might be different. To keep track of the possibly different number of children in each family, we redefine ${C}$ as ${C_i}$ where ${i}$ is the family and ${C_i}$ is the size of that family. This can be generalized as a function of the generation by also realizing that the total number of children from all families is equal to the size of the entire generation, ${X_n}$. In other words:

$\displaystyle X_n = C_1 + C_2 + \dots + C_{X_{n-1}} \ \ \ \ \ (13)$

Since ${C_i}$ all are independent and identically distributed random variables, then ${X_n}$ is the sum of a random number of independent and identically distributed random variables. This is important because the following relation holds for this case:

$\displaystyle G_n (s) = G_{n-1} \left( G(s) \right) \ \ \ \ \ (14)$