## The Galton Watson Process – Part I

What are the chances of you having the last name you currently have, over that of some other last name? How many children did your ancestors have, on average, in order to ensure that your surname didn’t become extinct, but instead continued to be passed down from generation to generation?

This is a review of previous work that has been done in order to answer, in part, these questions. I hope that this will be used as an educational source to teach others in an easy to digest manner. From my perspective, I get the opportunity to learn about a new topic while working in a new programming language, in this case Python. I chose Python specifically to gain experience with straightforward plotting capabilities (from matlibplot). This will come at the cost of reduced performance over using traditional scientific programming languages such as C++ or Fortran. I will do my best to use the numpy library for some performance boosts to the code where possible.

My contribution will be to translate the analytics into a readily understandable mode and to develop a numerical Monte Carlo simulation software which will compare the results to the well-established analytical model. The process that is explored here is called the Galton-Watson process which describes the manner in which a surname is passed down from generation to generation.

1 Galton-Watson Process

Let us propose the problem of finding the probability of extinction for a lineage of people, where we start in the 0th generation with 1 male parent. In the first generation there are $0,1,2,3,\dots$ possible male offspring where there are distinct probabilities associated with having a certain number of kids. If there are $k$ children in the first generation then in the second generation there will be $C_1 + C_2 + \dots + C_k$ offspring where $C_1, C_2, \dots , C_k$ are independent random variables each with identical distribution probabilities. The situation described here is a branching stochastic process known as the Galton-Watson process.

Figure 1: A particular scenario showing the first 4 generations of the growth of the family of ‘Awesome’.

Specifically, the assumptions here are that the family sizes are independent of each other and can take on values from $0$ to $\infty$. Additionally the families are all assumed to have identical distribution probabilities and that only males pass the name from generation to generation. The probability that the number of children, $C$, born by a single parent is equal to some number $k$ is written as $P(C=k)$ where $k=0,1,2,\dots,\infty$. For all values of $k$ when the probabilities are discrete this is known as a probability mass function. Furthermore, we can use the number of children born at each generation as input to determine

$X_n \, =$ number of children born at time $n$           (1)
(i.e. the total size of the $n$th generation)

From here we can visualize a single possible path that $X_n$ might take which is shown in Fig.1. Suppose someone is named Baron Awesome and he is the first of the lineage of Awesomes. He then goes on to have 3 children all of which have the surname `Awesome’ as well. These children then in turn have some finite integer number of children of their own. In this case, if the trend continues it will give rise to exponential growth of a populace of Mr. and Ms. Awesome. To go any further we will have to assume some type of distribution of children each parent might have. In other words, it is pretty likely that a couple might have one or two kids, but highly unlikely someone will have 1000 children. In order to introduce this consideration into our model we turn to the use of probability generating functions.