While browsing some of the content on slashdot.org I came across a link to John Cook’s blog, which discussed the differences between scientists and programmers when it comes to developing code. The original article can be found here.
The key statement in his blog entry is
“Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision. Scientists need to understand that prototype code may need a complete rewrite before it can be used in production.”
As a computational materials scientist extending my competency in software engineering, I fully agree with this statement. All too often, when graduate students, postdocs, and scientists leave their position and pass on their code to colleagues, a tremendous gap in knowledge results. This is because the comprehension and use of the software is intricately linked to the person who developed it. Thus significant one-on-one training is required to ensure that a new user fully understands the code and can make use of it.
What are the chances of you having the last name you currently have, over that of some other last name? How many children did your ancestors have, on average, in order to ensure that your surname didn’t become extinct, but instead continued to be passed down from generation to generation?
This is a review of previous work that has been done in order to answer, in part, these questions. I hope that this will be used as an educational source to teach others in an easy to digest manner. From my perspective, I get the opportunity to learn about a new topic while working in a new programming language, in this case Python. I chose Python specifically to gain experience with straightforward plotting capabilities (from matlibplot). This will come at the cost of reduced performance over using traditional scientific programming languages such as C++ or Fortran. I will do my best to use the numpy library for some performance boosts to the code where possible.
My contribution will be to translate the analytics into a readily understandable mode and to develop a numerical Monte Carlo simulation software which will compare the results to the well-established analytical model. The process that is explored here is called the Galton-Watson process which describes the manner in which a surname is passed down from generation to generation.
1 Galton-Watson Process
Let us propose the problem of finding the probability of extinction for a lineage of people, where we start in the 0th generation with 1 male parent. In the first generation there are possible male offspring where there are distinct probabilities associated with having a certain number of kids. If there are children in the first generation then in the second generation there will be offspring where are independent random variables each with identical distribution probabilities. The situation described here is a branching stochastic process known as the Galton-Watson process.
Figure 1: A particular scenario showing the first 4 generations of the growth of the family of ‘Awesome’.