## Science vs. Software

While browsing some of the content on slashdot.org I came across a link to John Cook’s blog, which discussed the differences between scientists and programmers when it comes to developing code.  The original article can be found here.

The key statement in his blog entry is

“Programmers need to understand that sometimes a program really only needs to run once, on one set of input, with expert supervision. Scientists need to understand that prototype code may need a complete rewrite before it can be used in production.”

As a computational materials scientist extending my competency in software engineering, I fully agree with this statement.  All too often, when graduate students, postdocs, and scientists leave their position and pass on their code to colleagues, a tremendous gap in knowledge results.  This is because the comprehension and use of the software is intricately linked to the person who developed it.  Thus significant one-on-one training is required to ensure that a new user fully understands the code and can make use of it.

Information about the formats, input, outputs, assumptions, and constraints of the code are often not well (if at all) documented.  The result is that the scientist receiving the code may find it necessary to completely rewrite the computer program in order to understand it.  This is extremely inefficient because they are essentially re-solving a (sometimes very complex) problem, and they already have someone else’s solution in their hands!

In my opinion, this time could be better allocated towards finding solutions for unresolved issues.   Often, new problems may be solved by making simple modifications to a preexisting code. This may even apply to prototype code originally developed to be run only once. Unfortunately, because of the way scientific code is generally written, making just slight tweaks to the code requires a full understanding of all of its details.

As computational engineering moves forward, this issue must be addressed more carefully.  In the classroom, emphasis must be placed on teaching the fundamentals of code re-usability and methods of transferring electronic intellectual property, such as code, from one person to the next. Projects that require students to work in teams are commonplace, but colleges should take this one step further and encourage different teams to collaborate.

For example, we may consider the task of solving the wave equation:

$\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u$

where $t$ is time, $c$ is a constant, and $u$ is a scalar field which represents the wave.  In this example, Team 1 might be in charge of developing software for calculating the Laplacian, $\nabla^2 u$.  In doing this they will likely have to consider issues such as the dimensionality of the derivatives  used as input.  Since it is unknown whether the problem is 1D, 2D, or 3D it will force them to spend more time planning how to develop their code for consistency and usability.  Team 2 might be in charge of input and output.  This team would have the task of creating standardized formats, which may address constraints such as hard drive space.  A third team could then be in charge of numerically implementing a scheme to integrate $u$ in time and may have to work with Team 1 to solve issues of numerical stability.  Team 4 could develop quantitative tools to analyze the results.  This could be something along the lines of taking the Fourier transform of $u$ to determine the frequencies, given some initial condition and boundary condition.  In order to achieve this, this team would have to work closely with Team 2.  The results of Teams 1 through 4 could then be checked by a fifth team in charge of visualizing an analytical solution to the same (or similar problem).  This might require deriving the analytical solution in some limit, and then programming using OpenGL or some other graphics library to produce 2D or 3D pictures/plots.

Some of the intellectual tools necessary to achieve compartmentalization of the tasks lie in the foundations of object oriented programming.  For example, abstraction allows programmers to hide parts of the data and implementation of programs, thereby creating a black box.  This is a crucial difference from the paradigm of traditional scientific programming, where the code is completely transparent.  The use of self-contained objects will help to increase re-usability of code among different people.  However making code extensible requires more than simply using object oriented principles such as inheritance.   Proper documentation and use of best practices (i.e. using specific naming rules for variables, appropriate commenting, writing consistent code logic, etc.) are necessary to allow other programmers to quickly comprehend the code.

Coming from a scientific (and not a programming) background, I was never formally taught any of these principles.  However, my own experiences and discussions with other people in the same field point to the necessity of filling in these gaps as computational engineering continues to grow as a discipline.