# Running big jobs without user interaction

For many sophisticated calculations, it can take computers hours, days, weeks, and even longer to calculated solutions. For example, here is a script that estimates certain curves using Montecarlo integration. Save this file as perc.py

__author__ = 'Timothy Reluga <treluga@psu.edu>'
__date__ = '2016.05.27'
__license__ = 'For personnel use only, with author approval'

import numpy
from matplotlib.pyplot import *

def find_bottom(A, i0=None):
n = A.shape[1]
if i0 == None:
return max([find_bottom(A, i) for i in xrange(n)])
B = 0*A
def next(i, j):
if j >= n:
return n
if B[i, j] == 1:
return 0
B[i, j] = 1
#print hstack([A, B])

if A[i, j] == 1:
return j
k = j
k = max(k, next(i, j+1))
if k == n:
return k
if j > 0:
k = max(k, next(i, j+1))
if k == n:
return k
if i > 0:
k = max(k, next(i-1, j))
if k == n:
return k
if i+1 < n:
k = max(k, next(i+1, j))
if k == n:
return k
return k
return next(i0, 0)

def prob_perc(N = 1000, p = 0.5, n = 40):
d = [ find_bottom(numpy.floor(numpy.random.rand(n, n) + p)) \
for i in xrange(N) ]
#e = array([[i, d.count(i)] for i in xrange(n+1)])
return float(d.count(n))/float(N)

def main():
numsamples = 5000
p_vals = numpy.linspace(0.3, 0.6, 30)
n_set = [ 5, 10, 20, 50, 100]
for n in n_set:
result = numpy.array([prob_perc(N=numsamples, p=p,n=n) for p in p_vals])
plot(p_vals, result,'-')
legend(["N = %d"%n for n in n_set], loc='upper right')
xlabel('Probability a site is occupied (\$p\$)',fontsize=18)
ylabel('Fraction of lattices percolating through \$h(p,N)\$',fontsize=18)
show()

main()

### Saving data and figures

First, this script is initially very slow, even though it is designed for interactive use. We can better test it by greatly reducing the value of numsamples used (even though this will make the figure inaccurate).

Now, to make the script work better for a background calculation with a few changes.

• Change the script so that all the calculations in main are done first before the plotting is started. We can do this by storing each result array in a dictionary under the corresponding value of n for which it was created.

• Use the numpy function vstack to create a single array with all of the results and save this array to a text file for future use using the function savetxt.

• Move the plotting commands to a new script that loads the text file you created above using loadtxt, draws the figure and saves it using savefig.

• When running a script in the background, it is very useful to be able to specify the values of parameters from the command line. The python module sys has a variable called argv which is a list of all the command-line arguments for a python script executed from the shell. Use sys.argv to change perc.py so that it takes in numsamples as an integer command-line argument. For example, we woud like

\$ python perc.py 10

to run with numsamples = 10.

Now, you can change your plot easily without having to recalculate all your data just to change an axis label, for example.

The other advantage is that you can run your calculation in the background on your computer or another computer and just retrieve the data file when you are done to explore your result. Try this in the terminal by cd'ing to the right directory and running

\$ python perc.py 3

Estimate how large num_samples has to be for your program to take 8 minutes to run using the time shell command and filling in small values for ? below.

\$ time python perc.py ?

Then run perc.py for 8 minutes. Use the & character to make the job run in the terminal's background. Once the execution completes, plot the result data, and inspect your resulting figure -- notice that the curves will be much smoother than they were for small numbers of samples.

Add a title to the figure without rerunning perc.py.

### Shell scripts for batch runs

Now, one of the advantages of running a script from the terminal is that you can configure it and have a script documenting your configuration, without having to update the code every time.

1. Modify perc.py so that it now takes two arguments, the first being numsamples and the second being a value for n. use the %d string trick to create a unique file-name based on the parameters you are passing in and save the resulting data to this file.

2. Write a shell-script that will to the calculations for values of n in the set {2,4,8,16,32}.

Now, one of the very powerful tricks the shell can do is parallelize your code so that it uses all of your CPU cores (python DOES NOT do this automatically).

1. Find out how many processor cores you workstation has. Store this in a shell variable called processors.

2. Run the following command for an appropriate value of numsamples.

echo "2 4 8 16 32" | xargs -P \$processors perc.py \$numsamples

If you open up a seperate terminal window, you can monitor you memory and CPU usage using top.

## Tracking and sharing your work

One of the other very helpful things we sometimes use when writting software is a version control system, which tracks the changes you make to a program, and can be used to recover old versions and merge in changes that other people might make. Initially, we had cvs then subversion, but today, we have git, which is widely popular.