For many sophisticated calculations, it can take computers hours, days, weeks, and even longer to calculated solutions. For example, here is a script that estimates certain curves using Montecarlo integration. Save this file as `perc.py`

```
__author__ = 'Timothy Reluga <treluga@psu.edu>'
__date__ = '2016.05.27'
__copyright__ = 'Copyright (c) 2016'
__license__ = 'For personnel use only, with author approval'
import numpy
from matplotlib.pyplot import *
def find_bottom(A, i0=None):
n = A.shape[1]
if i0 == None:
return max([find_bottom(A, i) for i in xrange(n)])
B = 0*A
def next(i, j):
if j >= n:
return n
if B[i, j] == 1:
return 0
B[i, j] = 1
#print hstack([A, B])
if A[i, j] == 1:
return j
k = j
k = max(k, next(i, j+1))
if k == n:
return k
if j > 0:
k = max(k, next(i, j+1))
if k == n:
return k
if i > 0:
k = max(k, next(i-1, j))
if k == n:
return k
if i+1 < n:
k = max(k, next(i+1, j))
if k == n:
return k
return k
return next(i0, 0)
def prob_perc(N = 1000, p = 0.5, n = 40):
d = [ find_bottom(numpy.floor(numpy.random.rand(n, n) + p)) \
for i in xrange(N) ]
#e = array([[i, d.count(i)] for i in xrange(n+1)])
return float(d.count(n))/float(N)
def main():
numsamples = 5000
p_vals = numpy.linspace(0.3, 0.6, 30)
n_set = [ 5, 10, 20, 50, 100]
for n in n_set:
result = numpy.array([prob_perc(N=numsamples, p=p,n=n) for p in p_vals])
plot(p_vals, result,'-')
legend(["N = %d"%n for n in n_set], loc='upper right')
xlabel('Probability a site is occupied ($p$)',fontsize=18)
ylabel('Fraction of lattices percolating through $h(p,N)$',fontsize=18)
show()
main()
```

First, this script is initially very slow, even though it is designed for interactive use. We can better test it by greatly reducing the value of `numsamples`

used (even though this will make the figure inaccurate).

Now, to make the script work better for a background calculation with a few changes.

Change the script so that all the calculations in main are done first before the plotting is started. We can do this by storing each

`result`

array in a dictionary under the corresponding value of`n`

for which it was created.Use the

`numpy`

function`vstack`

to create a single array with all of the results and save this array to a text file for future use using the function`savetxt`

.Move the plotting commands to a new script that loads the text file you created above using

`loadtxt`

, draws the figure and saves it using`savefig`

.When running a script in the background, it is very useful to be able to specify the values of parameters from the command line. The python module

`sys`

has a variable called`argv`

which is a list of all the command-line arguments for a python script executed from the shell. Use`sys.argv`

to change`perc.py`

so that it takes in`numsamples`

as an integer command-line argument. For example, we woud like`$ python perc.py 10`

to run with

`numsamples = 10`

.

Now, you can change your plot easily without having to recalculate all your data just to change an axis label, for example.

The other advantage is that you can run your calculation in the background on your computer or another computer and just retrieve the data file when you are done to explore your result. Try this in the terminal by `cd`

'ing to the right directory and running

`$ python perc.py 3`

Estimate how large `num_samples`

has to be for your program to take 8 minutes to run using the `time`

shell command and filling in small values for `?`

below.

`$ time python perc.py ?`

Then run `perc.py`

for 8 minutes. Use the `&`

character to make the job run in the terminal's background. Once the execution completes, plot the result data, and inspect your resulting figure -- notice that the curves will be much smoother than they were for small numbers of samples.

Add a title to the figure without rerunning `perc.py`

.

Now, one of the advantages of running a script from the terminal is that you can configure it and have a script documenting your configuration, without having to update the code every time.

Modify

`perc.py`

so that it now takes two arguments, the first being`numsamples`

and the second being a value for`n`

. use the`%d`

string trick to create a unique file-name based on the parameters you are passing in and save the resulting data to this file.Write a shell-script that will to the calculations for values of

`n`

in the set {2,4,8,16,32}.

Now, one of the very powerful tricks the shell can do is parallelize your code so that it uses all of your CPU cores (python DOES NOT do this automatically).

Find out how many processor cores you workstation has. Store this in a shell variable called

`processors`

.Run the following command for an appropriate value of

`numsamples`

.`echo "2 4 8 16 32" | xargs -P $processors perc.py $numsamples`

If you open up a seperate terminal window, you can monitor you memory and CPU usage using

`top`

.

One of the other very helpful things we sometimes use when writting software is a version control system, which tracks the changes you make to a program, and can be used to recover old versions and merge in changes that other people might make. Initially, we had `cvs`

then `subversion`

, but today, we have `git`

, which is widely popular.

- GitHub
Bitbucket will let you have private git repositories

Nature's recommended data repositories