Compute cluster

You want to run sweet.py two hundred times, possibly with different parameters. You could make a script that would ssh into a bunch of machines, open some screens and run some code. But luckily password-less ssh doesn't work and controlling screen from a script is not that easy.

Let's put together a submission script for the compute cluster:

#PBS -l nodes=1:ppn=1
#PBS -N SweetJob

python2.7 /dfs/scratch0/tommy/sweet.py

We saved that script to /dfs/scratch0/tommy/run_sweet.sh. Now we can run a job with 200 of these on the cluster with a single command (make sure to login to ilhead1 first):

~$ ssh tommy@ilhead1
tommy@ilhead1:~$ qsub -t 1-200 /dfs/scratch0/tommy/run_sweet.sh

You can check on your job from time to time with qstat:

qstat -a

You'll get a bunch of *.o* and *.e* files in the directory you submitted the job from. These contain standard output and standard error output for all the tasks.

Pro tip

Submit the job from /dfs/... and put all your code and dependencies there too.

Read more about the cluster here: InfolabClusterCompute