Locked History Actions

InfolabClusterComputeHowtoJobArray

Here is a hypothetical... you have one program that is not multi-threaded nor aware of multiple cores. You have to run that program about a thousand times with different input parameters and different input data. And luckily... the results of a single run are independent of all the other results. This HOWTO describes how one might run such a scenario on the Infolab Compute Cluster.

We presume that you know your qsub basics. If that is not the case, please see InfolabClusterComputeHowtoSingle and InfolabClusterComputeHowtoVariables first.

The submission script

We'll tackle this one the other way around. So let's create our submission script first. You can download the script here: JobArray.qsub.sh

   1 #!/bin/bash
   2 #PBS -N JobArray
   3 #PBS -l nodes=1:ppn=1
   4 #PBS -l walltime=00:01:00
   5 
   6 /usr/bin/python2.7 $HOME/tutorial/JobArray/JobArray.py $PBS_ARRAYID 

The only special thing here is that we'll be passing the array id (so the number of the job in the array) to our Python script.

The program

Again we are using the same simple Python script that sleeps for a while and outputs some time and the arguments that it was called with. You can download the script here: JobArray.py

   1 #!/usr/bin/python2.7
   2 
   3 import socket, datetime, time, getpass, sys
   4 
   5 arrayid = sys.argv[1]
   6 
   7 # We're using just using a simple list here but you can
   8 # easily imagine this getting read from a file or sth ...
   9 arguments = [
  10   [ "myarg1-0", "myarg2-0", "myarg3-0" ],
  11   [ "myarg1-1", "myarg2-1", "myarg3-1" ],
  12   [ "myarg1-2", "myarg2-2", "myarg3-2" ],
  13   [ "myarg1-3", "myarg2-3", "myarg3-3" ]
  14 ]
  15 
  16 start = datetime.datetime.now()
  17 hostname = socket.gethostname().split('.')[0]
  18 username = getpass.getuser()
  19 time.sleep(10)
  20 end = datetime.datetime.now()
  21 
  22 dfmt = "%Y-%m-%d %H:%M:%S"
  23 print "Started: %s Finished: %s Host: %s User: %s" % (start.strftime(dfmt), end.strftime(dfmt), hostname, username)
  24 print "My arguments:"
  25 print arguments [int(arrayid)]

The only twist is, that we are reading the actual arguments from the list provided in the script itself. This could be easily replaced by reading from a cvs file or some other, neater argument storage.

Submit the job

Nothing left to do but submit the job to the cluster with qsub:

qsub -V -t 0-3 $HOME/tutorial/JobArray/JobArray.qsub.sh

There is a few things to note about the -t argument. This argument specifies that we the job should be run as a job array. In addition to that it also specifies the array ids that our instabces will get. When we run the command above we'll get instances 0, 1, 2, 3 respectively. We could also specify those as a comma delimited list. The following command does the same thing as the previous one:

qsub -V -t 0,1,2,3 $HOME/tutorial/JobArray/JobArray.qsub.sh

We could also make up our own non-sequential ids:

qsub -V -t 111,211,311,411 $HOME/tutorial/JobArray/JobArray.qsub.sh

Anyhow, if our jobs ran successfully, we should be able to see the results in the output files. In our case:

~/ $ cat *.o*
Started: 2012-10-16 21:01:12 Finished: 2012-10-16 21:01:22 Host: iln28 User: akrevl
My arguments:
['myarg1-0', 'myarg2-0', 'myarg3-0']
Started: 2012-10-16 21:01:12 Finished: 2012-10-16 21:01:22 Host: iln28 User: akrevl
My arguments:
['myarg1-1', 'myarg2-1', 'myarg3-1']
Started: 2012-10-16 21:01:12 Finished: 2012-10-16 21:01:22 Host: iln28 User: akrevl
My arguments:
['myarg1-2', 'myarg2-2', 'myarg3-2']
Started: 2012-10-16 21:01:13 Finished: 2012-10-16 21:01:23 Host: iln28 User: akrevl
My arguments:
['myarg1-3', 'myarg2-3', 'myarg3-3']

So we successfully ran four instances of our script with 4 different sets of arguments. Of course this is only one way of doing things... but it seems to work...