Locked History Actions

Diff for "InfolabClusterCompute"

Differences between revisions 11 and 16 (spanning 5 versions)
Revision 11 as of 2012-10-16 21:57:02
Size: 8351
Editor: akrevl
Comment:
Revision 16 as of 2012-10-17 00:39:29
Size: 11280
Editor: akrevl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 99: Line 99:
=== Parallel jobs ===

You may specify that you want your job to run on multiple cores and multiple nodes with the following directive:

{{{
#PBS -l nodes=node_no:ppn=core_no
}}}

In the example above the ''node_no'' represents the number of nodes (physical servers) that you are requesting and the ''core_no'' represents the number of cores that you would like to use on each of the nodes. If you would like to use 6 cores on a single node you could do it with the following directive:

{{{
#PBS -l nodes=1:ppn=6
}}}

Here is another example requesting two nodes with 32 cores each:

{{{
#PBS -l nodes=2:ppn=32
}}}

{{{#!wiki tip
'''Parallel jobs'''

Please bear in mind that qsub will not to anything to make your job parallel. That is why you should only make requests for more than one core if your program is multi core or multi thread capable. If your program is not written in a parallel manner it will only run on a single core and your 32-core reservation will just waste system resources for others.

'''Number of requested cores'''

Please do not make requests that the cluster is not able to handle. If you submit a job with the directive ''-l nodes=1:ppn=128'' this job will actually never run on the current configuration of the cluster as we do not have nodes with 128 cores. Please consult the cluster's hardware capabilities before using this directive.
}}}

=== Running time ===

This directive lets you specify a maximum walltime (sum of CPU time and wait time) that can be used by your job. This may be useful in a situation where you know your job should run no longer than 2 hours and if it runs longer then something went wrong. You can specify such a limit with the following directive:

{{{
#PBS -l walltime=02:00:00
}}}

You do not have to specify a maximal walltime in that case your job will run eternally... unless the cluster crashes... and it may be interrupted by shorter running jobs.

{{{#!tip
'''Wall time format'''

You should always specify the wall time in HH:MM:SS format. If you were to write ''walltime=120:00'' your program would get killed after 2 hours of work as the setting is read as 120 minutes, 0 seconds.
}}}
Line 100: Line 146:

In the following example we do not actually call some binary of our own, we just run a few standard commands and exit. Since the submission script is nothing more than a regular shell script, the example should print out what host it is running on to our standard output file.
Line 131: Line 179:


== Queues ==

There is only one queue available on the compute cluster at the moment. This is bound to change once the cluster is used more heavily and we can make better sense of what is needed.

The default queue is called '''test''' and it allows up to 35,000 jobs to be queued and up to 1,200 jobs to run simultaneously.

== HOWTOs / Tutorials ==

 * [[InfolabClusterComputeHowtoSingle|How to]] run a single core job on the cluster

Infolab Compute Cluster

Access

To submit the jobs to the compute cluster you need to log in to the submission node ilhead1.stanford.edu. Use your CS credentials to log in.

ssh your_cs_id@ilhead1.stanford.edu

Job scheduling

All the jobs are submitted with Torque resource manager and are scheduled by the MAUI scheduler. Please do not log in to the nodes directly and run jobs from there.

Torque used to be called PBS, so if you see any resources talking about the PBS resource manager those more or less apply to Torque as well. Also please excuse us if we use PBS and Torque interchangeably.

Qsub

qsub is the main command that submits your job to the cluster. The command uses the following syntax:

qsub -V script_file

So if I have a script called runjob.sh that I would like to run on a cluster I can do so by executing the following:

qsub -V myjob.sh

script_file should be a text file

The script_file should contain the name and the path to your executable file and extra instructions that tell the resource manager how to run your job. Don't worry, we'll talk more about those later.

script_file must not be binary/executable file

Never use qsub to submit a binary executable to the resource manager. This will result in a successful job submission, but the runner that is the job is assigned to will fail to execute it with a "Cannot execute a binary file" error.

Resource manager directives

These directives tell the resource manager how to run your job. All of the directives start with a pound character (#) immediately followed by the keyword PBS:

#PBS -directive options

Name

This directive tells the resource manager which name to use for your job. If you do not specify it, the name of your submission script will be used.

#PBS -N InfolabClusterTutorial

Standard output

Since you never know which server your program will run on once you submit it to the cluster, the resource manager will deposit the standard output and standard error streams to a set of files in the directory where your submission script ran from.

By default the resource manager will redirect all standard output of a job to a file named jobname.ojobid. So if you submitted myjob.sh and the resource manager assigned it the ID 711, the standard output will be saved to myjob.sh.o711. If you provided the Name directive discussed in the previous section, then your default standard output will be saved to InfolabClusterTutorial.o711.

You can override this behavior by using the -o directive:

#PBS -o /dfs/rulk/0/mydir/myjob.out

This will save all the standard output to the file /dfs/rulk/0/mydir/myjob.out. Please note that the file will be overwritten if you run the job more than once.

Error output

By default the resource manager will redirect all output to standard error of a job to a file named jobname.ejobid. So if you submitted myjob.sh and the resource manager assigned it the ID 711, the standard output will be saved to myjob.e711. If you provided the Name directive discussed in one of the previous sections, then your default standard error stream will be saved to InfolabClusterTutorial.e711.

You can override this behavior by using the -e directive:

#PBS -e /dfs/rulk/0/mydir/myjob.error

This will save the standard error stream to the file /dfs/rulk/0/mydir/myjob.error. Please note that the file will be overwritten if you run the job more than once.

Mail directive

This directive tells the resource manager to send you an e-mail when your job is started and when it is finished. In a cluster environment your job may not start immediately as it depends on the other jobs that are currently in cluster's queues. The following will send you and e-mail both when the job starts executing and when it finishes.

#PBS -m be

Please note that the e-mail will not be delivered to your main CS account, but rather to the local mail queue on the submission node (you could set up forwarding, but that should be a topic of another wiki page).

Parallel jobs

You may specify that you want your job to run on multiple cores and multiple nodes with the following directive:

#PBS -l nodes=node_no:ppn=core_no

In the example above the node_no represents the number of nodes (physical servers) that you are requesting and the core_no represents the number of cores that you would like to use on each of the nodes. If you would like to use 6 cores on a single node you could do it with the following directive:

#PBS -l nodes=1:ppn=6

Here is another example requesting two nodes with 32 cores each:

#PBS -l nodes=2:ppn=32

Parallel jobs

Please bear in mind that qsub will not to anything to make your job parallel. That is why you should only make requests for more than one core if your program is multi core or multi thread capable. If your program is not written in a parallel manner it will only run on a single core and your 32-core reservation will just waste system resources for others.

Number of requested cores

Please do not make requests that the cluster is not able to handle. If you submit a job with the directive -l nodes=1:ppn=128 this job will actually never run on the current configuration of the cluster as we do not have nodes with 128 cores. Please consult the cluster's hardware capabilities before using this directive.

Running time

This directive lets you specify a maximum walltime (sum of CPU time and wait time) that can be used by your job. This may be useful in a situation where you know your job should run no longer than 2 hours and if it runs longer then something went wrong. You can specify such a limit with the following directive:

#PBS -l walltime=02:00:00

You do not have to specify a maximal walltime in that case your job will run eternally... unless the cluster crashes... and it may be interrupted by shorter running jobs.

'''Wall time format'''

You should always specify the wall time in HH:MM:SS format. If you were to write ''walltime=120:00'' your program would get killed after 2 hours of work as the setting is read as 120 minutes, 0 seconds.

An example submission script

In the following example we do not actually call some binary of our own, we just run a few standard commands and exit. Since the submission script is nothing more than a regular shell script, the example should print out what host it is running on to our standard output file.

#PBS -N my_job_name
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:10:00

echo "I am running on:"
hostname
sleep 20

Paths

You should always use your home directory (which is on the AFS filesystem) or one of the filesystems mounted under /dfs for your scripts, your programs and the datafiles needed for your job. You should also make sure to always use a full (absolute) path specification.

This means that using ./myjob to run your program from a submission script in your home directory is a bad idea. You should all it like this:

/afs/cs.stanford.edu/u/your_csid/myjob

You can save yourself some typing by using environment variables. You could use $HOME/myjob in the example above. If you decide to use environment variables, make sure that you run qsub with the -V parameter as we are showing you throughout this tutorial. The -V parameter makes sure that the environment variables are available to the submission script.

What is mounted under /dfs

  • /dfs/hulk/0 points to /lfs/hulk/0 on hulk.stanford.edu

  • /dfs/rulk/0 points to /lfs/rulk/0 on rulk.stanford.edu

  • /dfs/rocky/0 points to /lfs/rocky/0 on rocky.stanford.edu

  • /dfs/hulk/0 points to /lfs/hulk/0 on hulk.stanford.edu

  • /u points to /afs/cs.stanford.edu/u and contains user home directories

Queues

There is only one queue available on the compute cluster at the moment. This is bound to change once the cluster is used more heavily and we can make better sense of what is needed.

The default queue is called test and it allows up to 35,000 jobs to be queued and up to 1,200 jobs to run simultaneously.

HOWTOs / Tutorials

  • How to run a single core job on the cluster


  • #PBS -l nodes=1:ppn=1: specifies that I would like my job to run on a single node (nodes) and on a single core (ppn)
  • #PBS -l walltime=01:10:00: specifies the amount of real time I anticipate that my script will need to finish. Please note that the scheduler will terminate my script if it does not finish in time.

For a more comprehensive list of resources that you can slecify with #PBS -l see here: http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml. Note however, that there is currently only one queue without very many parameters set.

Make sure that your job uses data from:

  • Your CS home directory (whatever is under /afs/cs.stanford.edu/u/your_csid on hulk, rocky and snapx, please note that user home directories are not yet available under /u/your_csid on snapx)
  • Network mounted directories from rocky and hulk:
    • /dfs/hulk/0
    • /dfs/rocky/0

Here is an example of a bit more complex script to run an MPI job (copied from http://csc.cnsi.ucsb.edu/docs/running-jobs-torque):

#PBS -l nodes=2:ppn=4

# Make sure that we are in the same subdirectory as where the qsub command 
# is issued. 
cd $PBS_O_WORKDIR 

#  make a list of allocated nodes(cores)
cat $PBS_NODEFILE > nodes

# How many cores total do we have?
NO_OF_CORES=`cat $PBS_NODEFILE | egrep -v '^#'\|'^$' | wc -l | awk '{print $1}'`
NODE_LIST=`cat $PBS_NODEFILE `

# Just for kicks, see which nodes we got.
echo $NODE_LIST

# Run the executable. *DO NOT PUT* a '&' at the end!!
mpirun -np $NO_OF_CORES -machinefile nodes ./pi3 >& log 

Submitting your job

Now that your job is prepared you have to submit it to the resource manager. Use qsub to submit your jobs:

qsub myjob.sh

Make sure you run qsub from your CS home directory or from a network mounted filesystem (see above). Once the job is finished output data will wait for you in the same directory and there will be two additional files that end in e<job#> and o<job#>. These two are stderr and stdout, respectively.

Check the status of your job

You can check what is happening with your job with the qstat command:

qstat jobid

jobid is the number that the resource manager assigned to your job (the first number qsub will output after you successfully submit a job).

Other useful commands

  • qdel job_id: deletes your job
  • qstat -q: lists all queues
  • qstat -a: lists all jobs
  • qstat -au userid: lists all jobs submitted by userid
  • pbsnodes: list status of all the compute nodes

More coming soon...