Please note that the cluster is still in the testing phase. Things might break. Please join the mailing list and report any glitches that you come across.
- 2 head nodes
- 2 development nodes
- 36 compute nodes:
- 1152 cores
- 2.25 TB RAM
- Each node:
- 2x AMD Opteron 6276 (Interlagos) @2.3GHz - 3.2GHz, 16 cores/CPU, AMD64, VT
- 64 GB RAM
- 2 TB local HDD
- TORQUE resource manager
- MAUI scheduler
- Planned: Hadoop
There is a mailing list for all those interested in what is currently happening with the cluster and the configuration of the cluster:
List address: firstname.lastname@example.org
Manage your subscription: https://mailman.stanford.edu/mailman/listinfo/ilcluster
Login to the headnode
You can submit jobs to the cluster from the head node: snapx.Stanford.EDU (sorry about the name, the head node will be renamed to ilh soon). First SSH to the headnode:
Use your CSID username and password to login to the headnode.
Preparing your job
Preparing your job for the scheduler is as simple as adding a few comments to a script that runs your program. Here is an example:
#PBS -N my_job_name #PBS -l nodes=1:ppn=1 #PBS -l walltime=01:10:00 echo "I am running on:" hostname sleep 20
The comment lines that start with the PBS keyword let you select different PBS options:
- #PBS -N: lets you specify a friendly job name
- #PBS -l nodes=1:ppn=1: specifies that I would like my job to run on a single node (nodes) and on a single core (ppn)
- #PBS -l walltime=01:10:00: specifies the amount of real time I anticipate that my script will need to finish. Please note that the scheduler will terminate my script if it does not finish in time.
For a more comprehensive list of resources that you can slecify with #PBS -l see here: http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml. Note however, that there is currently only one queue without very many parameters set.
Make sure that your job uses data from:
- Your CS home directory (whatever is under /afs/cs.stanford.edu/u/your_csid on hulk, rocky and snapx, please note that user home directories are not yet available under /u/your_csid on snapx)
- Network mounted directories from rocky and hulk:
Submitting your job
Now that your job is prepared you have to submit it to the resource manager. Use qsub to submit your jobs:
Make sure you run qsub from your CS home directory or from a network mounted filesystem (see above). Once the job is finished output data will wait for you in the same directory and there will be two additional files that end in e<job#> and o<job#>. These two are stderr and stdout, respectively.
Check the status of your job
You can check what is happening with your job with the qstat command:
Since the cluster is running Torque you need to submit a job to it.
The compute cluster is running the TORQUE resource manager with
More coming soon...
on all nodes Hulk's filesystem is mounted on: /dfs/hulk/0 Rocky's filesystem is mounted on: /dfs/rocky/0