Contents

Infrastructure
Storage options
Long running sessions
Compute cluster
Hadoop cluster
Q&A

Infrastructure

The InfolabComputeServersStats lists our big memory servers. These are first come first serve (nobody is managing the resources). If you are on a tight deadline or you feel that somebody is hogging the machine talk to them or talk to your local sysadmin.

You can access this machines by logging in via ssh with your CSID. Here's an example:

~$ ssh tommy@madmax5
tommy@madmax5's password: 
Last login: Sun Oct  4 10:54:11 2015 from whale.stanford.edu
tommy@madmax5:~$

Using Windows? Use PuTTY or Cygwin.

Which server to pick?

The one that's feeling lonely (shows zero or little utilization). As a general rule of thumb Chris' students should go for raidersX machines and Jure's students should go for madmaxX machines.

How do I set up passwordless/key based ssh?

You do not. It's a limitation of our configuration. You need to supply your password on login. Of course you can be resourceful and do:

~$ ssh tommy@madmax5
tommy@madmax5's password: 
Last login: Tue Oct  6 00:52:52 2015 from whale.stanford.edu
tommy@madmax5:~$ ssh madmax3
Last login: Fri Oct  2 09:05:37 2015 from whale.stanford.edu
tommy@madmax3:~$

Did you notice that we never got asked for a password when logging into madmax3? Magic, huh? It could be just Kerberos.

But I hate typing in my password all the time!

No problem. Install Kerberos on your workstation. Here are some installation notes: KerberosMac | KerberosWindows.

Can I login from home?

Depends where home is. Logins over ssh should work from almost¹ any network on Stanford Campus. If you're not on campus you have three options.

1) whale

Login via ssh to whale.stanford.edu first. Whale is on one of our networks so you can ssh to other machines from there. Here's an example session.

~$ ssh tommy@whale.stanford.edu
tommy@whale.stanford.edu's password: 
Last login: Sun Oct  4 10:53:42 2015 from c-76-111-212-54.hsd1.ca.comcast.net
tommy@whale:~$ ssh madmax3
Last login: Tue Oct  6 01:17:08 2015 from madmax5.stanford.edu
tommy@madmax3:~$ hostname
madmax3.stanford.edu
tommy@madmax3:~$ exit
logout
Connection to madmax3 closed.
tommy@whale:~$ exit
logout
Connection to whale.stanford.edu closed.

Hey... and this has an added benefit of only typing your password once. Yay!

2) Stanford VPN

Setup the Stanford VPN connection, establish the connection and you're done.

3) Infolab VPN

Check out the instructions on the VPN page. The benefit of this VPN service is that it looks like regular https traffic... so it should work from most hotels, airports, etc.

Storage options

Storage option	Mount point	Good for	Speed	Backed up?
Your home directory	/afs/cs/u/tommy	Stuff that matters, e.g. results, code	Not really	Daily
Local hard disk	/lfs/local/0	Temporary files, intermediate results	Around 150 MB/s	No
Network storage	/dfs/scratchX	Datasets, things you need accessible across multiple servers	Up to 450 MB/s, but shared!	No

It is not common but a server could have multiple local volumes (think of it as having multiple disks) so check if there is an /lfs/local/1 if you're running out of space.

Long running sessions

So... You have some sweet python code that takes 2 days to run. Madmax5 is feeling lonely and you figure you'd just run your sweet.py there. Easy, right:

~$ ssh tommy@madmax5
tommy@madmax5:~$ python2.7 sweet.py 
Starting sweet pie...
1 minutes...
2 minutes...
^Z
[1]+  Stopped                 python2.7 sweet.py
tommy@madmax5:~$ bg 1
[1]+ python2.7 sweet.py &
tommy@madmax5:~$ exit
Connection to madmax5 closed.

This should be all good when you log back in to madmax5, right? Not really... Even if the job survives you can't really re-attach it so there's no easy way to see what the job is up to<<Footnote(Not to mention that the sweet process has no parent now and might eventually just get killed.)>>. That's where screen & tmux come in. What?! Think of the two as virtual terminals... You know how you can have multiple tabs open in some applications? Think of screen & tmux as tabs for your ssh session. Here's a quick session:

~$ ssh tommy@madmax5
# Let's start a screen session (open a new tabbed thingie)
tommy@madmax5:~$ screen -s myScreen
# Run something in the first tab
tommy@madmax5:~$ uptime
 14:20:32 up 145 days,  8:48, 10 users,  load average: 33.03, 33.07, 33.08
# Create a new tab by pressing Ctrl+A, C (C is for create... and also for cookie)
tommy@madmax5:~$ python2.7 sweet.py
Starting sweet pie...
1 minutes...
2 minutes...
# Switch between the tabs by pressing Ctrl+A, N (for next) or Ctrl+A, P (for previous) or Ctrl+A, " (that brings up a list)
# Want to "minimize" screen and come back later? Press Ctrl+A, D (to detach)

Great, now we have tabs. What's so good about them? They stay open even after you log out. How do you get back to them?

~$ ssh tommy@madmax5
# Bring back the session we detached
tommy@madmax5:~$ screen -x myScreen
...
10 minutes...
11 minutes...
12 minutes...
13 minutes...
14 minutes...
15 minutes...

I lost my permissions when when I re-attached !@#$@#$&^!$!$!!

It has to do with Kerberos... and since somebody else already wrote a guide on it, here's a link: ScreenKerberos. Same thing applies to tmux.

Compute cluster

You want to run sweet.py two hundred times, possibly with different parameters. You could make a script that would ssh into a bunch of machines, open some screens and run some code. But luckily password-less ssh doesn't work and controlling screen from a script is not that easy.

Let's put together a submission script for the compute cluster:

#PBS -l nodes=1:ppn=1
#PBS -N SweetJob

python2.7 /dfs/scratch0/tommy/sweet.py

We saved that script to /dfs/scratch0/tommy/run_sweet.sh. Now we can run a job with 200 of these on the cluster with a single command (make sure to login to ilhead1 first):

~$ ssh tommy@ilhead1
tommy@ilhead1:~$ qsub -t 1-200 /dfs/scratch0/tommy/run_sweet.sh

You can check on your job from time to time with qstat:

qstat -a

You'll get a bunch of *.o* and *.e* files in the directory you submitted the job from. These contain standard output and standard error output for all the tasks.

Pro tip

Submit the job from /dfs/... and put all your code and dependencies there too.

Read more about the cluster here: InfolabClusterCompute

Hadoop cluster

What's all the hubub? It's distributed and it spreads the data blocks between multiple nodes and the resource manager/scheduler/whateveryouwanttocallit is aware of the data locations. Which means we can "grep" through a 50TB dataset in about half an hour. Cool, right?

How do I get access?

You'll need a CSID and a home directory on the HDFS. You probably already have your CSID (if you don't, congrats for reading this through anyway). Your sysadmin can take care of the home directory (if you ask nicely).

Where do I? How do I?

This are the nodes that have the hadoop packages installed:

madmax
madmax2
madmax3
madmax4
madmax5

Here's how you list the contents of your HDFS home directory:

hadoop fs -ls /user/tommy

Here's how you submit a job to the cluster:

hadoop jar <jarfile> <param1> <param2> ...

More Hadoop info

Examples and usage: InfolabClusterHadoop
Current cluster status: ilHadoopStatus
Current cluster statistics: ilHadoopStats
HDFS info: http://ilhadoop1.stanford.edu:50070/dfshealth.html
Application Tracker: http://ilhadoop1.stanford.edu:8088/cluster

Q&A

Who is Tommy?

Tommy is his name place holding is his game. His friends are Bill Oddie, Private Tentpeg, and Airman Snuffy. Tommy likes to secretly volunteer for this wiki. The agencies involved can neither confirm nor deny that Corporal Schumuckatelli is involved in this matter.

Footnotes

Since the security was tightened your wireless device might fall onto a network that is considered as the public internet... Which means that you'll appear as if you're not on campus as far as our servers are concerned. (1)

QuickStart

Menu

SNAP

Wiki