Locked History Actions

Diff for "QuickStart"

Differences between revisions 36 and 37
Revision 36 as of 2015-10-16 21:32:40
Size: 6586
Editor: akrevl
Comment:
Revision 37 as of 2015-10-16 21:53:10
Size: 7324
Editor: akrevl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 86: Line 86:
So... Tommy has some sweet python code that takes 2 days to run. Madmax5 was feeling lonely so Tommy decided to run his sweet.py there. Easy, right: So... You have some sweet python code that takes 2 days to run. Madmax5 is feeling lonely and you figure you'd just run your sweet.py there. Easy, right:
Line 102: Line 102:
This should be fine when he logs back in to madmax5, right? Not really... Even if the job survives Tommy can't really re-attach it so there's no easy way to see what that sweet.py is doing<<Footnote(Not to mention that the swet process has no parent now and might eventually just get killed.)>>. That's where [[https://www.gnu.org/software/screen/|screen]] & [[https://tmux.github.io/|tmux]] come in. What?! Think of the two as virtual terminals. You know how you can have multiple tabs open in some applications on your OS? Think of screen & tmux as tabs for your ssh session. Here's a quick session: This should be all good when you log back in to madmax5, right? Not really... Even if the job survives you can't really re-attach it so there's no easy way to see what the job is up to<<Footnote(Not to mention that the sweet process has no parent now and might eventually just get killed.)>>. That's where [[https://www.gnu.org/software/screen/|screen]] & [[https://tmux.github.io/|tmux]] come in. What?! Think of the two as virtual terminals... You know how you can have multiple tabs open in some applications? Think of screen & tmux as tabs for your ssh session. Here's a quick session:
Line 137: Line 137:
It has to do with Kerberos... and since somebody else already wrote a guide on it, here's a link: ScreenKerberos. Same thing applies to tmux (just replace screen with tmux in all what you've read... oh and it uses Ctrl+B). It has to do with Kerberos... and since somebody else already wrote a guide on it, here's a link: ScreenKerberos. Same thing applies to tmux.
Line 141: Line 141:
Create a python script that outputs something useless.
Prep a bash script that runs on the cluster
You want to run sweet.py two hundred times, possibly with different parameters. You could make a script that would ssh into a bunch of machines, open some screens and run some code. But luckily password-less ssh doesn't work and controlling screen from a script is not that easy.
Line 144: Line 143:
Let's put together a submission script for the compute cluster:

{{{
#PBS -l nodes=1:ppn=1
#PBS -N SweetJob

python2.7 /dfs/scratch0/tommy/sweet.py
}}}

We saved that script to /dfs/scratch0/tommy/run_sweet.sh. Now we can run a job with 200 of these on the cluster with a single command:

{{{
qsub -t 1-200 /dfs/scratch0/tommy/run_sweet.sh
}}}

You can check on your job from time to time with qstat:

{{{
qstat -a
}}}

You'll get a bunch of *.o* and *.e* files in the directory you submitted the job from. These contain standard output and standard error output for all the tasks.

Read more here: InfolabClusterCompute

Infrastructure

The InfolabComputeServersStats lists our big memory servers. These are first come first serve (nobody is managing the resources). If you are on a tight deadline or you feel that somebody is hogging the machine talk to them or talk to your local sysadmin.

You can access this machines by logging in via ssh with your CSID. Here's an example:

~$ ssh tommy@madmax5
tommy@madmax5's password: 
Last login: Sun Oct  4 10:54:11 2015 from whale.stanford.edu
tommy@madmax5:~$ 

Using Windows? Use PuTTY or Cygwin.

Which server to pick?

The one that's feeling lonely (shows zero or little utilization). As a general rule of thumb Chris' students should go for raidersX machines and Jure's students should go for madmaxX machines.

How do I set up passwordless/key based ssh?

You do not. It's a limitation of our configuration. You need to supply your password on login. Of course you can be resourceful and do:

~$ ssh tommy@madmax5
tommy@madmax5's password: 
Last login: Tue Oct  6 00:52:52 2015 from whale.stanford.edu
tommy@madmax5:~$ ssh madmax3
Last login: Fri Oct  2 09:05:37 2015 from whale.stanford.edu
tommy@madmax3:~$ 

Did you notice that we never got asked for a password when logging into madmax3? Magic, huh? It could be just Kerberos.

But I hate typing in my password all the time!

No problem. Install Kerberos on your workstation. Here are some installation notes: KerberosMac | KerberosWindows.

Can I login from home?

Depends where home is. Logins over ssh should work from almost1 any network on Stanford Campus. If you're not on campus you have three options.

1) whale

Login via ssh to whale.stanford.edu first. Whale is on one of our networks so you can ssh to other machines from there. Here's an example session.

~$ ssh tommy@whale.stanford.edu
tommy@whale.stanford.edu's password: 
Last login: Sun Oct  4 10:53:42 2015 from c-76-111-212-54.hsd1.ca.comcast.net
tommy@whale:~$ ssh madmax3
Last login: Tue Oct  6 01:17:08 2015 from madmax5.stanford.edu
tommy@madmax3:~$ hostname
madmax3.stanford.edu
tommy@madmax3:~$ exit
logout
Connection to madmax3 closed.
tommy@whale:~$ exit
logout
Connection to whale.stanford.edu closed.

Hey... and this has an added benefit of only typing your password once. Yay!

2) Stanford VPN

Setup the Stanford VPN connection, establish the connection and you're done.

3) Infolab VPN

Check out the instructions on the VPN page. The benefit of this VPN service is that it looks like regular https traffic... so it should work from most hotels, airports, etc.

Storage options

Storage option

Mount point

Good for

Speed

Backed up?

Your home directory

/afs/cs/u/tommy

Stuff that matters, e.g. results, code

Not really

Daily

Local hard disk

/lfs/local/0

Temporary files, intermediate results

Around 150 MB/s

No

Network storage

/dfs/scratchX

Datasets, things you need accessible across multiple servers

Up to 450 MB/s, but shared!

No

It is not common but a server could have multiple local volumes (think of it as having multiple disks) so check if there is an /lfs/local/1 if you're running out of space.

Long running sessions

So... You have some sweet python code that takes 2 days to run. Madmax5 is feeling lonely and you figure you'd just run your sweet.py there. Easy, right:

~$ ssh tommy@madmax5
tommy@madmax5:~$ python2.7 sweet.py 
Starting sweet pie...
1 minutes...
2 minutes...
^Z
[1]+  Stopped                 python2.7 sweet.py
tommy@madmax5:~$ bg 1
[1]+ python2.7 sweet.py &
tommy@madmax5:~$ exit
Connection to madmax5 closed.

This should be all good when you log back in to madmax5, right? Not really... Even if the job survives you can't really re-attach it so there's no easy way to see what the job is up to<<Footnote(Not to mention that the sweet process has no parent now and might eventually just get killed.)>>. That's where screen & tmux come in. What?! Think of the two as virtual terminals... You know how you can have multiple tabs open in some applications? Think of screen & tmux as tabs for your ssh session. Here's a quick session:

~$ ssh tommy@madmax5
# Let's start a screen session (open a new tabbed thingie)
tommy@madmax5:~$ screen -s myScreen
# Run something in the first tab
tommy@madmax5:~$ uptime
 14:20:32 up 145 days,  8:48, 10 users,  load average: 33.03, 33.07, 33.08
# Create a new tab by pressing Ctrl+A, C (C is for create... and also for cookie)
tommy@madmax5:~$ python2.7 sweet.py
Starting sweet pie...
1 minutes...
2 minutes...
# Switch between the tabs by pressing Ctrl+A, N (for next) or Ctrl+A, P (for previous) or Ctrl+A, " (that brings up a list)
# Want to "minimize" screen and come back later? Press Ctrl+A, D (to detach)

Great, now we have tabs. What's so good about them? They stay open even after you log out. How do you get back to them?

~$ ssh tommy@madmax5
# Bring back the session we detached
tommy@madmax5:~$ screen -x myScreen
...
10 minutes...
11 minutes...
12 minutes...
13 minutes...
14 minutes...
15 minutes...

I lost my permissions when when I re-attached !@#$@#$&^!$!$!!

It has to do with Kerberos... and since somebody else already wrote a guide on it, here's a link: ScreenKerberos. Same thing applies to tmux.

Compute cluster

You want to run sweet.py two hundred times, possibly with different parameters. You could make a script that would ssh into a bunch of machines, open some screens and run some code. But luckily password-less ssh doesn't work and controlling screen from a script is not that easy.

Let's put together a submission script for the compute cluster:

#PBS -l nodes=1:ppn=1
#PBS -N SweetJob

python2.7 /dfs/scratch0/tommy/sweet.py

We saved that script to /dfs/scratch0/tommy/run_sweet.sh. Now we can run a job with 200 of these on the cluster with a single command:

qsub -t 1-200 /dfs/scratch0/tommy/run_sweet.sh

You can check on your job from time to time with qstat:

qstat -a

You'll get a bunch of *.o* and *.e* files in the directory you submitted the job from. These contain standard output and standard error output for all the tasks.

Read more here: InfolabClusterCompute

Hadoop cluster

Q&A

Who is Tommy?

Tommy is his name place holding is his game. His friends are Bill Oddie, Private Tentpeg, and Airman Snuffy. Tommy likes to secretly volunteer for this wiki. The agencies involved can neither confirm nor deny that Corporal Schumuckatelli is involved in this matter.

Footnotes

  1. Since the security was tightened your wireless device might fall onto a network that is considered as the public internet... Which means that you'll appear as if you're not on campus as far as our servers are concerned. (1)