Login: ssh username@hpc.oit.uci.edu
Help: cat /data/help/cheat-sheet.txt
Guidebook on how to use the BioCluster created by Kevin Thornton:
Example to run jobs on the cluster:
Batch jobs are jobs that contain all of necessary information and instructions to run inside a script. You create a script with your favorite editor (like emacs) and then submit the script to the scheduler to run.
Some jobs can run for days, weeks, or longer so batch is the way to go for such work. Once you submit a job to the scheduler, you can log off and come back at a later time and check on the results.
Serial batch jobs are usually the simplest to use. Serial jobs run with only one core and are also the slowest since they only consume 1-core per job.
Consider the following serial job script available from the HPC demo account.
- cat ~demo/serial.sh
#!/bin/bash #$ -N TEST #$ -q free64 #$ -m beas date > out
Grid Engine Directive | What It Does |
---|---|
#!/bin/bash
|
Running shell to use ( the bash shell )
|
#$ -N TEST
|
Our Job Name is TEST. If output is produced to standard out, you will see a file name TEST.o<jobid> and TEST.e<jobid> for errors (if any occurred)
|
#$ -q free64
|
Request the free64 queue
|
#$ -m beas
|
Send you email of job status (b)egin, (e)rror, (a)bort, (s)suspend
|
The first line #!/bin/bash is the shell to use. Grid Engine (GE) directives start with #$. GE directives are needed in order to tell the scheduler what queue to use, how many cores to use, whether to send email or not, etc.
The last line in our serial.sh script is the program to run. In this example it is a simple date program writing the output to out file.
date > out
Now that we have a basic understanding let’s run our first serial batch job on the HPC Cluster. First create a test directory, change to the test directory, copy the demo serial.sh script to our new directory and submit the job.
From your HPC account, do the following:
$ mkdir serial-test $ cd serial-test $ cp ~demo/serial.sh . $ qsub serial.sh $ qstat -u $USER
After you submit the job (qsub), GE will respond with a job ID:
Your job 1961 ("TEST") has been submitted
and qstat will display something similar to this:
job-ID prior name user state submit/start queue slots 1961 0.00000 TEST jfarran qw 08/16/2012 1
The state of our job is qw queue wait (meaning the job is sitting in the queue waiting for a compute node). The core count (slots) shows as 1 (this is the default which is one core).
When we run qstat -u $USER again a few seconds later, we see:
job-ID prior name user state submit/start queue slots 1961 0.50659 TEST jfarran r 08/16/2012 free64@compute-7-11 1
The scheduler found compute-7-11 on free64 queue available with 1 core (slots) and started our job #1961 on it. The job state changed from queue wait qw to running r.
Once you submit your job (qsub), things happen rather quickly so you may need to type qstat repeatedly and fast to see your job. Or open a new window and run: watch -d "qstat -u $USER" |
Once the job completes you will get an email notification and the qstat output will be empty.
Now do an ls and you will see the following files:
out serial.sh
The serial.sh is the batch job we submitted and file out is the output from the date program. To see the output type:
$ cat out
No comments:
Post a Comment