Users Guide
$Revision: 1.5 $This document outlines everything a user needs to know about getting priority access to resources and submitting urgent jobs. Reading the Work Flow section would be useful before proceeding futher. Please note that this version uses the Globus Toolkit Distinguished Names (DN) as the primary user identifier. If your site requires other mechanisms, contact the SPRUCE Team for further information.
Table of Contents
Introduction
Special PRiority and Urgent Computing Environment
The SPRUCE Portal functions as the one stop shop for attaining priority access
and monitoring usage.Users are provided with
Portal
The SPRUCE portal provides a single-point of administration and authorization for urgent computing across an entire Grid. This section details the functionality of attaining priority access and managing users who can submit urgent jobs. Step-by-step screenshots of activity are provided for convenience.
The main Portal Home Page provides functionality to manage
your tokens and user information. Built upon AJAX technologies and
web services,
[TOC]
Right-Of-Way Token
A
The SPRUCE Right-of-Way Token has a 16 digit number printed on it as below. This number will be used as the login to the portal.
SPRUCE Right-of-Way Token
In case of emergency the person holding the token has to activate it from the Web portal. Once activated, Tokens have a finite lifetime. Typical token lifetime ranges from 4-24 hours, this can vary for every token. After activation, the user has a window lasting for the token lifetime where he can submit Urgent Computing jobs. When the time runs out, they must spend another Token. If no jobs, or only standard priority jobs are submitted, nothing happens, and the Token simply expires.
If you would like to request tokens for your project, please contact the Spruce Team about how to get one.
[TOC]
Token Info
If you wish to see information related to a particular token, manage
the users associated with it, or check remaining time, you would need
to login using the
On login, all interesting information about the token, such as its status, lifetime, maximum urgency level, expiration date, associated resources and any users already associated with it are displayed.
[TOC]
Activating Tokens
Depending on the status of the token, whether activated or not, the options change
between 'checktime' and 'activate' the token. If you have an unactivated token,
then you can turn it on, by clicking on the activate option. A
A view of the token status is returned, where you can see that it is now 'Activated'.
[TOC]
Adding Users to Token
The user identities can be added to the token
Once the token has been activated, the team members who have their identities on the token can submit emergency jobs. The form of identification in the current distribution is Globus Toolkit's Distinguished Names (DN). The PI handing the token, needs to find the DN for each team member. This information is found in the grid-mapfile of any site. The typical command line to find this information is -
grep user-name-of-member /etc/grid-security/grid-mapfile
If you encounter any trouble identifying DN, please contact the SPRUCE Team. The DN of any user, along with his real name and email address for contact should be entered in the interface. All of these fields are necessary. Any number of users can be added per token and every one gets the permission to submit urgent jobs as long as the token is alive.
On completion, the user then shows up in the list of users associated with this token. Note that it was empty earlier in this example (previous screenshot).
Removing Users
If any user is no longer needed to make the job submissions, he can be removed from the active users list on a given token. Aslo, as user information cannot be modified, users need to be removed and added on again, if any details were entered wrong by mistake.
The list of users associated with a token has a 'remove' buttong beside each name. You just need to click on that and confirm that you indeed want to remove this particular user.
On completion, the user is removed from the list of users associated with this token.
[TOC]
Check Time
Once the token has been activated, the time remaining for submissions starts counting down. Clicking on the 'checktime' button on an activated token, shows you the time left, and how fresh this information is.
[TOC]
User Info
Any user who wants to know if he has any active tokens, and more information, can login to the portal using his email address and DN. The user does not need to know the token number.
On login, all pertinent information is displayed about active and not yet activated, but still unexpired tokens. Details include maximum urgency level, lifetime, resources etc.
Job Submission
We currently support two forms of job submission - either from using the Globus Toolkit provided globus-run command or direct command line submission akin to qsub, llsubmit or bsub depending on your local resource manager. The idea is to support both distributed Grids running Globus as well as traditional supercomputers.
Globus
The current software is compatible to run with Globus
Toolkit 4.0.1 PRE-WS version. Depending on the site
chosen, you need to idenfity the contact information
and job manager name. Typical name of the job manager
is
When submitting an urgent computing job, the user needs
to specifiy an additional RSL parameter called
UC/ANL resource manager contact: tg-grid1.uc.teragrid.org/jobmanager-spruce urgency=yellow/orange/red
Example resource manager contact and job manager for UC/ANL TG resource
An example RSL job file is given below. Please note that some parameters such as host_xcount may vary depending on the site you wish to submit to.
+ (&(resourceManagerContact = tg-grid1.uc.teragrid.org/jobmanager-spruce) (executable = $ENV{HOME}/spruce/demo/mpihello) (jobType = mpi) (directory = $ENV{HOME}/spruce/demo/) (host_types = ia64-compute) (host_xcount = 30)(urgency = red) (stdout = $ENV{HOME}/spruce/demo/stdout) (stderr = $ENV{HOME}/spruce/demo/stderr) )
Example RSL submission file to use with 'globus-run'
If the user does not have a valid token activated at the Spruce Portal,
the job submission will be aborted and the gram_log will contain an
error message pertinent to the situation. Otherwise,the job gets
submitted successfully and by doing a
User does not have a valid token : > globusrun -o -f globus_test.rsl > more gram_job_mgr_some_number.log ........ 2/9 10:46:04JMI: while return_buf = No Valid Token found for user = your_name, aborting urgent job submission ........User has a valid token with the policy of getting next in queue position: > qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 238316.tg-master g40.batch user1 00:00:00 R dque 238337.tg-master ...ps.10x2.randh user1 0 Q dque 238349.tg-master STDIN user2 00:00:00 R dque 238350.tg-master ...ps.10x2.randh user1 0 Q dque 238353.tg-master job user3 0 R dque 238354.tg-master job user3 0 Q dque 238355.tg-master job user3 0 Q dque 238356.tg-master job user3 0 Q dque 238357.tg-master job user3 0 Q dque 238358.tg-master job user4 0 Q dque 238359.tg-master job user4 0 Q dque 238360.tg-master job user4 0 Q dque238361.tg-master STDIN your_name 0 R spruce
Example job run with and without active token
[TOC]
The Wrapper:'spruce_sub'
Inorder to make the system compatible with traditional supercomputers
or users who wish to use direct command line job submission tools rather
than Globus, SPRUCE provides a wrapper command called
The 'spruce_sub' wrapper works exactly the same way as local job scripts,
with an additional
Usage: spruce_sub [urgency=yellow/orange/red] job_script
Command line usage of the 'spruce_sub' command
$TG_COMMUNITY is a
Teragrid wide standard, so there should not be any problems accessing
the script. If running from a non TG local resource, please contact
your administrator about access location of the script.
The job script can remain exactly same as your original version.
Nothing needs to be changed in there, the urgency is indicated
at the command line. If the user does not have a valid token input
with the Spruce Portal, the job submission will be aborted and an
error message pertinent to the situation will be displayed. Otherwise,
by doing a
Example PBS script - Any generic job submission script #!/bin/csh # Running in C shell #PBS -N spruce_job # Name of the job #PBS -l nodes=4:ia64-compute:ppn=1 # Number and type of nodes #PBS -l walltime=0:10:00 # Maximum wall clock run time #PBS -o out # Standard output #PBS -e err # Standard Input #PBS -V # Ship environment variables mpirun -np 4 $ENV{HOME}/spruce/demo/mpihello # ExecutableUser does not have a valid token : > $TG_COMMUNITY/spruce/spruce-sub urgency=red helloworld.pbsNo Valid Token found for user:your_name, aborting urgent job submission >User has a valid token with the policy of getting next in queue position: > qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 238316.tg-master g40.batch user1 00:00:00 R dque 238337.tg-master ...ps.10x2.randh user1 0 Q dque 238349.tg-master STDIN user2 00:00:00 R dque 238350.tg-master ...ps.10x2.randh user1 0 Q dque 238353.tg-master job user3 0 R dque 238354.tg-master job user3 0 Q dque 238355.tg-master job user3 0 Q dque 238358.tg-master job user4 0 Q dque 238359.tg-master job user4 0 Q dque238361.tg-master spruce_job your_name 0 R spruce
Example job run using 'spruce_sub' on the TG resources
Troubleshooting
Please contact Spruce Team for any questions or problems using Spruce software.
[TOC]