Banner
Title: Introductory Practical
Subtitle: OMII GridSAM
Tutor: Stephen Crouch, Steven Newhouse
Authors: OMII-UK

The OMII Introductory GridSAM Practical

Overview

This practical will cover two aspects of submitting jobs to the GridSAM OMII server.

The aim of GridSAM is to provide a Web Service for submitting and monitoring jobs managed by a variety of Distributed Resource Managers (DRM). The modular design allows third-parties to provide submission and file-transfer plug-ins to GridSAM.

Moreover the job management API used by the GridSAM web service can be embedded into a grid application that requires job submission and monitoring capabilities.

GridSAM installs on top of the WS-Security (authentication) layer (provided by the OMII WS container) and enables users to execute jobs on the OMII server.  GridSAM has its own client that runs on top of the OMII client and enables you to run jobs and monitor them.

Top

Job Submission/Monitoring: No File Staging

Objective: submit trivial job to GridSAM and monitor its progress.

This simply involves submitting a single, trivial job to GridSAM that has no inputs or outputs.

Look at <omii_client_home>/gridsam/data/examples/sleep.jsdl:

<JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
  <JobDescription>
   <Application>
    <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl-posix">
     <Executable>/bin/sleep</Executable> 
     <Argument>5</Argument> 
    </POSIXApplication>
   </Application>
  </JobDescription>
</JobDefinition>

The JSDL XML contains various declarative elements that describes the job.  In this case, a simple execution of the POSIX application /bin/sleep with 5 (seconds) as an argument.  We'll cover the structure of JSDL in more detail later.

Top

Submit Trivial Job to GridSAM Server

Type:

cd <omii_client_home>/gridsam/bin

Submit to GridSAM server (one line):

./gridsam-submit -s http://<server>:<port>/gridsam/services/gridsam?wsdl 
    -j ../data/examples/sleep.jsdl

A unique job ID is returned which should look something like the following:

urn:gridsam:18ce6dda0bf0fd73010bf5ea1d490001

This globally unique identifier allows you to reference your job when performing activities on it, such as monitoring its progress.

Top

Monitoring the Trivial Job

Monitor job until completion:

./gridsam-status -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j <unique_job_id>

The progress of the job is indicated by its current state.  Its state can be one of the following:

  • Pending: waiting to be executed

  • Staging-in: input data (if it exists) is being moved from source data locations so it can be used by the job

  • Staged-in: all input data (if it exists) is ready for the job

  • Active: the job is currently being executed

  • Executed: the job has finished execution

  • Staging-out: output data (if it exists) from the job is being moved to the target data locations designated for output

  • Staged-out: all output data (if it exists) has been successfully moved to their target locations

  • Done: job is complete

Following the job execution, you should see something like the following:

Job Progress: pending -> staging-in -> staged-in -> active -> executed ->
 staging-out -> staged-out -> done

--- pending - 2006-06-21 10:27:08.0 ---
job is being scheduled
--- staging-in - 2006-06-21 10:27:09.0 ---
staging files...
--- staged-in - 2006-06-21 10:27:09.0 ---
no file needs to be staged in
--- active - 2006-06-21 10:27:09.0 ---
'/bin/sleep 5' is being forked
--- executed - 2006-06-21 10:27:14.0 ---
'/bin/sleep 5' completed with exit code 0
--- staging-out - 2006-06-21 10:27:14.0 ---
staging files out...
--- staged-out - 2006-06-21 10:27:14.0 ---
no file needs to be staged out
--- done - 2006-06-21 10:27:14.0 ---
Job completed

--------------
Job Properties
--------------
urn:gridsam:exitcode=0

We can see that the job has successfully traversed all its stages and is complete.  If you're quick, you may catch your job in mid-execution and observe it in one of its intermediary stages!

If you specify the -x parameter in the gridsam-status command, you will get the same information, but in XML format (if automated XML parsing is perhaps required):

<gridsam:JobStatus xmlns:gridsam="http://www.icenigrid.org/service/gridsam">
    <gridsam:Stage>
        <gridsam:State>pending</gridsam:State>
        <gridsam:Description>job is being scheduled</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:08+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>staging-in</gridsam:State>
        <gridsam:Description>staging files...</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>staged-in</gridsam:State>
        <gridsam:Description>no file needs to be staged in</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>active</gridsam:State>
        <gridsam:Description>'/bin/sleep 5' is being forked</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>executed</gridsam:State>
        <gridsam:Description>'/bin/sleep 5' completed with exit code 0</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>staging-out</gridsam:State>
        <gridsam:Description>staging files out...</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>staged-out</gridsam:State>
        <gridsam:Description>no file needs to be staged out</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Stage>
        <gridsam:State>done</gridsam:State>
        <gridsam:Description>Job completed</gridsam:Description>
        <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time>
    </gridsam:Stage>
    <gridsam:Property name="urn:gridsam:exitcode"><![CDATA[0]]></gridsam:Property>
</gridsam:JobStatus>

Top

Job Submission/Monitoring: With File Staging

Objective: submit simple job with data input and output requirements and monitor progress.

This job is a little more complicated.  It involves 'staging in' (pulling in) input files so they can be used by the job, and 'staging out' (pushing out) an output file.

As can be seen from the architectural diagram, we'll need a mechanism for storing our input and output files on the client (your) machine.  When the job executes on the GridSAM server, it will contact the FTP server to upload input and download output when necessary.

Top

Job Submission/Monitoring: With File Staging - Application

Have a look at <omii_client_home>/gridsam/data/examples/remotecat-staging.jsdl:

<JobDefinition>
    <JobDescription>
        <JobIdentification>
            <JobName>cat job</JobName>
            <Description>cat job description</Description>
            <JobAnnotation>no annotation</JobAnnotation>
            <JobProject>gridsam project</JobProject>
        </JobIdentification>
        <Application>
            <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl-posix">
                <Executable>bin/concat</Executable> 
                <Argument>dir2/subdir1/file2.txt</Argument> 
                <Output>stdout.txt</Output> 
                <Error>stderr.txt</Error> 
                <Environment name="FIRST_INPUT">dir1/file1.txt</Environment> 
            </POSIXApplication>
        </Application>
    </JobDescription>
    …
</JobDefinition>

The JobIdentification section simply describes, at an abstract level, the nature of the job in terms of high-level string fields.  These can be anything you like.

The Application section describes the actual execution parameters for the job.  Within the Application block, we can see that we are executing a POSIXApplication style application.  This could be anything that can be executed on a POSIX (e.g. Linux) command line, like a shell script, platform-specific code like an executable resident on the machine, or platform independent code like Java (as long as Java is installed on the server, of course!).  The Executable and Argument parameters are self-explanatory, but we can also see that the standard output (Output) and standard error (Error) can be optionally captured in files, and that environment variables (in a POSIX sense) can optionally be set.

Taking this into account, the JSDL above will execute 'concat' with the argument 'dir2/subdir1/file2.txt', and also set the environment variable 'FIRST_INPUT' to 'dir1/file1.txt'. These two file references are virtual names, and are concretely defined in the next section (... above).

Top

Job Submission/Monitoring: With File Staging - Staging

Edit the <omii_client_home>/gridsam/data/examples/remotecat-staging.jsdl file, and change the four URI entries (in red) to look like this (where <host> refers to your machine name):

<DataStaging>
  <FileName>bin/concat</FileName> 
  <CreationFlag>overwrite</CreationFlag> 
  <Source>
    <URI>ftp://<host>:55521/concat.sh</URI> 
  </Source>
</DataStaging>

<DataStaging>
  <FileName>dir1/file1.txt</FileName> 
  <CreationFlag>overwrite</CreationFlag> 
  <Source>
    <URI>ftp://<host>:55521/input1.txt</URI> 
  </Source>
</DataStaging>

<DataStaging>
  <FileName>dir2/subdir1/file2.txt</FileName> 
  <CreationFlag>overwrite</CreationFlag> 
  <Source>
    <URI>ftp://<host>:55521/input2.txt</URI> 
  </Source>
</DataStaging>

<DataStaging>
  <FileName>stdout.txt</FileName> 
  <CreationFlag>overwrite</CreationFlag> 
  <DeleteOnTermination>true</DeleteOnTermination> 
  <Target>
    <URI>ftp://<host>:55521/output.txt</URI> 
  </Target>
</DataStaging> 

A DataStaging element can serve two purposes: staging data in prior to job execution and staging data out following job execution.  There can be as many of each as is required, and the GridSAM server will deal with the mechanics of staging on the job's behalf automatically.  We can see that by specifiying a Source parameter we are designating an input URI.  An input URI can be one of the following types:

  • ftp://

  • http://

  • webdav://

  • sftp:// - secure ftp

  • gsiftp:// - this allows Globus credentials to be retrieved from a MyProxy server for authentication to a Globus Security Infrastructure (GSI) ftp server

In the above JSDL, we can see that we are also staging the executable concat.sh as input which is resident in the gridsam/data/examples directory.

Create two text files in /gridsam/data/examples (input1.txt and input2.txt) that contain any strings you like - these will be your input files.

Top

File Staging: Set up FTP Server

An FTP server is required on the client to act as a data staging area for the input and output.  Fortunately, the GridSAM client supplies a simple anonymous one for test purposes which we can use.

Generally, you need to set a data staging area for input/output file staging, but we'll use the <omii_client_home>/gridsam/data/examples directory.

In the <omii_client_home>/gridsam/bin directory type:

./gridsam-ftp-server -d ../data/examples -p 55521

You should see something like the following:

2006-06-21 13:32:48,737 WARN [GridSAMFTPServer] (main:) ../data/examples/ is exposed
 through FTP at ftp://anonymous@152.78.237.90:55521/
2006-06-21 13:32:48,741 WARN [GridSAMFTPServer] (main:) Please make sure you understand 
the security implication of using anonymous FTP for file staging.
FtpServer.server.config.root.dir = ../data/examples/
FtpServer.server.config.data = /home/omii/.gridsam/ftp-158099398
FtpServer.server.config.server.host = 152.78.237.90
FtpServer.server.config.port = 55521
Started FTP

Your FTP server is now ready on port 55521.

Top

Submit File Staging Job to GridSAM Server

In another prompt, type: 

cd <omii_client_home>/gridsam/bin

Submit to GridSAM server (one line):

./gridsam-submit -s http://<server>:<port>/gridsam/services/gridsam?wsdl
    -j ../data/examples/remotecat-staging.jsdl 

Again, a unique job ID is returned.

Top

Monitoring the File Staging Job

You can monitor the job until completion:

./gridsam-status -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j <unique_job_id> 

If you're quick enough, you can observe the data staging activities in action!  When complete, check the gridsam/data/examples directory.  There you should find the file output.txt which is the files input1.txt and input2.txt concatenated.

Top

Exercises

  1. Create a JSDL command file in the <omii_client_home>/gridsam/data/examples directory that is able to take an input file which contains a string containing occurences of 'hello', pipe this to /bin/sed so that sed replaces all occurences of 'hello' with 'goodbye'.  You may have to specify the appropriate sed command in a separate input file.

  2. Create a client script (e.g. in Bash) that is able to:

    • Submit the above job to GridSAM

    • Take the output from the above job and submit this as input to another GridSAM job which replaces all occurences of the letter 'e' with the letter 'z'.

 

Top