|
||||||||||||
|
OverviewThis practical will cover two aspects of submitting jobs to the GridSAM OMII server. The aim of GridSAM is to provide a Web Service for submitting and
monitoring jobs managed by a variety of Distributed Resource Managers (DRM).
The modular design allows third-parties to provide submission and
file-transfer plug-ins to GridSAM. Job Submission/Monitoring: No File StagingObjective: submit trivial job to GridSAM and monitor its progress. This simply involves submitting a single, trivial job to GridSAM that has no inputs or outputs.
Look at <omii_client_home>/gridsam/data/examples/sleep.jsdl: <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> <JobDescription> <Application> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl-posix"> <Executable>/bin/sleep</Executable> <Argument>5</Argument> </POSIXApplication> </Application> </JobDescription> </JobDefinition> The JSDL XML contains various declarative elements that describes the job. In this case, a simple execution of the POSIX application /bin/sleep with 5 (seconds) as an argument. We'll cover the structure of JSDL in more detail later. Submit Trivial Job to GridSAM ServerType: cd <omii_client_home>/gridsam/bin Submit to GridSAM server (one line): ./gridsam-submit -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j ../data/examples/sleep.jsdl A unique job ID is returned which should look something like the following: urn:gridsam:18ce6dda0bf0fd73010bf5ea1d490001 This globally unique identifier allows you to reference your job when performing activities on it, such as monitoring its progress. Monitoring the Trivial JobMonitor job until completion: ./gridsam-status -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j <unique_job_id> The progress of the job is indicated by its current state. Its state can be one of the following:
Following the job execution, you should see something like the following: Job Progress: pending -> staging-in -> staged-in -> active -> executed -> staging-out -> staged-out -> done --- pending - 2006-06-21 10:27:08.0 --- job is being scheduled --- staging-in - 2006-06-21 10:27:09.0 --- staging files... --- staged-in - 2006-06-21 10:27:09.0 --- no file needs to be staged in --- active - 2006-06-21 10:27:09.0 --- '/bin/sleep 5' is being forked --- executed - 2006-06-21 10:27:14.0 --- '/bin/sleep 5' completed with exit code 0 --- staging-out - 2006-06-21 10:27:14.0 --- staging files out... --- staged-out - 2006-06-21 10:27:14.0 --- no file needs to be staged out --- done - 2006-06-21 10:27:14.0 --- Job completed -------------- Job Properties -------------- urn:gridsam:exitcode=0 We can see that the job has successfully traversed all its stages and is complete. If you're quick, you may catch your job in mid-execution and observe it in one of its intermediary stages! If you specify the -x parameter in the gridsam-status command, you will get the same information, but in XML format (if automated XML parsing is perhaps required): <gridsam:JobStatus xmlns:gridsam="http://www.icenigrid.org/service/gridsam"> <gridsam:Stage> <gridsam:State>pending</gridsam:State> <gridsam:Description>job is being scheduled</gridsam:Description> <gridsam:Time>2006-06-21T10:27:08+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>staging-in</gridsam:State> <gridsam:Description>staging files...</gridsam:Description> <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>staged-in</gridsam:State> <gridsam:Description>no file needs to be staged in</gridsam:Description> <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>active</gridsam:State> <gridsam:Description>'/bin/sleep 5' is being forked</gridsam:Description> <gridsam:Time>2006-06-21T10:27:09+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>executed</gridsam:State> <gridsam:Description>'/bin/sleep 5' completed with exit code 0</gridsam:Description> <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>staging-out</gridsam:State> <gridsam:Description>staging files out...</gridsam:Description> <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>staged-out</gridsam:State> <gridsam:Description>no file needs to be staged out</gridsam:Description> <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Stage> <gridsam:State>done</gridsam:State> <gridsam:Description>Job completed</gridsam:Description> <gridsam:Time>2006-06-21T10:27:14+01:00</gridsam:Time> </gridsam:Stage> <gridsam:Property name="urn:gridsam:exitcode"><![CDATA[0]]></gridsam:Property> </gridsam:JobStatus> Job Submission/Monitoring: With File StagingObjective: submit simple job with data input and output requirements and monitor progress. This job is a little more complicated. It involves 'staging in' (pulling in) input files so they can be used by the job, and 'staging out' (pushing out) an output file. As can be seen from the architectural diagram, we'll need a mechanism for storing our input and output files on the client (your) machine. When the job executes on the GridSAM server, it will contact the FTP server to upload input and download output when necessary. Job Submission/Monitoring: With File Staging - ApplicationHave a look at <omii_client_home>/gridsam/data/examples/remotecat-staging.jsdl: <JobDefinition> <JobDescription> <JobIdentification> <JobName>cat job</JobName> <Description>cat job description</Description> <JobAnnotation>no annotation</JobAnnotation> <JobProject>gridsam project</JobProject> </JobIdentification> <Application> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl-posix"> <Executable>bin/concat</Executable> <Argument>dir2/subdir1/file2.txt</Argument> <Output>stdout.txt</Output> <Error>stderr.txt</Error> <Environment name="FIRST_INPUT">dir1/file1.txt</Environment> </POSIXApplication> </Application> </JobDescription> … </JobDefinition> The JobIdentification section simply describes, at an abstract level, the nature of the job in terms of high-level string fields. These can be anything you like. The Application section describes the actual execution parameters for the job. Within the Application block, we can see that we are executing a POSIXApplication style application. This could be anything that can be executed on a POSIX (e.g. Linux) command line, like a shell script, platform-specific code like an executable resident on the machine, or platform independent code like Java (as long as Java is installed on the server, of course!). The Executable and Argument parameters are self-explanatory, but we can also see that the standard output (Output) and standard error (Error) can be optionally captured in files, and that environment variables (in a POSIX sense) can optionally be set. Taking this into account, the JSDL above will execute 'concat' with the argument 'dir2/subdir1/file2.txt', and also set the environment variable 'FIRST_INPUT' to 'dir1/file1.txt'. These two file references are virtual names, and are concretely defined in the next section (... above). Job Submission/Monitoring: With File Staging - StagingEdit the <omii_client_home>/gridsam/data/examples/remotecat-staging.jsdl file, and change the four URI entries (in red) to look like this (where <host> refers to your machine name): <DataStaging> <FileName>bin/concat</FileName> <CreationFlag>overwrite</CreationFlag> <Source> <URI>ftp://<host>:55521/concat.sh</URI> </Source> </DataStaging> <DataStaging> <FileName>dir1/file1.txt</FileName> <CreationFlag>overwrite</CreationFlag> <Source> <URI>ftp://<host>:55521/input1.txt</URI> </Source> </DataStaging> <DataStaging> <FileName>dir2/subdir1/file2.txt</FileName> <CreationFlag>overwrite</CreationFlag> <Source> <URI>ftp://<host>:55521/input2.txt</URI> </Source> </DataStaging> <DataStaging> <FileName>stdout.txt</FileName> <CreationFlag>overwrite</CreationFlag> <DeleteOnTermination>true</DeleteOnTermination> <Target> <URI>ftp://<host>:55521/output.txt</URI> </Target> </DataStaging> A DataStaging element can serve two purposes: staging data in prior to job execution and staging data out following job execution. There can be as many of each as is required, and the GridSAM server will deal with the mechanics of staging on the job's behalf automatically. We can see that by specifiying a Source parameter we are designating an input URI. An input URI can be one of the following types:
In the above JSDL, we can see that we are also staging the executable concat.sh as input which is resident in the gridsam/data/examples directory. Create two text files in /gridsam/data/examples (input1.txt and input2.txt) that contain any strings you like - these will be your input files. File Staging: Set up FTP ServerAn FTP server is required on the client to act as a data staging area for the input and output. Fortunately, the GridSAM client supplies a simple anonymous one for test purposes which we can use. Generally, you need to set a data staging area for input/output file staging, but we'll use the <omii_client_home>/gridsam/data/examples directory. In the <omii_client_home>/gridsam/bin directory type: ./gridsam-ftp-server -d ../data/examples -p 55521 You should see something like the following: 2006-06-21 13:32:48,737 WARN [GridSAMFTPServer] (main:) ../data/examples/ is exposed through FTP at ftp://anonymous@152.78.237.90:55521/ 2006-06-21 13:32:48,741 WARN [GridSAMFTPServer] (main:) Please make sure you understand the security implication of using anonymous FTP for file staging. FtpServer.server.config.root.dir = ../data/examples/ FtpServer.server.config.data = /home/omii/.gridsam/ftp-158099398 FtpServer.server.config.server.host = 152.78.237.90 FtpServer.server.config.port = 55521 Started FTP Your FTP server is now ready on port 55521. Submit File Staging Job to GridSAM ServerIn another prompt, type: cd <omii_client_home>/gridsam/bin Submit to GridSAM server (one line): ./gridsam-submit -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j ../data/examples/remotecat-staging.jsdl Again, a unique job ID is returned. Monitoring the File Staging JobYou can monitor the job until completion: ./gridsam-status -s http://<server>:<port>/gridsam/services/gridsam?wsdl -j <unique_job_id> If you're quick enough, you can observe the data staging activities in action! When complete, check the gridsam/data/examples directory. There you should find the file output.txt which is the files input1.txt and input2.txt concatenated. Exercises
|
|||||||||||
|
||||||||||||