Preparing Jobs

This part describes how you can prepare new jobs or modify existing jobs.

Overview

An UNICORE job is modelled as a directed acyclic graph (DAG) of tasks and sub-jobs which may contain other tasks and sub-jobs. On the user level, a task cannot be divided into smaller execution parts. All tasks of a job or sub-job are executed on the same Virtual Site (Vsite). However, a sub-job (i.e. its tasks) may be executed on a different Vsite than the main job.

By default, all tasks and sub-jobs are executed in no order, even in parallel if the resources are available on the target execution systems. To establish a temporal order explicit dependencies have to be defined in the job construction. The UNICORE NJS server will then delay the execution of a dependent action until the predecessor action has been completed successfully.

Each job runs in a dedicated temporary file space on the execution system called the Job Directory or Uspace. The Uspace only exists during the execution of the job. Any files needed by the application have to be imported into the Uspace, either from permanent storage on the execution system (Xspace), from the local computer the client is running on (Nspace), or from storage servers (e.g. archive systems). Vice versa, files that must survive the end of the job have to be exported explicitly. If data from one task is needed in another sub-job an explicit data transfer has to be specified in the job construction because the sub-job will run in a different Uspace.

UNICORE jobs are prepared in abstract seamless form, substituting the platform-dependent commands and parameters for the resource requirements, data transfer etc. by a platform-independent representation. The UNICORE NJS server will translate from this abstract definition to a concrete sequence of commands and options suitable for the selected execution platform. The end-user does not have to know the platform-specific details, and a UNICORE job can be easily re-targeted to a different site or execution system.

The end-user specifies on which system a job should run; it is possible to specify different execution systems for different sub-jobs, thus creating a job that will run on multiple systems, or even on multiple sites. Data transfer is handled transparently by the UNICORE system, given that the end-user has specified which files are needed by each task/job group and which are produced.

For each task, the end-user can specify the resources required to run this task; supported resources include CPU or node count, computing time, memory size etc. The resource model is designed to be extensible, so that other required software or hardware resources can be defined, provided by a site and requested by a task. Each of the UNICORE sites defines the resources made available by each of their systems, and the client performs a check on whether the required resources are actually available.

UNICORE jobs can be saved to disk on the machine running the client, and later on loaded to make modifications or submit the job again. Both a proprietary binary and an open XML-based job store format are supported.

When an end-user is satisfied with the constructed job, he/she can submit the job to the target Vsite. The actual abstract job which is sent from the Client to a Vsite is constructed in the Client as an Abstract Job Object (AJO) in form of a serialized Java object. The target server sends an acknowledge reply, and after that the job status can be controlled by the monitoring functions.

A UNICORE (sub-)job is executed on behalf of a UNICORE user account. In contrast to the common UNICORE user certificate, the accounts will be different at the Vsites of distinct Usites. The accounts are defined by the mapping in the UNICORE User Data Base (UUDB) at the server site.

The interaction between a UNICORE client and its server is transaction based and asynchronous. This means that a job is submitted to a UNICORE server and only the receipt of the job is acknowledged (transaction). The client does not wait for the completion of the job. The idea behind this is to provide a better support for mobile users and slow connections.

.