| ||||||||||||
|
Summary ExerciseIn this final exercise, you will try to use all the tools you have learned till now. The idea behind the exercise is to simulate the classical Producer-Consumer problem using the Grid. Producers will be jobs that run the RegularExplorer java program that we already encountered in the previous exercise. Consumers jobs will execute gnuplot to produce a PostScript file of the produced data. The original RegularExplorer was modified (as explained in the GFAL optional section) to take its input (ShapesData) from files on SEs and save the sampled data file into a Storage Element of the Grid and register it into the File Catalog. Jobs that run the gnuplot Consumer will fetch a sample data file from a Storage Element and use gnuplot to convert it into a PostScript file. How will the Consumers know which sample set is already available and
where can it be found ? To answer this question we will use the AMGA
metadata server. After the Producer has generated, stored, and registered
a sample file, it will create a new entry inside a pre-created AMGA
collection that has 3 attributes: the LogicalFileName of the file
generated, the Analyzed and MagicNumber attributes. These last two are
used by the Consumer to understand if a sample data file was already taken
and converted by another Consumer. Analyzed flag is initially set to 0 by
the Producer and it will be set to 1 by the Consumer that will do the
conversion for that input file. The MagicNumber attribute is used to avoid
race conditions during data writing among several concurrent Consumers
that are trying to consume the same data. More details on how to handle
race conditions will be given later in this page. Download here the package with all necessary files. Here there are all the steps needed to implement this final exercise. Set up an AMGA collectionFirstly, we need to prepare an AMGA collection to be used to sync
Producers and Consumers jobs. createdir /ischia06/<your_account_name>/explorer Now it's time to create a proper schema for the just created
collection. As suggested in the introduction, we need 3 attributes: LFN,
Analyzed, MagicNumber respectively of type addattr /ischia06/tcaland/explorer LFN varchar(255) addattr /ischia06/tcaland/explorer Analyzed int addattr /ischia06/tcaland/explorer MagicNumber int listattr /ischia06/tcaland/explorer Moreover you need also a sequence to name the entries added by Producers. sequence_create seq1 /ischia06/tcaland/explorer dir /ischia06/tcaland/explorer Prepare the Producers script and JDLWe need a bash a script for Producers that, once on the WN, will prepare the environment to let the JVM find the right libraries, run RegularExplorer and finally create an entry on the AMGA explorer collection filling the LFN attribute with the LogicalFileName assigned to the output file of the RegularExplorer and the Analyzed attribute to 0. Here is the content of the startRegExp.sh file: #!/bin/sh ###Script to run reg explorer echo "################### START ####################" echo "#### ARGUMENTS ####" echo "Usage:startRegExp Don't forget to replace, in the final part of the above script, the collection where to create the AMGA entry. To submit a bunch of this script, we can make use again of a DAG job, as we did here. So again, let's prepare a file containing a list of arguments, one line per DAG node, and lanch the dag_gener.sh to create the main DAG job and all the JDL nodes. Warning: The argument list format is now different because we have two more parameters: the destination SE and the LFN to assign to the generated file. You need to escape with backslash the single slash in the LFN path to let the script work. Here an example of valid arguments.list: arguments=0 0 10 10 2000 aliserv6.ct.infn.it lfn:\/grid\/gilda\/scardaci\/ischia_amga_1.dat arguments=10 10 20 20 2000 opteron.gs.unina.it lfn:\/grid\/gilda\/scardaci\/ischia_amga_2.dat arguments=-10 -10 10 10 4000 gildase.oact.inaf.it lfn:\/grid\/gilda\/scardaci\/ischia_amga_3.dat Due to the usage of GFAL APIs inside the RegularExplorer classes, only SRM Storage Elements should be used. Start the dag_gener.sh: $ dag_gener 3 $ ls expl1.jdl expl2.jdl expl3.jdl father_job.jdl Here, as example, the output of expl1.jdl: Type = "Job"; JobType = "Normal"; Executable = "startRegExp.sh"; Environment = {"CLASSPATH=./gfal.jar:./regexp.jar:./","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"}; Arguments = " 0 0 10 10 2000 aliserv6.ct.infn.it lfn:/grid/gilda/scardaci/ischia_amga_1.dat'"; StdOutput = "sample.out"; StdError = "sample.err"; InputSandbox = {"mdclient.config","regexp.jar","startRegExp.sh","gfal.jar","libGFalFile.so"}; OutputSandbox = {"sample.err","sample.out"}; Requirements = Member("GLITE-3_0_0",other.GlueHostApplicationSoftwareRunTimeEnvironment); N.B.: you will need also a mdclient.config file. Copy your $HOME/.mdclient.config to the dir where you are working and call it mdclient.config. And here the content of father_job.jdl [ type = "dag"; max_nodes_running = 3; nodes = [ explorer1 = [ file ="expl1.jdl" ; ]; explorer2 = [ file ="expl2.jdl" ; ]; explorer3 = [ file ="expl3.jdl" ; ]; dependencies = {} ]; ] If everything went fine till now, you can submit the main DAG job. Prepare the Consumers script and JDLNow that we have created jobs that are sampling our surface and saving their output on the Grid Storage Element, it's time to submit Consumers that will convert the output data of the RegularExplorer to the PostScript format with gnuplot. Here the startGnuPlot.sh: #!/bin/sh ###Script to run gnuplot echo "################### START ####################" AMGA_HOME=/ischia06/scardaci/explorer ################ Replace scardaci with your own AMGA home collection chmod 755 ./gnuplot ID=`mdcli selectattr $AMGA_HOME:FILE \'$AMGA_HOME:Analyzed = 0\' | head -1` if [ $ID ]; then echo "Found Data" else echo "Data not found..." exit 1 fi MAGIC=`echo $RANDOM` mdcli updateattr $AMGA_HOME/$ID Analyzed 1 MagicNumber $MAGIC \'Analyzed = 0\' MAGIC2=`mdcli selectattr $AMGA_HOME:MagicNumber \'$AMGA_HOME:FILE = ${ID}\'` if [ $MAGIC=$MAGIC2 ]; then LFN=`mdcli selectattr $AMGA_HOME:LFN \'$AMGA_HOME:FILE = ${ID}\'` else echo "Entry already processed by someone" exit 1 fi echo "lcg-cp -v --vo gilda $LFN file:$PWD/temp.dat" lcg-cp -v --vo gilda $LFN file:$PWD/temp.dat touch script.gnu echo "set terminal postscript">script.gnu echo "set output \"output.ps\"">>script.gnu echo "splot \"temp.dat\"">>script.gnu ./gnuplot script.gnu ########################################################################### echo "END PROGRAM." Some details on what this script is doing:
Let's create a JDL for this script: $ cat GnuPlotterJob.jdl Type = "Job"; JobType = "Normal"; Executable = "startGnuPlot.sh"; StdOutput = "sample.out"; StdError = "sample.err"; InputSandbox = {"gnuplot","mdclient.config","startGnuPlot.sh"}; OutputSandbox = {"output.ps","sample.err","sample.out"}; Requirements = Member("GLITE-3_0_0",other.GlueHostApplicationSoftwareRunTimeEnvironment); Now it's up to you to submit the Consumers job: you can again produce a DAG job, or you can, more easily, do a bash for loop to submit n jobs in the same time. After some Consumers have finished their job, use gnuplot to visualize the output.ps. | |||||||||||
|