Banner
Title: AMGA Metadata Management Commands
Section: Advanced Practical
Tutor: Tony Calanducci, Diego Scardaci
Authors: Tony Calanducci, Diego Scardaci, Valeria Ardizzone

Introduction on AMGA

AMGA is the gLite service to handle metadata and simple databases on the Grid. In this section, you will learn how to set interact with the AMGA server, define your own collection and its schema, populate the collection with some entries and then make queries. Some addition information are finally provided to secure your metadata.

Using the AMGA clients: mdclient and mdcli

AMGA provides several ways to interact with the AMGA master server: interactive and "one-shoot" command line tools, Python and Java APIs. For the purposes of our exercise, we will introduce the interactice client (mdclient) and the "one-shoot" client (mdcli) that let you issue the same commands available with mdclient, but just once a time. mdcli is useful for example in the scripts running on a WN during job execution.

Both mdclient and mdcli share the same configuration file: mdclient.config. This file is searched in the current directory, if not found a $HOME/.mdclient.config will be searched, and finally if this last one is missing, the default $GLITE_LOCATION/etc/mdclient.config will be read. During the exercises you will use the $HOME/.mdclient.config when you deal with AMGA from your workstation, while current path mdclient.config during job execution.

Each one of you have an account on the GILDA AMGA server (amga.ct.infn.it). The AMGA username is the same of your login name in the local workstations. So before to start one of the client, be sure to set correctly the Login parameter of your mdclient.config. Moreover, ensure that the Host is set up to amga.ct.infn.it, UseSSL to require, AuthenticateWithCertificate and GridProxyLogin to 1. For example, my $HOME/.mdclient.config contains, among other flags:

Login = tcaland
Host = amga.ct.infn.it
UseSSL = require
AuthenticateWithCertificate = 1
UseGridProxy = 1

Before to start mdclient, be sure to have a valid proxy (check that with voms-proxy-info). If everything was set up correctly you should get something like this:

$ mdclient 
Connecting to amga.ct.infn.it:8822...
ARDA Metadata Server 1.2.4
Query> whoami
>> tcaland
Query> 

Once logged in, you can give a look to the available commands and get help on each one of them with the help command. Commands are grouped by topic.

Query> help
>> help [topic]
>> Displays help on a command or a topic.
>> Valid topics are: help metadata metadata-optional directory replication 
constraints entry group acl index schema sequence user view site 
replicas ticket commands
Query> 		
		

Commands and collection paths can be auto-completed using the TAB keyboard.

TO DO : Create a proxy, edit properly your $HOME/.mdclient.config and try to login to the AMGA Server. Have a look to the help section.

Creation of a collection

AMGA entries are contained in collections. You can think a collection as a Database Table, while an entry with its attributes can be tought as a row of a DB table with its fields.
Collections are hierarchically organized: there is a root (/) collection, and each collection can contain other collections.

We have created a collection readable and writable by all the participants of the school: /ischia06. Create a subcollection with your name, surname or account name under /ischia06 with the following commands:

Query> cd /ischia06
Query> pwd
>> /ischia06/
Query> createdir tcaland
Query> dir
>> /ischia06/scardaci
>> collection
>> /ischia06/tcaland
>> collection
Query>

AMGA support both relative and absolute collections path. So you could have achieved the same result with createdir /ischia06/tcaland. This collection will be referred later as your AMGA home collection.

TO DO: Create your AMGA home collection under /ischia06.

Schema creation

Now it is time to define the schema of your collection: you will define one or more attributes that can be filled up/queried/updated later per each entry you create in this collection.

To add a new attribute to a collection, we need to use the addattr command, that has the following syntax:

  • addattr dir attr_name attr_type

The following attribute types are available: int, float, varchar(n), timestamp, text, numeric(p,s). Anyway, if you don't care about portability (you can move data to other AMGA metadata server that uses a different DB backend), you can use any available datatypes supported by the DB backed of the AMGA server. Our server use PostgreSQL 7.3. I have tryed, for example, MAC address type and GIS datatypes (multypoligon).

Suppose we want to create metadata for some movie files that we saved on Storage Elements and registered in the File Catalog. We can add details to the contents of this files in such a way we can make later queries to look for a specific movie. All this is possible thanks to a metadata service. To keep things simple, we will add the following attributes per each entries (in our example, each entry will rappresent a Grid file containing a movie trailer):

  • Title - varchar(200)
  • Runtime - int
  • Cast - varchar(200)
  • LFN - varchar(255)

Let's create a trailers subcollection under our AMGA home dir with createdir and then use addattr to add the previous attributes:

Query> createdir /ischia06/tcaland/trailers
Query> addattr /ischia06/tcaland/trailers Title varchar(200)
Query> cd /ischia06/tcaland/trailers
Query> pwd
>> /ischia06/tcaland/trailers/
Query> addattr . Runtime int
Query> addattr . Cast varchar(200) Query> addattr . LFN varchar(255)
Query> listattr .
>> Title
>> varchar(200)
>> Runtime
>> int
>> Cast
>> varchar(200) >> LFN
>> varchar(255)
Query>

With listattr path we can inspect the schema of a collection, as we have done in the previous example. You can also remove an attribute for a collection (only if there is no entry that is using that attribute), with:

  • removeattr dir attr_name

TO DO: Create the trailers collection under /ischia06/<your_home_collection> and add to it the previous showed attributes

Collection poputation

Now that we have a collection and we have defined its schema, we will start to add entries and fill their attributes with proper values.

Each entry in a collection need a name. You can think this as the Primary Key speaking in a RDBMS language. One easy way to obtain numeric autoincrementally entry name is to use an AMGA sequence. Per each collection is possibile to define one or more sequences, their starting value and the increment value. We just need a sequence that start by 1 and autoincrement by 1. To create a sequence use the following syntax:

  • sequence_create name dir [increment] [start value]

And for our example, we will use:

Query> sequence_create seq1 /ischia06/tcaland/trailers
Query> dir
>> /ischia06/tcaland/trailers/seq1
>> sequence
Query>

The default behaviour for a sequence is to start by one and increment by one.

To get the next sequence number, use:

  • sequence_next sequence

We will use:

Query> sequence_next /ischia06/tcaland/trailers/seq1
>> 1

So we can use the responses of sequence_next to give a name to our entries.

To add a new entry, we use the addentry command:

  • addentry entry_name (attribute_name value)+

Suppose that we have already uploaded some movie trailers grid file, let's create their entries on the trailers collection.

Query> pwd
>> /ischia06/tcaland/trailers/
Query> addentry 1 Title 'Matrix' Runtime 110 Cast 'Keanu Reeves' 
			  LFN 'lfn:/grid/gilda/tony/matrix.mov'
Query> addentry 2 Title 'Notting Hill' Runtime 99 Cast 'Julia Roberts, Hugh Grant' 
			  LFN 'lfn:/grid/gilda/tony/notting.mov'
Query> addentry 3 Title 'Anger Management' Runtime 120 Cast 'Adam Sandlan, Jack Nicholson' 
			  lFN 'lfn:/grid/gilda/scardaci/anger.mov'
Query> 

WARNING: There is no direct relation between files on the File Catalog directories and entries in AMGA collections. These associations are create by the user. In this example we have found a way to refer to File Catalog using the LFN attribute, but you can also find other way (for example, you can use the GUID of a Grid File as entry name). This also means that you can use metadata for other items different than files. For example you could create entries that provide additional information per each submitted job (using for example the JobID as entry name).

We other useful commands for entry management:

  • setattr entry (attribute value)+

used to set/change one or more attributes of an entry to the given values. Example: setattr /ischia06/tcaland/trailers/1 Runtime 108

  • getattr pattern (attribute)+

returns the entry name and all the requested attributes for every entry matching the given pattern. For example:

Query> getattr /ischia06/tcaland/trailers/ Title LFN
>> 1
>> Matrix
>> lfn:/grid/gilda/tony/matrix.mov
>> 2
>> Notting Hill
>> lfn:/grid/gilda/tony/notting.mov
>> 3
>> Anger Management
>> lfn:/grid/gilda/scardaci/anger.mov
Query> 

And finally, to remove an entry, you can use:

  • rm pattern

that removes all the entries matching the given pattern.

TO DO: Create a sequence and populate with some entries the trailers collection. Take a look to the values of entries'attributes.

Making queries

AMGA provides a SQL-like language to express queries, but has a limited number of operator. Joins are also possible between collections.

The most used command to make query is the use of the selectattr command. Its syntax is similar to the SQL Select one:

  • selectattr attr... condition

It returns a list the values of given attributes for all entries matching the condition.

attrs is a space-separated list of attributes with the follwing format: collection_name:attribute_name (ex.: /ischia06/tcaland/trailers:Title)

condition rapresents one or more condition to select entries. You can use comparison operator (<,>,=), logical operator (and, or), various operator (like, limit), etc..There is support also for aggregator operator (count) and group clause. For a list of all the supported operator take a look to the AMGA user manual (http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/downloads/amga-manual_1_2_3.pdf).

Here some sample usage of selectattr:

Query> selectattr /ischia06/tcaland/trailers:Title .:LFN 'like(Title, "Anger%")' 
>> Anger Management
>> lfn:/grid/gilda/scardaci/anger.mov Query> selectattr /ischia06/tcaland/trailers:Title .:LFN .:Cast 'like(Cast, "%ndle%") and Runtime > 80'
>> Anger Management
>> lfn:/grid/gilda/scardaci/anger.mov
>> Adam Sandler, Jack Nicholson

TO DO: Try selectattr command out with some condition

Update attributes

Often it could be useful to update attributes of entries whose attributes satisfies certain condition. Instead of using a sequence of selectattr and setattr, AMGA provides the updateattr command that does the two operation atomically. The syntax of updateattrs is the following:

  • updateattr pattern attr expression [attr expression]... condition

Here it follows a sample usage of updateattr:

Query> getattr /ischia06/tcaland/trailers/ Runtime
>> 2
>> 99
>> 1
>> 111
>> 3
>> 121
Query> updateattr /ischia06/tcaland/trailers/ Runtime Runtime+1 'Runtime > 100' Query> getattr /ischia06/tcaland/trailers/ Runtime
>> 2
>> 99
>> 1
>> 112
>> 3
>> 122
updateattr /i

 

mdcli usage

Until now we have used the mdclient tool to access the AMGA metadata server from the User Interface, but what if we want to populate a collection during the run of a job in a Worker Node? Or if a job running needs some data that comes out as a result of a query using certain condition? mdcli comes in help to accomplish the job in this situation. mdcli is the AMGA command tool that just run a single AMGA command and exit to the shell. So it is perfect to be used inside a shell script, where maybe the output of a query can be easy parsed and used by the next script command.

As already said, all the commands available with mdclient can be used also with mdcli. The only thing to care about when using mdcli is the quotation. So if you need to use single quote because requested by some AMGA commands, you should escape them with a backslash (\) character.

Here some example of usage of the escape:

ID=`mdcli sequence_next /ischia06/scardaci/seq1`
mdcli addentry /ischia06/scardaci/${ID} LFN \'${LFN}\' Analyzed 0 LFN=`mdcli selectattr $AMGA_HOME:LFN \'$AMGA_HOME:FILE = ${ID}\'`

TO DO: Use mdcli to get the listing of your trailers collection and save it to a file into your workstation

Secure access to metadata

Sometimes can be useful to restrict the access to your metadata only to a certain group of users or to a given list. AMGA provides support to handle authorization at collection level and even at entry level. As you have noticed, each one of you has an account to be authenticad by the AMGA server and you had also to have a valid proxy. This is why each account is bound to a user certificate, so only the user that owns that certificate can successfully login. AMGA has also the concept of groups and ACLs based on groups. For example, all your AMGA user account are part of the ischia:users group. Let's take a look to the available commands that deals with permissions and groups:

  • whoami. Print the name of the current user
  • user_listcred user . Lists all possible credentials of a user. Only root can inspect other's credentials.
  • grp_member. Lists the groups a user belongs to
  • acl_show colletionpath. Shows all access controls for a directory
  • grp_create groupname. Creates a new group. The group created will have the following name: owner:groupName
  • grp_adduser groupname user. Adds a user to a group
  • grp_show groupname. Lists the members of a group
  • acl_add directory group rights. Adds a new access control to a directory granting a group certain rights
  • acl_remove directory group. Removes an access control for a group from a directory

Here are some examples:

Query> whoami
>> tcaland
Query> user_listcred tcaland
>> 1f86e515aa56dfb8b75ed7daaf63f52e97b8678
>> 'C = IT, O = GILDA, OU = Personal Certificate, L = INAF Catania, CN = Tony Calanducci, emailAddress = tony@calanducci.it', 'C = IT, O = GILDA, OU = Personal Certificate, L = INFN Catania, CN = Tony Calanducci, emailAddress = tony.calanducci@ct.infn.it', 'C = IT, O = GILDA, OU = Personal Certificate, L = ISCHIA, CN = ISCHIA15, emailAddress = roberto.barbera@ct.infn.it' Query> grp_member
>> ischia:users Query> acl_show /ischia06
>> root rwx
>> gilda:users rx
>> ischia:users rwx
Query> acl_show /ischia06/tcaland
>> tcaland rwx
>> gilda:users rx
>> ischia:users rwx
Query> grp_create myfriends Query> grp_adduser myfriends tcaland
Query> grp_adduser myfriends scardaci
Query> grp_adduser myfriends oliva
Query> grp_show myfriends
>> tcaland
>> scardaci
>> oliva
Query> grp_member
>> ischia:users
>> tcaland:myfriends Query> acl_add /ischia06/tcaland myfriends rx
Query> acl_show /ischia06/tcaland
>> tcaland rwx
>> gilda:users rx
>> ischia:users rwx
>> tcaland:myfriends rx Query> acl_remove /ischia06/tcaland gilda:users
Query> acl_show /ischia06/tcaland
>> tcaland rwx
>> ischia:users rwx
>> tcaland:myfriends rx

TO DO: Try to create a group with the members of your group. Then enable this group to read(r) and enter(x) your collection. Then remove the ischia:users from your collection and ask someone not part of your group to access your collection: it should get a permission denied error.

Top

Top