GPIR

From Ogce

Contents

Introduction

GPIR is the Grid Portal Information Repository, a Web Service that works in combination with the GPIR Portlet. GPIR provides portals and other potential clients with a data persistence needs. Perhaps most fundamentally, GPIR provides a place to store data about your grid that is readily accessible to a portal application. This includes both dynamic data and "human-centric" data (such as where a resource is located or whom to call for support):

  • "dynamic" machine-oriented data is updated via a Web Service called the "GPIRIngester" and data reads are performed via the "GPIRQuery" Web Service.
  • "human-centric" data is managed through the GPIR Administration client which is a standard web application accessed via a browser.

The OGCE portal release by default points to the TeraGrid GPIR service, but you can also configure your own service.

Download

Date Download Link
March 31, 2008 gpir-service.tar.gz

SVN Access

The source code is available from the OGCE SourceForge SVN: svn co https://ogce.svn.sourceforge.net/svnroot/ogce/ogce-services-incubator/gpir-service

Install GPIR

Required Software Prerequisites

The following software packages are prerequisites to using the GPIR Service:

  • Java 1.5.
  • Maven 2 Build System.
  • Tomcat 5.0.xx or 5.5.x. Please download and install the latest version of Tomcat 5.0.xx or 5.5.x.

Note: If you are using Tomcat 5.5.x, you will need to also download and install the compatibility patch (labeled jakarta-tomcat-5.5.x-compat).

We recommend setting up a dedicated system account to run the GPIR service software. To install the service, you will need write permissions in your Tomcat, and GPIR service directories. Thus, you should install these as your dedicated service user. The other required software may be installed as any user.

Environment Settings

In order to use the software required to support the GPIR Service, you must first set up your environment. You will need to set the JAVA_HOME and MAVEN_HOME environment variables and ensure that the respective binaries are in your PATH. Additionally, you will need to set the CATALINA_HOME environment variable appropriately. The following is an example environment script.

### for sh-based shells
export JAVA_HOME="/usr/local/dev/jdk1.5.0_14"
export MAVEN_HOME=/opt/maven/maven-2.0.7;
export PATH=${JAVA_HOME}/bin:${MAVEN_HOME}/bin:${PATH};
export CATALINA_HOME=/opt/tomcat/jakarta-tomcat-5.5.12;

Installation

After installing the prerequisite software and setting up your environment, please download the GPIR source distribution and unpack it.

GPIR can use a PostgreSQL database or a hypersonic SQL database. To configure GPIR to a specific database type you must edit the pom.xml properties section.

By default the GPIR pack is configured to use a hypersonic database. To modify it to use an existing installation of postgreSQL you must comment out the hypersonic configuration and modify the postgres configuration. For more information please read the database configuration section.

To install the service, use Maven to install GPIR and the associated database tables:

cd gpir-service
mvn \-P init.hypersonic install
OR
mvn \-P init.pgsql install
$CATALINA_HOME/bin/startup.sh

The '\P init.hypersonic' OR '\-P init.pgsql' argument will initialize the database to its default state. If you need to re-deploy the service **without** re-initializing the database you should omit this argument. IMPORTANT: Reinitializing the database will drop all the database tables and remove the existing data and install new tables. Only do this when installing GPIR for the first time on a new database. If connecting to an existing GPIR database please make sure to omit the argument.

The web service URL of the GPIR service is http://localhost:8080/gpir/webservices. You may point your browser to this URL to see a list of the service methods available.

The administrative client is located at http://localhost:8080/gpir. For more information please visit the Administrative client documentation.

Database Configuration

By default, GPIR comes with a pre-configured HypersonicSQL database that runs "In-process" inside of the Tomcat web container. Of course, this is not the most optimal configuration for a production service. Below are some alternative configurations that will perform better and be more reliable.

Configure HypersonicSQL

This configuration is most ideal if you have already added data to the default HypersonicSQL in-process database that is included with a standard GridPort installation and don't want to migrate the data to another database server. It takes advantage of the fact that both 'In-Process' and 'Server' modes of HypersonicSQL use the same database directory structure, which makes migration simple.

First, you will need access to the hsqldb-version.jar file. This file can be found in the $CATALINA_HOME/webapps/gpir/WEB-INF/lib directory inside your gpir webapp You will need to specify the location of this file in a later step (NOTE: You do not need to copy it anywhere).

It is recommended that you copy the current GPIR database located in $CATALINA_HOME/webapps/gpir/WEB-INF/db to another appropriate location outside of the gpir webapp. This will prevent the database from being overwritten or deleted if at any point you need to upgrade GPIR. Once you've copied the directory, check to see that there's not a gpir.lck file present and if there is, remove it.

Use the following single command to start the database server (NOTE: the command spans two lines here so that it will fit):

localhost> java -cp [path to hsqldb.jar] org.hsqldb.Server -database.0 \
        [path to database dir]/gpir -dbname.0 gpir

The "gpir" part of the path in the -database.0 argument is important because it tells the server to find all files in that start with the prefix "gpir.". This will be enough to test your database configuration but you will most likely want to consult the HypersonicSQL documentation on <a href="http://hsqldb.org/doc/guide/ch03.html#N1075E">Running Hsqldb as a System Daemon</a>.

You will also need to modify the gpir.db.url property in the pom.xml file in the GPIR source directory.

    <gpir.db.url>jdbc:hsqldb:hsql://localhost:9001/gpir</gpir.db.url>

Redeploy GPIR in order for the changes to take effect.

mvn install

GPIR is now configured to use the new Server instance rather than the In-process database. Restart Tomcat to run GPIR with the new configuration.

Configure PostgreSQL

Assuming you have installed and configured PostgreSQL database according to the documentation at <a href="http://www.postgresql.org">http://www.postgresql.org</a> you will need to configure GPIR to use it as its data source. Open the pom.xml file in the GPIR source directory in your favorite editor. We have provided you with a boilerplate PostgreSQL configuration that you can customize for your own configuration. First, comment out the HypersonicSQL configuration and uncomment the PostgreSQL configuration. Your properties section will look like this:


    <!-- NOTE: ONLY ONE SHOULD BE UNCOMMENTED -->
    <!-- Properties for Hypersonic SQL -->
    <!--
    <gpir.db.type>hypersonicsql</gpir.db.type>
    <gpir.db.driver>org.hsqldb.jdbcDriver</gpir.db.driver>
    <gpir.db.url>jdbc:hsqldb:file:${env.CATALINA_HOME}/webapps/gpir/WEB-INF/db/gpir</gpir.db.url>
    <gpir.db.username>sa</gpir.db.username>
    <gpir.db.password></gpir.db.password>
    <gpir.hibernate.dialect>net.sf.hibernate.dialect.HSQLDialect</gpir.hibernate.dialect>
    <database.artifactId>hsqldb</database.artifactId>
    <database.groupId>hsqldb</database.groupId>
    <database.version>1.7.3.3</database.version>
    -->

    <!--Properties for PostgreSQL -->
    <gpir.db.type>postgresql</gpir.db.type>
    <gpir.db.driver>org.postgresql.Driver</gpir.db.driver>
    <gpir.db.url>jdbc:postgresql://localhost:5432/GPIR</gpir.db.url>
    <gpir.db.username></gpir.db.username>
    <gpir.db.password></gpir.db.password>
    <gpir.hibernate.dialect>net.sf.hibernate.dialect.PostgreSQLDialect</gpir.hibernate.dialect>
    <database.artifactId>postgresql</database.artifactId>
    <database.groupId>postgresql</database.groupId>
    <database.version>7.4.1-jdbc3</database.version>

Depending on where you installed PostgreSQL (local or remote relative to where the GPIR service is running) you will probably need to modify the gpir.db.url property accordingly with the correct hostname and port number (NOTE: 5432 is the default PostgreSQL port number). You will also need to modify the gpir.db.username and gpir.db.password properties with your PostgreSQL username and password.

If deploying to an existing GPIR PostgreSQL database with data and tables already created install GPIR as follows:

mvn install

If you are installing to a clean installation of PosgreSQL that doesn't already have a GPIR database installed. Deploy GPIR making sure that you specify the \-P init.pgsql argument to initialize the database (IMPORTANT: this will remove any existing GPIR tables).

mvn \-P init.pgsql install

Administrative Client

The administration client is used to update the "human-centric" data that is not managed via the web services discussed above. There are three main sections to the administration client - general administration, VO administration, and resource administration.

The admin client URL is as follows:

http://<hostname>:<port>/gpir

To login to the administration client using the default username and password use::

username: tomcat
password: tomcat

To change the default username, password and role for the admin client open CATALINA_HOME/conf/tomcat-users.xml in your favorite text editor and add the following lines (as an example):

	<role rolename="gpiradmin"/>
	<user username="gpir" password="gpir" roles="gpiradmin"/>

Your tomcat-users.xml file should look something like this:

	<tomcat-users>
  		<role rolename="tomcat"/>
  		<role rolename="role1"/>
  		<role rolename="gpiradmin"/>
  		<user username="gpir" password="gpir" roles="gpiradmin"/>
  		<user username="tomcat" password="tomcat" roles="tomcat"/>
  		<user username="both" password="tomcat" roles="tomcat,role1"/>
  		<user username="role1" password="tomcat" roles="role1"/>
	<tomcat-users>

You will also need to change the auth-constraint and security-role role-name elements.

    <auth-constraint>
       <role-name>gpiradmin</role-name>
    </auth-constraint>
    ...
    <security-role>
       <role-name>gpiradmin</role-name>
    </security-role>

Web Services

The GPIR web services are made up of GPIRQuery which is used for reading information from GPIR, and GPIRIngester which is used for writing information to it. Both services rely on XML documents that follow the GPIR schemas.

GPIR Query

The GPIRQuery web service exposes the following two methods:

  1. getQueryByResource - This queries a single GPIR resource. It accepts two parameters, the first is a string representing the name of the particular query being requested, the second being the name of the specific resource being queried strHostname parameter is the exact full hostname of the resource being queried (e.g. myhost.tacc.utexas.edu).
  2. getQueryByVo - This query also accepts the strQuery parameter, but its second parameter is a string representing the exact name, as registered in GPIR, of the VO whose resources you wish to retrieve.

The former returns data for a single resource while the later returns data for multiple resources that are in a VO. The query name parameter called strQuery is limited to the types listed below.

  • load - an xml document corresponding to the loads.xsd schema. This document includes all load types data that exists in GPIR.
  • jobs - returns an xml document corresponding to the jobs.xsd schema. This document includes data on jobs that exist on the resources queuing system.
  • motd - "message of the day" data according to motd.xsd.
  • nodes - node status information for a resource or set of resources according to nodes.xsd.
  • nws - returns point to point Network Weather Service data for resources in a VO when used with the getQueryByVo method. It is only used with an overloaded form of the getQueryByResource method that accepts a third parameter corresponding to the VO that you wish to obtain NWS data from for the requested Resource. It follows nws.xsd.
  • downtime - downtime data according to downtime.xsd.
  • queues - queue data for a Resource or a VO, follows queues.xsd
  • status - a simple bit indicating resource status (i.e. "up" or "down") using status.xsd

These queries above were "simple" queries returning a specific type of resource or VO data. The queries below are more complex compound queries that can be convenient when writing portals that display a broad range of data on a single page.

  1. static - returns all of the "human-centric" data for a resource or resources according to static.xsd.
  2. summary - a broad range of summary data for a resource or VO according to summary.xsd.
  3. stats - resource statistics (e.g. number of nodes) using the stats.xsd schema.

Here is a sample query client:

import java.net.*;
import java.util.*;
import java.lang.*;
import org.apache.soap.*;
import org.apache.soap.rpc.*;

public class QueryClient {

    /* The Web service URL, NAME and METHOD */
    private final static String WSURL = "http://myhost:8080/gpir/webservices";
    private final static String WS_NAME = "GPIRQuery";
    private final static String WS_METHOD = "getQueryByVo";

    public static void runWS(String strQuery, String strVO) throws Exception {

        URL url = new URL(WSURL);
        Call call = new Call();
        String encodingStyleURI = Constants.NS_URI_SOAP_ENC;
        call.setEncodingStyleURI(encodingStyleURI);
        call.setTargetObjectURI(WS_NAME);
        call.setMethodName(WS_METHOD);
        Vector params = new Vector();
        params.addElement(new Parameter("strQuery", String.class, strQuery, null));
        params.addElement(new Parameter("strVO", String.class, strVO, null));
        call.setParams(params);

        Response resp = call.invoke(url, "");
        if (resp.generatedFault()) {
            Fault fault = resp.getFault();
            System.out.println("ERROR call failed: ");
            System.out.println("  Fault Code   = " + fault.getFaultCode());
            System.out.println("  Fault String = " + fault.getFaultString());
        } else {
            Parameter result = resp.getReturnValue();
            System.out.println(result.getValue());
        }
    }
}

GPIR Ingest

The GPIRIngester is in essence a mirror image of the GPIRQuery. A single ingest method accepts as it's one argument an XML document corresponding to one of the follows schemas:

  • status
  • load
  • nodes
  • jobs
  • nws
  • motd

The ingester method distinguishes between ingester types by examining the value of the root element of the XML document. If this value does not correspond to one of the root element values as specified in the schemas it will return an error.

Here is a sample Ingester Client:

import java.net.*;
import java.util.*;
import java.lang.*;

import org.apache.soap.*;
import org.apache.soap.rpc.*;

public class IngesterClient {
    // The Web service URL, NAME and METHOD
    private final static String WSURL = "http://myhost:8080/gpir/webservices";
    private final static String WS_NAME = "GPIRIngester";
    private final static String WS_METHOD = "ingest";

    public static void runWS(String strXML) throws Exception {
        URL url = new URL(WSURL);
        //Build SOAP Call
        Call call = new Call();
        String encodingStyleURI = Constants.NS_URI_SOAP_ENC;
        call.setEncodingStyleURI(encodingStyleURI);
        call.setTargetObjectURI(WS_NAME);
        call.setMethodName(WS_METHOD);

        //Build Paramter list for method call
        Vector params = new Vector();
        params.addElement(new Parameter("strXMLSource", String.class, strXML, null));
        call.setParams(params);

        //Run method call, retrieve response
        Response resp = call.invoke(url, "");

        //Report error or result string
        if (resp.generatedFault()) {
            Fault fault = resp.getFault();
            System.out.println("ERROR call failed: ");
            System.out.println("  Fault Code   = " + fault.getFaultCode());
            System.out.println("  Fault String = " + fault.getFaultString());
        } else {
            Parameter result = resp.getReturnValue();
            System.out.println(result.getValue());
        }
    }
}

Providers

The GPIR providers are responsible for getting data about resources and putting it into XML format, and ingesting the data into GPIR. There are two categories of providers: central and remote. Central providers can be run on any server, including the GridPort server. Remote providers need to be installed and deployed on all end resources that will be monitored by GPIR. Sample providers that gather different types of data (e.g. jobs, load, motd) for two three different resource managers (LSF, PBS, LoadLeveler, Condor) are included in the GridPort distribution.

Configuration & Installation

In order to install the remote GPIR providers, please carry out the following steps:

  1. Set up the remote resource as a client in GPIR
    GPIR authenticates a provider that tries to ingest data via that providers ip address. Thus, the resource on which a remote provider runs must be added as a client in the GPIR. You can add clients in GPIR by using the GPIR Admin Client. For more details, please see the documentation on the GPIR Admin Client.
  2. Distribute and unpack the providers
    Bundle the provider code, transfer the provider bundle to each remote resource for which you wish to gather information, and decompress the provider bundle.
          	> cd gpir-<version>/src
          	> tar cvfz providers.tar.gz providers/
          	> scp providers.tar.gz provideruser@myremotehost.edu:~/.
          	> ssh provideruser@myremotehost.edu
          	> tar xvfz providers.tar.gz
    
  3. Set up the provider user's environment
    If the remote provider will gather data from the resource's resource manager, the portal user's environment must be aware of the resource manager's commands. If the provider user does not have this environment configured by default, please add the configuration to the provider user's shell environment by editing this user's $HOME/.profile file (the provider runs as an sh shell command via cron).
  4. Install SOAP::Lite
    If your remote resource does not have the SOAP::Lite perl module, you will need to install it on the remote resource as the provider user. The SOAP::Lite bundle can be found in providers/perl/soap. In the same directory, there is also a simple shell script that will install the bundle. Make sure to execute this script from the providers/perl/soap directory. (Note: you may need to modify the script to account for various resources' tar commands.)
          	# as provideruser on the remote resource
          	> cd ~/providers/perl/soap
          	> ./install.soap.sh
    
  5. (LSF resources) Install the lsf_showq command
    If your remote resource utilizes LSF as its resource manager, the provider scripts will by default interact with the lsf_showq command, a non-default lsf command developed at TACC. This command is included in the GPIR provider distribution, and an installation script is provided.
          	> cd ~/providers/perl/lsf_showq
          	> ./install.sh
    
  6. Configure the provider
    Next, you will need to set a few configuration values for the remote provider. You will need to edit the parameters in perl-providers/perl/conf/providers.conf. These parameters include hostname, GPIR servers (and ports), administrator email addresses, and resource manager specification. The following configuration utilizes the sample data gathering modules that interact with a PBS resource manager. If your resource utilizes a different resource manager, please point the module paths to the appropriate data gathering module for that resource manager (see the providers/src/modules directory for a full list of the example modules provided). For more details on the configuration parameters, see the documentation in the configuration file.
          	# as provideruser on the remote resource
          	> vi ~/providers/perl/conf/providers.conf
    
          	...
    
          	### RESOURCE INFO
          	hostname=myremotehost.edu
    
          	### MODULE PATHS
          	motd.module=../modules/motd.pl
          	load.module=../modules/jobs.pbs.pl
          	jobs.module=../modules/load.pbs.pl
          	jobs.condor.module=<path-to-load-module>
          	pcgrid.condor.module=<path-to-load-module>
    
          	### CONDOR INFO
          	central.manager=centralmanager.edu
          	pool.name=<pool-name>
          	pool.description=<pool-description>
          	pool.state=<enabled || disabled>
    
          	
          	### GPIR INFO
          	gpir.contact=gpirserver.edu:8080
    
          	### ADMIN INFO
          	admin.email=portaladmin@myorg.edu
    
  7. Test the providers from the command line.
    Before automating the execution of the provider scripts, please run the providers from the command line to ensure they are functioning correctly. To run the provider, as the provider user on the remote resource, from within the providers/perl/src/core directory, run the provider's main.pl script with the appropriate arguments. It is a good idea to first test the providers without ingesting the data to GPIR (by passing the -n command-line parameter to main.pl). Note: If you installed the perl SOAP::Lite module that is bundled with the providers, using the main.pl script to ingest the data will not work. However, it is still useful to test proper data gathering and xml formatting without ingesting to GPIR.
          	
          	# as provideruser on the remote resource
          	> cd ~/providers/perl/src/core
          	> ./main.pl
          	Usage: ./main.pl -f <function> [-d] [-n]
          	<function> = motd | jobs | load | nodes
          	[-d] = print debug info
          	[-n] = do not ingest data in GPIR
    
          	### first test without ingesting
          	> ./main.pl -f motd -d -n
          	> ./main.pl -f load -d -n
          	> ./main.pl -f jobs -d -n
          	> ./main.pl -f jobs.condor -d -n
          	> ./main.pl -f pcgrid.condor -d -n
    
          	### then test with ingesting
          	> cd ~/providers/perl
          	> ./run.sh motd
          	> ./run.sh load
          	> ./run.sh jobs
          	> ./run.sh jobs.condor
          	> ./run.sh pcgrid.condor
    
  8. Install the provider in the crontab.
    In order to automate the execution of the provider, we recommend using cron run on the remote resource under the provider user's account. Sample cron entries for various resource types have been included in the providers/perl/cron directory. The sample entries gather motd information every half hour and gather load and jobs information every fifteen minutes. The sample entries also specify that the output and error of the provider execution be written to files in the providers/perl/logs directory.
          	# as provideruser on the remote resource
          	> crontab ~/providers/perl/cron/compute.crontab
    

Customizing The Providers (Creating Module Scripts)

The remote GPIR providers are architected to separate the various responsibilities they carry out into modular scripts. The tasks carried out can be listed as follows:

  1. Gather data of interest from the resource manager (or other sources).
  2. Format the data into GPIR-schema-based XML.
  3. Ingest the data into data via SOAP.

Due to the great variety of resource managers, and slight differences in the formats of their commands' outputs, Step 1 may have to be customized somewhat to fit the environment of your particular resource. However, as long as Step 1 produces a consistently formatted output, you should not have to modify any provider code that deals with Steps 2-3.

Step 1, the gathering of resource data, is encapsulated in the perl scripts in the providers/perl/src/modules directory. The provider's infrastructure code will execute these scripts, and will take the resulting output produced by the scripts and pass it to the xml formatting code in providers/perl/src/xml_formatters. Thus, if you wish to include a customized data gathering mechanism for your resource, simply create an executable script (in the programming language of your choice) that gathers the data and writes it to STDOUT in the expected format of for the particular function (i.e. motd, load, jobs).

The existing xml formatters expect a well-defined format to be written to STDOUT. A description of the format for each supported function follows:

  • motd - The motd xml formatter simply expects the MOTD contents to be printed to STDOUT line-by-line. Thus, the given provider simply makes a call to 'cat motd'. It is unlikely this provider will need to be tweaked, unless perhaps your resources motd file is not in the standard location (/etc/motd).
  • load - The load xml formatter simply expects the ratio of CPUS used to total CPUS available, expressed as a whole number percentage (i.e. 65). This number should be printed to stdout on a single line.
  • jobs - The jobs xml formatter expects the appropriate module to print a job per line. Each line will contain a comma-delimited list of the jobs attributes, with no spaces between commas and attributes. For a detailed list of the order of the attributes, please refer to the providers/perl/src/xml_formatters/jobs.pl script.

Extending Provider Functionality

The remote provider is separated into three groups of functionality: core, modules, and xml formatters. The core scripts are responsible for the provider's configuration, main logic, and GPIR web service interaction. The modules are responsible for gathering a specific type of data about the resource. The xml formatters convert data obtained by the modules into a format that the GPIR web service expects. In order to use the remote provider to gather another type of information from the resource, please do the following:

  1. Add a module to gather data
    You will need to add a module to the remote providers to gather the particular data of interest. This module must be an executable script (in the programming language of your choice) that acquires the data of interest and prints the data to STDOUT in a format that its xml formatter expects. Different modules will print data in different formats, depending on the type of data being gathered. Modules are located in the providers/perl/src/modules directory and follow the following naming convention:
          	
          	<function>.<resource_manager>.pl
    
  2. Add an xml formatter to format the module's data for GPIR
    The xml formatter will convert the data printed by the the module into a format that the GPIR web service expects. Specifically, this format is an xml string that conforms to one of the accepted GPIR schemas. Please refer to the documentation on GPIR schemas for more information. Xml formatters are located in the providers/perl/src/xml_formatters directory and follow the following naming convention:
          	
          	<function>.pl
    


  3. Add the new functionality to the provider's core configuration
    Finally, you will need to modify the remote provider's core configuration in order for your new functionality to be executed by the provider. To do so, you will need to edit providers/perl/src/core/config.pl. Simply add the function to the VALID_FUNCTIONS list (line 3).
          	# as provideruser on the remote resource
          	vi ~/providers/perl/src/core/config.pl
          	...
          	
          	# line 3
          	@VALID_FUNCTIONS = ('motd', 'jobs', 'jobs.condor', 'load', 'pcgrid.condor', 'nodes');
    

    The new function will be accepted as a parameter of the -f flag when executing the providers/perl/src/core/main.pl script.

Web site tools