Resource Prediction Service
From Ogce
The Resource Prediction Service (RPS) predicts an optimal set of resources for running your application. The RPS is a limited version of the Fault Tolerance and Recovery Service (FTR) that is used in the LEAD (Linked Environments for Atmospheric Discovery) and the VGrADS (Virtual Grid Application Development Software) projects.
Contents |
How to install the Resource Prediction Service (RPS)
Pre-requisites and Overview
The RPS service is included in the OGCE Axis Services suite. The easiest way to get started is to download the entire package or check it out of our SVN. In addition to the Web Service, RPS includes an agent service that runs as a separate process to the Web service.
- Java 1.5 or later
- Maven 2.0.7 or later
- Apache Ant 1.7 or later
- MySQL 5.x or later
Setting up a MySQL database
Before you begin make sure you have a mysql database and a mysql user with privileges to create, update and delete tables from the database. To create a mysql database follow these steps.
- If you are starting with a fresh mysql installation, you need to setup the mysql root password. For this use the mysqladmin command as root.
sudo mysqladmin password "root_password" - Login as root into the mysql database
mysql -u root -p - You can list the current databases using the "show databases" command. To create a database and grant the required privileges, run the commands
create database rps_dev; grant all on rps_dev.* to 'rps_dev'@'%' identified by 'rps_dev_password'; - Now try to login as the user 'rps_dev' using the password 'rps_dev_password' to the 'rps_dev' database.
$ mysql -u rps_dev -p -D rps_dev Enter password:If you can successfully login, then you can run some commands to see if you have the required privileges
create table foo (col1 int, col2 int); show tables; insert into foo values(1, 2); select * from foo; update foo set col2=3 where col1=1; select * from foo; drop table foo; show tables;
Install the RPS service
Download the OGCE Axis Services suite, unpack, and run the command
mvn clean install
in the ogce-axis-services directory. This will build and install everything.
If you want to rebuild just this service after initial installation, use
mvn clean install -f rps/pom.xml
Maven's -o option is also useful to speed up subsequent builds.
Start the RPS Agent
The RPS agent is a program that collects information from the QBETS and NWS services. This information is needed by the RPS service to predict the optimal set of resources for running your application. Use the following commands to run it:
- From the rps-agent subdirectory, start the RPS Agent by running the command "./rpsAgent.sh ./rps.properties"
- You should see the log messages in the var/service.log file.
- You should see five tables created in your MySQL database (BQP_TABLE, COMPUTE_RESOURCES_TABLE, NWS_TABLE, PERF_TABLE and QUEUE_INFO_TABLE)
- After a few minutes the COMPUTE_RESOURCES_TABLE, NWS_TABLE and QUEUE_INFO_TABLE should start getting populated with some values.
- The BQP_TABLE and PERF_TABLE will be empty till you insert performance models for your application(s) into the PERF_TABLE. You can add performance models either through the web service interface (see README.html) or directly using MySQL commands.
The RPS agent is compiled along with everything else by the master Maven command line. You can also recompile it by running
mvn clean install -f rps-agent/pom.xml
Alternatively you can cd to rps-agent and run the command "ant" with no arguments.
How to use the Resource Prediction Service
Web Clients
RPS Web client examples are included in the GTLAB project. For debugging purposes, the command line tools below can be used to debug. These duplicate the command line functions:
- List resources
- Add resources
- Delete resources
Using the command-line client to access the RPS web service
The RPS source code has a sample command-line client to access the RPS web service. Run these commands in the rps-agent directory. Output and errors will go to var/client.log.
How to invoke the "listResources" operation?
The listResources operation returns a list of resources for which the RPS service has queue wait time information and network latency and bandwith information. This list is the "universe" of resources for the RPS service.
To invoke the listResources operation, run the command
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation listResources --params
Note whether you are using 8080 or 9090 for your Apache Tomcat server port number and be sure you are using the proper port in your RPS commands.
You should see results of the invocation in var/client.log file that looks like this.
INFO [main] (RpsClient.java:244) - Got list of resources from service... INFO [main] (RpsClient.java:247) - kittyhawk INFO [main] (RpsClient.java:247) - cobalt INFO [main] (RpsClient.java:247) - queenbee INFO [main] (RpsClient.java:247) - abe INFO [main] (RpsClient.java:247) - eldorado INFO [main] (RpsClient.java:247) - ornlteragrid INFO [main] (RpsClient.java:247) - bigben INFO [main] (RpsClient.java:247) - ucteragrid INFO [main] (RpsClient.java:247) - sdscteragrid INFO [main] (RpsClient.java:247) - bigred INFO [main] (RpsClient.java:247) - uctg-spruce INFO [main] (RpsClient.java:247) - mayhem INFO [main] (RpsClient.java:247) - ucsbeuca INFO [main] (RpsClient.java:247) - utkeuca INFO [main] (RpsClient.java:247) - ec2 INFO [main] (RpsClient.java:247) - uheuca INFO [main] (RpsClient.java:247) - rencieuca
How to invoke the "addOrUpdateAppPerfModels" operation?
A performance model for an application is the amount of time the application takes to execute on a given number of CPUs on a given resource. An application is identified by a namespace, name and version number.
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params app_namespace app_name app_version resource_name cpus wall_time
for e.g
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params http://www.renci.org WRF V3.0.1 bigred 1024 3600 ./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params http://www.renci.org WRF V3.0.1 bigred 512 5000 ./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params http://www.renci.org WRF V3.0.1 bigred 256 8000 ./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params http://www.renci.org WRF V3.0.1 ornlteragrid 128 7200 ./client.sh --epr http://localhost:8080/axis2/services/Rps --operation addOrUpdateAppPerfModels --params http://www.renci.org WRF V3.0.1 sdscteragrid 512 5000
The above performance models will be added to the MySQL database. It will however take a few minutes for the service to gather enough information to be able to find the optimal resources for running your application. After that, an invocation of the 'findOptimalResources' operation on the RPS web service should return results within seconds.
How to invoke the "findOptimalResources" operation?
The findOptimalResources operation returns a set of optimal resources (optimized over data transfer time, queue wait time and compute time) for running your application.
First, wait for a few minutes after adding performance models for any application to the database (either through the web service as described above or directly through MySQL commands). Then run the command,
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation findOptimalResources --params app_namespace app_name app_version --inputData data_source_1 data_size_1 data_source_2 data_size_2 ... data_source_n data_size_n
for e.g
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation findOptimalResources --params http://www.renci.org WRF V3.0.1 --inputData bigred 1024000000 sdscteragrid 1024000000
If your application does not need to stage any input data files to the machine on which it will run, then you can omit the "--inputData" from the above command.
for e.g
./client.sh --epr http://localhost:8080/axis2/services/Rps --operation findOptimalResources --params http://www.renci.org WRF V3.0.1
You should see the results of the above invocation in the var/client.log file. The result of the above invocation is an array of the following
1. The rank of the resource (lower rank means more optimal resource) 2. The resource name 3. The queue name 4. The number of CPUs to use on the above resource and the above queue 5. The expected time to transfer data from the source (bigred in the above example) to the above computational resource 6. The expected queue wait time on the above resource and queue based on the number of CPUs and wall time obtained from the performance model 7. The expected wall time
INFO [main] (RpsClient.java:190) - Got list of optimal resources... INFO [main] (RpsClient.java:194) - Rank: 0, Resource name: sdscteragrid, queue name: dque, num cpus:512, exp data trans time: 2.234362650223991E9, exp queue wait time: 25934.0, exp wall time: 5000.0 INFO [main] (RpsClient.java:194) - Rank: 1, Resource name: bigred, queue name: DEBUG, num cpus:1024, exp data trans time: 2.2444635513588905E9, exp queue wait time: 0.0, exp wall time: 3600.0 INFO [main] (RpsClient.java:194) - Rank: 2, Resource name: ornlteragrid, queue name: dque, num cpus:512, exp data trans time: 3.770845237085046E9, exp queue wait time: 1.0, exp wall time: 5000.0
