The Xamin Guide v1.1

January 9, 2013

Introduction

This document describes the Xamin system used to access astronomical data at the High Energy Astrophysics Science Archive Research Center at NASA's Goddard Space Flight Center.

What is Xamin?

Xamin is the primary interface to the HEASARC's catalog and archive systems. Xamin supports simple and complex table queries, monitoring of tables, and downloads of HEASARC archival data. Xamin also links to tables and data at other sites where appropriate.

Xamin can be used through a web interface designed as a Rich Internet Applicaiton (RIA) and also through a command line interface. All facilities are available through both interfaces.

Documentation Overview

The are a number of documents describing aspects of the Xamin system. While there is some overlap between these documents they are intended for different purposes and audiences. Documents include:
The Xamin Guide
This document itself provides a general overview of the Xamin system with particular emphasis on how the system is built and how it can be used. Users interested in an overview of the general capabilities may find this document more accessible than the detailed interface guides. This is the primary document for administrators since it describes in detail the steps for installing and maintaining the Xamin system. Developers may find this document less useful since it discusses the code in only fairly general terms, but the code overview may be a helpful starting point.
This web guide -- designed to be used from the Xamin Web environment -- describes the Web interface in detail.
The Xamin CLI User's Guide
This document describes the use of the Xamin CLI interface. The Web interface is based upon the CLI and the CLI document spends a bit more time on the details and consequences of certain settings, so this document may be useful for both Web and batch users.
The Xamin Plotting User's Guide
This document describes the powerful plotting capabilies. Users may also refer to the STILTS Plotting documentation since Xamin's plot interface is built on top of the STILTS.
The Xamin requirements document.
This document was developed early in the Xamin process to clearly define features required for the final system. The latest version is available on the HEASARC Developmentwiki but requires an account to view. This document is of interest primarily to developers and testers to assure that the implemented systems meets the needs of our community.
The Xamin Design document
This document is of particular interest to developers who may wish to use or extend the Xamin code. It provides details of actual modules used and their relationships.

Current implementations

Xamin is currently implemented on the operations machine of the HEASARC cluster as
    http://heasarc.gsfc.nasa.gov/xamin
        
The Xamin web application contains a number of end-point URLs. This web application includes not only the run-time code, but full documentation and code for the installation and maintenance of Xamin.

Migrating from Browse to Xamin

The Xamin interface replaces the Browse web interface. Xamin supports all of the query functionality in Browse and provides many new capabilities, however this new system uses a single Web page where users can progressively expand capabilities rather than the many distinct Web interfaces that Browse had employed. This section describes how some of the tasks accomplished in the old system are done in Xamin.
Querying all HEASARC holdings
This works essentially the same as before. Just enter a position and/or time and and start a query. You get back a table with the number of hits for each HEASARC table. Click on the table icon to query a specific table with the specified constraints.
Querying a specific table
In Browse if you wanted to query a specific table you could go to the index page or first select a mission and then the specific table. In Xamin you enter the table in the Tables Explorer quick entry box at the top of the form, or you can browse the the table entries in the Tables Explorer hierarchies. You can also search for tables using a keyword search of their documentation. Any of these can be used to select a table.
Querying all tables from a given mission
This was the default mode for Browse when the user checked a mission box. In Xamin, to select all of the tables for that mission just open the Tables Explorer/mission hierarchy and click on the add icon to add all of the tables for that mission. Then use Find matches in... Selected Tables to search all of the tables in the mission for the selected time or positions. You can add tables from other missions, or delete some of the tables before querying if you like. You can then query the selected tables for position or time constraints.
How do I do a query by parameters?
Enter the table name the use the Parameters Explorer to add parameter constraints. You can add a constraint for any parameter by clicking next to it. Note that you can use fairly general SQL expressions in the constraints and these can involve table parameters.
How do I do cross-correlations?
If you have 2-4 tables selected, you can do a correlation of them just by clicking on Query to...Correlate. Spatial cross-match constraints will be added automatically, but you can override or modify these to meet your needs.
Can I search the metadata for tables as I could in the keyword entry interface?
Yes. Use the Tables Explorer keyword search.
How do I monitor a mission for new observations of the target I'm interested in?
Xamin has a broad capability to check if the results for any query have changed. First get a HEASARC services account. Login during an Xamin session and run a query that finds observations you are interested in. Use Query to...Save to save this in your account. Bring up the summary of your saved tables with theSession/User Tables--Info menu. If you click on the Frequency column next to your saved table you should get a pane that allows you to ask to be informed whenever the result of this query change.

System Elements

Interfaces

Users may access all Xamin capabilities from either a Web-based GUI or through a CLI interface that can be embedded in scripts.

Web Interface

The Web interface is built on top of the ExtJS library to provide a very powerful but compact interface. Users can query, correlate, plot and download data all in the same page. A single box entry to the GUI is also available. It allows users to specify targets, formats, tables and such in a single entry box and starts up the Web interface after having sorted all of these out

CLI Interface

The CLI interface uses a simple keyword driven interface to build and run queries. The Web interface is built on top of the CLI interface.

Queries

Xamin supports a broad range of queries. Users can query single tables, they can ask what tables have data at a given location or time, or they can combine information from multiple tables in a variety of different ways.

Single Table Queries

Queries of single tables can involve standard positional or temporal constraints. Simple or complex constraints can be levied against any parameter in the table. Output fields can be defined as columns or expressions. If a user simply wishes to browse a table it can be queried without any constraints.

Discovery queries

Discovery queries query master tables for positions and/or times that the user has specified and find science tables that have information for those positions or intervals. Users can query the entire HEASARC holdings or they can restrict the query to a specific list of tables of interest. Only HEASARC tables are indexed in discovery queries.

Correlations

Correlations include any query that involves multiple tables. Most correlations involve checking for spatial adjacency, but correlations can be made on time or other criteria. In many database system tables reference one another through the use of keys. E.g., a set of object detections may refer to an observation table through an observation id. Xamin provides special support for positional correlations.

All tables can take part in correlations. However remote tables must be downloaded to the HEASARC before the correlation can take place. So in practice correlations of remote resources can only be done if the tables are of modest size or if constraints that can be passed to the remote site can restrict the size to a reasonable number. It may be impractical to query remote tables where the size of the downloaded data is more than ~10,000 rows.

In the web interface spatial correlation constraints are assumed and created for the user -- they can be deleted if not desired. In the command line version the OFFSET keyword is used to specify positional correlations, but the user must include this explicitly.

Temporal cross-correlations can be made as simple constraints on the time columns of the two tables. At the HEASARC times are stored in MJD's, so to ask for observations within a day of one another we can add a constraint of the format

     b.time-1 .. b.time+1
        
in the Parameters explorer for the field a.time. In the command line interface that might be specified as
     constraint='a.time between b.time-1 and b.time+1'
        
where the quotes are to ensure that the shell treats the value as a single string.

While any number of tables can be correlated in principle, queries tend to become very slow when large numbers of tables are involved. The web interface imposes a maximum of four tables in a correlation, no limit is specified for the command line. This limit can be gotten around if necessary by breaking up the correlation into a series of correlations and saving intermediate results as tables.

Anti-correlations
Anticorrelations are a special class of correlation where the user wishes to find rows in one table or tables that do not have corresponding data in another table, e.g., objects that have not been observed. Only a single anticorrelation table is permitted in a correlation. The results of an anticorrelation are defined as follows: Manifestly a user may not request an output field in an anticorrelation that requires any column in the anticorrelating table. The same is true of any group constraint.

Note that Xamin does not do careful analysis of the input constraints to determine if pieces of an input constraint can be included in the second query. E.g., compare the constraint

     constraint='d.exposure>1000 and c.exposure>1000'
        
versus
     constraint=d.exposure>1000,c.exposure>1000
        
From Xamin's perspective in the first example we have a single constraint. If d is the anticorrelation table, then in the first case Xamin will not include any constraint on the exposure from table c when it runs the second part of the anticorrelation. In the second case, since the constraint on table c is specified independently this constraint will be included in the second query.
Autocorrelations
Autocorrelations involve a correlation of a table with itself. This can be useful in looking for neighbors or duplicates. The structure of the Xamin metadata table often suggests using autocorrelations to get results. To perform an autocorrelation simply specify the table multiple times. The table prefixes allow you to specify which instance of the table you are looking at.

When doing an autocorrelation it is usually necessary to make sure that you don't return a row matching itself. A simple way to ensure that is to add a constraint of the form

       constraint='a.id != b.id'
        
in the query. In the Xamin web interface just enter != b.id next to the entry for a.id. Of course the appropriate field may be named something other than id.

Data Products

Data products are any information not in the table being queried that is somehow linked to the table as a whole or to a specific row or rows in the table.

There are two fundamental types of products:

  1. Mission archival products represent the data holdings in the HEASARC archive. These are the mission data from more than two dozen NASA and non-NASA missions. For many Xamin users querying observation catalogs is just a step in finding and locating the datasets of interest to their research. These products and be retrieved and downloaded using Xamin.
  2. Link products are URLs that point to data that are related to a specific row, but are not part of the HEASARC archive proper. We distinguish several types of link products.
By default, the Web interface only shows archival products but the user can request the other types. In the CLI interface users must explicitly request data product records.

Downloading archive data

In the web environment the user selects data products in a grid query results page. Data products can be selected for all rows in the table, or for individual rows. Users can open the data products for a given row and look within the data product hierarchy selecting only the data products of interest.

As data products are collected they are added to a shopping cart. The shopping cart can contain products from multiple selections and from multiple queries. Users can also download individual data products directly (or in the case of directories open a web link to the directory). Link products cannot be added to the shopping cart -- they generally involve data that is intended to be viewed in browser or which are found at remote sites.

At any point in an Xamin session users can download the data in their shopping cart. They can download individual products one by one from the shopping cart, but if there are more than two or three products users may prefer to collect the data products together. A tar file of all of the products in the cart can be created. This tar file is created dynamically so the download should start almost immediately.

Whenever a significant amount of data is to be downloaded we recommend that a data download script be created rather than directly tarring the data. Scripts can be created which use either the WGET or CURL commands both of which are widely available. Scripts are much more easily restarted and allow the users complete control of the data download process. The script can be displayed and copied via striping or it can be downloaded explicitly.

In the command line environment a query can be used to build a download script to download archival products or to provide a list of links to link products.

Setting Up Data Products

Xamin data products use two kinds of metadata: Data product information is relatively stable and batch procedures have been developed for interpreting XML descriptions of the products and updating the database. A metadata file describing each mission's data products is included in the full Xamin distribution.

User Accounts

Xamin uses HEASARC services login accounts. Users may request any number of accounts, but must provide a valid E-mail. Users must respond to a message sent to them by E-mail before the account is activated.

Capabilities

A user can store any number of user tables in their account. These tables can be used exactly as any other table in the system users can correlate the user tables with other local and remote tables. User tables are only available when a user has logged into the corresponding account. For the command line the USER and PASSWORD keywords are used to ensure appropriate access privileges.

On-line users can also save the current query configuration at any time into a named configuration. This configuration can be restored at any later time. Note that the configuration only affects the Query window, not any of the other Xamin windows. If a configuration is saved with the name default, then auser can automatically restore this configuraition in subsequent logins. To change the default configuration the current default must first be deleted.

Output formats

A variety of output formats are supported in Xamin.

ASCII

Two ASCII formats are supported. The Aligned format pads the output fields to align columns and headers. The Streaming format does no padding. Messages are shown after query results in the Aligned format, but are shown as they are processed in the Streaming format.

VOTable

The VOTable format is an XML format supported by the Virtual Observatory. Product extensions are used within the HEASARC to describe products associated with table rows.

VO Protocols

ConeSearch and TAP formats provide data using the VO ConeSearch and TAP protocols. These are pure VOTable formats which satisfy the requirements for these protocols (e.g., the type of UCD used, and the syntax of the INFO elements included).

Excel

Data may be downloaded as Excel spreadsheets in the .xls format.

FITS

Data may be downloaded as FITS binary tables.

Table Sources

Xamin allows users to specify many different kinds of tables to query.

Local Tables

These are stored in the Postgres database and include the familiar HEASARC tables, the metainfo metadata tables and a few master tables used for index queries.

User Tables

Each user may have their own persistent tables. User tables may be purged if use becomes excessive.

Uploaded Tables

In the command line interface user may query temporarily uploaded tables without requiring any user account or they can upload tables into a user account. Only the second option is currently available from the Web.

External Tables

Users may query tables at Vizier and Virtual Observatory sites. The syntax for querying these tables is exactly the same (except for the table names used) and in principle users can do all of the same kinds of queries. However in practice downloads of large number of rows from a remote site may be impractical so that users should try to ensure that no more than ~10,000 rows are required to be downloaded from a remote site.

Sample Queries

This section briefly describes some of the kinds of queries that you can do with Xamin. The web and CLI user�s guides illustrate the specific steps needed to do a few of these queries. The discussion here focuses on what you can do rather than how to do it.

Discovery Queries.

You can use discovery queries to find out if the HEASARC has information at given positions and/or times. They can be made against all HEASARC tables, or against a specified list of two or more tables. The results of a discovery query are a list of the tables that have information at the requested positions/times including the number of rows in each table that match the criteria. The list includes links that actually do the query for each table. Only positional and temporal constraints may be specified in a discovery query. When you do a discovery query you get back a table which provides links to the queries of the individual tables.

Examples:

Simple Single Table Queries

If you just want to query a given table for information on specific targets or times, then enter the position or time constraints just as with a discovery query, and also specify the table you are interested in. You can also view a table without specifying any constraints. A single table query of an observation is often the first step in downloading data associated with the selected observations.

Examples:

Complex Single Table Queries

You can do a lot more with single table queries than positional or temporal constraints. You can put constraints on any parameter, decide exactly which columns to display -- including columns that you define�and set the order columns are displayed in. You can specify a given sort order for the output. You can even do simple statistics on a table.

Examples:

Cross-correlations

Cross-correlations allow you to combine information from multiple tables. This can include tables of your own that you have uploaded or generated in previous queries. Typically cross-correlations use a positional constraint where we match results from the two tables according to positional coincidence. However cross-correlations can also be done based on time or any other characteristics of the table. All of the options available for single table queries are available for cross-correlations too. One special kind of correlation is an anticorrelation, looking for data in one table or tables which does not have matching data in another. Data products can be linked to the results of the cross-correlation whenever any of the source have data products.

Examples:

External Queries

External tables are queried exactly like native HEASARC tables. Access to these remote tables may be slower than for local tables, but users can query by position, add constraints and join tables. Simple positional and parameter constraints may be applied remotely. More complex constraints require downloading of data to the HEASARC database. Queries that would require downloading more than of order 10,000 rows may not be practical. External queries of VO image and spectral data services may have an associated data product.

Examples:

Monitors

Users can check for new observations and for other changes to the database. Any local query can be saved and the user can ask that the query be repeated occasionally and that they be informed of the changes. This feature can also be used to keep a table up to date.

Examples:

Upload Queries

Users can query their own tables and use these tables in correlations with HEASARC and external tables. Tables can be uploaded in CSV, VOTable and TDAT formats. Any string may be used as the separator in the CSV format, not just commas. The data are uploaded into an image of the input table and a few system columns may be added if positional columns are recognized (by UCD or name). Users may also upload a list of targets where each line of the input is assumed to be a target. Some lines may be target names and others coordinates. Coordinates may in turn be in mixed sexagesimal and decimal formats. However all coordinates must be in the same coordinate system. A list of targets is uploaded into a table with three rows: the original input string and the J2000 RA and Dec corresponding to the position. The derived coordinates can be null if an unresolveable target name was found. While source lists can be queried as tables in their own right they are usually correlated with some other table.

Examples:

Installation

This section describes the steps in creating the complete Xamin data access system. The discussion of each step includes both the general issues that need to be resolved by an implementor in their system, and the specific choices made at the HEASARC.

This section describes a complete installation of the Xamin server. Users who only wish to be able to use the Xamin from their CLI environment need only download the very small users.jar file from the Xamin distribution directory (e.g. http://heasarc.gsfc.nasa.gov/xamin/distrib/users.jar).

Installing and populating the database.

Xamin is designed to use a PostgreSQL database. Some PostgreSQL specific features are used in the software to populate the system and in the query system. The keyword search interface uses PostgreSQL features that paralleled in some other systems (e.g., SQLServer), but may not be generally available. Postgres specific features of the system are called out

Postgres is freely available at:

   http://postgresql.org
        

While one click install downloads are available, these generally require superuser privileges on the destination machines..

Installing the database

This section gives a brief overview for installing Postgres. Much more information is available at the PostgreSQL site.

The tar file should be unpacked with

      > tar zxf postgresql-X.Y.Z.tar.gz

        
this creates a subdirectory of the current directory of postgresql-8.4.2. An INSTALL file with detailed directions for installing Postgres is available in this directory. The path used for the HEASARC installation is noted below.

Within the postgres-ql-X.Y.Z directory

    > ./configure --prefix=desired_postgres_directory
        
e.g.,
    > ./configure --prefix=/local/db/postgres
        

Then execute:

     make
     make install
        
to compile and install Postgres in the desired location.

The current Postgres operational machine is installed on the DBMS1 machine.

The Postgres server is ideally run under a special user account but may be run under a normal account. The special account can be created using normal system procedures.

Initial configuration.

To initialize the database first create the directory in which the database data is to be stored. This directory should be owned by the same user as will run the server.

    > mkdir directory_for_data
        
then initialize the database using
    > initdb -D directory_for data
        
where initdb is in the bin subdirectory of your Postgres installation.

The output from the initdb command indicates how one can start the database server. This command should be put in the startup scripts for your database machine (but is not currently for the HEASARC).

Once the database has been initialized some small amount of customization of the configuration files is desirable. These are found in the data directory mentioned above.

Postgres accounts

At least two and preferably three Postgres accounts should be set up to manage the Xamin database. A superuser, postgres, account will be needed to set up the user accounts and to create the database in which the Xamin system will run.

A second account, e.g., xamin, may be defined as the owner of the xamin database with all associated privileges, or this can be done under the postgres account. Using a separate xamin account is preferable since this account need have no privileges outside of the Xamin database. The account will be used in many of the internal operations scripts, but is only invoked by external user actions when new 'user accounts' are created and destroyed and the associated schemas need to be created or destroyed in the Xamin database.

The webuser account is the account that external users normally invoke when accessing Xamin. The webuser account has the privileges to read all non-system tables and to create and update tables in user schemas.

The names and passwords of these accounts are needed in the [heasarc|ops|dev].settings files found in the Xamin WEB-INF area. Generally the ops/dev settings files will use the xamin account which is also the privileged account in the heasarc.settings. The webuser account is used as the non-privileged account in the heasarc.settings.

To create a user in Postgres use the CREATE ROLE SQL command or the createuser script provided in the Xamin user area. E.g.,

                create role xamin with login with password encrypted 'password';
                create role webuser with login with password encrypted 'another';
            

Preparing for data ingest.

Before starting data ingest, an installer will need to download the the needed jar files. Jar files are modified zip files which are used to store Java classes. However they can also be used to store other kinds of files including data files used for running programs. The Xamin jar includes a number of utility programs (mostly written in Perl) that need to be unpacked to be used. The Xamin war file is created in building a full distribution of Xamin. This war file (a jar file containing the complete Web application) is currently created using NetBeans. To unpack the war, copy it to the directory in which you want to unpack it and

   >  jar xf xamin.war
        
Unfortunately JAR files do not seem to preserve execute permissions for scripts properly. One can simply set the execute flag for the scripts in the operations area, or use the Permissions.jar (which should be included in the Xamin distribution) with the executables.list file in the WEB-INF/classes directory. E.g., the command (see below for the definition of the logical names)
0
   > java -cp $XAMIN_JARDIR/Permissions.jar permissions.PermissionRestorer $XAMIN_ROOT/.. < $XAMIN_ROOT/../executables.list
        
will restore appropriate permissions.

Initializing the database.

The scripts for initializing and populating the HEASARC database are included with the Xamin Web service. The service should be installed as described below. The scripts described below can all be found in the operations area of the installed service (typically something like .../xamin/WEB-INF/classes/ops). .

The initialize command in the .../ops/ingest directory will perform many of the tasks noted below: defining the HEASARC database, populating it with HEASARC tables, creating the table documentation and master tables, setting up the areas for supporting VO TAP queries, and ingesting data products. More details on these steps are given below.

Many of the environment variables used in operations scripts are defined in the .../ops/daily/cronenv.sh file which is used to initialize them before they are run. The settings in this script should be carefully examined and may need to be updated. The environment variables include:

APP_ROOT
Defines the base directory of the overall Xamin web application.
XAMIN_ROOT
Defines the base directory of the Xamin operations scripts typically, $APP_ROOT/WEB-INF/classes/ops.
XAMIN_JARDIR
The location of the JARS used within Xamin, typically, $APP_ROOT/WEB-INF/lib.
LOG_ROOT
A directory containing subdirectories in which which system commands outputs are placed, notably the system operations logs. This does not log user requests to Xamin.
POSTGRES_ROOT
The root of a Postgres installation. This is mostly used to find Postgres executables, e.g., $POSTGRES_ROOT/bin/psql
XAMIN_LOGDIR
The directory in which system actions are logged, usualy $LOG_ROOT/logs
XAMIN_BACKUPS
The directory in which user backups are placed, usually $LOG_ROOT/bck
XAMIN_TMPDIR
A directory in which system commands can create temporary files, usually $LOG_ROOT/tmp
PGHOST
The computer where the Postgres database is to be found
PGUSER
The user account under which requests are to be made. Normally this will be the postgres or xamin user for system operations. This will be set in the settings file for external Xamin access.
PGDATABASE
The database to queried. Normally this will be the heasarc database.
TDAT_DIR
The directory (on the TDAT_HOST) where TDAT files are stored.
TDAT_HOST
The host machine on which TDAT files are stored.
HEASARC_SETTINGS
A pointer to the settings file to be user. The settings files are generally used to convey information to Java programs. Operations scripts normally use the dev.settings or ops.settings (depending upon whether this is a development or operational installation), while the web application generally uses heasarc.settings. All of these files are included in the $APP_ROOT/WEB-INF directory.
Creating the database

Postgres allows for multiple databases to be supported by a single Postgres installation. Each database can have multiple schemas and each schema multiple tables. Cross-database queries are not supported, but cross-schema queries are. All tables used within Xamin should be placed within a single database which may now be created either by using the psql command (the CLI interpreter for the database), or using the createdb command. E.g., using psql we might have:

    > psql
    prompt=# create database heasarc;
    CREATE DATABASE
    prompt=#
        
We place all tables into the heasarc database. Note that if we had run the previous command as the postgres user and have a separate xamin Postgres account we may want to
    prompt=# alter database heasarc owner to xamin;
        
to change the ownership of the HEASARC database to the xamin Postgres account.
Metadata and feedback table creation

The metadata table, metainfo, needs to be created before any other tables can be read into the system. The

    buildmeta.csh
        
command sets this up. If a feedback table is to be supported in the system this can be created using the
    buildfeedback.csh
        
command.
Copying tables
An initial ingest of HEASARC tables may be done using the [run]fullcopy command. The runfullcopy (in ops/ingest) sets up logs and such for the fullcopy command. Just
   > runfullcopy
        
will copy all HEASARC tables into the system using the current values of the TDAT files in the HEASARC's http://heasarc.gsfc.nasa.gov/FTP/heasarc/dbase/tdat_files directory. As discussed in operations below, this script is also used to update the database with new and updated tables.

The runfullcopy command uses other scripts in the directory. It calls fullcopy after setting up logs. This calls copyit for each matching TDAT file. The real workhouse inside copyit is pgcopy.pl which parses the TDAT files and creates the SQL to ingest the table. This is then sent to the Postgres psql to be executed.

The procedure populates both the tables themselves and the metadata regarding the tables. However it does not populate the data products related to a table.

In addition to populating the science tables, the runfullcopy command updates the documentation tables within Xamin. Xamin keeps a copy of all HEASARC and Vizier table documentation in the database. These can either be fully refreshed or updated depending upon whether an argument is specified in the command. The updateDocs.pl script is used by runfullcopy to find and ingest documentation.

Creating and updating the documentation table
The Tables Explorer keyword search capability uses the tabledocs table. This is also used in defining the Vizier table structure. This table uses the Postgres TSVECTOR column to provide keyword searchable representation of HEASARC tables. The table_docs_create.sql command creates the table. The loadDocs.pl script extracts both HEASARC and VizieR documentation. HEASARC documentation is the HEASARC table HTML documents. VizieR documentation is the online documentation for each table. Before indexing them in the TSVECTOR column, loadDocs.pl strips these documents of their structure elements (i.e., HTML keywords).

This resulting SQL file should be run and then the fixdocs.sql script should be run. This last script adds in a bit of HEASARC metadata to the HEASARC descriptions, populates the TSQUERY column in the tabledocs table and gets rid of descriptions of HEASARC tables that are not present in the database.

Ingesting HEASARC product metadata.

The metadata for HEASARC products is ingested using the updateProducts command in ops/products, specifying a directory with the product definition files for each mission. E.g., as the code is distributed if the user makes the directory specified above the default directory then the command

   updateProducts missions
        
will ingest the data products for the missions (Found in ops/products).

The HEASARC has a products file for each mission and special products files for table links, external links, and bibliographic links that should be individually processed.

Building master tables
Several master tables are used to support index queries where the user specifies a position and/or time and finds lists of tables that have data in the specified locations and/or intervals. These master tables are built using the tools in the ops/master directory. The createMasters.pl script does an initial creation of the master tables. Run the
    buildFullMasters.pl
        
script to populate the master tables: pos_big, pos_small, pos_time_big, pos_time_small and time. This process can take a couple of hours.

Master tables are views of two underlying tables a _prime table and a _bck. Creating the master tables is fairly expensive because we create a clustered index on fairly large tables. We do not wish to recreate them and regenerate the index each time we update a table. After buildFullMasters.pl completes, the _bck tables are empty. As tables in the database are updated, part of the update process is to add the position/time information from those tables into the corresponding _bck tables. Another table keeps track of the tables that have been updated since the last time we ran buildFullMasters.pl. When a user makes a subsequent query of the master tables, we look in the _bck table for information about science tables that have been updated, and in the _prime table for information about unchanged tables. Since only a small fraction of the tables get updates, the _bck table is much smaller than the _prime table. Thus even though it is not as efficiently indexed, master table queries can run quickly.

Periodically we run buildFullMasters.pl to bring the data from the _bck tables into the primary tables. Since this takes a couple of hours, the approach is to build up the new versions of the master tables using temporary table names. Only when the new master tables are complete do we drop the previous version and rename the new tables to the standard names. This takes seconds.

Preparing for VO Access
The Virtual Observatory Table Access Protocol (TAP) interface requires some support structures within the Postgres environment. The TAP_SCHEMA schema is used to provide standard metadata to TAP clients. A TAP_ASYNC schema is used ot support asynchronous requests from data from Xamin. Tables holding the results from asynchoronous requests are stored there along with tables that describe the status of requests. TAP requires support for Astronomical Data Query Language (ADQL) queries. ADQL defines extensions to SQL that make it easier to specify spatial constraints. A set of functions and types that enable the support for these constraints is defined. This ADQL specific functionality is defined in terms of another spatial library pgSphere which must first be added to Postgres. A pgSphere implementation is found in the ops/vo/pgsphere directory and includes a Makefile and instructions for installation. [Note: As of Postgres Version 9, the installation script for pgSphere fails to compile since the LANGUAGE attribute in the CREATE FUNCTION calls is case sensitive. References to LANGUAGE 'c' should be replaced with LANGUAGE 'C' and similarly with 'sql'.] After pgSphere has been installed the ops/vo/tap.csh file can be used to load the SQL statements that define the TAP-specific schemata and the ADQL functions.

The heasarc.settings file.

The heasarc.settings file is used to set up the defaults for the Xamin system. The file is normally found in the WEB-INF subdirectory of the Xamin web application. Below we provide an annotated version. Other installations will need to adjust the values to meet their requirements. End users should generally not need to worry about it. It is not used on the client side of the Xamin batch interface.


An example heasarc.settings file.

# The default database
# This will be used as the suffix in the jdbc_xxxx settings
database=postgres

# The JDBC parameters for Postgres

# This is probably the same for everyone.
jdbc_class_postgres=org.postgresql.Driver

# The host and database name will need to be updated.
# The port is the Postgres default port.
jdbc_url_postgres=jdbc:postgresql://db.host:5432/postgres
jdbc_user_postgres=dbuser
jdbc_pwd_postgres=password

# Nominal hosts versus actual host names.  We want to
# be sure to translate any hostname we get into a valid
# host.  So if we get heasarc we want the full path
# heasarc.gsfc.nasa.gov.  If running on a load balancer
# then we want to translate the actual server name
# that we are running on, to the name that we want to
# publish.
heaweb1=heasarc.gsfc.nasa.gov
heaweb2=heasarc.gsfc.nasa.gov
heaweb3=heasarc.gsfc.nasa.gov
heaweb4=heasarc.gsfc.nasa.gov
# What to translate the host "localhost" to.
localhost=heasarc.gsfc.nasa.gov

# These are settings that the user is not allowed to change.
fixedSettings=database,jdbc_class_postgres,jdbc_url_postgres,jdbc_user_postgres

# The User account for non-privileged access
standard_user=dbuser


# Is there forwarding to a TOMCAT server.  This happens only
# for the nominal released and operational versions, not development versions.
# When we are constructing a local URL from the elements available in the
# server environment we may find that the servlet is running on port 6080,
# but we want to publish the original port 80 for the Apache server.
ForwardedPort=80,6080

# The current version of Xamin.
# The first digit is incremented with major releases.
# The second digit is incremented when there is a significant change to user functionality.
# Letters are appended to show bug/documentation fixes that do not appreciably
# change end user functionality.
Version=1.3g

# The following two fields are place holders that we fill in during the installation
# procedure with appropriate versions for the HDB and Xamin projects that are used
# in building the Xamin web application.
HDBVersion=0
XaminVersion=0

# This is replaced by the current date in the installation procedure.
ReleaseDate=2011-01-01


# This is a relative or absolute URL to service that should be invoked
# to request a user login.  Note that Xamin itself does not manage user logins.
# While a user is logged in, all Xamin requests use the login service
# to validate that the indicated user is logged in to the current session.
LoginServlet=/srvlogin

# This is a pointer to the location of the HEASARC hera service.
HeraServer=herasrv.gsfc.nasa.gov
HeraPort=60000
DftHeraName=AnonHera

# These are the E-mail addresses for administrators who will receive notifications
# when Xamin users submit feedback.
feedbackRecipients=user1@xxx.gov,user2@xxx.gov

# These provide a skeleton for the structure of download requests.
wgetcommand="wget -q -nH -r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks "
curlcommand="curl -s --remote-name "

# User download requests using the TAR download feature are constrained
# to the tree below.
DownloadPath=/FTP
        
Smaller versions of the settings files ops.settings and dev.settings are used by the operations scripts. These include the Postgres settings that should be used by the operations scripts (which will typically be different from that used in Xamin user queries).

Preparing the Web server.

The current Xamin framework has been tested within only the TOMCAT environment though there are no known reasons why other servlet containers could not be used. TOMCAT is available as a download from the Apache Web site for a variety of machines.

Customizing TOMCAT

Relatively little customization of the TOMCAT environment has been done, but other installations may wish to tweak things. To enable Web logs similar to those produced by the Apache Web server something like the following XML is required to be added to the Engine table of the server.xml file.
<Valve className="org.apache.catalina.valves.AccessLogValve"
      directory="logs" prefix="localhost_access_log." suffix=".log"
      pattern="common" resolveHosts="false"/>
        
The following entries in the VirtualHost information in the Apache server configuration files enables the TOMCAT server to run under the Apache server with any URL that begins with /xamin processed by TOMCAT and all other URLs handled directly by Apache.
   <VirtualHost>
        ....
        ProxyPass        /xamin http://localhost:nnnn/xamin
        ProxyPassReverse /xamin http://localhost:nnnn/xamin
        ....
   </VirtualHost>
        
The first line says that when the Apache server sees the request it should delegate to another web server running at port nnnn. The second line tells Apache that when it get back a response from the remote port it should scrub references in the headers to the actual server and replace them with references to the Apache server.

Setting up the Web applications

Xamin is intended to run in a servlet environment and at the HEASARC is developed using TOMCAT. To install Xamin in TOMCAT simply copy the xamin.war file to the TOMCAT webapps directory and restart TOMCAT. This should automatically install the system. Installers may need to update the heasarc.settings file in the webapps/xamin directory.

One disadvantage to the preceding approach is that the directory webapps/xamin directory that is created will be owned by whatever user is running the TOMCAT server. This may make it inconvenient to update the application. Installers can create the webapps/xamin directory manually and the install the application from the war file with
     mkdir webapps/xamin
     chdir webapps/xamin
     jar   xf some/path/to/xamin.war
        
where the user is assumed to start in directory in which TOMCAT has been installed. The web application may need to be restarted within TOMCAT.

The Xamin war file contains the full Xamin distribution including operational scripts. Scripts are generally found under the xamin/WEB-INF/classes/ops directory. A site may wish to move these operational scripts to other directories, however the WEB-INF area and its subdirectories are invisible to the outside world so this area can be used without revealing operational details to the public.

Setting up the CLI/Batch interface

Since direct access to the database is not available outside of GSFC the Xamin batch mode gets around this by providing a query proxy to the user. The user need download only a very small executable Jar file that contains two classes. The classes in this jar take the user's input arguments and convert them in to an HTTP POST request to the Xamin web server. If there are file arguments (i.e., UPLOAD) then these files are read and appended to the POST. If the user has specified an OUTPUT argument, then the output is written to the designated file.

The server side of the batch interface is a thin layer which initiates a servlet that runs the user specified query and returns the results. The CLI servlet is very similar to the servlet used in processing Web requests except for the handling of user authentication. If a user wishes to access an Xamin account through the batch interface, both the user name password must be supplied in the command arguments. These are authenticated using the HEASARC login service.

To execute the users JAR file the user should enter:

       java -jar user.jar arg1 arg2 ...
        
For example
      java -jar user.jar table=rosmaster,ascamaster offset=a:b:10
        
will run a positional cross-correlation of the rosmaster and ascamaster tables with an maximum allowed offset of 10 arc minutes.

Code Overview

This section describes the relationship of elements of the code used within Xamin.

The Query Library

Query processing is performed using the HEASARC Database (HDB) library written in Java. Users invoke this library through a very thin servlet layer. The CLI interface calls this directly while the Web client does it internally in XMLHTTPRequests.

This sections describes the general flow of control in processing a query with a view to showing how the various classes work together.

Startup

The primary class for query processing is the Controller class. For CLI processing this is initiated using the main() method, while in the Web environment a Controller instance is instantiated and its exec method is called.

The first order of business for the Controller is to set up three global structures. The Connecter is used to get connections to the underlying database or databases. The Settings are used to manage the system and user settings. The MessageHandler is used to convey information about processing to the user. Each of these is available in a global context. To allow for robust access even in an environment where there may be many simultaneous queries, these three globals use a static ThreadLocal variables to ensure that each instance of a query provides an independent instance to the query processing.

The Connecter allows for reuse of a single connection to the underlying database or when multiple connections are desired to get as many connection as needed. During shutdown it ensures that all connections are closed.

The Settings are essentially is just a Map from keywords to string values. Settings are set by system set up files, user arguments and also by program operations.

The Message Handler provides a standard way to send messages back to the user. Initially it is set to simply echo whatever is sent on the standard error output. Once the query processing has progressed to a stage where the desired output format has been established, a message handler compatible with that format will be used.

Help and User account processing

After establishing the Settings the Controller begins to analyze the request the user has made. There are a few non-query requests that are handled first.

If the user has requested help documentation the Help class is used to statisfy the request. If a USER setting has been specified, the the UserAccount class is used to ensure that the password specified is correct (or that the user has consistently forgone password protection on the account).

Next there is a check to see if this is an account processing directive. A user may create an account or modify it using the functions in the AccountManager or view or delete tables using the TableManager.

Setting up the query

After verifying that the request is not one of the non-query requests, the Controller.runQuery method starts setting up the query environment. The first step is determining the output. A StandardDisplay object is created which will handle the communications with the user. A FormatFactory which creates the WriterElements appropriate for the selected format is created and the WriterElement are plugged into the StandardDisplay. WriterElements separate out tasks for writing. Notably there are WriterElements for writing table information and distinct elements (but associated through the FormatFactory) for writing out the data products associated with the element. The global MessageHandler is updated so that messages will be incorporated appropriately into the output, e.g., if the output is FITS, the messages may be rendered as FITS comments.

If the user has requested information about the setup environment, (showArguments or ShowSettings), then these are handled by sending the appropriate messages.

Finding the tables to be queried

The first step in building the query is determining which tables are used in the query. Tables can be either either local (e.g., already present in the database), or uploaded. Uploaded tables will be added to the database temporarily for the duration of the query and then deleted afterwards. If there is a user account associated with the query, then the uploaded table can be persisted.

The Controller looks for a TableFinder class to use. Normally this will be the StandardTableFinder, but in principle this can be overriden by the user. This is flexibility is built into many of the key operational classes.

The TableFinder looks to the TABLE keyword to find any existing classes that are to be used in the query. It creates a QueryBuilder class that will be used to store information about the query and add Tables to it that represent each of the user selected tables. A MetadataInfo object is created that has all of the required metadata for the system by querying the system metadata table. The metadata for tables not participating in the query is not included in the MetadataInfo object.

Next come the uploaded tables. These are specified using the UPLOAD. Format specific classes (TDat, CSV, VOTable or SourceList) implement the Table interface for the input data. A generic TableCopier class then copies this table into underlying database as a temporary table. Metadata for the uploaded tables is added into the MetadataInfo object.

Building the Query

Once the tables that are to used in the query are found, work begins on setting up all of the SQL clauses in the query using a QueryConstructor. User specified constraints are added to the Where clause. There may be explicit constraints using the CONSTRAINT keyword, or are converted to explicit constraint using the user specified time or positional constraints. Constraints use ParsedExpressions which identify the underlying columns used in each constraint. This allows us to deal with the aliases that tables may have in the query. Generally the first table is aliased to 'a' the second to 'b' and so forth. A TableAliaser manages the mapping between tables and aliases.

Positional constraints involve substantial work. User specified positions may be in a variety of coordinate systems and may involve source names. the SkyView SourceCoordinates class is used to handle this. DistanceCalculator classes use different ways to specify a maximum radius constraint for a query. This is used both for point sources and cross-correlations.

The output fields for the query may be defaulted or may be explicitly specified.

As each of the user inputs is processed the QueryBuilder object is updated appropriately. E.g., position, time, and explicit constraints are added in separate steps. Any requested sort order is saved. If the user has requested a grouping clause that can be added.

Running the Query

After all of the user inputs are processed, the QueryBuilder generates the full SQL for the query. This SQL is first tested using the for validity and to give an estimate of the time the query will take using Postgres' explain function. If there are no problems, then the QueryBuilder returns a Query object which is executed to return a Table.

If the user has requested that the results of a query be saved as a user table, then two SQL statements are actually executed. The first statement uses a select into user.xxx ... to run the query and save the result as a user table. The second query is just select * from user.xxx and displays the results of the table that has just been created.

Displaying the results

Up to this point there have been no results shown to the user other than possible messages. Now the Display object's write method takes the Table returned by executing the query and calls methods for each of the WriterElements that have been plugged into it. Each writer element has methods to be invoked at the beginning of processing, for each row and after processing. E.g., if we are asking for data products, then there will be at least two WriterElements called for each row. The first will write whatever is desired regarding the actual table results, the second will look to see if there are any products associated with the row and if so will display them in some format appropriate fashion.

The Web Client

The Xamin Web client is primarily built in JavaScript and uses the Sencha ExtJS JavaScript library in building the GUI. The interfaces envisages a user treating the Web page as a virtual console, opening and closing panes as they process requests.

When the web client starts up, it first queries the Xamin server to get the current list of available tables. It processes any user arguments and then displays the primary Query pane to the user. Users can select tables and and add constraints to queries. Depending upon what the user has done various capabilities are enabled. Eventually the user submits a query. The users inputs are processed, and a query, in the same format a user may use in the CLI interface, is sent using an XMLHTTPRequest to the Xamin server. The results are processed in the web client and displayed to the user in a new pane.

The following are the major elements of the JavaScript interface:

xaminStart
Initialize the interface and sets up the framework for the overall window.
xaminQuery
Creates and displays the primary query window. Processes initial arguments to Xamin and extracts the current tables from the Xamin server.
xaminActions
Displays and updates buttons that allow users to initiate queries or clear entered information.
xaminError
Handle errors and messages to be sent to the user.
xaminFeedback
Allow users to send feedback directly to the Xamin system.
xaminGridResults
Display results and data products.
xaminInfo
Display table documentation and other information to the user.
xaminMenu
Render and update menu of options for user.
xaminQI
Process a query request and initiate a query.
xaminSamp
Broadcast results using the VO SAMP protocol.
xaminSession
Initiate and manage user sessions.
xaminShpCart
Manage selected data products and initiate product retrievals.
xaminSpecCon
Support parameter requests and constraints in queries.
xaminStdCon
Support position and time constraints in queries.
xaminTree
Support multiple methods for users to select tables in Xamin.
plotfuncs
Provide support for generating plots in Xamin.
threeaxis
Display and manage plot window.

Xamin Servlets and JSPs

Both the CLI and Web libraries use servlets to initiate requests. These are primarily written in Java with some small JSP files. A key function of the servlets is to transform the input arguments (which may include file uploads) into the key/value pairs understood by the clients. The servlet layer also addresses user security issues. A number of utilities are provided as servlets for use by the web client. The VO TAP interface requires substantial coding in the servlet layer for its implementation. Key elements of the servlet layer include:
Key Servlet Packages and Classes
heasarc.xaminThe package containing the primary servlets for both CLI and web requests
QueryServletThe main class for processing web client query requests. It parses the inputs and returns the results as needed
CLIServlet The main class for processing cli queries
TableParams Get the parameters for a given table (or tables)
BatchEmul Emulate Browse batch queries
heasarc.xamin.nodeClasses for building nodes in the table selection trees including both HEASARC and Vizier tables
heasarc.voSupport Cone and TAP queries using VO protocols
TapAsyncSupport TAP Asynchronous queries
Vosi*Support TAP VOSI capabilities
ADQLServletSynchronous ADQL query
heasarc.stiltsInvoke STILTS library for plotting
heasarc.sources Invoke coordinate converter to see if token is source name
heasarc.downloads Support download requests
Key JSP files
xamin.jsp The primary file for the Web interface
getDoc.jsp Get the documentation for a table
delete/addUser.jsp Delete or add a user as requested by the login service

The Metadata Table

A single metadata table, metainfo, is used to store all metadata for the Xamin system. When a user account is created and extension metainfo table is created in the schema for the user and the join of this user table -- which contains metadata specific to user tables -- and the system table used. The metainfo table uses a generic structure for representing metadata rather than providing a specific structured tables. This allows for great flexibility in the growth of the metadata at the cost of some complexity in the queries and size of the metadata.

To find information in a table we generally need to know the table, row and column we are interested in. Each row of the metadata table is essentially a table, row, and column identifier followed by the information that belongs there.

There are some additional sources of information that are used in Xamin.

Bibliographic information is stored in a the zzbib table but zzbib is not a special metadata table, it is queryable just as any other table. The only special feature of this table is that it is used in the When clauses in a number of conditional products -- but using the same syntax as other conditional links.

Structure of the Metainfo table.

The metainfo table comprises five columns:
type
The type column is used to specify the kind of data being described. It is analogous to the table name in a conventional representation. The query select distinct type from metainfo is the equivalent of determining what tables are present. Commonly used types include mission, table, product, linkfield, parameter.
name
The name column is used to specify which element of a given type is being looked at. It corresponds to a row identifier in a table view. E.g., if we are providing metadata on tables, then the type field might be 'table' and the name field would be the name of the table. Some care is needed to ensure uniqueness of the name within a type. E.g., it would not enough to simply use the name of the parameter, the name should include the table name and parameter name to ensure uniqueness within the entire system.
relation
The relation can be thought of as the column identifier. It describes what the relationship the current piece of information has to the element being described. A table might have a 'description' or an 'author' or any number of characteristics. A parameter may have a 'unit' or an 'ucd'. Some relationships involve other kinds of data. E.g., a table might 'include' a parameter or 'link' to a product.
value
The actual content is presented in this row. E.g., for a table description it will be some text describing the table. For a product template it might be a string that can be used to find files associated with the given product. The value is a text string which in some cases may encode a numeric value.
vorder
Unlike a conventional table, the metainfo structure allows multiple entries with the same type, name and relation. This is often very desirable, e.g., for a table to link to many parameters. In cases where the order of these relationships is important the vorder keyword can be used to specify this ordering. For scalar values or where the order is not significant it can be left null. This integer field is the only non-text field in the metadata.

Current types and relations in the metadata

Type (table)Relation (column)Description
 
document The document type stores information about document used somewhere in the system. Currently the only information for a document is its URL, but author, update time, format, ... may be added in the future
document located The URL of a document
 
linkfield A linkfield describes the mapping between the values available in a given table, and the variables used in a product template.
linkfield field The variable name used in a product template description
linkfield value The SQL used to to extract the template from a row of the table
 
mission The mission type is currently mostly a placeholder, but may include information about instruments, PI's, wavelength coverage and more in the future
mission end_time The start of mission science operations
mission start_time The end of mission science operations
 
parameter The parameter type gives the characteristics for a particular parameter of a given table.
parameter description A short text description of the parameter
parameter format The format in which the parameter should be displayed
parameter maximum The maximum value for the parameter in the table
parameter minimum The minimum value for the parameter in the table
parameter note A comment on the parameter
parameter order The order in which the parameter is to be displayed. If positive then the parameter is a default/standard parameter. If negative, then when all parameters are displayed the order should be the absolute value of this parameter, except that values of 0 come last.
parameter otype The type of the parameter in the Browse system
parameter ucd The uniform content descriptor for the parameter
parameter unit The unit[s] of the parameter
 
product The product type is used to describe a data product which may be an archival product, bibliographic reference, table link or generic external product.
 
product description A short text description of the product
product includes A link to a child product
product links User to link the generic products
product match A template string used to find files or URLs for a product given values in a row
product type The general type of the product (e.g., FITS, ASCII, HTML, ...)
 
table This type describes the characteristics of a particular table in the database.
table author The author of the table
table bibcode The bibcode for a paper describing the table
table containedBy A mission which includes this table (may be repeated)
table contains A link to a parameter entry
table defaultSearchRadius The default radius for position queries of this table
table describedBy A document describing the table
table description A short text description of the table
table end_time The column name for the column containing the end time for a table
table includesTypes Object types described by this table
table j2000coordinates A pair of columns giving the J2000 coordinates for the row
table lastUpdated The date when this table was last updated
table priority A numeric priority associated with the table
table regime The regime of the table (may be repeated)
table start_time A column name for the start time associated with each row
table tableType The kind of table (observation, object, ...)
 
tablelink A tablelink maps a table to product. It indicates the linkfield mappings that are to be used in finding the products for a given row.
tablelink linkfield A linkfield used in this link from a table to a product
tablelink product The name of the product being linked to
tablelink table The name of the table being linked from
tablelink when A conditional expression indicating if this link should be made for the current row
 
user User entries are only found in the user metainfo tables, not in the primary metainfo table. Exactly one user name should be found in each user metainfo table.
user email The E-mail address of the user.
user password An encrypted version of the user password (if supplied).
user status The status of the account: active/inactive/validating'

The Master Tables

A set of master tables is used to store time and position information so that users can quickly find all tables that match specified positional or temporal constraints. These tables are found in the master_table schema. A total of 11 tables and 5 views are used.
NameTable/ViewDescription
pos_big_primeTableWeekly positions where the position error is > 1 degree.
pos_small_primeTableWeekly positions where the position error is < 1 degree.
pos_big_bckTableIncremental positions where the position error is > 1 degree.
pos_small_bckTableIncremental positions where the position error is < 1 degree.
pos_bigViewCurrent positions where the position error is > 1 degree.
pos_smallViewCurrent positions where the position error is < 1 degree.
pos_time_big_primeTableWeekly positions/times where the position error is > 1 degree.
pos_time_small_primeTableWeekly positions/times where the position error is < 1 degree.
pos_time_big_bckTableIncremental positions/times where the position error is > 1 degree.
pos_time_small_bckTableIncremental positions/times where the position error is < 1 degree.
pos_time_bigViewCurrent positions/times where the position error is > 1 degree.
pos_time_smallViewCurrent positions/times where the position error is < 1 degree.
time_primeTableWeekly times
time_bckTableIncremental times
timeViewCurrent times
update_listTableTables that have been updated since weekly cleanup

Position and Time Tables

There are two kinds of information stored in the master tables: positional and temporal. The pos tables are used to support position-only discovery queries. The time tables are used to support temporal-only queries. The pos_time tables are used to support queries where the user specifies both time and position constraints.

Position tables

Positional information is split into two tables, big and small. This split is needed to enable efficient queries when the user wishes to use the table-specific (or occasionally row specific) default search radius in the query. If we query the table using the default search radius constraint we get a constraint that looks like dec between XX-dsr and XX+dsr where XX is the declination of the requested center. In our database, this constraint cannot take advantage of the clustered index on declination. However if we know that the default search radius is always less than say 1 degree we can put in the constraint dec between XX-dsr and XX+dsr and dec between XX-1 and XX+1. The additional constraint using a constant range allows us to make effective use of our index. However some HEASARC tables (e.g., INTSCW) have very large default search radii, greater than 20 degrees. To make effective use of the dec index, then cannot search them with this technique. Fortunately there are relatively few such tables.

So to make positional queries with default search radii efficient we separate the tables with relatively small default search radii and index them in the pos_small master tables. Those with large search radii are placed in pos_big. When we want to find all of the tables we run two queries (as a union). The query of the pos_small tables includes a constraint using the maximum default search radius that can take advantage of the dec index. The query of the pos_big tables does not include this additional constraint, but this table is much smaller since only a very few tables have such large error bars.

Time Tables

Time information is kept in separate tables for similar reasons. A typical time query asks for information that spans a given instant or interval. It would be very efficient if we could do add simply add a constraint that the row be within some constant time of the users requested time. E.g., we might use time between TT - 10 and TT+10 if we knew that no row spanned an interval more than 10 days. However the HEASARC database contains tables where the time interval for a given row is years (typically when the row describes a series of observations, e.g., in XTEINDEX). So correct time searches can't use an index on time as efficiently as we might wish. By extracting the time information into a separate table we are able to search a much smaller table and the query runs in an acceptable time even though it makes limited use of the indexes.

Position/Time tables

The pos_time tables accommodate queries that specify both temporal and positional constraints. These are the smallest tables in number of rows. These are kept distinct from the purely temporal tables to maximize the efficiency of the time-only queries. HEASARC tables require five columns for positions, so the time table rows are much smaller.

Primary and backup tables

All of the underlying tables are paired as prime and bck Periodically (currently weekly) the master tables are regenerated from the tables in the HEASARC database. All current information from all HEASARC tables is then available in the prime tables. The bck and update_list tables are emptied. During the week, some tables may be updated. Whenever a table is updated, any rows associated with that table in any of the bck tables is deleted and then appropriate position/time rows are placed in the appropriate bck tables. The deletion is done first to accommodate the common case that a table is updated multiple times during the week. The name of the table being updated is added to the update_list.

Current views

Five views are provided that combine the prime and bck tables such that the user sees only the most recent information for each table. E.g., the views look something like
          select AA from xx_prime
             where table_name not in (select table_name from update_list) and BB
            union
          select AA from xx_bck where BB
        
By referring to the views, the users of the master tables are isolated from the update process.

Indexing

To support efficient queries the master tables are clustered (have a primary index) on declination where available or time. Clustering substantially increases the responsiveness of the query. Indexes are also provided on the table_name field.

Operations

The following table describes some of the routine operations required for Xamin at the HEASRC. Many of these tasks are automatically initiated as cron jobs. Cron jobs use the /www/server/xamin/ area for links to non-Xamin executables (e.g., PostgreSQL commands), temporary storage, backups and logging. The outputs for many of the cron jobs are logged in /www/server/xamin/ops/logs. The environment for cron jobs is set in the xamin/WEB-INF/classes/ops/daily/cronenv.sh which is sourceed before the cron jobs are executed.

The scripts are given with respect to xamin/WEB-INF/classes/ops. Logs are generally kept in /www/server/xamin/ops/logs.

Process Frequency Initiation Script Description Log
Update tables Every 12 hours Automatic ingest/runfullcopy Update HEASARC tables in the Postgres database. Also update HEASARC and Vizier documentation. xamin.ingest.DATE
Update documentation table Every 12 hours Automatic Included in runfullcopy Update the table documents table. See above
Update master tables As needed Auto Included in runfullcopy When tables are updated add position/time information to master _bck files. See above
Update bibliography tables As needed Automatic Included in runfullcopy Add entries to bibliography tables. As far as Xamin is concerned this is just a table update, but the table is maintained by the HEASARC. This happens automatically whenever runfullcopy sees the zzbib.tdat file updated. See above
Clean master tables Weekly Automatic master/buildFullMasters.pl Put all master table information into _prime tables. fullMasters.DATE
TAP async user tables/jobs cleanup Daily Automatic daily/cleanasync.pl Clean up all information on jobs in the TAP_ASYNC schema that are past their destruction time. cleanup.log.DATE
User usage review Weekly Automatic ops/daily/sizeReport Review disk usage of all users. This will also such TAP asynchronous and master table usage. The output of this cron is not logged so it is sent as a mail message to the cron initiator. Cron user e-mail
Run table monitor Daily Automatic daily/monitor.pl See if the queries that have created user tables would now give different results and update/notify tables/users as requested. monitor.log.DATE
HTTP logs Daily? Auto TBD Copy TOMCAT http logs -- may want to join with Apache logs TBD
User backup Daily Automatic daily/userbck.pl Backup Xamin user account information. The HEASARC's TDAT files serve as a backup for the science tables so a backup of those is not required. userbck.log.DATE
Testing Hourly Automatic tests/batchTest.pl Run tests that show that system is operational. At least one test (comprising multiple queries) is run each hour. Every table is queried at least once each day. autochecks.log
Update tables test Daily Automatic tests/makeTables.pl Since the tables in the Xamin system can change we need to keep the tables test up to date. This updates tests/tables.tests. Not logged
Statistics Quarterly Manual TBD Compute statistics including at least:
  • Number of query requests
  • Frequency given tables were requested
  • Total number of rows downloaded.
  • Table statistics aggregated by mission
  • Download requests per mission
  • Download volumes per mission
NA
Compare Xamin and Browse Weekly Automatic ingest/compareXaminBrowse Ensure that the tables seen in Xamin and Browse are consistent. Cron user e-mail
Delete tables from Xamin As needed Manual ingest/deleteTable Delete a table that is no longer used in Xamin. delete.TABLE
Add user schema As requested by srvlogin Automatic xamin/jsp/createUser.jsp Add a schema for a new user account. This script is invoked by the HEASARC login service and is no externally invokable Not logged
Delete user schema As requested by srvlogin Automatic xamin/jsp/deleteUser.jsp Deleta a schema for user account. This script is invoked by the HEASARC login service and is not externally invokable. Not logged
Add/update data products As needed Manual products/updateProducts When new products are defined, the mission products definition file in products/mission/MISSION.xml must be created or updated. This script reads this file and updates the metadata table to define the products. Not logged

Schedule

This table summarizes the current schedule for Xamin cron events. Times are Eastern time.
Time Event
00:50 Test: basic tests
01:50 Test: infrastructure tests
02:30 Clean asynchronous
02:50 Test: basic tests
03:00 User backups
03:10 Daily monitor
03:30 Tu Build full masters
03:50 Test: infrastructure tests
04:50 Test: basic tests
05:00 Table ingest
05:50 Test: infrastructure tests
06:50 Test: basic tests
07:50 Test: infrastructure tests
08:50 Test: basic tests
09:50 Test: infrastructure tests
10:50 Test: basic tests
11:50 Test: infrastructure tests
12:50 Test: basic tests
13:50 Test: infrastructure tests
14:50 Test: basic tests
15:50 Test: infrastructure tests
16:50 Test: basic tests
17:00 Table ingest
17:50 Test: infrastructure tests
18:50 Test: basic tests
19:50 Test: infrastructure tests
20:50 Test: basic tests
21:00 Update tables test
21:00 Th Generate user size report
21:50 Test: infrastructure tests
22:00 Compare Xamin and Browse
22:10 Test: table tests
22:50 Test: basic tests
23:50 Test: infrastructure tests

Acknowledgements and licenses

Included software

STILTS

The STILTS library developed by Mark Taylor is the basis of all plotting functionality provided by Xamin. A number of libraries are called by STILTS and are described in the STILTS documentation.

ExtJS

The ExtJS JavaScript library is used extensively in the Web interface. While the HEASARC has a commercial license for this product, it is also available under GPL.

pgSphere

The pgSphere provides support for spherical geometry within Postgres. This library is available at its home site. No license is specified in the distribution. .

nom.tam.fits

The nom.tam.fits library is used for writing FITS output.

com.ice.tar

This library provides several classes to support writing TAR files from Java. It includes the statement:
This work was authored by Timothy Gerard Endres, time@gjt.org.
This work has been placed into the public domain.
You are free to use this work in any way you wish.