The HEASARC Database System
HEASARC Catalog Organization and Metadata
The HEASARC uses a standard relational database for its catalog data and metadata. A relational database is essentially just a set of tables. Each table consists of a number of rows and columns. The columns of the table may be of different types -- strings, integers, floating point numbers -- but each row has the same structure. Some cells in the table may not have a value defined, these have the special marker value 'null'.
In addition to its contents, each table has associated metadata that describes it,e.g.,a name, information about who is authorized to access it, and indexes that allow more efficient searches on the table.
The HEASARC database recognizes three kinds of tables. Metadata tables, which always begin with 'ZZ' describe tables or archive data. They give information about the table: the names of the columns, any special meanings that are associated with the table, the archive data products associated with the table. The underlying database system also has metadata tables, but to ensure that the HEASARC software is portable from vendor to vendor, only the HEASARC metadata tables are referred to in our software. All of the metadata tables combined form the HEASARC metabase.
Local user catalogs are the tables with information that users may wish to extract. Most of these can be categorized as object tables, which describe specific objects in the sky, or observation tables, which describe observations by a given satellite or instrument. There are some tables of atomic data and tables of proposal abstracts as well. All of these are stored within the same relational database system as the metadata tables. Most local catalogs for historical reasons begin with 'heasarc_'.
Remote user catalogs are used in the same way as local catalogs, but reside in other database systems. These may include data in the VizieR system at the CDS, or databases accessed through Virtual Observatory protocols. Metadata tables may have information about remote catalogs, but it is usually much less complete than for local catalogs. There may also be remote tables that the HEASARC software discovers dynamically when making queries, so that there is no evidence of their existence in the HEASARC metadata tables. Missing metadata for remote catalogs is gathered dynamically during the query process.
Metabase Schema
This section discusses the metadata that is used by the HEASARC software. The metadata for remote catalogs is not necessarily stored in these tables, but when gathered dynamically has much the same structure.
Currently, the HEASARC DBMS utilizes the following metadata tables:
- ZZGEN
- describes the overall characteristics of all of the tables that can be directly accessed through our software. This includes all metadata and local catalog tables and some remote catalogs. There are remote catalogs which can be discovered dynamically during a query sessions that are not included in ZZGEN. ZZGEN contains non-discipline specific information about the table. As such it typically duplicates information that is included in database system-specific tables, but provides it to the HEASARC software systems in a system-independent way. Different relational database systems store the information in very different ways. Each table will have one entry in ZZGEN.
- ZZEXT
- describes domain specific extensions to ZZGEN. This is where metadata describing elements of specific interest to astronomers would normally be placed, e.g., which columns are RA and DEC, how large is the default search radius for a cone search, what column contains the start time of the observation. A single table may have many entries in ZZGEN, each gives a single special characteristic for that table. The overall characteristics of the table are the concatenation of it's single ZZGEN entry and all of its ZZEXT entries.
- ZZPAR
- describes the parameters of the table. This information is usually also available in a system-specific table, but gathered here to provide a standard way to access it. There will normally be one entry in ZZPAR for each column of each table.
- ZZLINK
- describes links between tables. It shows how given an entry in one table, one or more entries in another table may be linked to it. There may be 0 or many ZZLINK entries for a given table.
- ZZWORDS
- lists the keywords pertinent to each publicly visible table. The list of keywords are roughly in order of relevance and each keyword is separated by spaces and strictly all lowercase.
- ZZDPSETS
- describes the data products associated with a given table. Each entry describes a specific data product set for a given table. Tables have entries in ZZDPSETS if and only if they have data products.
- ZZDP
- describes the data products available in the HEASARC archive. Each data product is described as a URL so that it need not be physically present at the HEASARC. For each URL a data product tag is associated. In principal many tags can be associated with a given URL, but this is currently discouraged. Note that a data product set described by ZZDPSETS will often comprise multiple data products.
Note:The ZZDPTYPES and ZZREL tables were defined and used in earlier incarnations of the HEASARC Database System, but they are no longer used.
Metabase Details
This section contains the detailed specification of the names, formats and use of columns in each of the metadata tables.
ZZGEN: contains the generic information to describe tables available for access. | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZPAR: contains a list of the parameters available for each table. | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZEXT: contains the discipline-specific information. | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZLINK: contains links between tables | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZWORDS: contains pertinent keywords describing tables | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZDPSETS: contains the following information: (1) the existence of an entry for a given table_name indicates that that table has data products that are available for access, (2) definitions of generic sets (or categories) of data products, and (3) how to construct the data products tags for a given set. | |||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
ZZDP: contains the generic information to describe data products that are available for access. | |||||||||||||||||||||||||||||||||
|
Special Values in ZZEXT
Entries in ZZEXT are often called "virtual parameters," since they are used to extend the table information stored in ZZGEN with table-specific metadata. The existence and values of these virtual parameters in ZZEXT control how tables are treated by the HEASARC Browse system.
ZZEXT and ZZDP URL Shortcuts
Entries in the ZZEXT table with table_name='zzdp' are used as shortcuts in building the URLs stored in the dp_url field.
Although the dp_url field in ZZDP is currently limited to 80 characters, URLs can be much longer, indeed indeterminately longer. In order to support arbitrarily long URLs, to conserve storage space, and to increase the speed of data products queries, it was decided to use variables which would be looked up in ZZEXT and repeatedly expanded to construct the full URL. Such shortcuts use a syntax like "${shortcut}". For a given "${shortcut}" the ZZEXT table is queried for the parameter_value matching the table_name "zzdp" and the parameter_name "shortcut". This parameter value then replaces "${shortcut}" in the dp_url entry.
E.g., suppose ZZEXT has the following entry
(table_name, parameter_name, parameter_value) = 'zzdp', 'missionbase', 'ftp://heasarc.gsfc.nasa.gov/mission/data/'
In a row in ZZDP with
dp_url='${missionbase}/obs/k95432.fits.gz'
the ${missionbase} shortcut is recognized as a shortcut and replaced by the value in the ZZEXT table. I.e., the full URL is:
ftp://heasarc.gsfc.nasa.gov/mission/data/obs/k95432.fits.gz
Since shortcuts may be defined in terms of other shortcuts, such expansions are done until there are no more shortcuts left to expand in the URL. Circular shortcut references are detected and disallowed.
For example, "${heasarc}" might be "http://heasarc.gsfc.nasa.gov", "${rosat}" might be "${heasarc}/FTP/rosat/data", and "${rosat_pspc} might be "${rosat}/pspc/processed_data".
Shortcuts also have the added advantage that, if the relative location of some number of data products had to be changed, it is a lot easier to change a single entry in ZZEXT than to change thousands of rows in ZZDP.
Note that these shortcuts only apply to the ZZEXT entries for the ZZDP table.
The Data Products Layer
The HEASARC's Data Products Layer implements linking a given row in a database table to the data products for associated with that row. It utilizes the ZZDP and ZZDPSETS metabase tables.
Example of How the Browse Web Interface Uses the Data Products Layer
Suppose the ROSAT catalog, HEASARC_ROSMASTER, contained the following five rows:
seq_id | instrument | ra | dec | name |
---|---|---|---|---|
RF150003 | PSPC | 225.341508 | 66.405361 | H1504+65 |
RF150007 | PSPC | 216.643574 | 1.512456 | PG1426+015 |
RF150015 | PSPC | 219.479943 | 64.504469 | HD129333 |
RH100192 | HRI | 325.652292 | 38.089434 | XRT/HRI THERM CYG |
RH100193 | HRI | 85.021314 | -69.764432 | DRACO CLOUD |
When a user goes into Browse and does a search which displays the above rows in HEASARC_ROSMASTER, Browse displays the above with checkboxes to the left of each of the sequence IDs. Suppose the user checks the second and fourth sequence IDs in the list. He/she then chooses to preview or download the data products associated with those observations. For a given data products set (or "category" in Browse Web Interface terminology), the software looks up the tag_format in ZZDPSETS for that set and the table HEASARC_ROSMASTER. Suppose the tag_format field for some set says:
rosat.@{instrument}.@{seq_id}.*
The software then fills in the information from the HEASARC_ROSMASTER table into the tag format. So, for the second row, seq_id is RF15007 and instrument is PSPC. The resulting data products tag that is constructed from this information (and after converting to all lowercase) is:
rosat.pspc.rf15007.*
Similarly, for the fourth row, seq_id is RH100192 and instrument is HRI, so the constructed data products tag becomes:
rosat.hri.rh100192.*
Browse then queries the HEASARC Data Products Layer (specifically the ZZDP table) to get the data product information for that tag. The asterisk ("*") is interpreted as a wildcard, so the result is a list of tags. The ZZDP table returns the matching tag(s), types, formats, and URLs. For example, say the table ZZDP looks like the following:
dp_tag | dp_type | dp_format | dp_url |
---|---|---|---|
rosat.pspc.rh100192.aspect.1 | ASPECT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.aspect.2 | ASPECT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.events | EVENTS | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.lc.1 | LIGHTCURVE | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.plot.1 | PLOT | FITS | http://heasarc.gsfc... |
rosat.pspc.rh100192.plot.2 | PLOT | GIF | http://heasarc.gsfc... |
rosat.pspc.rh100192.image.1 | IMAGE | JPEG | http://heasarc.gsfc... |
rosat.pspc.rh100192.image.2 | IMAGE | GIF | http://heasarc.gsfc... |
Browse then formats the above URLs into HTML links to each data product. It also uses the above information to package multiple data products into a Unix tar file for the user to download together as a convenience.
User Tables
Browse dynamically queries the metadata tables to determine available tables and the columns they contain. The following guidelines are commonly adopted in building tables. Tables are normally added to the HEASARC by creating a TDAT file, an ASCII representation of the table and then using the HDBingest command to bulk copy the table into the database.
The primary right ascension and declination columns (as indicated by the ZZEXT fields) should use J2000 coordinates. While other coordinate systems may be used within Browse for cone-searches, positional cross-correlations are not feasible when the base coordinate systems are different. If a table is supplied with coordinates in a different system, then new columns should be added to the table in J2000 coordinates and these new columns should be made the primary positional fields. Conventionally, the primary position fields have the names 'ra' and 'dec'.
Times and dates should be stored in Modified Julian Day (MJD) numbers. Dates may be specified using integer values, while real numbers should be used for finer grained times. Double precision values allow time resolution of a few microseconds which is usually enough for catalog data.
A class field should normally be used only for information on object classes using the HEASARC standard set of source classes.
The table priority should be used to highlight the key tables for a given mission which may have priorities of 2 or 3. Tables made redundant by newer versions should be given priorities of 8 or 9. Typical object tables usually are given a priority of 5.
The default search radius should normally reflect either the size of the observation or the positional uncertainty of the source position.