The HEASARC Database System

HEASARC Catalog Organization and Metadata

The HEASARC uses a standard relational database for its catalog data and metadata. A relational database is essentially just a set of tables. Each table consists of a number of rows and columns. The columns of the table may be of different types -- strings, integers, floating point numbers -- but each row has the same structure. Some cells in the table may not have a value defined, these have the special marker value 'null'.

In addition to its contents, each table has associated metadata that describes it,e.g.,a name, information about who is authorized to access it, and indexes that allow more efficient searches on the table.

The HEASARC database recognizes three kinds of tables. Metadata tables, which always begin with 'ZZ' describe tables or archive data. They give information about the table: the names of the columns, any special meanings that are associated with the table, the archive data products associated with the table. The underlying database system also has metadata tables, but to ensure that the HEASARC software is portable from vendor to vendor, only the HEASARC metadata tables are referred to in our software. All of the metadata tables combined form the HEASARC metabase.

Local user catalogs are the tables with information that users may wish to extract. Most of these can be categorized as object tables, which describe specific objects in the sky, or observation tables, which describe observations by a given satellite or instrument. There are some tables of atomic data and tables of proposal abstracts as well. All of these are stored within the same relational database system as the metadata tables. Most local catalogs for historical reasons begin with 'heasarc_'.

Remote user catalogs are used in the same way as local catalogs, but reside in other database systems. These may include data in the VizieR system at the CDS, or databases accessed through Virtual Observatory protocols. Metadata tables may have information about remote catalogs, but it is usually much less complete than for local catalogs. There may also be remote tables that the HEASARC software discovers dynamically when making queries, so that there is no evidence of their existence in the HEASARC metadata tables. Missing metadata for remote catalogs is gathered dynamically during the query process.

Metabase Schema

This section discusses the metadata that is used by the HEASARC software. The metadata for remote catalogs is not necessarily stored in these tables, but when gathered dynamically has much the same structure.

Currently, the HEASARC DBMS utilizes the following metadata tables:

ZZGEN
describes the overall characteristics of all of the tables that can be directly accessed through our software. This includes all metadata and local catalog tables and some remote catalogs. There are remote catalogs which can be discovered dynamically during a query sessions that are not included in ZZGEN. ZZGEN contains non-discipline specific information about the table. As such it typically duplicates information that is included in database system-specific tables, but provides it to the HEASARC software systems in a system-independent way. Different relational database systems store the information in very different ways. Each table will have one entry in ZZGEN.
ZZEXT
describes domain specific extensions to ZZGEN. This is where metadata describing elements of specific interest to astronomers would normally be placed, e.g., which columns are RA and DEC, how large is the default search radius for a cone search, what column contains the start time of the observation. A single table may have many entries in ZZGEN, each gives a single special characteristic for that table. The overall characteristics of the table are the concatenation of it's single ZZGEN entry and all of its ZZEXT entries.
ZZPAR
describes the parameters of the table. This information is usually also available in a system-specific table, but gathered here to provide a standard way to access it. There will normally be one entry in ZZPAR for each column of each table.
ZZLINK
describes links between tables. It shows how given an entry in one table, one or more entries in another table may be linked to it. There may be 0 or many ZZLINK entries for a given table.
ZZWORDS
lists the keywords pertinent to each publicly visible table. The list of keywords are roughly in order of relevance and each keyword is separated by spaces and strictly all lowercase.
ZZDPSETS
describes the data products associated with a given table. Each entry describes a specific data product set for a given table. Tables have entries in ZZDPSETS if and only if they have data products.
ZZDP
describes the data products available in the HEASARC archive. Each data product is described as a URL so that it need not be physically present at the HEASARC. For each URL a data product tag is associated. In principal many tags can be associated with a given URL, but this is currently discouraged. Note that a data product set described by ZZDPSETS will often comprise multiple data products.

Note:The ZZDPTYPES and ZZREL tables were defined and used in earlier incarnations of the HEASARC Database System, but they are no longer used.

Metabase Details

This section contains the detailed specification of the names, formats and use of columns in each of the metadata tables.

ZZGEN: contains the generic information to describe tables available for access.
table_name char20 The short name of the table.
table_location char80 An identifier of the database system where the table is stored.
table_description char80 A short description of the contents of the table
create_date char19 The date the table was created. Unlike other dates in the HEASARC database, the creation and modification dates in ZZGEN are given as an ASCII string in the form YYYY-MM-DD HH:MM:SS. Elsewhere dates should be given as modified Julian day numbers.
modify_date char19 The last date the table was modified.
table_rows int2 The number of rows in the table
 
ZZPAR: contains a list of the parameters available for each table.
table_name char20 This field contains the table being described.
parameter_name char24 This field contains the parameter being described.
parameter_description char80 This field contains a short description of the parameter.
parameter_comment char80 This field contains additional information pertaining to the parameter.
parameter_format char80 The basic type and display format for the parameter given as a string of the form 'format:display' where format gives the type and length in bytes of the data, e.g., int1, int2, int4, float4, float8, char22, and the display is a printf display code without the initial '%'. E.g., float8:10.3f would indicate a double-precision floating-point value that should be displayed in a field 10 characters long to a precision of 3 decimal places. The display portion is used to ensure that excess precision is not given for a variable. The colon and display precision may be omitted.
parameter_unit char80 The unit of the parameter. These should generally be given using the HEASARC standard unit strings . E.g., 'ct/cm^2/s'. Times are generally expected to be given in Modified Julian Days and should be given the unit 'mjd'.
parameter_ucd char120 The Unified Content Descriptor (UCD) of the parameter. These should follow the latest IVOA recommendations for UCDs.
parameter_is_index char1 A suggestion to the underlying database or ingest software that an index should be created on this field.
parameter_minval char80 The minimum value of the column within the table. Nulls are not considered. This is a string even if the underlying column is not.
parameter_maxval char80 The maximum value of the column within the table. Nulls are not considered. This is a string even if the underlying column is not.
parameter_default int4 If non-zero, gives the ordering for the display of the parameters. Parameters with this set to zero are displayed in an undefined order after the parameters for which there are non-zero values.
 
ZZEXT: contains the discipline-specific information.
table_name char20 The name of the table for which the extension information is being given.
parameter_name char24 The name of the 'virtual column' which is being added to ZZGEN for this table. See the section below for details about the signficance of certain values.
parameter_value char80 The value to be given to the 'virtual column'. Note that this is a string even for numeric values. By convention, if the value begins with the character '@', it refers to a column in the table. E.g., if parameter_name='default_search_radius' and parameter_value='@error', this is interpreted as saying that the default search radius is whatever the value is stored in the 'error' column of each row in the table.
 
ZZLINK: contains links between tables
table_name char20 The table being linked from.
link_table_name char20 The table being linked to.
link_priority int2 When multiple links are being displayed, this specifies the order in which they should be displayed.
link_symbol char255 A suggested anchor string to use for displaying the link. Normally this should be either a short string, or an <IMG> link to a small icon.
link_criterion char255 The criterion through which the link is defined, often the SQL in the where clause describing the link. E.g., in an link from the ROSMASTER table to the WGACAT table this might be the string "ror=heasarc_rosmaster.ror" which indicates that a given row in the ROSMASTER table should link to all of the rows in the WGACAT table which have the same ror field. A special syntax is also available for linking using using a cone search. E.g., if we wish to link the ROSMASTER table to the all RASSFSC objects within 1 degree (60') of the center of the field of view the criterion may be written as: "-cone:heasarc_rosmaster.ra,heasarc_rosmaster.dec,60". In both these cases the table_name would be 'heasarc_rosmaster', but the link tables would be 'heasarc_wgacat' and 'heasarc_rassfsc' respectively.
link_description char255 A short description of the link.
 
ZZWORDS: contains pertinent keywords describing tables
table_name char40 The name of the table to which the keywords pertain. The table name is wider here to include VizieR tables.
words char3900 Space-separated list of keywords for the specified table in rough order of relevance.
 
ZZDPSETS: contains the following information: (1) the existence of an entry for a given table_name indicates that that table has data products that are available for access, (2) definitions of generic sets (or categories) of data products, and (3) how to construct the data products tags for a given set.
table_name char20 The table with which a data set is assoicated. The same data product may be assoicated with multiple tables but a separate entry is required for each table.
set_name char25 A short name for this data product set. It will be used as a label when the user is selecting data products.
tag_format char500 A comma-separated list describing how to construct the tags for for the set. Anything inside "@{" and "}" is considered to be a column name in that table. For example, "rosat.hri.@{seq_id}.events" means each tag or set of tags is constructed starting with "rosat.hri.", then the value in the 'seq_id' column for that row, followed by ".events". More than one tag can be formed from the tag_format if the tags are separated by commas. For example,
tag_format = rosat.@{instrument}.@{seq_id}.lc,rosat.ao*.cover
set_description char80 A longer, user-understandable description for this data product set.
 
ZZDP: contains the generic information to describe data products that are available for access.
dp_tag char45 The data product tag, typically an abstract identifier for the file, directory, or remote URL. The HEASARC convention for the dp_tag values associated with a given observation and data product is "mission.instrument.observation_id.type.unique_id". However, there are many ancilliary data products like images, cover pages, abstracts, etc. which apply to an entire mission or an instrument of such a mission rather than a specific observation, and there cannot be a strict naming convention for those types of data products. However, the guideline is that those tags should start with "mission.instrument.*" or just "mission.*" as appropriate. No HEASARC naming convention for multi-mission data products has yet been defined. The dp_tag value should always be all lowercase letters.
dp_type char35 A short description of the kind of data product. Typical data product types are Image, Plot, Lightcurve, Events, Spectrum, and Telemetry, but many others are in usage. The dp_type value should be mixed case.
dp_format char25 The uncompressed format of the data product. (In general, most data products are compressed to save storage space.) The dp_format value should be all uppercase. Typically, the HEASARC uses one of the following:
FITS A FITS file (usually compressed).
GIF, JPEG, PNG, PS, HTML Quick-look data.
HTTP A link to a remote Web page.
TAR A tar (Unix archive) file.
DIRECTORY A directory in a hierarchy. This implicitly refers to the contents of this directory.
ASCII Human-readable (often "quick look") text data.
BINARY Program data not in FITS format.
dp_level int2 Currently unused. This field is intended for the heretofore rare occasions when it is desirable to archive old versions of data products in addition to the current version. The dp_level is kind of a reverse version number in which 0 is the most recent version, 1 would be the previous (older) version, 2 would be older still, etc. While this may appear clumsy at first, it makes it vastly easier to query for a list of the latest data products versions.
dp_url char80 The URL that points to the data. This typically uses "shortcuts" described below in the discussion of ZZEXT parameters.

 

Special Values in ZZEXT

Entries in ZZEXT are often called "virtual parameters," since they are used to extend the table information stored in ZZGEN with table-specific metadata. The existence and values of these virtual parameters in ZZEXT control how tables are treated by the HEASARC Browse system.

ZZEXT and ZZDP URL Shortcuts

Entries in the ZZEXT table with table_name='zzdp' are used as shortcuts in building the URLs stored in the dp_url field.

Although the dp_url field in ZZDP is currently limited to 80 characters, URLs can be much longer, indeed indeterminately longer. In order to support arbitrarily long URLs, to conserve storage space, and to increase the speed of data products queries, it was decided to use variables which would be looked up in ZZEXT and repeatedly expanded to construct the full URL. Such shortcuts use a syntax like "${shortcut}". For a given "${shortcut}" the ZZEXT table is queried for the parameter_value matching the table_name "zzdp" and the parameter_name "shortcut". This parameter value then replaces "${shortcut}" in the dp_url entry.

E.g., suppose ZZEXT has the following entry

(table_name, parameter_name, parameter_value) =
    'zzdp', 'missionbase', 'ftp://heasarc.gsfc.nasa.gov/mission/data/'

In a row in ZZDP with

dp_url='${missionbase}/obs/k95432.fits.gz'

the ${missionbase} shortcut is recognized as a shortcut and replaced by the value in the ZZEXT table. I.e., the full URL is:

      ftp://heasarc.gsfc.nasa.gov/mission/data/obs/k95432.fits.gz

Since shortcuts may be defined in terms of other shortcuts, such expansions are done until there are no more shortcuts left to expand in the URL. Circular shortcut references are detected and disallowed.

For example, "${heasarc}" might be "http://heasarc.gsfc.nasa.gov", "${rosat}" might be "${heasarc}/FTP/rosat/data", and "${rosat_pspc} might be "${rosat}/pspc/processed_data".

Shortcuts also have the added advantage that, if the relative location of some number of data products had to be changed, it is a lot easier to change a single entry in ZZEXT than to change thousands of rows in ZZDP.

Note that these shortcuts only apply to the ZZEXT entries for the ZZDP table.

The Data Products Layer

The HEASARC's Data Products Layer implements linking a given row in a database table to the data products for associated with that row. It utilizes the ZZDP and ZZDPSETS metabase tables.

Example of How the Browse Web Interface Uses the Data Products Layer

Suppose the ROSAT catalog, HEASARC_ROSMASTER, contained the following five rows:

seq_idinstrumentradecname
RF150003PSPC 225.341508 66.405361H1504+65
RF150007PSPC 216.643574 1.512456PG1426+015
RF150015PSPC 219.479943 64.504469HD129333
RH100192HRI 325.652292 38.089434XRT/HRI THERM CYG
RH100193HRI 85.021314-69.764432DRACO CLOUD

When a user goes into Browse and does a search which displays the above rows in HEASARC_ROSMASTER, Browse displays the above with checkboxes to the left of each of the sequence IDs. Suppose the user checks the second and fourth sequence IDs in the list. He/she then chooses to preview or download the data products associated with those observations. For a given data products set (or "category" in Browse Web Interface terminology), the software looks up the tag_format in ZZDPSETS for that set and the table HEASARC_ROSMASTER. Suppose the tag_format field for some set says:

rosat.@{instrument}.@{seq_id}.*

The software then fills in the information from the HEASARC_ROSMASTER table into the tag format. So, for the second row, seq_id is RF15007 and instrument is PSPC. The resulting data products tag that is constructed from this information (and after converting to all lowercase) is:

rosat.pspc.rf15007.*

Similarly, for the fourth row, seq_id is RH100192 and instrument is HRI, so the constructed data products tag becomes:

rosat.hri.rh100192.*

Browse then queries the HEASARC Data Products Layer (specifically the ZZDP table) to get the data product information for that tag. The asterisk ("*") is interpreted as a wildcard, so the result is a list of tags. The ZZDP table returns the matching tag(s), types, formats, and URLs. For example, say the table ZZDP looks like the following:

dp_tagdp_typedp_formatdp_url
rosat.pspc.rh100192.aspect.1ASPECT FITS http://heasarc.gsfc...
rosat.pspc.rh100192.aspect.2ASPECT FITS http://heasarc.gsfc...
rosat.pspc.rh100192.events EVENTS FITS http://heasarc.gsfc...
rosat.pspc.rh100192.lc.1 LIGHTCURVEFITS http://heasarc.gsfc...
rosat.pspc.rh100192.plot.1 PLOT FITS http://heasarc.gsfc...
rosat.pspc.rh100192.plot.2 PLOT GIF http://heasarc.gsfc...
rosat.pspc.rh100192.image.1IMAGE JPEG http://heasarc.gsfc...
rosat.pspc.rh100192.image.2IMAGE GIF http://heasarc.gsfc...

Browse then formats the above URLs into HTML links to each data product. It also uses the above information to package multiple data products into a Unix tar file for the user to download together as a convenience.

User Tables

Browse dynamically queries the metadata tables to determine available tables and the columns they contain. The following guidelines are commonly adopted in building tables. Tables are normally added to the HEASARC by creating a TDAT file, an ASCII representation of the table and then using the HDBingest command to bulk copy the table into the database.

The primary right ascension and declination columns (as indicated by the ZZEXT fields) should use J2000 coordinates. While other coordinate systems may be used within Browse for cone-searches, positional cross-correlations are not feasible when the base coordinate systems are different. If a table is supplied with coordinates in a different system, then new columns should be added to the table in J2000 coordinates and these new columns should be made the primary positional fields. Conventionally, the primary position fields have the names 'ra' and 'dec'.

Times and dates should be stored in Modified Julian Day (MJD) numbers. Dates may be specified using integer values, while real numbers should be used for finer grained times. Double precision values allow time resolution of a few microseconds which is usually enough for catalog data.

A class field should normally be used only for information on object classes using the HEASARC standard set of source classes.

The table priority should be used to highlight the key tables for a given mission which may have priorities of 2 or 3. Tables made redundant by newer versions should be given priorities of 8 or 9. Typical object tables usually are given a priority of 5.

The default search radius should normally reflect either the size of the observation or the positional uncertainty of the source position.


Documentation prepared by the HEASARC Database Group
HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Monday, 09-Oct-2006 20:01:16 EDT

The Astrophysics Science Division (ASD) at NASA's Goddard Space Flight Center (GSFC) seeks a creative, innovative individual with strong teamwork and leadership skills to serve as Director of the High Energy Astrophysics Science Archive Research Center (HEASARC). This will be a permanent civil servant position. + Learn more.