The HEASARC Database System

Introduction

Background

The High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center is a multi-mission archive facility supporting the high-energy astrophysics community around the world. HEASARC services include providing, via a variety of user interfaces, access to many different types of data and information including proposals and grants tracking information, astronomical catalogs, observation logs, images, etc.

Previously, most HEASARC information was housed in a home-grown database system, based on that used for EXOSAT. As the HEASARC holdings continued to grow, certain disadvantages of this system began to manifest themselves.

As in any home-grown system, continuity of maintenance became a problem. Adding functionality in terms of enriched metadata to allow more complex or multi-mission data searching was difficult. Further, the older database software had file names and locations directly linked to the metadata, which can cause changes at the lowest levels of the system to have impact throughout.

In addition, other information at the HEASARC and affiliated organizations were kept in separate databases, on different platforms running commercial relational database management system software. There was no well-defined way to communicate between these information repositories. This caused inconvenience both to users who were required to deal with several disparate systems, and to developers who had to create similar functions redundantly for the separate systems. It also caused problems when data and information is transferred between, for example, the HEASARC and the processing facility or the deep archive.

Therefore, the database development efforts at the HEASARC had two main goals: to migrate to a standard commercial database management system, and to facilitate the exchange of information between database systems.

Maintaining and Enhancing Services

As an operational facility, the HEASARC had the obligation to avoid disrupting the level of service it provided to its users. This implied that current user interfaces, or enhanced versions thereof, must continue to be available. The HEASARC currently provides both a command-line interface and a variety of services accessible via the World-Wide Web.

In addition, many HEASARC users have become accustomed to accessing the data files and tables directly via File Transfer Protocol (FTP) and Structured Query Language (SQL) respectively. This type of access must continue to be supported as well.

For maximum flexibility in the support of underlying heterogeneity, both in the types of information within the HEASARC database and in communicating between components of the HEASARC systems and with other organizations, it became clear that what was required was a facility not only for describing data files via entries in tables (catalog level), but also to describe the catalog-level tables themselves in such a way that information could be exchanged about what tables and attributes are available for search. This level in the information hierarchy is what we are calling "meta-information," or the "metabase."

The meta-information design is intended to be as generic as possible. Since the astronomical content in HEASARC's database resides in the catalog-level tables, the meta-information serves the function of describing these tables and the parameters available therein. Therefore, it could theoretically be used to describe information used in any discipline. On a practical level, this at least serves to facilitate the use of data across astrophysics missions which might arrange their information differently.

Historical Requirements

The requirements analysis was undertaken as the first step in the development of a multi-mission database management system for the High Energy Astrophysics Science Archive Research Center (HEASARC). It was used as a vehicle for discussion and as a springboard for the design effort. It was not intended as a formal specifications document. Neither is it continually updated to reflect the evolution of our thinking during the prototyping and implementation phases of the project. It is included here as a useful summary of our original goals.

  1. The HEASARC Database System shall be developed so as not to preclude migration of tables between different relational database management systems (e.g. Ingres, Oracle, Sybase).

  2. The HEASARC Database System shall be accessible via the following user interfaces, at a minimum:

    • a Web interface
    • a command-line interface
    • an e-mail interface
  3. The HEASARC Database System shall have a data dictionary or system metabase including at a minimum for each table or view:

    • definition
    • access privileges
    • creator
    • creation date
    • fields to use for standard coordinate, time, class and name searches
    • units and other conventions used to store the data, such as epoch of the coordinates
  4. The HEASARC Database System shall be queryable using SQL.

  5. The HEASARC Database System shall include an application programmers' interface (API) for FORTRAN and C calls.

  6. The HEASARC Database System shall accommodate the following types of information, at a minimum:

    • proposals and grants tracking information
    • bibliographic information
    • mission and observation status information
    • astronomical catalogs
    • meta-information describing the database contents
    • granule ID, file location, format and type information pointing to science data
    • usage statistics for archive and information products
  7. The HEASARC Database System shall be able to accommodate information concerning multiple types of science data files, to include among others:

    • telemetry
    • multiple product file types
    • auxiliary (e.g. calibration)
    • screened
    • unscreened
    • raw (FITS-converted telemetry)
  8. The HEASARC Database System shall allow the identification and retrieval of data files or granules individually and/or in groups as defined by the mission.

  9. The HEASARC Database System shall be developed on a schedule such that it is tested and ready to accommodate XTE (i.e. first ingest of accepted proposals expected late March/early April 1995), with migration of additional data on a schedule to be determined.

  10. The HEASARC Database System shall include a core set of generic attributes based to the extent possible on the FITS standard, and consistent table structures wherever feasible, in order to facilitate cross-mission data search by attribute.

  11. The HEASARC Database System shall accommodate the information held currently in existing HEASARC DB tables, and shall support data from a technically unbounded number of active and inactive missions, subject to resource limitations.

  12. The HEASARC Database System shall accommodate the frequent updating of database contents associated with the support of active missions.

  13. The HEASARC Database System shall be capable of interfacing as a node of the Astrophysics Data System.

  14. The HEASARC Database System shall be capable of providing secure access for proprietary information.

  15. The HEASARC Database System shall have tools to check the consistency of the database tables and the data archives.

  16. The HEASARC Database System shall have tools to check the validity of joins between database tables.

  17. The HEASARC Database System shall be designed to accommodate the distribution of the data archives among multiple servers in a manner transparent to the user.

Status

The migration of the HEASARC database to a commercial RDBMS (first INGRES, and then later Sybase) was initially completed in 1995, when the HEASARC's World-Wide Web database interface, Browse, was released. The overall architecture and implementation has continuously evolved since then, however.


Documentation prepared by the HEASARC Database Group
HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Tuesday, 13-Jul-2004 14:21:27 EDT