NOTICE:

This Legacy journal article was published in Volume 1, May 1992, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.

An Introduction to the HEASARC

N. E. White

HEASARC


1. Overview

The High Energy Astrophysics Science Archive Research Center, HEASARC, was created by NASA in 1990 as a site for X-ray and Gamma-ray archival research. The motivation for the HEASARC is to provide a multi-mission archive for the high energy data from ROSAT, GRO, BBXRT, Astro-D, and XTE missions, that coexists with archival data from past missions such as Einstein, HEAO 1, HEAO 3, OSO 8, SAS 2 and 3, Uhuru, and Vela5B. Data from non-US missions, e.g., EXOSAT and Ginga, will also be made available as international agreements allow. The total data volume will be of the order of 1,000 gigabytes by 1995 and the aim is to make these data available on-line for immediate access as well as by bulk distribution.

The HEASARC is located at the Goddard Space Flight Center and is a collaboration between Goddard's Laboratory for High Energy Astrophysics, LHEA, and the NSSDC. The LHEA is responsible for the science content of the archive, the NSSDC is responsible for the data archive management. The HEASARC data holding will consist of data from past, concurrent and future missions. The NSSDC contribution is outlined in the next article by Jim Green, the NSSDC director. This article will concentrate on the LHEA HEASARC activities (which currently constitute the bulk of the new funding).

2. Terms of Reference

The terms of reference of the HEASARC are to:

  • maintain and disseminate data from previous and concurrent high energy astrophysics missions,
  • provide software and data analysis support for these datasets,
  • maintain the necessary scientific and technical expertise for the processing and interpretation of the data holding,
  • develop and maintain tools for combining data from several missions and for multi-dataset analysis,
  • develop and maintain catalogs of observations and ancillary information for data holdings relevant to that wavelength band,
  • coordinate data, software and media standards with other parts of NASA's Astrophysics Data System, including other
    multi-mission centers.

3. Organization

The LHEA part of the HEASARC is under the Office for Guest Investigator Programs, OGIP, within the LHEA. The OGIP also administers the Compton Gamma-Ray Observatory Science Support Center (CGRO SSC), and the Guest Observer Facilities for ROSAT, Astro-D and XTE (Figure 1). The objective of the OGIP is to provide uniform guest observer support for these missions. The HEASARC forms a central pillar within the organization, in that it provides the connecting thread between the various science support facilities. At the end of each of the various projects the HEASARC will be the final resting place for the archive and the associated expertise in its analysis.

In setting up the HEASARC there was much concern that it does take on project responsibilities, and the respective roles of the projects and the HEASARC have been clearly separated.

The project Data Centers are responsible for:

  • archive creation
  • delivering all non-proprietary data to HEASARC in FITS format
  • providing science expertise to support archival research while project funding is maintained

The HEASARC provides:

  • multi-mission high energy astrophysics archival access
  • FITS format standards
  • FITS software: i/o and table manipulation tools

At the end of project funding, the HEASARC takes over the science expertise, probably by transferring a few data center staff to the HEASARC.

For existing data sets (e.g. HEAO 1 and 2) the HEASARC will work to make the data available in a multi-mission framework. In cases where the original project is still well-funded, the above rules will apply. For those projects where there is no longer any project support, the HEASARC will directly apply resources to make the data available.

Figure 1: LHEA Organization

4. Requirements

Before discussing how the HEASARC will organize its data holding, it is worthwhile to consider the motivations for archival research. There are four distinct categories:

(i) historical studies,
(ii) theoretical follow-up,
(iii) surveys, and
(iv) assurance.

Historical studies are the most obvious archival activity. An observer discovers a new phenomenon, or is studying one previously known, and needs to check earlier data to, e.g., independently confirm its existence and/or track the long-term variability. These studies can be perhaps the most difficult, since they will involve combining and/or comparing data sets from different telescopes. The major issues here are gaining easy access to the data, and cross-instrument calibration. A related activity is to use archival data as part of a justification to propose to use a new telescope.

Theoretical follow-up is the need to test new models against existing data. In many cases, the interpretation of a phenomenon can take many years, with theoreticians repeatedly building models and testing them against the data. Theoreticians currently have to work closely with the original investigator to test their models, or make eyeball fits to published data. Here the major issue is that the theoretician does not have a detailed understanding of the instrument characteristics or analysis techniques. He or she simply wants a data product and the associated calibration to test against the model in a clearly-described, easy-to-read data format.

Surveys provide the opportunity to combine many observations of a single class of object (e.g., AGN) made by many different investigators using the same telescope and instrument . The current principal investigator approach to allocating observation time means that large uniform samples of particular object types are rarely available to a single observer. Only after the data enter the public domain can a survey of the properties of a particular class be made. The main issue here is ensuring that a user can access a sample of all objects of a particular class.

Assurance is the ability to guarantee both that an observation is analyzed (and, if appropriate, published) and that unjustified repeat observations are not made. Observation time on satellites is very limited (and expensive). Making the data available after some fixed time ensures that all interested parties in the field get access to that data and that it is eventually looked at. The issue here is that in many cases an observation may never be published because the result is not sufficiently noteworthy. It is essential to provide a simple overview of the main results of the observation to avoid unnecessary repeated analysis of the raw data.

The four motivations described above place the following requirements on the HEASARC:

  • multi-mission analysis
  • hierarchical archive structure
  • quick-look capability to assess the value of the data
  • vendor-independent data formats

5. Data Analysis

(i) The HEASARC Dilemma

For every mission the data flow is identical. First, the raw data undergoes some form of data reduction to produce data products -- usually a photon list, an image, a spectrum, and/or a lightcurve. These are then analyzed to produce some results which are then, hopefully, published. While the sequence of events is much the same for each mission, the dilemma facing the HEASARC is that every mission to date has produced a data set in a different format, with a different set of analysis software. This makes the long-term support and distribution of a multi-mission archive problematic, since every mission is a special case. In addition, combining datasets from different missions is non-trivial.

Mission-specific formats tend to be used throughout the data processing chain. The raw telemetry data in many cases involves preprocessing and packing of the data on the spacecraft so as to maximize the information transmitted to the ground. There may be multiple telemetry and onboard computer modes which can add to the complexity of reducing the data. The data products produced by the data reduction software are more generic, e.g., a lightcurve is a time and a count rate. However, even for data products, missions typically generate their own formats. A notable exception is that images have recently begun to be distributed in FITS format. The results of each mission are also sometimes kept in mission-specific or vendor-dependent formats, e.g., an INGRES DBMS table.

The data access is limited to a data processing system produced by the project. These tend to be monolithic systems that are not optimal for long-term maintenance or general distribution and use by the community.

Specific problems with data processing systems are:

  • they are custom built for each mission, even though the underlying functions are the same
  • there is a failure to modularize and isolate the mission-dependent functions
  • calibrations and methodology are embedded in the code
  • the code is vendor-dependent (e.g., operating system, compiler, DBMS)

The last point is particularly problematic. In the long term it makes maintenance of the data processing system difficult. The code must repeatedly be ported to new hardware and software platforms as technology evolves. With so many different missions in the HEASARC archive this could be a never-ending and expensive task.

In addition, the user community is becoming increasingly demanding. They require access to the original raw data, and also want to reduce it from within a familiar analysis environment, e.g., IRAF, IDL or XANADU.

(ii) The HEASARC solution

The root of the problem is that each mission produces data in different formats. Many of the data reduction and analysis functions are basically the same; the driving factor is decoding the different data telemetry and any mission-specific data product formats. Up to now there has been little, if any re-use of software between missions. The HEASARC solution is to isolate this function by reformatting the data to a single standard structure. This should be self-describing so that the user need only look in the header to be able to read the file. The FITS standard provides such a capability.

FITS is an IAU and NASA standard for distributing data analysis software, and there are FITS readers within all the popular environments, e.g., IDL, IRAF, and MIDAS. The recent adopting of the binary table FITS standard, which allows the byte structure of each column to be defined in the header, has been a real breakthrough. It allows compact table structures to be defined which can mirror the underlying table structures in most data analysis systems such as MIDAS or IRAF STSDAS tables.

The HEASARC will distribute all useful data as FITS binary tables, including the telemetry. While at first sight it may seem a formidable problem to reformat a complex telemetry stream containing science and housekeeping data, it is actually simpler than having to build from scratch a data reduction and analysis system. Reformatting the data forces an isolation of the mission-specific function of decoding the telemetry. The following data reduction tasks will have both mission-specific and mission-independent functions. By reformatting the telemetry, it is simple to recycle the mission-independent functions.

To implement this plan, the HEASARC is taking the following steps: First, the data reduction system for the next high energy astrophysics mission, Astro-D, will be constructed so that it forms the basis for a multi-mission infrastructure. The Astro-D telemetry will be reformatted to FITS and all of the mission-dependent and independent bits will be isolated (see the article on Astro-D by Day, Arnaud and White). Second, the HEASARC has begun to reformat existing telemetry and data products from past missions such as Einstein, HEAO 1, and EXOSAT. The experience learned and FITS file structures defined can be fed into future missions such as XTE.

To enable both the HEASARC and future missions to reformat to FITS, the HEASARC is providing a portable FORTRAN 77 subroutine library to write and read FITS files. This package, called FITSIO, was released earlier this year and has already proven extremely popular (see the following article by Bill Pence). The HEASARC is also defining mission-independent FITS file structures for spectra, lightcurves, and photon lists. These will allow data products to be distributed transparently between different analysis packages. In particular, the HEASARC is working with the ROSAT Data Center to define a set of "rationalized" FITS files for the ROSAT archive. These rationalized files will differ from the current files in that the structure and keywords will have a multi-mission flavor.

The HEASARC will not force the community to use one data analysis environment. Instead it will adopt a policy of ensuring that any HEASARC-produced data reduction tasks are distributed in ANSI standard code, with the input and output only operating on FITS files. In addition all parameter checking and binding will be isolated, so that these packages can be interfaced to the user's favorite analysis environment.

To facilitate this approach a Data Selector is being produced by the HEASARC, in collaboration with the Astro-D project, to allow Boolean selections from FITS tables. This data selector will form the basis of a multi-mission data reduction system, and will be very similar in concept to the MIDAS and STDAS table systems. The major advantage of the HEASARC data selector is that it will directly operate on FITS tables, making the system fully portable. It will be written in strict FORTRAN 77. The software will isolate the parameter input and validation from the kernel that actually does the task. This will allow the selector to run under different analysis environments. The first version will be built to run under both the IRAF, using the FORTRAN interface, and XANADU. It will be a trivial matter for other developers to integrate the selector into their own analysis environments, so long as their environments have an isolated parameter interface.

The remaining mission-dependent part of any data reduction system is the calibration data. The HEASARC is defining standard formats for distributing calibrations. Like the data itself, calibrations can be divided into raw data, such as a detector energy resolution function, or a telescope point spread function and calibration products such as a detector response matrix or an exposure map. The HEASARC will encourage future developers to externally define all calibration information so that it can be accessed by any data reduction or analysis system.

6. Data Distribution

Data distribution can be done via on-line access, and by mass distribution, e.g., via CD-ROM. The HEASARC will provide both methods of access to its data holding. There will be regular distributions of data on CD-ROMs, primarily of data products and catalogs from each mission. The first CD-ROM will contain Einstein SSS spectra and lightcurves and is now close to completion. In addition there will be remote on-line access to the data.

On-line services such as SIMBAD, NED, IUE, EXOSAT, and Einline are well known and work well at delivering the data quickly to the user. The disadvantage to these various services is that each one has a different user interface with which the user must become familiar. NASA's Astrophysics Data System, ADS uses a client-server approach to allow remote queries of databases. The archive sites retain control of the archive contents but will rely on a common user interface provided by the central organization. The HEASARC is currently testing its connection to ADS, and should be a fully-functional node by April 1992. Currently ADS only provides the capability to query single catalogs, and can only be a supplement to the more traditional remote login services. Further information about ADS can be obtained from IPAC by contacting Mary Wittman at mew@ipac.caltech.edu.

In addition to ADS the HEASARC provides an on-line service to allow remote login to the HEASARC data holding and to data analysis software. The emphasis will be on browsing of the data, such that a user can make a quick-look assessment of its worth before exporting it, or part of it, to his or her home site. Rather than invent yet another on-line system, the HEASARC has adopted an existing system, the one developed for the EXOSAT mission by the European Space Agency, ESA. The advantage of this system is that it provides the capability to not only access the data, but also to display and analyze it remotely.

At the heart of the system is the BROWSE program, a command-driven environment that allows a user to search one or more database tables by coordinates, name, object class, or any other valid parameter combination. The user can then display the selected data, or run analysis software on it. This service is available now and is described in a following article by Kathy Rhode.


Next Proceed to the next article Previous Return to the previous article

Contents Select another article




HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Monday, 19-Jun-2006 11:40:53 EDT