Requirements to archive data at the HEASARC

This document lists general guidelines to archive data at the HEASARC either from missions or experiment PIs or high level products derived from particular study. These guidelines are to ensure and maintain the capability of the multi-mission approach of the HEASARC archive as described in the HEASARC charter. Project-specific needs have to be agreed on individual cases.

The general guidelines relative to data archive are described as different elements that are applicable, all or in part, either to missions or products:

Requirement from Mission

The HEASARC's general policy is that archival data to be effective must include in addition to the data, also documentation, software and calibration data. The lack of any of these components prevents the full exploitation of the archival data. Every NASA Astrophysics project usually produces a Project Data Management Plan (PDMP) that describes how their data will be analyzed and archived. The projects are encouraged to work with the HEASARC in writing their PDMP.
Before the mission/experiment is submitted to a specific Explorers or ROSES call (or similar) the HEASARC may provide a letter of support/acknowledgement that the HEASARC has been identified as the mission data archive if required by call. After the mission is selected, the HEASARC encorages missions to discuss and document a general agreement by the end of Phase A where is listed the HEASAC services necessary to support the mission and the mission responsabilities and deliveries. After the mission is approved, the agreed services are detailed in a technical document typically ready by the Critical Design Review. The technical agreement includes part or all the elements necessary to archive the data.

Requirement for Data Products and Software routine

HEASARC archives results of a scientific reaseach that may include catalogs, specialized data products and/or algorithm that may be used by a larger community.
The HEASARC accepts these products if they are relevant to the scope of the HEASARC archive and add value to existing mission archive.
Catalogs/database and data products may result from research on specific object class (for example AGN , LMXRB) that may include properties derived using data from a single mission or multi-missions and/or other observatories. Data products may include spectra, lightcurves or images or other products. Format for these products should follow the HEASARC guideline for data and databases. If the research leading to these results are from an ADP or ATD proposal the HEASARC may provide a letter of support/acknowledgement that the HEASARC has been identified as the mission data archive if required by the call.
HEASARC also accepts community software such as new spectral models or new algorithms applicable to multi-missions analysis. This contributed software may be included in future HEASARC software distribution and users are encoraged to make use of the standard develop for software . However, if the software is related to a specific mission, the HEASARC encourages the Pis to first contact the mission science center for which the software is relevant.



Archive Common Elements

Data Delivery

The HEASARC may receive the data either at the end of mission operations as the final mission archive site, or during the mission operations phase as the primary mission archive site.

To use the HEASARC as the primary archive during the mission operations phase, the project should contact the HEASARC to establish the details of the data delivery and the archive structure. All data delivered to the HEASARC are made public as soon as they are archived, unless the mission requires a proprietary period, in which case the HEASARC will store the data in a protected format.

Data can be delivered to the HEASARC using different methods. HEASARC has adopted a data transfer protocol, DTS, originally developed by the XMM consortium. This uses the secure FTP protocol and requires the DTS software to be installed on the site that initiates the transfer. This is the preferred method for missions that uses the HEASARC as an active archive during the operation phase.
Alternative deliveries may use the "scp" copy or FTP copy from the remote sites. The HEASARC, however, does not offer local staging area for external users to deliver their data.

Data Format

NASA has mandated the archiving of astrophysical data in FITS format. Following this mandate, the HEASARC has adopted and promoted FITS as the standard format for all levels of data, e.g. from the basic reformatted telemetry to the higher products such as lightcurves, spectra or images. To help projects to provide data in FITS format, HEASARC has developed FITS standards for headers and data structure to describe most of the high energy astrophysical data. These HEASARC FITS conventions include either keywords specific setting, full header and data structure(see also template examples ).

Data delivered to the HEASARC should comply with these existing standards. If these standards are not appropriate for a particular data set, projects are encouraged to interact with the HEASARC personnel to define headers and data structure suitable for their data. Using standard headers and data structure facilitates the usage of existing software to manipulate FITS files and, if suitable, of analysis packages available at the HEASARC. In the past, this has been proven to be effective in reducing the costs associated with data analysis software. Simple FITS wrappers of the raw data are discouraged for both science data and calibration data files. HEASARC will also accept gif, jpeg , png and/or ps files as quick-view or preview versions of the FITS data products.
As a general policy, the HEASARC does not archive the original telemetry, which is in general not in FITS format.

As a general policy, data that are not in FITS format, and for which insufficient software or documentation exist, are not suitable to be archived at the HEASARC. Projects that use the HEASARC as their archive should not assume that the HEASARC will reformat non-FITS data into FITS. The project should make the HEASARC aware of their plans and agree upon their PDMP with the HEASARC well before the mission is active. The HEASARC might reformat non-FITS data into FITS, depending on the available software, documentation and HEASARC resources. However this would be an exception and not the rule.

Archive structure and Filenames

The HEASARC designs the archive structure in directories each dedicated to a specific data type. These directories may contain subdirectories to identify specific observations and/or specific data products. This is to avoid having too many files (> 2000) within a single directory that slow down directory listing.
The files within each data type have a unique file name which describes the file using not more than 35 characters with a clear pattern.

Calibration

Any data archived at the HEASARC need to be accompanied by their relevant calibration data. As for the science data, calibration data should also be delivered in FITS format. High-level calibration data, e.g. response matrices, should conform with the standard HEASARC format. Lower-level calibration data, used for example by the reduction software, can be stored in a mission-specific FITS format. The HEASARC documents the format, the usage of the delivered calibration data and uses a standand filename convention (see an example for the Swift XRT Calibration file .

At the HEASARC, calibration data are stored in the HEASARC CALibration DataBase ( CALDB). CALDB may be access with existing tools to query the data and return the approriate data file. This method therefore allows to write multi-missions science algorithm separately from the calibration data. The FITS files for the calibration database must contain appropriate keywords to access the data via the calibration database tools. The calibration data delivered in CALDB may be accompanied by a document describing how the calibration data were obtained, their validity, a general assessement on their goodness, and highlight differences with previuos deliveries of the same calibration file.

Software

The HEASARC provides a suite of programs suitable for manipulating FITS files and also multi-mission software to analyze high-level products that are in the HEASARC standard format. These programs are part of the HEAsoft software package. HEAsoft also includes many mission specific tasks to deal with the specifics of the experiment calibration or the screening of the archival data. The HEASARC requires the archiving of any mission specific software or scripts that may be needed to reduce the data in the archive. If several levels of FITS data for a specific mission are archived at the HEASARC, scripts, programs and/or recipes used to screen and derive higher data levels should be delivered along with the data.

The HEASARC encourages missions to use the HEAsoft infrastructure and to add their specific packages to HEAsoft. To secure a long-lasting lifetime for the software, HEASARC encourages and promotes: software portability, to ensure software operability on many (of the popular) operating systems; modular software, with each program dedicated to a specific task rather than trying to do 'everything' in one program; clean interfaces, e.g. ones not dependent on commercial database systems or databases in general. The HEASARC does not support software built on commercial packages (e.g. IDL). To assist developers to meet these standards, the HEASARC distributes libraries to read and write FITS files, and to implement parameter file interfaces (e.g. FITSIO, XPI).

Just as for data, HEASARC can receive, and ingest in the HEAsoft package, the mission specific software, either at the end of mission operations as part of their final archive, or during the mission operations phase. During the mission operation phase, the mission specific software can be included as part of the HEAsoft software distribution. However, the project will retain the responsibility for the software maintenance and update. The details of the software delivery during operations and the turn-over of the responsibility for the software maintenance should be agreed to by the HEASARC personnel.

The HEASARC has standards to build software routines and provides tutorials on how to build an 'Ftool' as well as guidelines for interfacing with Python.

Documentation

The HEASARC requires the delivery of every level of useful documentation that is relevant to the usage of the archival data. The documentation should include satellite and instrument descriptions, data format descriptions, software documentation and calibration documents. Any important events that occurred during the mission lifetime that have relevance to the archival data should also be documented. When possible, the documentation should be provided in electronic form. The HEASARC may provide standard webpages dedicated to the missions to post vital mission information. The web pages may be placed on-line to support the mission during the mission operation or after the mission is completed. While infomation may be provided as HTML format, documents as user guide on data analysis or general calibration assessment are better provided as static documents (i.e. as docx or pdf).

Database

The final archive of an experiment or a science research project also includes catalogs. These catalogs may be contained in database tables that have various types of information, e.g. a source catalog as final product of an experiment, or timelines and observing logs. These catalogs are ingested into the HEASARC database system.

Database tables are also used, in the HEASARC database system, to access and retrieve the data from the archive. For this type of usage, the table must contain a field that uniquely identifies a dataset or a file located in the archive. The HEASARC standard for database tables is the TDAT format, a plain ASCII file, where the various fields are pipe-delimited, accompanied by an ASCII header that describes the data type of the fields and other table characteristics. Other formats are also acceptable as for FITS as long as that the data are fully described. The table has to be self-contained with one field selected as the unique key of the table. Here an example of a TDAT table that contains six fields: source name, ra, dec, time of the observation, flux, observation id, unique key.

The header is:

field[source_name] = char24   // Source Name
field[ra] = float8:.4f_degree  // Right Ascension
field[dec] = float8:.4f_degree  // Declination
field[observation_date] = int4_mjd  // Observation Date
field[flux] = float8:.8.3f_microJy  // average flux 
field[sequence_number] = int4:9d  // Observation ID
field[unique_key] = int4:9d  // Observation ID

The data fields are values separated by a pipe :


field1 | field2 | field3 |....

where each field is a unique value: field1 is the source name, field2 is the right ascension and
field3 is the declination.

If the HEASARC is the main archive during the mission operations, the project and the HEASARC should agree to a schedule for updating all mission specific tables that will be available through the HEASARC on-line system interfaces Browse ans Xamin.


HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Saturday, 26-Jun-2021 11:45:41 EDT