Requirements and Guideline to archive data at the HEASARC
The HEASARC's general policy is that, for archival data to be effective, it must include in addition to the
data, also documentation, software, and calibration data. The lack of any of these components prevents
the full exploitation of the archival data.
Every NASA Astrophysics project usually produces a Project Data Management Plan (PDMP) that
describes how their data will be analyzed and archived.
At the proposal phase the project has to contact the HEASARC soon after the specific
Explorers or ROSES call opportunity is released to discuss their archive plan, to be included
in the PDMP, and the associated cost.
HEASARC requires that requests for the archive and cost estimate to arrive three weeks
before the estimate is needed and no later than six weeks before the proposal deadline.
The request has to be submitted by using the
Archiving Data to the HEASARC form.
Letter of support/acknowledgement that the HEASARC has been identified as the archive site
for the project to include with the proposal submission will be provided by the HEASARC
no later than two weeks prior the proposal deadline and only if the request arrives within six weeks
before the proposal deadline. The letter may include other elements as requested by the
relevant opportunity.
After the mission is selected, the HEASARC requires by the end of Phase-B
a general concurrence for the mission archive needs, captured in a document where are listed the
HEASARC services or special
services necessary to support the mission, the mission deliverables and the delivery timeline.
After the mission is approved, the agreed services are detailed
in a technical document to be ready by the Critical Design Review or the gateway identify before
implementation.
The technical agreement includes part or all the elements
necessary to archive the data. This is a mandatory step to archive data at the HEASARC
and/or use the HEASARC as archive during mission operation.
HEASARC archives results of scientific research that may include catalogs, specialized
data products and/or algorithm that may be used by a larger community.
The HEASARC accepts these products if they are relevant to the scope of the HEASARC archive
and add value to existing mission archive.
Catalogs/database and data products may result from research on specific
object class (for example AGN, LMXRB) that may include properties derived using data
from a single mission or multi-missions and/or other observatories.
Data products may include spectra, light curves or images or other products.
Format for these products should follow the HEASARC guideline for
data and databases.
HEASARC also accepts community software such as new spectral models or new algorithms applicable
to multi-missions analysis. This contributed software may be included in future HEASARC software
distribution and users are encouraged to make use of the standard develop for
software.
However, if the software is related to a specific mission, the HEASARC encourages
the PIs to first contact the mission science center for which the software is relevant.
If the research leading to these new data results or new software algorithms are from an
ADP or ATD proposal, proposers must contact the HEASARC no later than four weeks before
the proposal deadline using the Archiving data to the HEASARC form
to discuss their archive planning, cost estimate and request a letter of support/acknowledgement
that the HEASARC has been identified as archive site. After the proposal has been accepted, the PIs must contact
the HEASARC within three months to establish details on the data and the timeline of delivery.
The guidelines to archive data at the HEASARC include common elements
that are applicable, all or in part, either to data from missions or experiment PIs or
high-level products derived from particular study.
These guidelines are to ensure and maintain the capability of the multi-mission approach
of the HEASARC archive as described in the HEASARC charter. Project-specific needs
have to be agreed and documented on individual cases. The common elements include:
Data Delivery
The HEASARC may receive the data either at the end of mission operations
as the final mission archive site, or during the mission operations phase
as the primary mission archive site.
To use the HEASARC as the primary archive during the mission operations phase,
the project should agree with the HEASARC to establish the details of the data
delivery and the archive structure. All data delivered to the HEASARC are made
public as soon as they are archived, unless the mission requires a proprietary
period, in which case the HEASARC will store the data in a protected format.
Data can be delivered to the HEASARC using different methods.
HEASARC has adopted a data transfer protocol, DTS,
originally developed by the XMM consortium. This uses the secure FTP protocol and requires
the DTS software to be installed on the site that initiates the transfer.
This is the preferred method for missions that uses the HEASARC as an active archive
during the operation phase since the data automatically are placed in the public domain (within 3-4 hours).
Alternative deliveries may use the "scp" copy or FTP copy from the remote
sites. The HEASARC, however, does not offer local staging area for external users
to deliver their data.
Data are placed in the public domain as soon as they are delivered to HEASARC assuming that data format,
directory structure and filenames follow the agreements.
For missions that use the HEASARC as main archive during operations, the delivery method must be tested
ahead. For mission or science results provided as a single delivery, data will be on-line within two weeks.
Data Format
NASA has mandated the archiving of astrophysical data in FITS format.
Following this mandate, the HEASARC has adopted and promoted FITS as the standard
format for all levels of data, e.g. from the basic reformatted telemetry
to the higher products such as lightcurves, spectra or images.
To help projects to provide data in FITS format, HEASARC has developed FITS
standards for headers and data structure to describe most of the high energy
astrophysical data. These
HEASARC FITS conventions include either keywords specific setting, full
header and data structure (see also
template examples).
Data delivered to the HEASARC should comply with these existing standards.Details on
keywords setting or file structure depend on the specific mission telemetry and/or to how the
data are divided in files. If these standards are not appropriate for a particular data set,
HEASARC personnel help to define headers and data structure suitable for their data. The definition
of the baseline data format must start as soon as sufficient details on the telemetry are known, such to
have all details completed at the start of the implementaion phase.
Using standard headers and data structure facilitates the usage of
existing software to manipulate FITS files and, if suitable, of analysis
packages available at the HEASARC. In the past, this has been proven
to be effective in reducing the costs associated with data analysis software.
Simple FITS wrappers of the raw data are discouraged for both science data and
calibration data files. HEASARC will also accept gif, jpeg, png and/or ps files
as quick-view or preview versions of the FITS data products.
As a general policy, the HEASARC does not archive the original telemetry, which
is in general not in FITS format.
As a general policy, data that are not in FITS format, and for which insufficient
software or documentation exist, are not suitable to be archived at the HEASARC.
Projects that use the HEASARC as their archive should not assume that the HEASARC
will reformat non-FITS data into FITS. The project should make
the HEASARC aware of their plans and agree upon their PDMP with the HEASARC
at the mission proposal. The HEASARC might reformat non-FITS data
into FITS, depending on the available software, documentation and
HEASARC resources. However, this would be an exception and not the rule.
HEASARC provides tools, fverify and ftverify and fchecksum, to verify that the files are correctly
written in FITS and the integrity of the file.
Data Levels and Data size
Mission data may include different levels of processing. The HEASARC archives all science
data levels, additional housekeeping, and orbital information in FITS format
but also requires archiving software and calibration information
to assure accessibility and usability of all data levels.
Data levels are defined: 1) Level 1 data include the telemetry translated into FITS and
additional calibrated information with not data selection applied; 2) Level 2 data are derived
from Level 1 via screening for time intervals or other parameters to retain data where the instrument
operates nominally; 3) Level 3 data are high level products derived from Level 2.
There is not specific limitation on the size of the archive for a specific dataset.
However, the expected data size for the entire dataset must be evaluated during mission
implementation (phase C). The HEASARC policy is to have the disk space in place by the time of
the final end to end test prior to launch.
Archive Structure and Filenames
The HEASARC designs the archive structure in directories each dedicated to
a specific data type. These directories may contain subdirectories
to identify specific observations and/or specific data products.
This is to avoid having too many files (> 2000) within a single directory
that slow down directory listing.
Data delivery from mission has to follow the agreed data archive structure.
An example of data structure is :
data/
obs/ trend/
| |
__________________________ ____________________
| | | | | |
000101/ 000201/ 000301/... type1/ type2/ type3/...
| |
unfiltered/ products/ filetype11 filetype12 …
The files within each data type must have a unique filename using not more
than 35 characters with a clear pattern that describes the file.
The filename may be constructed by using unique identifiers for the
differ components that make unique the file.
These may include mission name, instrument, data mode, observation tag and others.
An example of a filename for the HaloSat mission is:
hsYYYYZZ_sYY.ext.gz
where YYYYZZ is the observation number, YY is the detector and ext the extension set to evt
to identify event file or pi to identify spectral file.
Filename or directory name must be constructed using english alphabet letters or numerical [0-9] characters.
Symbols are not allowed, with exception of underscore (both for filename and directory name)
and dot (".") in filename to indicate a filename extension.
The filename or directory name must not contain mix cases, e.g. alphabethical letters in
filename or directory name are either all lower case or all upper case, and case must
be consistent in filename and directory name, e.g. if the directory name is lower case
the filename must be lower case as well.
Calibration
Any data archived at the HEASARC need to be accompanied by their relevant
calibration data. As for the science data, calibration data should also
be delivered in FITS format.
High-level calibration data, e.g., response matrices, should
conform with the standard HEASARC format.
Lower-level calibration data, used for example by the reduction
software, can be stored in a mission-specific FITS format.
The HEASARC documents the format, the usage of the delivered calibration data
and uses a standard filename convention
(see an example for the
Swift XRT Calibration file .
At the HEASARC, calibration data are stored in the HEASARC CALibration DataBase ( CALDB).
CALDB may be access with existing tools to query the data and return
the appropriate data file. This method therefore allows to write multi-missions
science algorithm separately from the calibration data.
The FITS files for the calibration database must contain appropriate
keywords to access the data via the calibration database tools.
The calibration data delivered in CALDB may be accompanied by a document describing
how the calibration data were obtained, their validity, a general assessment
on their goodness, and highlight differences with previous deliveries of the
same calibration file.
Software
The HEASARC provides a suite of programs suitable for manipulating
FITS files and also multi-mission software to analyze high-level products
that are in the HEASARC standard format. These programs are part of the HEAsoft
software package. HEAsoft also includes many missions' specific tasks
to deal with the specifics of the experiment calibration or the screening
of the archival data.
The HEASARC requires the archiving of any mission specific software or scripts
that may be needed to reduce the data in the archive.
If several levels of FITS data for a specific mission are archived at the
HEASARC, scripts, programs and/or recipes used to screen and derive higher data
levels should be delivered along with the data.
The HEASARC encourages missions to use the HEAsoft infrastructure and to add
their specific packages to HEAsoft. To secure a long-lasting lifetime for the
software, HEASARC encourages and promotes: software portability, to ensure
software operability on many (of the popular) operating systems; modular
software, with each program dedicated to a specific task rather than trying to
do 'everything' in one program; clean interfaces, e.g., ones not dependent on
commercial database systems or databases in general. The HEASARC currently supports
software written in C, C++, Fortran, Fortran90, Perl and soon to include Python
but provides Python interface for existing software.
It does not support software built on commercial packages (e.g., IDL).
To assist developers to meet these standards, the HEASARC distributes libraries
to read and write FITS files, and to implement parameter file interfaces
(e.g., FITSIO, XPI).
HEASARC currently supports the operating systems widely used by the community as for
different flavors of Linux and macOS.
Just as for data, HEASARC can receive, and ingest in the HEAsoft package, the mission
specific software, either at the end of mission operations as part of their
final archive, or during the mission operations phase.
During the mission operation phase, the mission specific software can be
included as part of the HEAsoft software distribution. However, the project
will retain the responsibility for the software maintenance and update.
The details of the software delivery during operations and the turn-over
of the responsibility for the software maintenance should be agreed to by
the HEASARC personnel.
The HEASARC has standards to build software routines and provides
tutorials on how to build an
'Ftool' as well as guidelines for interfacing with Python.
Documentation
The HEASARC requires the delivery of every level of useful documentation
that is relevant to the usage of the archival data. The documentation
should include satellite and instrument descriptions, data format
descriptions, software documentation and calibration documents. Any
important events that occurred during the mission lifetime that have
relevance to the archival data should also be documented. When possible,
the documentation should be provided in electronic form.
The HEASARC may provide standard webpages dedicated to the missions to post vital
mission information. The web pages may be placed on-line to support the mission during
the mission operation or after the mission is completed.
While information may be provided as HTML format, documents as
user guide on data analysis or general calibration assessment
are better provided as static documents (i.e., as docx or pdf).
Database
The final archive of an experiment or a science research project also includes catalogs.
These catalogs may be contained in database tables that have various types
of information, e.g., a source catalog as final product of an experiment, or
timelines and observing logs. These catalogs are ingested into the
HEASARC database system and made available in the Virtual Observatory services.
Database tables are also used, in the HEASARC database system, to access and
retrieve the data from the archive. For this type of usage, the table must
contain a field that uniquely identifies a dataset, or a file located in the
archive.
The HEASARC standard for database tables is the TDAT format,
a plain ASCII file, where the various fields are pipe-delimited, accompanied by an
ASCII header that describes the data type of the fields and other table characteristics.
Other formats are also acceptable as for FITS as long as that the data are fully described.
The table has to be self-contained with one field selected as the unique key of the table.
Here an example of a TDAT table that contains six fields: source name, ra, dec, time of the observation,
flux, observation id, unique key.
The header is:
field[source_name] = char24 // Source Name
field[ra] = float8:.4f_degree // Right Ascension
field[dec] = float8:.4f_degree // Declination
field[observation_date] = int4_mjd // Observation Date
field[flux] = float8:.8.3f_microJy // average flux
field[sequence_number] = int4:9d // Observation ID
field[unique_key] = int4:9d // Observation ID
The data fields are values separated by a pipe :
field1 | field2 | field3 |....
where each field is a unique value: field1 is the source name, field2 is the right ascension and
field3 is the declination.
If the HEASARC is the main archive during the mission operations, the project
and the HEASARC should agree to a schedule for updating all mission specific
tables that will be available through the HEASARC on-line system interfaces Browse and Xamin.
|