NOTICE:

This Legacy journal article was published in Volume 7, June 1998, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.

The DSRI Spectrum Röntgen Gamma (SRG) Data Processing Software
M. K. Barfoed Danish Space Research Institute

The SRG satellite*
The SRG satellite is one of the major X-ray satellites in this decade. It is being built under the leadership of the Russian Space Research Institute (IKI) in cooperation with institutes from many European countries and the USA. The scheduled launch date is the end of 1999.
No strong centralized computing approach has been attempted for this project, but forums exist for discussion of common approaches, notably the Technical Implementation Committee (TIC) group. In general, institutes process their own data, and make available to IKI programs for processing the IKI share of data from their instruments. The Center for Astrophysics at Cambridge, Massachusetts will setup an overall archive for the data from the satellite they get rights to.
The DSRI SRG involvement
The Danish Space Research Institute (DSRI) partakes in the SRG-satellite by building the two SODART (SOviet DAnish Röntgen Telescope) x-ray telescopes, 4 detectors (two high energy proportional counters and two low energy proportional counters), plus an objective crystal spectrometer called the BRAGG panel for the selection of very distinct wavelengths of energy. We have a certain share of the data coming from our detectors, and will setup a system for both processing and archiving the DSRI share of data, plus - to the extent wanted - other observers share of data from our detectors. We will also set up a system for processing requests for observations.
2.1 DSRI Data Processing
2.1.1 Basic Philosophy
Our software project is to some extent a 'discount' project - at least compared to other software projects such as the XMM and AXAF. This is not necessarily a weakness, but it has had profound influence on our approach to building the processing system. Keywords have been (and are) simplistic in design, using other people's ideas and methods, using existing software to the extent possible and maintaining modest ambitions. We have had only limited success in trying to involve outside resources in programming, due to limited project management resources.
2.1.2 Main Goals
The main-goals of our software system are:
- To process incoming raw satellite data, plus necessary ancillary data, to form output data in a form suitable for further scientific analysis (notice, that we do not intend to do any scientific analysis as part of the standard analysis. If resources allow for it, we will eventually build scientific analysis in as part of the standard processing).
- To be able to do instrument health analysis and quick look analysis at IKI, where the ground-station will be placed, and a more in depth health analysis at DSRI.
- To archive our share of the data.
For it to be 'suitable for scientific analysis' we intend the output data to be in a form where instrument signatures have been removed, and in a format that may be read by existing analysis software (ftools, IRAF/PROS, XANADU).
We may get up to 320 MB of raw data per day (that's the theoretical maximum, defined by the size of the on-board disks of the 4 detectors). This amount is expected to multiply by at least a factor of p through the system, so a maximum load of 1 GB per day is realistic.
We aim for a minimal load on personnel resources for the daily processing of data (2 operators).
2.1.3 Hardware Platform
We have two types of computers in our processing configuration. One type is the PC, used as EGSE machines (Electronic Ground System Equipment), the others are SUN servers, used for the pipeline processing, archiving etc. of data. We anticipate to acquire one large multi-processor SUN server for handling the daily processing load. We also will acquire a smaller SUN server for the quick look analysis, plus a number of SUN servers for daily analysis at our premises by scientists, both those employed here and those visiting.
Data will be written to CD's and to some extent to 8 mm tapes. CD's have been chosen due to the widespread use of CD drives.
2.1.4 Software
Given the main hardware platform, the obvious operating system to choose is Solaris 2, though we will consider updating this if and when SUN announces their next operating system update (which is likely to be some kind of Java-based system). Other software issues can be summarized as follows:

(i) C and FORTRAN have been chosen as programming languages. We considered going directly to C++, but felt that we lacked sufficiently broad expertise, and preferred to stick to the better known languages.

(ii) We use perl and normal bourne shell for scripts.

(iii) We generally do not make GUIs (graphical user-interfaces) though for some applications this may be applied at a later time in the project.

(iv) We will offer HTML-interfaces for observation-requesting, though.
2.1.5 Formats
All (or nearly all) formats used by the processing system are based on the FITS standard. This choice was made because of the self-documenting nature of FITS. With the Binary Extension option, FITS has also become useful for almost any kind of data. We have followed the development in the High Energy FITS community closely, and intend to implement indexing schemes and other significant improvements to the standards as they emerge.
Beside the qualities of the format itself, software for handling FITS data is available in the community - notably in the fitsio-package from HEASARC, which we use when dealing with FITS files.
On a higher level, we have used a format very close to the RDF (Rationalized Data Format) format used by the ROSAT mission for event data. We have two RDF files; the Basic RDF file holding event data, GTI information, and other information with direct relevance to the event data. The other one is the ancillary RDF file holding attitude data, clock-correlation data, detailed hk data and other data, which have been used in constructing the event data. We overall refer to this format as DSRI RDF format, and hope no misunderstandings will emerge from this!
For calibration data we use the HEASARC defined formats, where applicable. We also use a local version of the CALDB for archiving calibration data. Formats for time-lines, aspect-solution, orbit-solution and so on, have been defined in a Binary FITS format, by IKI.
For database tables, used for instance for archiving and log purposes, we have adopted the RDB format (described in UNIX Relational Database Management from Prentice Hall, 1988, by Rod Mains et al.), see below for discussion of RDB.
2.2 Logical Model
The logical model depicts the relationship between different data types and how they are processed to form new data types.
Figure 1: The figure above shows somewhat schematically the logical model at a high level for the main bulk of the processing activities.
2.2.1 Data types
Data types are divided into logical types and format types: there may be many logical data types that all use the same type of format. As an example, uncorrected and corrected event files share the same format type, but belong to different logical types. Data are stored according to their logical data-type (plus some other parameters).
2.3 Processing Model
Processing may be seen as the pipe-lining of different input-data to form different output-data. This processing has been made as automatic as we have dared to, thus minimizing operator load. The central tool for running the automated pipe-line is the DSRI-built tool called the drishell.
Figure 2: The above figure shows on one side the dsrishell, executing the programs that are specified within a number of profiles and interacting with the file system based processing database. On the other hand, one program handles data as specified by the dsrishell, interacts with the file-id database and the version-database, and logs activity to the log area.
2.3.1 The dsrishell
The dsrishell controls the execution of the single program, on the basis of:
1: The purpose of the processing.
2: Unprocessed Available Data.
One may define a number of pipelines, each with a given purpose (one could be to assemble all data pertaining to one observation into a basic RDF file). One such Pipeline consists of a number of Profiles, that again describe a sequence of Programs, so the levels are Pipeline - Profile - Program.
```
Below is a sample profile:
 
#---------------------------------------------------------------
#    DSRI-software
#    Subsystem:         dsri-sys
#    Directory:         cmd
#    Type:              dsri
#    Program-name:  	rff.prof
#    Description:       Profile for handling raw FITS fast data.
#--------------------------------------------------------------
#    HISTORY:
#________________________________________________________
#       970603 ! morten! created.
#---------------------------------------------------------------
#              SCCS-version string
#---------------------------------------------------------------
# sccsversion="@(#): rff.prof 1.1 06/20/97 DSRI"
#---------------------------------------------------------------
{
profile:     rff.prof
mode:     auto
num:      $RFFNUM 
}
# List variables.
# PROCSEQ is an built-in variable, set to the
# processing sequence number of this execution
# of the profile.
{
RFFNUM=		'pdb_filenum.pl -p rff2ev.x -s $PROCSEQ'
}
{
name:		rff2ev.x
type:		inputfile
command:	rff2ev.x -i $INPUTFILE -s $PROCSEQ
} 
{
name:		pdb_delete.pl
type:		file
command:	pdb_delete.pl -i $INPUTFILE -s $PROCSEQ
} 
{
name:		sleep
type:		file
command:	sleep 10 
} 
 
```
This profile depicts two main elements of the system: the ability to define internal variables (here RFFNUM), that may be used to guide the processing (here decide how many available files exist, suited for analysis by the rff2ev.x program), and the listing of programs to run.
The pipeline may be made to loop according to the amount of input data or infinitely. One may also create parallel running pipelines by branching (e.g. one branch handling HK data and another science data). Notice also that more than one dsrishell may be running at any time.The dsrishell logs execution to a processing log. Programs log errors and debugging messages to individual log files.
Processing configuration and parameter files are also kept, as part of the processing log. All this is done on a per session basis.
The dsrishell interacts with the processing database. It reserves available files for processing and releases them when processing is done. The processing database is based on the normal UNIX file system, using a directory structure that reflects the existing data types, and some parameters related to data.
2.3.2 Database tools
Besides the purely file system-based processing database, the other system databases rely on ASCII tables called RDB-tables. These are well-defined, and make use of the Starbase routines written at CFA (Center for Astrophysics) in Cambridge, Massachusetts for manipulating the files. This RDB choice was made because we found it to hard on our sparse manpower resources to acquire a professional database management system, given the particular expertise and maintenance that such a system requires.
2.4 Some particulars
A number of particulars have been elaborated over a period of time. Among these are the following.
2.4.1 File-id
Each file being handled by the system obtains a so-called file-id. This id is appended to the filename, and all files are entered in the file-id table including some information regarding the file (notably type of data that the file holds). The file-id table is an ASCII table in the RDB-format.
2.4.2 Versions
To be able to trace the origins of a given file, both in terms of data sets used as input to generate the file, and programs used for the processing, we have set up a version database. This holds a version description for each file, from which the whole story of the file may be extracted.A sample version description looks like this:
```
EVEG0.279.LEA-
QM.a00000098:ft.00000.sod.0.8.a00000083:auto0052.fhl.a00000075:auto0052.fhl>srgrec.x@1.9@971016>r2rff.x
@2.0@971016>rff2ev.x@1.3@971016

This rather cryptic looking line may - with the help of a small tool - be deciphered into the following:

The file 'EVEG0.279.LEA-QM.a00000098' was created from inputfile(s) 
   'ft.00000.sod.0.8.a00000083'
by the program 'rff2ev.x@1.3@971016'
The file 'ft.00000.sod.0.8.a00000083' was created from inputfile(s) 
   'auto0052.fhl.a00000075'
by the program 'r2rff.x@2.0@971016'

The file 'auto0052.fhl.a00000075' was created from inputfile(s) 
   'auto0052.fhl'
by the program 'srgrec.x@1.9@971016'

The file auto0052.fhl has no version 
```
We may thus see which versions of what programs were used when generating the file, making it possible also to find out which files were generated with, e.g., a faulty program. In the above example, the usage of ID's is also clear, as is the indication as to when the processing was done.
2.4.3 Error-handling
As indicated in the description of the dsrishell, all programs log errors to different outputs. For a normal program, the outputs are threefold: a particular program log, the processing-sequence log and a global error log. Logs are essentially divided into debug messages, warnings and fatal errors. Fatal errors are logged to all 3 logs. Errors are logged to the program log and the processing sequence log; the debugging messages to the program log only. The program log exists on a per processing sequence basis.
2.5 Requests for Observation
July 16th of 1997, the first AO for the so-called Danish core program took place. We used it for testing the software that receives and handles Observation Requests. This software has been developed by CFA, in Cambridge, Massachusetts. We have had to do some enhancements. The system has three main parts: a HTML interface, which may be used for filling out all necessary information for applying for observation time; a mail part, by which mails generated by using the HTML interface are deciphered; and an RDB based database part, where data pertaining to the Observation request are stored in different relevant tables. The main identifier is the Observation Request ID, which will stick to the observation and it's included sub-observations throughout the process - till data is received.
3. Conclusions
The relatively few resources allocated for the implementation of our data processing has influenced the definitions of the goals set up for our processing system. The main goals will be to get a running, thoroughly tested, minimal system for launch; to enhance that system with further science analysis as part of the standard analysis, plus implementation of GUI's where applicable (and useful); to access archive data; to configure profiles for the pipeline etc., as resources become available. Without a well-working international community of High Energy Astrophysics Data centers, a good deal of standards and standard software and a general willingness in this community to be helpful, things would have been very difficult.

Proceed to the next article Return to the previous article
Select another article

HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Monday, 19-Jun-2006 11:40:52 EDT

HEASARC Staff Scientist Position - Applications are now being accepted for a Staff Scientist with significant experience and interest in the technical aspects of astrophysics research, to work in the High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC) in Greenbelt, MD. Refer to the AAS Job register for full details.

National Aeronautics and Space Administration

Goddard Space Flight Center

Sciences and Exploration

Search:

NOTICE: