Come analyze HEASARC, IRSA, and MAST data in the cloud! The Fornax Initiative is now welcoming all interested beta users.

NOTICE:

This Legacy journal article was published in Volume 7, June 1998, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.

The XMM Data Model C. Page (Leicester University)

The XMM X-ray observatory is scheduled for launch by the European Space Agency in August 1999. The science analysis software is being developed jointly by the XMM Science Operations Centre (SOC) based at ESTEC, and the XMM Survey Science Centre (SSC) which is a consortium of research establishments in France, Germany, and the UK, led by the University of Leicester.

Structure of the XMM Data Model
The XMM project is using a relatively simple data model: this can easily be represented by a FITS file, but other data formats are by no means ruled out. The description below is given using a generic terminology, but the FITS equivalent, where different, is shown in parentheses.

Data are organized in the form of data sets. A data set(file)is a collection of blocks. A block (header-data-unit) is either an array or a table; each block has a name. An array (primary image or image extension) is an N-dimensional array (N up to 4) of scalars, all of the same data type. The data types supported are: 8-bit logical, 8-bit, 16-bit, or 32-bit integer, and 32-bit and 64-bit reals.

A table (binary or ASCII) is a collection of columns, where each columns in a table has have the same number of rows. A column is a vector of a particular data type: it may be a character string, or a scalar or N-dimensional array (N up to 4) of 8-bit logical, 8-bit, 16-bit or 32-bit integer, or 32-bit or 64-bit real. Each column has a name; numerical columns may also have physical units attached. A value in a column is called a cell (field).

Datasets, tables, arrays, and columns may have attributes attached to them. An attribute (FITS keyword) has a name and value, and optionally the physical units and a comments string. History records are also supported.

This simple data model appears to be adequate to represent all the data structures needed in our astronomical data analysis; it follows quite naturally from the data structures which have been devised for ROSAT and ASCA, and should also be compatible with those being developed AXAF and INTEGRAL. Some of the latter include support for more advanced features such as variable-length fields in tables, data sub-space and filtering syntax, dataset grouping, and indexing structures. At present the XMM data model does not have these built in (there are specific tools to take care of data filtering), but they may be the subject of future enhancements.

Where the XMM data model differs more notably is in its implementation, which is designed to avoid various problems experienced in past missions.
Data Processing IssuesA sequence of quite complex data reduction processes is needed to turn raw telemetry data from an X-ray telescope into useful scientific products such as images, spectra, and light-curves. The strategy adopted by most recent missions (including RXTE, ASCA, and BeppoSAX) is to split the data reduction process into a several relatively independent programs, with FITS files used to transfer data from each program to the next. This modularity simplifies software development and testing, while allowing maximum flexibility of use. It often turns out that some stages can be handled by existing programs, such as FTOOLS, which further reduces the need to develop new software. This approach has many attractions, but is not without its drawbacks.Firstly it is very I/O intensive, since each stage writes a FITS file to disc, which the next stage has to read back in. I/O operations are relatively slow: a random seek on a modern disc drive takes around 10 ms, but this has to be compared to random access to a memory location, which takes only about 50 ns. This ratio, which in the worst case can be around 200,000:1, makes it highly desirable to avoid I/O whenever possible, and especially to avoid re-reading anything that could have been saved in memory.
Secondly, the FITS file structure was designed for data interchange, and not for I/O efficiency. FITS data are stored in blocks of 2880 bytes, a length poorly matched to current disc sector sizes. Tables contain numbers which are generally not aligned on whole-word boundaries, nor are successive elements of a column stored contiguously. Even in binary tables the metadata are stored as text strings in the FITS header records and have to be converted to and from binary form. All these increase the cost of reading and writing FITS files.

Thirdly the FITSIO library is rather complicated, because it has to handle FITS files in all their glory. As a result, applications built directly upon FITSIO are somewhat more complicated than really necessary, given the simplicity of the basic data model. These complications could be hidden from the application programmer by using a data access layer based directly on the data model concepts.
The XMM Data Access LayerIn an attempt to overcome these problems, the SOC designed and implemented a data access layer (DAL) which sits between the applications software and CFITSIO. This interface can be called from programs written in Fortran90 and takes advantage of its powerful array-handling facilities. The XMM project decided to adopt Fortran90 as its applications programming language because of its compatibility with existing Fortran77 code, but the DAL itself is coded in C++ and can also be called from programs written in C++.The program fragment below shows the simplification possible in application code. It demonstrates how to access a data set with a table called 'events' that contains a column 'x'. type(DataSetT) :: set
type(ArrayT) :: arr
real(kind=real32), dimension(:), pointer :: xset = dataSet("test.dat",READ)
tab = table(set,"events",READ)
xCol = column(set,"x",READ)
x => real32Data(xCol)
write(*,*) x(1:100)call release(xCol)
call release(tab)
call release(set)An important feature of the DAL is that it makes maximum use of central memory and encourages the processing of data column-by-column rather than row-by-row. This can be handled efficiently when the required data can all be held in memory, a situation which we expect to be true in most cases, now that memory is relatively cheap. In cases when the overheads of handling the FITS structure are considered excessive, the DAL can also read and write files in its own internal format, which is essentially a memory dump of its internal data structures. This is only intended for use as an interface between two tasks where the intermediate file does not need to be retained for further use.It is also possible to eliminate I/O altogether between two tasks: the top-level routines can be linked into a single executable and a dataset transferred from one to the other merely by passing a dataset pointer across. It is worth pointing out that the XMM DAL is based on CFITSIO, which can now handle data transfers to/from Unix pipes. This represents a further way of avoiding physical I/O.
The XMM DAL in Practice
The SSC Consortium has been using pre-release versions of the DAL over the last few months, and many applications have already been successfully built with it and tested. As a result we now have some confidence in the basic DAL infrastructure, and in the ability of our programmers to adapt to the programming style which it requires. However we have not yet had time to evaluate the more advanced options such as use of the internal file format, nor the direct transfers of datasets from one task to another via memory pointers. The interactive data analysis software and infrastructure will, of course, be available to all XMM users: the first external release is scheduled for early 1999, with a further release just before launch.
Potential users will want to know which platforms are supported. The official ESA policy is that the XMM software will be supported only on Solaris, but we are doing our best to ensure that it will also run on other Unix systems such as PC/Linux and Alpha/Digital-Unix (indeed Linux is being used extensively for software development). Although Fortran90 and C++ can each be used to build very portable software, connections between the two are still rather compiler-specific. At present the software depends upon the GNU C++ compiler and NAG (or NAGACE) Fortran compilers, but some experiments have been carried using other compiler combinations. In the longer term it is worth noting that one of the important features expected to appear in Fortran-2000 is a standard way of interfacing Fortran with C and other languages.We are also considering the best ways of coping with large data files on systems with little memory. Much of the simplicity of the current software arises from the fact that data are accessed column by column, whereas FITS tables are stored in row order. This works well when there is enough physical memory available, otherwise the performance drops sharply. One solution would be to make use of an iterator mechanism, some support for which is already present in the DAL. Essentially this requires the application to include a subroutine which processes a single row of data: this routine is called repeatedly by the DAL. The DAL transparently handles the reading or writing of data in chunks, the size of which is matched to the amount of memory available (and may consist of an entire column). We have not yet had time to exercise the iterator mechanism in realistic applications, so cannot yet comment whether the performance gain is worthwhile, given that there is some additional coding effort. It is interesting to note, however, that an iterator mechanism has also been introduced in the latest version of CFITSIO, and we would be pleased to hear of experiences of its use. [Clive Pages e-mail address is cgp@star.le.ac.uk]
Proceed to the next article Return to the previous article
Select another article

NASA | GSFC | Sciences and Exploration

HEASARC

High Energy Astrophysics Science Archive Research Center

NOTICE: