NOTICE:

This Legacy journal article was published in Volume 7, June 1998, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.
The HEASARC CFITSIO Data Interface Software

W. Pence (HEASARC)


  1. Introduction

    CFITSIO is the software library developed at the HEASARC to provide a simple and efficient programming interface for reading and writing FITS (Flexible Image Transport System) format data files. CFITSIO currently supports all the features in the official definition of the FITS format. The HEASARC is committed to maintaining CFITSIO in the future to support changes to the FITS standard or new FITS conventions that are widely supported within the astrophysics community.

    A Fortran-77 implementation of the FITSIO library was first developed in 1991 followed by the ANSI-C CFITSIO library in 1996. Recently, a set of Fortran-callable wrapper routines have been added to CFITSIO (developed by HEASARC programmer Peter Wilson) which serve to replace the Fortran FITSIO library. These wrapper routines are more efficient and provide new features, therefore, the old FITSIO library is now no longer being supported and Fortran programmers should switch to using CFITSIO instead. The Fortran-callable wrappers in CFITSIO have exactly the same calling sequence and functionally as the FITSIO subroutines. Any application program can be linked to CFITSIO instead of FITSIO without modification.

    One of the highest priorities in the design of CFITSIO has been to maximize the data throughput rate to be able to read and write data files as efficiently as possible. As a result CFITSIO is often limited only by the data I/O rate of the underlying media (e.g., magnetic disk). The actual performance depends on the computer platform and the type of FITS file (image or table), but as a rough guide, CFITSIO can read or write disk files on current generation workstations with data throughput rates of 5 MB/s or more. Substantially faster rates can be achieved when using the new features in CFITSIO which read or write FITS files in memory instead of on magnetic disk. More detailed statistics on the measured performance of CFITSIO on a variety of different workstations and file storage media are available from the FITSIO home page.

    The latest version of CFITSIO, as well as an HTML version of the User's Guide and other FITS related information can be obtained from the FITSIO home page at:

    http://heasarc.gsfc.nasa.gov/fitsio

    Many significant improvements have been made to CFITSIO recently, so users are encouraged to upgrade to the latest v1.42 version if they are using an older version or are using the now-obsolete Fortran FITSIO library. This version has been extensively tested and will remain the official release version until it is replaced by v2.0 (soon to be available as a beta release) in late 1998.

  2. Recent Enhancements to CFITSIO

    This section describes some of the recent enhancements to CFITSIO that are available in the current v1.42 release:

    • CFITSIO can now directly read FITS files that have been compressed with the gzip (.gz) or Unix compress (.Z) algorithms. This is especially useful for reading FITS files from on-line data archives or on CDROMs which are often in a compressed format. Specifying the file compression suffix is optional. If one attempts to open a file called 'myfile.fits', and a file with that name doesn't exist, CFITSIO will automatically try to open the file 'myfile.fits.gz' or 'myfile.fits.Z' instead.

    • FITS files may be read or written, respectively, on the stdin or stdout file streams, and thus FITS files may be piped in memory between tasks in an analysis pipeline. Simply specifying the input file name as a '-' (minus sign) will cause CFITSIO to read the file from the stdin stream, and similarly, specifying '-' for anoutput file will cause it to be written to the stdout stream. If one then has 2 tasks, each of which needs an input and output FITS file, the output file from task1 can be piped to the input of task2 by specifying the following on the Unix command line:

      > task1 inputfile.fits '-' | task2 '-' outputfile.fits

      where the vertical bar symbol is the unix piping command. This will improve the overall processing speed of the pipeline by passing the intermediate FITS file in memory rather than having to write and then read back the intermediate FITS file on slower magnetic disk.

    • A new CFITSIO function called the 'iterator' provides a powerful method of executing an arbitrary user-supplied 'work' function that operates on rows of data in FITS tables or on pixels in FITS images. Rather than explicitly reading and writing the FITS images or columns of data, one instead calls the iterator routine, passing to it the name of the user's work function that is to be executed along with a list of all the table columns or image arrays that are to be passed to the work function. The CFITSIO iterator function then does all the work of allocating memory for the data arrays, reading the input data from the FITS file, passing them to the work function, and then writing any output data back to the FITS file after the work function exits. Because it is often more efficient to process only a subset of the total table rows at one time, the iterator function can determine the optimum amount of data to pass in each iteration and repeatedly call the work function until the entire table been processed.

      • For many applications this single CFITSIO iterator function can effectively replace all the other CFITSIO Routines for reading or writing data in FITS images or tables. Using the iterator has several important advantages over the traditional method of reading and writing FITS data files:

      • It cleanly separates the data I/O from the routine that operates on the data. This leads to a more modular and 'object oriented' programming style.

      • It simplifies the application program by eliminating the need to allocate memory for the data arrays and eliminates most of the calls to the CFITSIO routines that explicitly read and write the data.

      • It ensures that the data are processed as efficiently as possible. This is especially important when processing tabular data since the iterator function will calculate the most efficient number of rows in the table to be passed at one time to the user's work function on each iteration.

        This iterator concept is well known in object oriented languages like C++, but may be unfamiliar to more procedural oriented C and Fortran programmers. A new section in the User's Guide and several example programs are included with CFITSIO to help learn how to use the iterator function.

    • The CFITSIO User's Guide has been rewritten to make it easier and faster to learn about and use the CFITSIO routines. The most commonly used routines have been grouped into a single shorter section for easy reference.

  3. Future Plans

    Development of the next v2.0 version of CFITSIO is now well underway and it should be available as a beta release in mid 1998. Despite all the planned changes described below, this new version will remain backward compatible so that existing software may be linked to the new version of the CFITSIO library without any modification to the source code. What follows is a provisional list some of the major features that will be introduced with this release, but users should check the FITSIO home page for the most current information.

    • The low-level routines in CFITSIO which open, close, read, and write files will be replaced by a set of 'plug-in' drivers, one for each supported media or file type. There will be a driver for FITS files on magnetic disk, as well as other drivers for compressed files, files in memory, files on magnetic tape, etc. This change in and of itself will be largely transparent to CFITSIO users, however, it enables many of the other new features described below to be implemented much more easily. This driver concept was originally conceived and developed for use in CFITSIO by members of the INTEGRAL Science Data Centre (ISDC), namely Jurek Borkowski, Bruce O'Neel, and Don Jennings.

    • Based on this driver concept, new I/O drivers will be provided to read FITS files over the net using the FTP and HTTP protocols. Users only have to supply the full ftp or http URL of the input file and CFITSIO will then establish the network connection to the appropriate machine and read the FITS file. Similarly, a new 'ROOT' I/O driver will be installed which will allow writing as well as reading FITS files over the network. These new drivers were written by Bruce O'Neel (ISDC).

    • Another I/O driver is being provided by Jurek Borkowski (ISDC) that will read and write FITS files in shared memory. In certain environments this should provide much faster data I/O processing than if the FITS files are written to magnetic disk.

    • A new set of routines to support hierarchical groups of data files will be added to CFITSIO. These are being added specifically to help support the INTEGRAL mission, which will typically generate hundreds data files for each observation, (see the "INTEGRAL Data Model" article elsewhere in this volume) but these routines should also be generally useful for other applications which have to deal with a large number of related files at one time.

    • The input FITS file specification syntax in CFITSIO is being greatly expanded to support a number of new features which are similar to what is planned for the ASC, XMM or DAL Data Models as described in companion articles this issue. Firstly, one may specify the desired HDU within a FITS file to be opened either by giving the positional number of the HDU within the file, or by specifying the value of the EXTNAME, EXTVER, and XTENSION keywords in the desired HDU, enclosed in square brackets. For example, entering 'myfile.fit[events,2]' as the name of the input FITS file will cause CFITSIO to open the FITS file called 'myfile.fits' and then move to the extension which has the keywords EXTNAME = 'EVENTS and EXTVER = 2.

    • The input file syntax will also support on-the-fly file table row filtering by entering a boolean selection expression following the file name. For example entering:

      'myfile.fits[events][pha > 1 && pha < 6]'

      as the name of the file to be opened will cause CFITSIO to create a temporary table that is similar to the EVENTS table extension in the file myfile.fits but only contains those rows that have a PHA column value in the range two to five. Arbitrarily complex boolean expressions may be entered using a C-like syntax. For readers that are familiar with the ftools package, this puts all the power of the 'fselect' task directly within CFITSIO itself and in many cases will eliminate the need to generate intermediate FITS tables containing only the selected rows. This is currently implemented in CFITSIO by creating a temporary new FITS file, in memory if possible, else on magnetic disk, which contains only the selected rows. This temporary file is then opened and passed to the application program.

    • On-the-fly histogram binning will also be supported and will generate a temporary FITS image by binning the specified column(s) of a FITS table. For example,

      'myfile.fits[events][pha > 1 && pha < 6][bin (X,Y) = 4]'

      will select the rows from the EVENTS table that have PHA in the range two to five and then create a 2-D histogram from the values in X and Y columns of the table, using a pixel size of 4 in both dimensions. (It uses the TLMINn and TLMAXn keywords to get the default range of each axis).

    • A number of other filtering options are possible and will probably be added to CFITSIO as time permits. This could include filtering files based on an input 'Good Time Interval' (GTI) files and on spatial region files.

    • The new I/O driver concept in CFITSIO (described above) has opened up many new possibilities for CFITSIO in the future. For example, new compression schemes that are optimized for tabular data could be implemented, or indices on large tables could be supported to more quickly locate the desired rows in the table. CFITSIO could also be expanded in a major new direction to support other similar data formats besides FITS, such as the IRAF QPOE and IMH formats. While none of these projects are currently planned for v2.0, they could be added in later versions.

  4. Summary and Conclusions

    As can be seen from the above examples, CFITSIO is undergoing extensive development that will provide many new features in the near future. These developments are occurring so rapidly that it is difficult to anticipate what will be available more than a few months in advance. This in turn makes it difficult for other developers of data interfaces that use CFITSIO to assess how the evolution of CFITSIO might affect their own development plans. Currently there is a certain amount of healthy competition between the various data interface developers as each tries to provide the most powerful and convenient features for the end user. Eventually, it is hoped and expected, based on previous cooperative developments within the high energy astrophysics community, that the best features from the differing data interfaces will emerge as common standards that will provide a uniform data interface to users, regardless of which data analysis system is being used.


    Next Proceed to the next article Previous Return to the previous article

    Contents Select another article



    HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

    Last modified: Monday, 19-Jun-2006 11:40:52 EDT