CFITSIO and Data Compression

CFITSIO supports data compression on 2 different levels:
  • External file compression
  • Internal tiled-image compression
The imcopy example program that is included in the CFITSIO distribution and is also available in the FITS tools suite of utility programs can be used to compress any existing FITS image into the tile-compressed format, or to uncompress a tile-compressed image into a normal FITS image. This program can be used to experiment with the various compression options on existing images.

External File Compression

In the first type of data compression, the entire FITS file may be externally compressed with the gzip or Unix compress algorithm, producing a *.gz or *.Z file, respectively. When reading compressed files of this type, CFITSIO first uncompresses the entire file into memory before performing the requested read operations. Output files can be directly written in the gzip compressed format if the user-specified filename ends with `.gz'. In this case, CFITSIO initially writes the uncompressed file in memory and then compresses it and writes it to disk when the FITS file is closed, thus saving user disk space. Read and write access to these compressed FITS files is generally quite fast; the main limitation is that there must be enough available memory (or swap space) to hold the entire uncompressed FITS file.

Internal Tiled-Image Compression

CFITSIO also supports the tiled image compression format in which the image is divided into a grid of rectangular tiles, and each tile of pixels is individually compressed. The compressed tiles are stored in rows of a variable length array column in a FITS binary table, but CFITSIO recognizes that the binary table extension contains an image and treats it as if it were an IMAGE extension. This tile-compressed format is especially well suited for compressing very large images because a) the FITS header keywords remain uncompressed for rapid read access, and because b) it is possible to extract and uncompress sections of the image without having to uncompress the entire image. This format is also much more effective in compressing floating point images (using a lossy compression algorithm) than simply compressing the image using gzip or compress.

A detailed description of this compressed image format can be found in the Registry of FITS Conventions.

The N-dimensional FITS image can be divided into any desired rectangular grid of compression tiles. By default the tiles are chosen to correspond to the rows of the image, each containing NAXIS1 pixels. For example, a 800 x 800 x 4 pixel data cube would be divided in to 3200 tiles containing 800 pixels each by default. Alternatively, this data cube could be divided into 256 tiles that are each 100 X 100 X 1 pixels in size, or 4 tiles containing 800 x 800 X 1 pixels, or a single tile containing the entire data cube. Note that the image dimensions are not required to be an integer multiple of the tile dimensions, so, for example, this data cube could also be divided into 250 X 200 pixel tiles, in which case the last tile in each row would only contain 50 X 200 pixels.

Tile Compression Algorithms

Currently, 4 image compression algorithms are supported: Rice, GZIP, HCompress, and PLIO. Rice, GZIP, and HCompress are general purpose algorithms that can be used to compress almost any image. The PLIO algorithm, on the other hand, is more specialized and was developed for use in IRAF to store pixel data quality masks. It is designed to only work on images containing positive integers with values up to about 2**24. Other image compression algorithms may be supported in the future.

The supported image compression algorithms are all 'loss-less' when applied to integer FITS images; the pixel values are preserved exactly with no loss of information during the compression and uncompression process. Floating point FITS images (which have BITPIX = -32 or -64) are first quantized into scaled integer pixel values before being compressed. This technique produces much higher compression factors than simply using GZIP to compress the image, but it also means that the original floating value pixel values may not be precisely returned when the image is uncompressed. When done properly, this only discards the 'noise' from the floating point values without losing any significant information. The amount of noise that is discarded can be controlled by the 'noise_bits' compression parameter.

Using Tile-Compressed Images

No special action is required to read tile-compressed images because all the CFITSIO routines that read normal uncompressed FITS images can also read images in the tile-compressed format; CFITSIO essentially treats the binary table that contains the compressed tiles as if it were an IMAGE extension.

When creating (writing) a new image with CFITSIO, a normal uncompressed FITS primary array or IMAGE extension will be written unless the tile-compressed format has been specified at the programmer level (see the CFITSIO User's Reference Guide) or at run time by the user as follows:

When specifying the name of the output FITS file to be created at run time, the user can indicate that images should be written in tile-compressed format by enclosing the compression parameters in square brackets following the root disk file name. Here are some examples of the extended file name syntax for specifying tile-compressed output images:

    myfile.fit[compress]  - use the default compression algorithm (Rice)
                            and the default tile size (row by row)

    myfile.fit[compress GZIP] - use the specified compression algorithm;
    myfile.fit[compress Rice]    (only the first letter of the algorithm 
    myfile.fit[compress PLIO]     name is required)
    myfile.fit[compress HCOMPRESS]
    
    myfile.fit[compress R 100,100]   - use 100 x 100 pixel tile size
    myfile.fit[compress R 100,100;2] - as above, and use noisebits = 2


Return to the FITSIO home page.


HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

Last modified: Tuesday, 10-Nov-2015 15:26:24 EST

HEASARC Staff Scientist Position - Applications are now being accepted for a Staff Scientist with significant experience and interest in the technical aspects of astrophysics research, to work in the High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC) in Greenbelt, MD. Refer to the AAS Job register for full details.