next up previous contents FITSIO Home
Next: 5.7 ASCII and Binary Up: 5. Basic CFITSIO Interface Previous: 5.5 Primary Array or   Contents

5.6 Image Compression

CFITSIO transparently supports the 2 methods of image compression described below.

1) The entire FITS file may be externally compressed with the gzip or Unix compress utility programs, producing a *.gz or *.Z file, respectively. When reading compressed files of this type, CFITSIO first uncompresses the entire file into memory before performing the requested read operations. Output files can be directly written in the gzip compressed format if the user-specified filename ends with `.gz'. In this case, CFITSIO initially writes the uncompressed file in memory and then compresses it and writes it to disk when the FITS file is closed, thus saving user disk space. Read and write access to these compressed FITS files is generally quite fast since all the I/O is performed in memory; the main limitation with this technique is that there must be enough available memory (or swap space) to hold the entire uncompressed FITS file.

2) CFITSIO also supports the FITS tiled image compression convention in which the image is subdivided into a grid of rectangular tiles, and each tile of pixels is individually compressed. The details of this FITS compression convention are described at the FITS Support Office web site at http://fits.gsfc.nasa.gov/fits_registry.html Basically, the compressed image tiles are stored in rows of a variable length array column in a FITS binary table, however CFITSIO recognizes that this binary table extension contains an image and treats it as if it were an IMAGE extension. This tile-compressed format is especially well suited for compressing very large images because a) the FITS header keywords remain uncompressed for rapid read access, and because b) it is possible to extract and uncompress sections of the image without having to uncompress the entire image. This format is also much more effective in compressing floating point images than simply compressing the image using gzip or compress because it approximates the floating point values with scaled integers which can then be compressed more efficiently.

Currently CFITSIO supports 3 general purpose compression algorithms plus one other special-purpose compression technique that is designed for data masks with positive integer pixel values. The 3 general purpose algorithms are GZIP, Rice, and HCOMPRESS, and the special purpose algorithm is the IRAF pixel list compression technique (PLIO). In principle, any number of other compression algorithms could also be supported by the FITS tiled image compression convention.

The FITS image can be subdivided into any desired rectangular grid of compression tiles. With the GZIP, Rice, and PLIO algorithms, the default is to take each row of the image as a tile. The HCOMPRESS algorithm is inherently 2-dimensional in nature, so the default in this case is to take 16 rows of the image per tile. In most cases it makes little difference what tiling pattern is used, so the default tiles are usually adequate. In the case of very small images, it could be more efficient to compress the whole image as a single tile. Note that the image dimensions are not required to be an integer multiple of the tile dimensions; if not, then the tiles at the edges of the image will be smaller than the other tiles.

The 4 supported image compression algorithms are all 'loss-less' when applied to integer FITS images; the pixel values are preserved exactly with no loss of information during the compression and uncompression process. In addition, the HCOMPRESS algorithm supports a 'lossy' compression mode that will produce larger amount of image compression. This is achieved by specifying a non-zero value for the HCOMPRESS ``scale'' parameter. Since the amount of compression that is achieved depends directly on the RMS noise in the image, it is usually more convention to specify the HCOMPRESS scale factor relative to the RMS noise. Setting s = 2.5 means use a scale factor that is 2.5 times the calculated RMS noise in the image tile. In some cases it may be desirable to specify the exact scaling to be used, instead of specifying it relative to the calculated noise value. This may be done by specifying the negative of desired scale value (typically in the range -2 to -100).

Very high compression factors (of 100 or more) can be achieved by using large HCOMPRESS scale values, however, this can produce undesirable ``blocky'' artifacts in the compressed image. A variation of the HCOMPRESS algorithm (called HSCOMPRESS) can be used in this case to apply a small amount of smoothing of the image when it is uncompressed to help cover up these artifacts. This smoothing is purely cosmetic and does not cause any significant change to the image pixel values.

Floating point FITS images (which have BITPIX = -32 or -64) usually contain too much ``noise'' in the least significant bits of the mantissa of the pixel values to be effectively compressed with any lossless algorithm. Consequently, floating point images are first quantized into scaled integer pixel values (and thus throwing away much of the noise) before being compressed with the specified algorithm (either GZIP, Rice, or HCOMPRESS). This technique produces much higher compression factors than simply using the GZIP utility to externally compress the whole FITS file, but it also means that the original floating value pixel values are not exactly preserved. When done properly, this integer scaling technique will only discard the insignificant noise while still preserving all the real information in the image. The amount of precision that is retained in the pixel values is controlled by the "quantization level" parameter, q. Larger values of q will result in compressed images whose pixels more closely match the floating point pixel values, but at the same time the amount of compression that is achieved will be reduced. Users should experiment with different values for this parameter to determine the optimal value that preserves all the useful information in the image, without needlessly preserving all the ``noise'' which will hurt the compression efficiency.

The default value for the quantization scale factor is 16., which means that scaled integer pixel values will be quantized such that the difference between adjacent integer values will be 1/16th of the noise level in the image background. CFITSIO uses an optimized algorithm to accurately estimate the noise in the image. As an example, if the RMS noise in the background pixels of an image = 32.0, then the spacing between adjacent scaled integer pixel values will equal 2.0 by default. Note that the RMS noise is independently calculated for each tile of the image, so the resulting integer scaling factor may fluctuate slightly for each tile. In some cases it may be desirable to specify the exact quantization level to be used, instead of specifying it relative to the calculated noise value. This may be done by specifying the negative of desired quantization level for the value of q. In the previous example, one could specify q = -2.0 so that the quantized integer levels differ by 2.0. Larger negative values for q means that the levels are more coarsely spaced, and will produce higher compression factors.

There are 2 methods for specifying all the parameters needed to write a FITS image in the tile compressed format. The parameters may either be specified at run time as part of the file name of the output compressed FITS file, or the writing program may call a set of helper CFITSIO subroutines that are provided for specifying the parameter values, as described below:

1) At run time, when specifying the name of the output FITS file to be created, the user can indicate that images should be written in tile-compressed format by enclosing the compression parameters in square brackets following the root disk file name in the following format:

    [compress NAME T1,T2; q QLEVEL, s HSCALE]
where

    NAME   = algorithm name:  GZIP, Rice, HCOMPRESS, HSCOMPRSS or PLIO
             may be abbreviated to the first letter (or HS for HSCOMPRESS)
    T1,T2  = tile dimension (e.g. 100,100 for square tiles 100 pixels wide)
    QLEVEL = quantization level for floating point FITS images
    HSCALE = HCOMPRESS scale factor; default = 0 which is lossless.

Here are a few examples of this extended syntax:

    myfile.fit[compress]    - use the default compression algorithm (Rice)
                              and the default tile size (row by row)

    myfile.fit[compress GZIP] - use the specified compression algorithm;
    myfile.fit[compress Rice]     only the first letter of the algorithm
    myfile.fit[compress PLIO]     name is required.
    myfile.fit[compress HCOMP]

    myfile.fit[compress R 100,100]   - use Rice and 100 x 100 pixel tiles

    myfile.fit[compress R; q 10.0] - quantization level = (RMS-noise) / 10.
    myfile.fit[compress HS; s 2.0]  -  HSCOMPRESS (with smoothing)
                                          and scale = 2.0 * RMS-noise

2) Before calling the CFITSIO routine to write the image header keywords (e.g., fits_create_image) the programmer can call the routines described below to specify the compression algorithm and the tiling pattern that is to be used. There are routines for specifying the various compression parameters and similar routines to return the current values of the parameters:

  int fits_set_compression_type(fitsfile *fptr, int comptype, int *status)
  int fits_set_tile_dim(fitsfile *fptr, int ndim, long *tilesize, int *status)
  int fits_set_quantize_level(fitsfile *fptr, float qlevel, int *status)
  int fits_set_hcomp_scale(fitsfile *fptr, float scale, int *status)
  int fits_set_hcomp_smooth(fitsfile *fptr, int smooth, int *status)
              Set smooth = 1 to apply smoothing when uncompressing the image

  int fits_get_compression_type(fitsfile *fptr, int *comptype, int *status)
  int fits_get_tile_dim(fitsfile *fptr, int ndim, long *tilesize, int *status)
  int fits_get_quantize_level(fitsfile *fptr, float *level, int *status)
  int fits_get_hcomp_scale(fitsfile *fptr, float *scale, int *status)
  int fits_get_hcomp_smooth(fitsfile *fptr, int *smooth, int *status)
4 symbolic constants are defined for use as the value of the `comptype' parameter: GZIP_1, RICE_1, HCOMPRESS_1 or PLIO_1. Entering NULL for comptype will turn off the tile-compression and cause normal FITS images to be written.

No special action is required by software when read tile-compressed images because all the CFITSIO routines that read normal uncompressed FITS images also transparently read images in the tile-compressed format; CFITSIO essentially treats the binary table that contains the compressed tiles as if it were an IMAGE extension.

The following 2 routines are available for compressing or or decompressing an image:

  int fits_img_compress(fitsfile *infptr, fitsfile *outfptr, int *status);
  int fits_img_decompress (fitsfile *infptr, fitsfile *outfptr, int *status);
Before calling the compression routine, the compression parameters must first be defined in one of the 2 way described in the previous paragraphs. There is also a routine to determine if the current HDU contains a tile compressed image (it returns 1 or 0):

  int fits_is_compressed_image(fitsfile *fptr, int *status);
A small example program called 'imcopy' is included with CFITSIO that can be used to compress (or uncompress) any FITS image. This program can be used to experiment with the various compression options on existing FITS images as shown in these examples:

1)  imcopy infile.fit 'outfile.fit[compress]'

       This will use the default compression algorithm (Rice) and the
       default tile size (row by row)

2)  imcopy infile.fit 'outfile.fit[compress GZIP]'

       This will use the GZIP compression algorithm and the default
       tile size (row by row).  The allowed compression algorithms are
       Rice, GZIP, and PLIO.  Only the first letter of the algorithm
       name needs to be specified.

3)  imcopy infile.fit 'outfile.fit[compress G 100,100]'

       This will use the GZIP compression algorithm and 100 X 100 pixel
       tiles.

4)  imcopy infile.fit 'outfile.fit[compress R 100,100; q 10.0]'

       This will use the Rice compression algorithm, 100 X 100 pixel
       tiles, and quantization level = RMSnoise / 10.0 (assuming the
       input image has a floating point data type).

5)  imcopy infile.fit outfile.fit

       If the input file is in tile-compressed format, then it will be
       uncompressed to the output file.  Otherwise, it simply copies
       the input image to the output image.

6)  imcopy 'infile.fit[1001:1500,2001:2500]'  outfile.fit

       This extracts a 500 X 500 pixel section of the much larger
       input image (which may be in tile-compressed format).  The
       output is a normal uncompressed FITS image.

7)  imcopy 'infile.fit[1001:1500,2001:2500]'  outfile.fit.gz

       Same as above, except the output file is externally compressed
       using the gzip algorithm.


next up previous contents FITSIO Home
Next: 5.7 ASCII and Binary Up: 5. Basic CFITSIO Interface Previous: 5.5 Primary Array or   Contents