CFITSIO now transparently supports 2 methods of image compression:
1) The entire FITS file may be externally compressed with the gzip or Unix compress algorithm, producing a *.gz or *.Z file, respectively. When reading compressed files of this type, CFITSIO first uncompresses the entire file into memory before performing the requested read operations. Output files can be directly written in the gzip compressed format if the user-specified filename ends with `.gz'. In this case, CFITSIO initially writes the uncompressed file in memory and then compresses it and writes it to disk when the FITS file is closed, thus saving user disk space. Read and write access to these compressed FITS files is generally quite fast; the main limitation is that there must be enough available memory (or swap space) to hold the entire uncompressed FITS file.
2) CFITSIO also supports a newer image compression format in which the image is divided into a grid of rectangular tiles, and each tile of pixels is individually compressed. The compressed tiles are stored in rows of a variable length array column in a FITS binary table, but CFITSIO recognizes that the binary table extension contains an image and treats it as if it were an IMAGE extension. This tile-compressed format is especially well suited for compressing very large images because a) the FITS header keywords remain uncompressed for rapid read access, and because b) it is possible to extract and uncompress sections of the image without having to uncompress the entire image. This format is also much more effective in compressing floating point images (using a lossy compression algorithm) than simply compressing the image using gzip or compress.
A detailed description of this format is available at:
http://heasarc.gsfc.nasa.gov/docs/software/fitsio/
compression/compress_image.html
The N-dimensional FITS image can be divided into any desired rectangular grid of compression tiles. By default the tiles are chosen to correspond to the rows of the image, each containing NAXIS1 pixels, (except when using HCOMPRESS, where the default is to compress the entire image as a single big tile). For example, a 800 x 800 x 4 pixel data cube would be divided in to 3200 tiles containing 800 pixels each by default. Alternatively, this data cube could be divided into 256 tiles that are each 100 X 100 X 1 pixels in size, or 4 tiles containing 800 x 800 X 1 pixels, or a single tile containing the entire data cube. Note that the image dimensions are not required to be an integer multiple of the tile dimensions, so, for example, this data cube could also be divided into 250 X 200 pixel tiles, in which case the last tile in each row would only contain 50 X 200 pixels.
Currently, 4 image compression algorithms are supported: Rice, GZIP, HCOMPRESS, and PLIO. Rice and GZIP are general purpose algorithms that can be used to compress almost any image. HCOMPRESS is very effective for astronomical images. The PLIO algorithm is more specialized and was developed for use in the IRAF data analysis system to store pixel data quality masks. It is designed to only work on images containing positive integers with values up to about 2**24. Other image compression algorithms may be supported in the future.
The 4 supported image compression algorithms are all 'loss-less' when applied to integer FITS images; the pixel values are preserved exactly with no loss of information during the compression and uncompression process. The HCOMPRESS algorithm can also be used in a 'lossy' mode to achieve a greater degree of image compression (by specifying a 'scale' parameter value greater than 1). To set the scale parameter effectively, one must know the RMS noise in the image pixel values; The fits_rms_float and fits_rms_short routines in CFITSIO are provided to estimate the RMS value in an image.
Floating point FITS images (which have BITPIX = -32 or -64) are first quantized into scaled integer pixel values before being compressed. This technique produces much higher compression factors than, for example, simply using GZIP to compress the whole FITS file image, but it also means that the original floating value pixel values may not be precisely returned when the image is uncompressed. When done properly, this only discards the 'noise' from the floating point values without losing any significant information. The amount of noise that is discarded can be controlled by the 'noise_bits' compression parameter.
No special action is required to read tile-compressed images because all the CFITSIO routines that read normal uncompressed FITS images can also read images in the tile-compressed format; CFITSIO essentially treats the binary table that contains the compressed tiles as if it were an IMAGE extension.
When creating (writing) a new image with CFITSIO, a normal uncompressed FITS primary array or IMAGE extension will be written unless the tile-compressed format has been specified in 1 of 2 possible ways:
1) At run time, when specifying the name of the output FITS file to be created at run time, the user can indicate that images should be written in tile-compressed format by enclosing the compression parameters in square brackets following the root disk file name. The `imcopy' example program that included with the CFITSIO distribution can be used for this purpose to compress or uncompress images. Here are some examples of the extended file name syntax for specifying tile-compressed output images:
myfile.fit[compress] - use the default compression algorithm (Rice)
and the default tile size (row by row)
myfile.fit[compress GZIP] - use the specified compression algorithm;
myfile.fit[compress Rice] only the first letter of the algorithm
myfile.fit[compress PLIO] name is required.
myfile.fit[compress HCOMP]
myfile.fit[compress R 100,100] - use Rice, GZIP, or HCOMPRESS compression
myfile.fit[compress G 100,100] and 100 x 100 pixel tile size
myfile.fit[compress H 100,100]
myfile.fit[compress R 100,100;2] - Rice or GZIP, use noisebits = 2
myfile.fit[compress G 100,100;2] when compressing floating point image
myfile.fit[compress H; 25 1] - HCOMPRESS, with scale = 25 and smooth = 1
2) Before calling the CFITSIO routine to write the image header keywords (e.g., fits_create_image) the programmer can call the routines described below to specify the compression algorithm and the tiling pattern that is to be used. There are routines for specifying the various compression parameters and similar routines to return the current values of the parameters:
int fits_set_compression_type(fitsfile *fptr, int comptype, int *status) int fits_set_tile_dim(fitsfile *fptr, int ndim, long *tilesize, int *status) int fits_set_noise_bits(fitsfile *fptr, int noisebits, int *status) int fits_set_hcomp_scale(fitsfile *fptr, int scale, int *status) int fits_set_hcomp_smooth(fitsfile *fptr, int smooth, int *status) int fits_get_compression_type(fitsfile *fptr, int *comptype, int *status) int fits_get_tile_dim(fitsfile *fptr, int ndim, long *tilesize, int *status) int fits_get_noise_bits(fitsfile *fptr, int *noisebits, int *status) int fits_get_hcomp_scale(fitsfile *fptr, int *scale, int *status) int fits_get_hcomp_smooth(fitsfile *fptr, int *smooth, int *status)4 symbolic constants are defined for use as the value of the `comptype' parameter: GZIP_1, RICE_1, HCOMPRESS_1 or PLIO_1. Entering NULL for comptype will turn off the tile-compression and cause normal FITS images to be written.
The HCOMPRESS algorithm has 2 parameters. The first is the scale factor that controls the amount of compression. The default value of scale = 0 (or 1) results in lossless compression. Values of scale greater than 1 will produce larger amounts of 'lossy' compression. As a rule of thumb setting scale to about 2 times the RMS noise in the sky background of typical astronomical image will compress the image by about a factor of 10, with minimal loss of significant information. Larger values of scale will produce greater compression, but with more loss of information. The fits_rms_float and fits_rms_short routines can be used to calculate the RMS value in an image.
The second HCOMPRESS parameter controls whether to slightly smooth the image after decompressing it, if it was 'lossy' compressed (with smooth > 1). Smoothing helps to remove some of the 'blockiness' in the image, especially if it was highly compressed. The default is to not smooth the image (smooth = 0), but setting smooth to 1 (or any non-zero value) will cause the image to be smoothed.
The 'noisebits' parameter is only relevant when compressing floating point images (with BITPIX = 32 or 64). The default value is 4. Decreasing the value of noisebits will improve the overall compression efficiency at the expense of losing more information.
The following 2 routines are available for compressing or or decompressing an image:
int fits_img_compress(fitsfile *infptr, fitsfile *outfptr, int *status); int fits_img_decompress (fitsfile *infptr, fitsfile *outfptr, int *status);Before calling the compression routine, the compression parameters must first be defined in one of the 2 way described in the previous paragraphs. There is also a routine to determine if the current HDU contains a tile compressed image (it returns 1 or 0):
int fits_is_compressed_image(fitsfile *fptr, int *status);A small example program called 'imcopy' is included with CFITSIO that can be used to compress (or uncompress) any FITS image. This program can be used to experiment with the various compression options on existing FITS images as shown in these examples:
1) imcopy infile.fit 'outfile.fit[compress]'
This will use the default compression algorithm (Rice) and the
default tile size (row by row)
2) imcopy infile.fit 'outfile.fit[compress GZIP]'
This will use the GZIP compression algorithm and the default
tile size (row by row). The allowed compression algorithms are
Rice, GZIP, and PLIO. Only the first letter of the algorithm
name needs to be specified.
3) imcopy infile.fit 'outfile.fit[compress G 100,100]'
This will use the GZIP compression algorithm and 100 X 100 pixel
tiles.
4) imcopy infile.fit 'outfile.fit[compress R 100,100; 4]'
This will use the Rice compression algorithm, 100 X 100 pixel
tiles, and noise_bits = 4 (assuming the input image has a
floating point data type). Decreasing the value of noisebits
will improve the overall compression efficiency at the expense
of losing more information.
5) imcopy infile.fit outfile.fit
If the input file is in tile-compressed format, then it will be
uncompressed to the output file. Otherwise, it simply copies
the input image to the output image.
6) imcopy 'infile.fit[1001:1500,2001:2500]' outfile.fit
This extracts a 500 X 500 pixel section of the much larger
input image (which may be in tile-compressed format). The
output is a normal uncompressed FITS image.
7) imcopy 'infile.fit[1001:1500,2001:2500]' outfile.fit.gz
Same as above, except the output file is externally compressed
using the gzip algorithm.