next up previous


Specifications for Storing Compressed Images in FITS Binary Tables


Richard L. White, STScI
Perry Greenfield, STScI
William Pence, NASA/GSFC
Doug Tody, NOAO


October 21, 1999

1. General Description

This document describes a convention for compressing n-dimensional images and storing the resulting byte stream in a variable-length column in a FITS binary table. The general file structure outlined here is independent of the specific data compression algorithm that is used. The implementation details for several commonly used compression algorithms are described in the appendixes of this document.

The general principle used in this convention is to first divide the n-dimensional image into a rectangular grid of subimages or `tiles'. Each tile is then compressed as a continuous block of data, and the resulting compressed byte stream is stored in a row of a variable length column in a FITS binary table. By dividing the image into tiles it is generally possible to extract and uncompress subsections of the image without having to uncompress the whole image. The default tiling pattern treats each row of a 2-dimensional image (or higher dimensional cube) as a tile, such that each tile contains NAXIS1 pixels. Any other rectangular tiling pattern may be defined using the ZTILEn keywords that are described below. In the case of relatively small images it may be sufficient to compress the entire image as a single tile, resulting in an output binary table with 1 row. In the case of 3-dimensional data cubes, it may be advantageous to treat each plane of the cube as a separate tile if application software typically needs to access the cube on a plane by plane basis.

2. Keywords

The following keywords are defined by this convention for use in the header of the FITS binary table extension to describe the structure of the compressed image.

3. Columns

The following columns in the FITS binary table are defined by this convention. The order of the columns in the table is not significant. The column names (given by the TTYPEn keyword) are shown here in upper case letters, but the case is not significant.

4. Appendex A: Quantization algorithm

[description of the noise estimation and quantization algorithm goes here]. This algorithm is specifically used to quantize floating point images prior to compressing them with the Rice algorithm (see below), however, this same quantization algorithm could be used equally well with other integer compression algorithms.

5. Appendex B: Rice algorithm

[description of the Rice decoding algorithm goes here. ]

6. Appendex C: IRAF PLIO algorithm

The IRAF PLIO (Pixel List I/O) algorithm was developed to store image masks in a compressed form. The performance of this encoding is very good for typical masks consisting of isolated high or low values or extended regions at the same level. The worst case performance occurs when successive pixels have different values. Even in this case the encoding will only require one word (16 bits) per mask pixel, provided either the delta intensity change between pixels is usually less than 12 bits, or the mask represents a zero floored step function of constant height. The worst case cannot exceed npix*2 words provided the mask depth is 24 bits or less.

A good compromise between storage efficiency and efficiency of runtime access, while keeping things simple, is achieved if we maintain the compressed line lists as variable length arrays of type short integer (16 bits per list element), regardless of the mask depth. A line list consists of a series of simple instructions which are executed in sequence to reconstruct a line of the mask. Each 16 bit instruction consists of the sign bit (not used at present), a three bit opcode, and twelve bits of data, i.e.:

        +--+-----------+-----------------------------+
        |16|15       13|12                          1|
        +--+-----------+-----------------------------+
        |  |   opcode  |            data             |
        +--+-----------------------------------------+
The significance of the data depends upon the instruction. The instructions currently implemented are summarized in the table below.
     Instruction     Opcode           Description

        ZN            00        Output N zeros
        HN            04        Output N high values
        PN            05        Output N-1 zeros plus one high value
        SH            01        Set high value, absolute
        IH,DH         02,03     Increment or decrement high value
        IS,DS         06,07     Like IH-DH, plus output one high value

In order to reconstruct a mask line, the application executing these instructions is required to keep track of two values, the current high value and the current position in the output line. The detailed operation of each instruction is as follows:

ZN
Zero the next N (=data) output pixels.

HN
Set the next N output pixels to the current high value.

PN
Zero the next N-1 output pixels, and set pixel N to the current high value.

SH
Set the high value (absolute rather than incremental), taking the high 15 bits from the next word in the instruction stream, and the low 12 bits from the current data value.

IH,DH
Increment (IH) or decrement (DH) the current high value by the data value. The current position is not affected.

IS,DS
Increment (IS) or decrement (DS) the current high value by the data value, and step, i.e., output one high value.

The high value is assumed to be set to 1 at the beginning of a line, hence the IH,DH and IS,DS instructions are not normally needed for boolean masks. If the length of a line segment of constant value or the difference between two successive high values exceeds 4096 (12 bits), then multiple instructions are required to describe the segment or intensity change.

7. Appendex D: HCompress algorithm

[description of the HCompress decoding algorithm goes here. ]


next up previous