HFWG Recommendation R14:

    TSORTKEY - A Convention for Specifying the Sort Order of a FITS Table

    (Approved: 1994 Nov 16)

    A) Introduction

    Tables of data that are stored in FITS ASCII or binary table extensions are usually sorted on one or more columns of the table, and it is often advantageous for users of the table to know what sorting has been performed. Certain data processing operations can only be performed, or can be performed much more efficiently, if the table has been sorted in a particular way (e.g. in order of increasing time as opposed to increasing Right Ascension on the sky) so there is a need to be able to specify how any particular table has been sorted. To fill this need, a convention for using a newly defined TSORTKEY keyword to specify the sort order in a FITS table is defined below.

    B) Definition of the TSORTKEY Keyword Convention

    The TSORTKEY keyword is reserved within this convention to indicate the order in which the rows in a FITS ASCII or binary table extension have been sorted. The value of the TSORTKEY keyword is a character string which lists the name (as given by the TTYPEn keyword) of the primary sort column, optionally followed by the names of any secondary sort column(s). The presence of this keyword indicates that the rows in the table have been sorted first by the values in the primary sort column; any rows that have the same value in the primary column have been further sorted by the values in the secondary sort column and so on for all the specified columns. If more than one column is specified by TSORTKEY then the names must be separated by a comma. One or more spaces are also allowed between the comma and the following column name.

    By default, columns are sorted in ascending order, but a minus sign may precede the column name to indicate that the rows are sorted in descending order.

    C) Definition of the Sort Order for the Various Datatypes

    In order to avoid any ambiguity, the definition of the ascending sort order for all the possible FITS datatypes is given below:

    • Integer or floating point columns are always sorted by numerical order and not by their internal ASCII representation in the case of ASCII table extensions.

    • Complex datatype ('C' or 'M') columns are first sorted in numerical order of the real component (the first of the pair of numbers). Any rows that have the same real value are then further sorted in numerical order of the imaginary component (the second value in the pair).

    • In bit datatype ('X') columns, the zero or 'unset' bits will appear first in sorted order followed by the one or 'set' bits.

    • In logical datatype ('L') columns, the false values (F) will appear first followed by the true (T) values.

    • Character ('A') columns are sorted in order of the ASCII collating sequence of the characters. By default the entire string in the ASCII field is used to determine the sorted order. In other words the table is first sorted in order of the first character in the field, then rows that have the same first character are further sorted in order of the second character and so on until all the characters have been used. The vector subset notation described below may be used to specify that a table has been sorted on a substring of characters within the table field.

    • Any null or undefined elements in a sort column will appear after all the defined values when the table is sorted in ascending order.

    D) Conventions for Sorting Vector Columns

    Vector columns are by default sorted on the value of every element within the vector. The rows are first sorted in order of the value of the first element of the vector, then rows that have same first value are further sorted by the value of the second element, and so on. If the vectors do not all have the same length (i.e. the column contains ASCII NUL terminated character strings or uses variable length array descriptors) then the shorter of 2 otherwise identical vectors shall appear first when sorted in ascending order (e.g., the 4-character string 'FORM' shall occur before the 6-character string 'FORMAT').

    If a table has been sorted based only on the value of single element in the vector, then this may be indicated by including the vector element number (starting with 1 for the first element) in parentheses after the column name in the TSORTKEY keyword value. For example:

    TSORTKEY= 'ARRAY(4)'
    indicates that the table has been sorted on the value of the 4th element in the vector field. If a table has been sorted in turn on several different elements within the vector, then each element should be listed in the TSORTKEY value, as in
    TSORTKEY= 'ARRAY(2), ARRAY(3), ARRAY(4)'
    to indicate that the table is sorted first by the 2nd element, then by the 3rd element, and finally by the 4th element. A shorthand notation may be used to indicate that a table has been sorted on a set of adjacent elements in a vector by listing the first and last elements separated by a colon (:); The preceding example may thus be rewritten using this shorthand notation as:
    TSORTKEY= 'ARRAY(2:4)
    This shorthand notation is most useful in conjunction with character string columns (in either ASCII or binary table extensions) to indicate that the table has been sorted on a substring within the vector of characters in the field. (Note that under this convention a character string field in an ASCII table extension is regarded as a vector of single characters, the same as in an ASCII character column in a binary table extension).

    E) Restrictions on Column Names Under this Convention

    The FITS format definition does not require that the TTYPEn keyword be present in FITS tables, and when the keyword is present any ASCII text characters may be included in the value string. FITS tables that use this sorting convention are required, however, to use the TTYPE keyword to assign a unique name to every sorting column. In addition, certain punctuation characters must not be used in the column name to avoid confusion with the syntax used within this convention. Specifically, the minus sign must not be used as the first character of a column name, and the comma, and the open and close parenthesis characters must not be used anywhere in the name of a column that is used to sort the table. It is also strongly recommended that the column name should not contain any embedded blank characters, and instead the underscore character should be used to link separate words in a column name together into a single string (e.g., use 'OBJECT_NAME' not 'OBJECT NAME').

    F) Examples of the TSORTKEY Keyword Usage

    The following examples illustrate the typical usage of TSORTKEY keyword:

    • TSORTKEY= 'X '
      This table is sorted in ascending value of the X column.
    • TSORTKEY= 'X,Y '
      This table is sorted in ascending value of the X column. Rows that have the same value of X have been further sorted in ascending value of the Y column.
    • TSORTKEY= '-TIME '
      This table is sorted in descending value of the TIME column.
    • TTYPE1 = 'SPECTRUM'
      TFORM1 = '10J '
      TSORTKEY= 'SPECTRUM(1)'
      This binary table is sorted in ascending value of the first element of the SPECTRUM vector column.
    • TTYPE1 = 'NAME '
      TFORM1 = '20A '
      TSORTKEY= 'NAME '
      This binary table is sorted in ascending order on all 20 characters of the NAME field. This is equivalent to specifying TSORTKEY = 'NAME(1:20)'
    • TTYPE1 = 'OBJECT '
      TFORM1 = 'A20 '
      TSORTKEY= '-OBJECT(2:4)'
      This ASCII table is sorted in reverse order on the 2nd through 4th characters (elements) of the OBJECT string column (which can be considered to be a vector of character elements).