HFWG Recommendation R14:
TSORTKEY - A Convention for Specifying the Sort Order of a FITS Table
(Approved: 1994 Nov 16)
A) Introduction
Tables of data that are stored in FITS ASCII or binary table extensions
are usually sorted on one or more columns of the table, and it is often
advantageous for users of the table to know what sorting has been
performed. Certain data processing operations can only be performed,
or can be performed much more efficiently, if the table has been sorted
in a particular way (e.g. in order of increasing time as opposed to
increasing Right Ascension on the sky) so there is a need to be able to
specify how any particular table has been sorted. To fill this need, a
convention for using a newly defined TSORTKEY keyword to specify the
sort order in a FITS table is defined below.
B) Definition of the TSORTKEY Keyword Convention
The TSORTKEY keyword is reserved within this convention to indicate the
order in which the rows in a FITS ASCII or binary table extension have
been sorted. The value of the TSORTKEY keyword is a character string
which lists the name (as given by the TTYPE n keyword) of
the primary
sort column, optionally followed by the names of any secondary sort
column(s). The presence of this keyword indicates that the rows in the
table have been sorted first by the values in the primary sort column;
any rows that have the same value in the primary column have been
further sorted by the values in the secondary sort column and so on for
all the specified columns. If more than one column is specified by
TSORTKEY then the names must be separated by a comma. One or more
spaces are also allowed between the comma and the following column
name.
By default, columns are sorted in ascending order, but a minus sign may
precede the column name to indicate that the rows are sorted in descending
order.
C) Definition of the Sort Order for the Various Datatypes
In order to avoid any ambiguity, the definition of the ascending sort order
for all the possible FITS datatypes is given below:
- Integer or floating point columns are always sorted by numerical order
and not by their internal ASCII representation in the case of ASCII
table extensions.
- Complex datatype ('C' or 'M') columns are first sorted in numerical
order of the real component (the first of the pair of numbers). Any
rows that have the same real value are then further sorted in numerical
order of the imaginary component (the second value in the pair).
- In bit datatype ('X') columns, the zero or 'unset' bits will appear
first in sorted order followed by the one or 'set' bits.
- In logical datatype ('L') columns, the false values (F) will
appear first
followed by the true (T) values.
- Character ('A') columns are sorted in order of the ASCII collating
sequence of the characters. By default the entire string in the ASCII
field is used to determine the sorted order. In other words the table is
first sorted in order of the first character in the field, then rows that
have the same first character are further sorted in order of the
second character and so on until all the characters have been used.
The vector subset notation described below may be used to specify that
a table has been sorted on a substring of characters within the table
field.
- Any null or undefined elements in a sort column will appear after all
the defined values when the table is sorted in ascending order.
D) Conventions for Sorting Vector Columns
Vector columns are by default sorted on the value of every element
within the vector. The rows are first sorted in order of the value of
the first element of the vector, then rows that have same first value
are further sorted by the value of the second element, and so on. If
the vectors do not all have the same length (i.e. the column contains
ASCII NUL terminated character strings or uses variable length array
descriptors) then the shorter of 2 otherwise identical vectors shall
appear first when sorted in ascending order (e.g., the 4-character
string 'FORM' shall occur before the 6-character string
'FORMAT' ).
If a table has been sorted based only on the value of single element in
the vector, then this may be indicated by including the vector element
number (starting with 1 for the first element) in parentheses after the
column name in the TSORTKEY keyword value. For example:
TSORTKEY= 'ARRAY(4)'
indicates that the table has been sorted on the value of the 4th
element in the vector field. If a table has been sorted in turn on
several different elements within the vector, then each element should
be listed in the TSORTKEY value, as in
TSORTKEY= 'ARRAY(2), ARRAY(3), ARRAY(4)'
to indicate that the table is sorted first by the 2nd element, then by
the 3rd element, and finally by the 4th element. A shorthand notation
may be used to indicate that a table has been sorted on a set of
adjacent elements in a vector by listing the first and last elements
separated by a colon (:); The preceding example may thus be rewritten
using this shorthand notation as:
TSORTKEY= 'ARRAY(2:4)
This shorthand notation is most useful in conjunction with character
string columns (in either ASCII or binary table extensions) to indicate
that the table has been sorted on a substring within the vector of
characters in the field. (Note that under this convention a character
string field in an ASCII table extension is regarded as a vector of
single characters, the same as in an ASCII character column in a
binary table extension).
E) Restrictions on Column Names Under this Convention
The FITS format definition does not require that the
TTYPE n keyword be
present in FITS tables, and when the keyword is present any ASCII text
characters may be included in the value string. FITS tables that use
this sorting convention are required, however, to use the
TTYPE keyword
to assign a unique name to every sorting column. In addition, certain
punctuation characters must not be used in the column name to avoid
confusion with the syntax used within this convention. Specifically,
the minus sign must not be used as the first character of a column
name, and the comma, and the open and close parenthesis characters must
not be used anywhere in the name of a column that is used to sort the
table. It is also strongly recommended that the column name should not
contain any embedded blank characters, and instead the underscore
character should be used to link separate words in a column name
together into a single string (e.g., use
'OBJECT_NAME' not 'OBJECT NAME' ).
F) Examples of the TSORTKEY Keyword Usage
The following examples illustrate the typical usage of TSORTKEY keyword:
-
TSORTKEY= 'X ' This table is sorted in ascending value of the X column.
-
TSORTKEY= 'X,Y ' This table is sorted in ascending value of the X column. Rows
that have the same value of X have been further sorted in
ascending value of the Y column.
-
TSORTKEY= '-TIME ' This table is sorted in descending value of the TIME column.
-
TTYPE1 = 'SPECTRUM'
TFORM1 = '10J '
TSORTKEY= 'SPECTRUM(1)' This binary table is sorted in ascending value of the first element
of the SPECTRUM vector column.
-
TTYPE1 = 'NAME '
TFORM1 = '20A '
TSORTKEY= 'NAME ' This binary table is sorted in ascending order on all 20
characters of the NAME field. This is equivalent to
specifying TSORTKEY = 'NAME(1:20)'
-
TTYPE1 = 'OBJECT '
TFORM1 = 'A20 '
TSORTKEY= '-OBJECT(2:4)' This ASCII table is sorted in reverse order on the 2nd through 4th
characters (elements) of the OBJECT string column (which can
be considered to be a vector of character elements).
|