NAME

ftmergesort - Sort the rows in a very large FITS table

USAGE

ftmergesort infile[ext][filters] outfile columns

DESCRIPTION

ftmergesort is designed to perform sorting operations upon very large files. Such inputs are larger than can be stored in computer memory at one time, or will exceed swap. For smaller input files, use the task ftsort.

This task ftmergesort creates a sorted copy of the input table in which the rows are sorted in ascending or descending order based on the values in a specified column or set of columns in the table. If more than one column is specified then the rows that have the same value in the first column are sorted in order of the value in the second column, and so on for any further specified columns. Precede the column name with a minus sign to sort in descending order.

Internally, ftmergesort functions by creating a series of intermediate partial output files which are sorted. These intermediate files are then sorted together using mergesort. The sorting algorithms used by ftsort (heap, shell, insert) are available for the intermediate stage, but mergesort is always used for the merging phase.

ftmergesort may use a significant amount of temporary disk space. Users should be prepared for double the amount of the original file size. ftmergesort uses a heuristic to determine how many intermediate files to create. If only one file is needed, then operation is equivalent to ftsort.

WARNINGS

Using any CFITSIO on-the-fly expressions will prevent the task from functioning as expected. These include: specifying a compressed file; using a colfilter calculator expression; using a rowfilter selection expression.

The input file should be an uncompressed file, and any filtering or column operations must have already been applied.

PARAMETERS

infile [filename]
Input file name and optional extension name or number enclosed in square brackets of the table to be sorted (e.g., 'file.fits[events]'). If an explicit extension is not specified, then the first 'interesting' table in the input file will be sorted, i.e., the first table extension that is not a GTI (Good Time Interval) extension. Additional table filters (such as row or column filters) should NOT be appended to the file name, as noted above.

outfile [filename]
Output file name for the sorted file. Precede it with an exclamation point, !, (or \! on the Unix command line), to overwrite a preexisting file with the same name (or set the clobber parameter to YES).

columns [string list]
A comma separated list of the column names (or numbers) on which to sort the table. To sort in reverse order (from largest to smallest) put a minus sign in front of the column name. If more than one column is specified then the rows that have the same value in the first column are sorted in order of the value in the second column, and so on for any further specified columns.

(method = "heap") [string]
Sorting algorithm to be used for intermediate sorting. The final sort will always be mergesort. Supported algorithms are the "heap" (NlogN), "shell" (N**1.5) and "insert" (N**2) sort. The shell sort gives better performance with midsize data sets. The heap sort gives the best speed when dealing with large random datasets. The insertion sort works best when the dataset is very nearly sorted, i.e., one value out of place.

(memory = YES) [boolean]
Ignored for ftmergesort, but present for drop-in compatibility with ftsort. The partial sorts are done with memory=YES always.

(unique = NO) [boolean]
Flag used to determine if rows with identical sort keys should be purged, keeping only one unique row. Columns not included in the sort are not tested for uniqueness.

(copyall = YES) [boolean]
If copyall = YES (the default) then all other HDUs in the input file will also be copied, without modification, to the output file. If copyall = NO, then only the single table HDU specified by infile will be copied to the output file along with the required null primary array.

(clobber = NO) [boolean]
If outfile already exists, then setting 'clobber = yes' will cause it to be overwritten.

(startrow = 0) [integer]
Starting row number to sort, or 0 to use first row. Rows before startrow are not copied to the output.
(nrows = 0) [integer]
Number of rows to sort, or 0 to use all rows from startrow to the end of the file. Rows after startrow+nrows are not copied to the output.
(chatter = 1) [integer, 0 - 5]
Controls the amount of informative text written to standard output. Setting chatter = 5 will produce detailed diagnostic output, otherwise this task normally does not write any output.

(history = NO) [boolean]
If history = YES, then a set of HISTORY keywords will be written to the header of the sorted HDU to record the value of all the ftsort task parameters that were used to produce the output file.

EXAMPLES

See ftsort for examples. ftmergesort is a drop-in replacement for ftsort.

SEE ALSO

ftsort

LAST MODIFIED

Apr 2019