Care must be taken when designing software to achieve the best possible performance when processing the FITS data files. The following paragraphs describe some strategies that may be used to improve the processing speed of software that uses FITSIO.
1. When dealing with a FITS primary array or IMAGE extension, it is more efficient to read or write large chunks of the image at a time. When reading or writing large chunks of contiguous data in the FITS file (at least 3 FITS blocks = 8640 bytes) FITSIO bypasses the internal buffers that it uses for small pieces of data (e.g., when reading FITS keywords). This is more efficient because the data are not copied to the intermediate buffer.
2. When dealing with FITS tables, the most important efficiency factor in the software design is to read or write the data in the FITS file in a single pass through the file. An example of poor program design would be to read a large, 3-column table by sequentially reading the entire first column, then going back to read the 2nd column, and finally the 3rd column; this obviously requires 3 passes through the file which could triple the execution time of an I/O limited program. For small tables this is not important, but when reading multi-megabyte sized tables these inefficiencies can become significant. The more efficient procedure in this case is to read or write only as many rows of the table as will fit into the available internal I/O buffers, then access all the necessary columns of data within that range of rows. Then after the program is completely finished with the data in those rows it can move on to the next range of rows that will fit in the buffers, continuing in this way until the entire file has been processed. By using this procedure of accessing all the columns of a table in parallel rather than sequentially, each block of the FITS file will only be read or written once.
The optimal number of rows to read or write at one time in a given table depends on the width of the table row, on the number of I/O buffers that have been allocated in FITSIO, and also on the number of other FITS files that are open at the same time (since one I/O buffer is always reserved for each open FITS file). Fortunately, a FITSIO routine is available that will return the optimal number of rows for a given table: call ftgrsz(unit, nrows, status). It is not critical to use exactly the value of nrows returned by this routine, as long as one does not exceed it. Using a very small value however can also lead to poor preformance because of the overhead from the larger number of subroutine calls.
The optimal number of rows returned by ftgrsz is valid only as long as the application program is only reading or writing data in the specified table. Any other calls to access data in the table header or in any other FITS file would cause additional blocks of data to be loaded into the I/O buffers displacing data from the original table, and should be avoided during the critical period while the table is being read or written.
Occasionally it is necessary to simultaneously access more than one FITS table, for example when transfering values from an input table to an output table. In cases like this, one should call ftgrsz to get the optimal number of rows for each table separately, than reduce the number of rows proportionally. For example, if the optimal number of rows in the input table is 3600 and is 1400 in the output table, then these values should be cut in half to 1800 and 700, respectively, if both tables are going to be accessed at the same time.
3. Alway use binary table extensions rather than ASCII table extensions for better efficiency when dealing with tabular data. The I/O to ASCII tables is slower because of the overhead in formatting or parsing the ASCII data fields, and because ASCII tables are about twice as large as binary tables with the same information content.
4. Design software so that it reads the FITS header keywords in the same order in which they occur in the file. When reading keywords, FITSIO searches forward starting from the position of the last keyword that was read. If it reaches the end of the header without finding the keyword, it then goes back to the start of the header and continues the search down to the position where it started. In practice, as long as the entire FITS header can fit at one time in the available internal I/O buffers, then the header keyword access will be very fast and it makes little difference which order they are accessed.
5. Avoid the use of scaling (by using the BSCALE and BZERO or TSCAL and TZERO keywords) in FITS files since the scaling operations add to the processing time needed to read or write the data. In some cases it may be more efficient to temporarily turn off the scaling (using ftpscl or fttscl) and then read or write the raw unscaled values in the FITS file.
6. Avoid using the 'implicit datatype conversion' capability in FITSIO. For instance, when reading a FITS image with BITPIX = -32 (32-bit floating point pixels), read the data into a single precision floating point data array in the program. Forcing FITSIO to convert the data to a different datatype can significantly slow the program.
7. Design FITS binary tables so that every column is aligned on a computer word boundary and so that each row is a multiple number of computer words in length. Accessing non-aligned words can be slower on some machines. This is usually not a problem when using FITSIO to read or write the FITS files, but other FITS readers and writers could be affected. In practice, this means that double precision columns should start at a multiple of 8 bytes within the row, single precision floating point columns and integer columns should start at a multiple of 4 bytes, and short integer columns should start at a multiple of 2 bytes. If necessary, the row length should be padded out by adding a dummy column of the appropriate width or by adjusting the width of an existing column so that the row length is also a multiple number of words in length. For example, if a binary table contains a '1B', a '1E', and a '1D' column, then the optimum design would place the '1D' column first in the table followed by the '1E' and then the '1B' column. Since the row length is then 8 + 4 + 1 = 13 bytes, one should add another dummy column, with a 3A datatype to make the length a multiple of the double precision word length. Alternatively, one could change the last column from '1B' to '4B'. This will insure that all the data values are optimally aligned.
8. Where feasible, design FITS binary tables so that the columns of data are written as a contiguous set of bytes, rather than as single elements in multiple rows. For example, it is much faster to access the data in a table that contains a single row and 2 columns with TFORM keywords equal to '1000E' and '1000J', than it is to access the same amount of data in a table with 1000 rows which has columns with the TFORM keywords equal to '1E' and '1J'. In the former case the 1000 floating point values in the first column are all written in a contiguous block of the file which can be read or written quickly, whereas in the second case each floating point value in the first column is interleaved with the integer value in the second column of the same row so FITSIO has to explicitly move to the position of each element to be read or written.
9. Avoid the use of variable length vector columns in binary tables, since any reading or writing of these data requires that FITSIO first look up or compute the starting address of each row of data in the heap.
10. When copying data from one FITS table to another, it is faster to transfer the raw bytes instead of reading then writing each column of the table. The FITSIO subroutines FTGTBS and FTPTBS (for ASCII tables), and FTGTBB and FTPTBB (for binary tables) will perform low-level reads or writes of any contiguous range of bytes in a table extension. These routines can be used to read or write a whole row (or multiple rows) of a table with a single subroutine call. These routines are fast because they bypass all the usual data scaling, error checking and machine dependent data conversion that is normally done by FITSIO, and they allow the program to write the data to the output file in exactly the same byte order. For these same reasons, use of these routines can be somewhat risky because no validation or machine dependent conversion is performed by these routines. In general these routines are only recommended for optimizing critical pieces of code and should only be used by programmers who thoroughly understand the internal byte structure of the FITS tables they are reading or writing.
11. Finally, external factors such as the type of magnetic disk controller (SCSI or IDE), the size of the disk cache, the average seek speed of the disk, the amount of disk fragmentation, and the amount of RAM available on the system can all have a significant impact on overall I/O efficiency. For critical applications, a system adminstrator should review the proposed system hardware to identify any potential I/O bottlenecks.