FCALC (Jul98) ftools.futils FCALC (Jul98)
NAME
fcalc -- Calculates values for a column using an arithmetic
expression.
USAGE
fcalc infile[ext#] outfile clname expr
DESCRIPTION
This task creates a new table using an arithmetic expression
applied to the values stored in the input table's columns.
Variables in the arithmetic expression represent columns in the
input file. The named column for the results will be created
unless it already exists. If the column named to store the results
of the expression already exists, its values will be overwritten.
Using the hidden parameters, fcalc will copy the primary extension,
all extensions, all columns in the extension where the arithmetic
expression is applied, or operate on only a subset of rows.
General Expression Syntax
The expression can be an arbitrarily complex series of operations
performed on constants, keyword values, and column data taken from
the specified FITS TABLE extension.
Keyword and column data are referenced by name. Any string of
characters not surrounded by quotes (ie, a constant string) or
followed by an open parentheses (ie, a function name) will be
initially interpreted as a column name and its contents for the
current row inserted into the expression. If no such column
exists, a keyword of that name will be searched for and its value
used, if found. To force the name to be interpreted as a keyword
(in case there is both a column and keyword with the same name),
precede the keyword name with a single pound sign, '#', as in
'#NAXIS2'. Due to the generalities of FITS column and keyword
names, if the column or keyword name contains a space or a
character which might appear as an arithmetic term then inclose the
name in '$' characters as in $MAX PHA$ or #$MAX-PHA$. Names are
case insensitive.
To access a table entry in a row other than the current one, follow
the column's name with a row offset within curly braces. For
example, 'PHA{-3}' will evaluate to the value of column PHA, 3 rows
above the row currently being processed. One cannot specify an
absolute row number, only a relative offset. Rows that fall outside
the table will be treated as undefined, or NULLs.
Boolean operators can be used in the expression in either their
Fortran or C forms. The following boolean operators are available:
"equal" .eq. .EQ. == "not equal" .ne. .NE. !=
"less than" .lt. .LT. < "less than/equal" .le. .LE. <= =<
"greater than" .gt. .GT. > "greater than/equal" .ge. .GE. >= =>
"or" .or. .OR. || "and" .and. .AND. &&
"negation" .not. .NOT. ! "approx. equal(1e-7)" ~
Note that the exclamation point, '!', is a special UNIX character,
so if it is used on the command line rather than entered at a task
prompt, it must be preceded by a backslash to force the UNIX shell
to ignore it.
The expression may also include arithmetic operators and functions.
Trigonometric functions use radians, not degrees. The following
arithmetic operators and functions can be used in the expression
(function names are case insensitive). A null value will be
returned in case of illegal operations such as divide by zero,
sqrt(negative) log(negative), log10(negative), arccos(.gt. 1),
arcsin(.gt. 1).
"addition" + "subtraction" -
"multiplication" * "division" /
"negation" - "exponentiation" ** ^
"absolute value" abs(x) "cosine" cos(x)
"sine" sin(x) "tangent" tan(x)
"arc cosine" arccos(x) "arc sine" arcsin(x)
"arc tangent" arctan(x) "arc tangent" arctan2(y,x)
"hyperbolic cos" cosh(x) "hyperbolic sin" sinh(x)
"hyperbolic tan" tanh(x) "round to nearest int" round(x)
"round down to int" floor(x) "round up to int" ceil(x)
"exponential" exp(x) "square root" sqrt(x)
"natural log" log(x) "common log" log10(x)
"modulus" x % y "random # [0.0,1.0)" random()
"random Gausian" randomn() "random Poisson" randomp(x)
"minimum" min(x,y) "maximum" max(x,y)
"cumulative sum" accum(x) "sequential difference" seqdiff(x)
"if-then-else" b?x:y
"angular separation" angsep(ra1,dec1,ra2,de2) (all in degrees)
Three different random number functions are provided: random(),
with no arguments, produces a uniform random deviate between 0 and
1; randomn(), also with no arguments, produces a normal (Gaussian)
random deviate with zero mean and unit standard deviation;
randomp(x) produces a Poisson random deviate whose expected number
of counts is X. X may be any positive real number of expected
counts, including fractional values, but the return value is an
integer.
An alternate syntax for the min and max functions has only a single
argument which should be a vector value (see below). The result
will be the minimum/maximum element contained within the vector.
The accum(x) function forms the cumulative sum of x, element by
element. Vector columns are supported simply by performing the
summation process through all the values. Null values are treated
as 0. The seqdiff(x) function forms the sequential difference of
x, element by element. The first value of seqdiff is the first
value of x. A single null value in x causes a pair of nulls in the
output. The seqdiff and accum functions are functional inverses,
i.e., seqdiff(accum(x)) == x as long as no null values are present.
The angsep function computes the angular separation in degrees
between 2 celestial positions, where the first 2 parameters give
the RA-like and Dec-like coordinates (in decimal degrees) of the
first position, and the 3rd and 4th parameters give the coordinates
of the second position.
In the if-then-else expression, "b?x:y", b is an explicit boolean
value or expression. There is no automatic type conversion from
numeric to boolean values, so one needs to use "iVal!=0" instead of
merely "iVal" as the boolean argument. x and y can be any scalar
data type (including string).
The following type casting operators are available, where the
inclosing parentheses are required and taken from the C language
usage. Also, the integer to real casts values to double precision:
"real to integer" (int) x (INT) x
"integer to real" (float) i (FLOAT) i
In addition, several constants are built in for use in numerical
expressions:
#pi 3.1415... #e 2.7182...
#deg #pi/180 #row current row number
#null undefined value #snull undefined string
A string constant must be enclosed in quotes as in 'Crab'. The
"null" constants are useful for conditionally setting table values
to a NULL, or undefined, value (eg., "col1==-99 ? #NULL : col1").
There is also a function for testing if two values are close to each
other, i.e., if they are "near" each other to within a user
specified tolerance. The arguments, value_1 and value_2 can be
integer or real and represent the two values who's proximity is
being tested to be within the specified tolerance, also an integer
or real:
near(value_1, value_2, tolerance)
When a NULL, or undefined, value is encountered in the FITS table,
the expression will evaluate to NULL unless the undefined value is
not actually required for evaluation, eg. "TRUE .or. NULL"
evaluates to TRUE. The following two functions allow some NULL
detection and handling:
"a null value?" ISNULL(x)
"define a value for null" DEFNULL(x,y)
The former returns a boolean value of TRUE if the argument x is
NULL. The later "defines" a value to be substituted for NULL
values; it returns the value of x if x is not NULL, otherwise it
returns the value of y.
Bit Masks
Bit masks can be used to select out rows from bit columns (TFORMn =
#X) in FITS files. To represent the mask, binary, octal, and hex
formats are allowed:
binary: b0110xx1010000101xxxx0001
octal: o720x1 -> (b111010000xxx001)
hex: h0FxD -> (b00001111xxxx1101)
In all the representations, an x or X is allowed in the mask as a
wild card. Note that the x represents a different number of wild
card bits in each representation. All representations are case
insensitive.
To construct the boolean expression using the mask as the boolean
equal operator described above on a bit table column. For example,
if you had a 7 bit column named flags in a FITS table and wanted
all rows having the bit pattern 0010011, the selection expression
would be:
flags == b0010011
or
flags .eq. b10011
It is also possible to test if a range of bits is less than, less
than equal, greater than and greater than equal to a particular
boolean value:
flags <= bxxx010xx
flags .gt. bxxx100xx
flags .le. b1xxxxxxx
Notice the use of the x bit value to limit the range of bits being
compared.
It is not necessary to specify the leading (most significant) zero
(0) bits in the mask, as shown in the second expression above.
Bit wise AND, OR and NOT operations are also possible on two or more
bit fields using the '&'(AND), '|'(OR), and the '!'(NOT) operators.
All of these operators result in a bit field which can then be used
with the equal operator. For example:
(!flags) == b1101100
(flags & b1000001) == bx000001
Bit fields can be appended as well using the '+' operator. Strings
can be concatenated this way, too.
Vector Columns
Vector columns can also be used in building the expression. No
special syntax is required if one wants to operate on all elements
of the vector. Simply use the column name as for a scalar column.
Vector columns can be freely intermixed with scalar columns or
constants in virtually all expressions. The result will be of the
same dimension as the vector. Two vectors in an expression, though,
need to have the same number of elements and have the same
dimensions. The only places a vector column cannot be used (for
now, anyway) are the SAO region functions and the NEAR boolean
function.
Arithmetic and logical operations are all performed on an element by
element basis. Comparing two vector columns, eg "COL1 == COL2",
thus results in another vector of boolean values indicating which
elements of the two vectors are equal.
Eight functions are available that operate on a vector and return a
scalar result:
"minimum" MIN(V) "maximum" MAX(V)
"average" AVERAGE(V) "median" MEDIAN(V)
"sumation" SUM(V) "standard deviation" STDDEV(V)
"# of values" NELEM(V) "# of non-null values" NVALID(V)
where V represents the name of a vector column or a manually
constructed vector using curly brackets as described below. The
first 6 of these functions ignore any null values in the vector
when computing the result. The STDDEV() function computes the
sample standard deviation, i.e. it is proportional to
1/SQRT(N-1) instead of 1/SQRT(N), where N is NVALID(V).
The SUM function literally sums all the elements in x,
returning a scalar value. If V is a boolean vector, SUM
returns the number of TRUE elements. The NELEM function
returns the number of elements in vector V whereas NVALID
return the number of non-null elements in the vector.
(NELEM also operates on bit and string columns, returning
their column widths.) As an example, to test whether all
elements of two vectors satisfy a given logical comparison,
one can use the expression
SUM( COL1 > COL2 ) == NELEM( COL1 )
which will return TRUE if all elements of COL1 are greater than
their corresponding elements in COL2.
To specify a single element of a vector, give the column name
followed by a comma-separated list of coordinates enclosed in
square brackets. For example, if a vector column named PHAS exists
in the table as a one dimensional, 256 component list of numbers
from which you wanted to select the 57th component for use in the
expression, then PHAS[57] would do the trick. Higher dimensional
arrays of data may appear in a column. But in order to interpret
them, the TDIMn keyword must appear in the header. Assuming that a
(4,4,4,4) array is packed into each row of a column named ARRAY4D,
the (1,2,3,4) component element of each row is accessed by
ARRAY4D[1,2,3,4]. Arrays up to dimension 5 are currently
supported. Each vector index can itself be an expression, although
it must evaluate to an integer value within the bounds of the
vector. Vector columns which contain spaces or arithmetic operators
must have their names enclosed in "$" characters as with
$ARRAY-4D$[1,2,3,4].
A more C-like syntax for specifying vector indices is also
available. The element used in the preceding example alternatively
could be specified with the syntax ARRAY4D[4][3][2][1]. Note the
reverse order of indices (as in C), as well as the fact that the
values are still ones-based (as in Fortran -- adopted to avoid
ambiguity for 1D vectors). With this syntax, one does not need to
specify all of the indices. To extract a 3D slice of this 4D
array, use ARRAY4D[4].
Variable-length vector columns are not supported.
Vectors can be manually constructed within the expression using a
comma-separated list of elements surrounded by curly braces ('{}').
For example, '{1,3,6,1}' is a 4-element vector containing the values
1, 3, 6, and 1. The vector can contain only boolean, integer, and
real values (or expressions). The elements will be promoted to the
highest datatype present. Any elements which are themselves
vectors, will be expanded out with each of its elements becoming an
element in the constructed vector.
Good Time Interval Filtering
A common filtering method involves selecting rows which have a time
value which lies within what is called a Good Time Interval or GTI.
The time intervals are defined in a separate FITS table extension
which contains 2 columns giving the start and stop time of each
good interval. The filtering operation accepts only those rows of
the input table which have an associated time which falls within
one of the time intervals defined in the GTI extension. A high
level function, gtifilter(a,b,c,d), is available which evaluates
each row of the input table and returns TRUE or FALSE depending
whether the row is inside or outside the good time interval. The
syntax is
gtifilter( [ "gtifile" [, expr [, "STARTCOL", "STOPCOL" ] ] ] )
or
gtifilter( [ 'gtifile' [, expr [, 'STARTCOL', 'STOPCOL' ] ] ] )
where each "[]" demarks optional parameters. Note that the quotes
around the gtifile and START/STOP column are required. Either
single or double quotes may be used. In cases where this
expression is entered on the Unix command line, enclose the entire
expression in double quotes, and then use single quotes within the
expression to enclose the 'gtifile' and other terms. It is also
usually possible to do the reverse, and enclose the whole
expression in single quotes and then use double quotes within the
expression. The gtifile, if specified, can be blank ("") which
will mean to use the first extension with the name "*GTI*" in the
current file, a plain extension specifier (eg, "+2", "[2]", or
"[STDGTI]") which will be used to select an extension in the
current file, or a regular filename with or without an extension
specifier which in the latter case will mean to use the first
extension with an extension name "*GTI*". Expr can be any
arithmetic expression, including simply the time column name. A
vector time expression will produce a vector boolean result.
STARTCOL and STOPCOL are the names of the START/STOP columns in the
GTI extension. If one of them is specified, they both must be.
In its simplest form, no parameters need to be provided -- default
values will be used. The expression "gtifilter()" is equivalent to
gtifilter( "", TIME, "*START*", "*STOP*" )
This will search the current file for a GTI extension, filter the
TIME column in the current table, using START/STOP times taken from
columns in the GTI extension with names containing the strings
"START" and "STOP". The wildcards ('*') allow slight variations in
naming conventions such as "TSTART" or "STARTTIME". The same
default values apply for unspecified parameters when the first one
or two parameters are specified. The function automatically
searches for TIMEZERO/I/F keywords in the current and GTI
extensions, applying a relative time offset, if necessary.
Spatial Region Filtering
Another common filtering method selects rows based on whether the
spatial position associated with each row is located within a given
2-dimensional region. The syntax for this high-level filter is
regfilter( "regfilename" [ , Xexpr, Yexpr [ , "wcs cols" ] ] )
where each "[]" demarks optional parameters. The region file
name is required and must be enclosed in quotes. The remaining
parameters are optional. The region file is an ASCII text file
which contains a list of one or more geometric shapes (circle,
ellipse, box, etc.) which defines a region on the celestial
sphere or an area within a particular 2D image. The region
file is typically generated using an image display program such
as fv/POW (distribute by the HEASARC), or ds9 (distributed by
the Smithsonian Astrophysical Observatory). Users should refer
to the documentation provided with these programs for more
details on the syntax used in the region files.
In its simpliest form, (e.g., regfilter("region.reg") ) the
coordinates in the default 'X' and 'Y' columns will be used to
determine if each row is inside or outside the area specified in
the region file. Alternate position column names, or
expressions, may be entered if needed, as in
regfilter("region.reg", XPOS, YPOS)
Region filtering can be applied most unambiguously if the
positions in the region file and in the table to be filtered
are both give in terms of absolute celestial coordinate units.
In this case the locations and sizes of the geometric shapes in
the region file are specified in angular units on the sky
(e.g., positions given in R.A. and Dec. and sizes in
arcseconds or arcminutes). Similarly, each row of the filtered
table will have a celestial coordinate associated with it.
This association is usually implemented using a set of
so-called 'World Coordinate System' (or WCS) FITS keywords that
define the coordinate transformation that must be applied to
the values in the 'X' and 'Y' columns to calculate the
coordinate.
Alternatively, one can perform spatial filtering using unitless
'pixel' coordinates for the regions and row positions. In this
case the user must be careful to ensure that the positions in
the 2 files are self-consistent. A typical problem is that the
region file may be generated using a binned image, but the
unbinned coordinates are given in the event table. The ROSAT
events files, for example, have X and Y pixel coordinates that
range from 1 - 15360. These coordinates are typically binned
by a factor of 32 to produce a 480x480 pixel image. If one
then uses a region file generated from this image (in image
pixel units) to filter the ROSAT events file, then the X and Y
column values must be converted to corresponding pixel units as
in:
regfilter("rosat.reg", X/32.+.5, Y/32.+.5)
Note that this binning conversion is not necessary if the region
file is specified using celestial coordinate units instead of
pixel units because CFITSIO is then able to directly compare the
celestial coordinate of each row in the table with the celestial
coordinates in the region file without having to know anything
about how the image may have been binned.
The last "wcs cols" parameter should rarely be needed. If
supplied, this string contains the names of the 2 columns
(space or comma separated) which have the associated WCS
keywords. If not supplied, the filter will scan the X and Y
expressions for column names. If only one is found in each
expression, those columns will be used, otherwise an error will
be returned.
The region shapes supported are (names are case insensitive):
Point ( X1, Y1 ) <- One pixel square region
Line ( X1, Y1, X2, Y2 ) <- One pixel wide region
Polygon ( X1, Y1, X2, Y2, ... ) <- Rest are interiors with
Rectangle ( X1, Y1, X2, Y2, A ) | boundaries considered
Box ( Xc, Yc, Wdth, Hght, A ) V within the region
Diamond ( Xc, Yc, Wdth, Hght, A )
Circle ( Xc, Yc, R )
Annulus ( Xc, Yc, Rin, Rout )
Ellipse ( Xc, Yc, Rx, Ry, A )
Elliptannulus ( Xc, Yc, Rinx, Riny, Routx, Routy, Ain, Aout )
Sector ( Xc, Yc, Amin, Amax )
where (Xc,Yc) is the coordinate of the shape's center; (X#,Y#) are
the coordinates of the shape's edges; Rxxx are the shapes' various
Radii or semimajor/minor axes; and Axxx are the angles of rotation
(or bounding angles for Sector) in degrees. For rotated shapes, the
rotation angle can be left off, indicating no rotation. Common
alternate names for the regions can also be used: rotbox <-> box;
rotrectangle <-> rectangle; (rot)rhombus <-> (rot)diamond; and pie
<-> sector. When a shape's name is preceded by a minus sign, '-',
the defined region is instead the area *outside* its boundary (ie,
the region is inverted). All the shapes within a single region
file are OR'd together to create the region, and the order is
significant. The overall way of looking at region files is that
if the first region is an excluded region then a dummy included
region of the whole detector is inserted in the front. Then
each region specification as it is processed overrides any
selections inside of that region specified by previous regions.
Another way of thinking about this is that if a previous
excluded region is completely inside of a subsequent included
region the excluded region is ignored.
The positional coordinates may be given either in pixel units,
decimal degrees or hh:mm:ss.s, dd:mm:ss.s units. The shape sizes
may be given in pixels, degrees, arcminutes, or arcseconds. Look
at examples of region file produced by fv/POW or ds9 for further
details of the region file format.
There are three functions that are primarily for use with SAO region
files and the FSAOI task, but they can be used directly. They
return a boolean true or false depending on whether a two
dimensional point is in the region or not:
"point in a circular region"
circle(xcntr,ycntr,radius,Xcolumn,Ycolumn)
"point in an elliptical region"
ellipse(xcntr,ycntr,xhlf_wdth,yhlf_wdth,rotation,Xcolumn,Ycolumn)
"point in a rectangular region"
box(xcntr,ycntr,xfll_wdth,yfll_wdth,rotation,Xcolumn,Ycolumn)
where
(xcntr,ycntr) are the (x,y) position of the center of the region
(xhlf_wdth,yhlf_wdth) are the (x,y) half widths of the region
(xfll_wdth,yfll_wdth) are the (x,y) full widths of the region
(radius) is half the diameter of the circle
(rotation) is the angle(degrees) that the region is rotated with
respect to (xcntr,ycntr)
(Xcoord,Ycoord) are the (x,y) coordinates to test, usually column
names
NOTE: each parameter can itself be an expression, not merely a
column name or constant.
PARAMETERS
infile [filename]
A file name (including optional extension number in square
brackets) of the FITS table to be searched.
outfile [filename]
Name of the output file that will contain the modified copy of
the input file. To overwrite a preexisting file with the same
name, prefix the name with an exclamation point '!' (or '\!' on
the Unix command line), or set the 'clobber' parameter = YES.
clname [string]
Name of the column to contain the calculated values. If the
column does not exist a new one will be created. If the column
exists its values will be overwritten.
expr [string]
The expression used to calculate the output column's values.
The expression can evaluate to any of the following data types:
boolean, integer, double, string, or bit. Fcalc will create
(if necessary) the column with the appropriate type to hold the
results. A text file containing the expression can be
specified by preceding the filename with '@', such as
'@file.txt'. The expression in the file can be arbitrarily
complex and extend over multiple lines of the file. Lines that
begin with 2 slash characters ('//') will be ignored and may be
used to add comments to the file.
(rowrange="-") [string]
A comma-separated list of row ranges which should be used. By
default, all the rows of a table are used, but one could
operate on only one or more row ranges written in the form
start-end.
(tform="") [string]
Optional TFORM keyword to use for a new column. If blank or
'-', a new column will be created based on the expression's
result type. If a single character TFORM is provided instead
of a full TFORM specification, an appropriate repeat value will
be inserted (for binary tables) to reflect the vector length of
the result. (For ASCII tables, default column widths will be
appended to the TFORM column type.) If the output column
already exists, this parameter is ignored.
(anull="") [string]
The ascii string to represent NULL values in the output column
of an ascii table.
(inull=0) [integer]
The integer value to represent NULL values in an output integer
column in a binary table.
(copycol=yes) [boolean]
If true, all columns in the input table will be copied to the
output table, otherwise only the column named by the column
parameter will appear in the output table.
(histkw=yes) [boolean]
If true, history records will be added to the output file
primary header indicating that fcalc created the file, and a
history records of the expression used will be added to the
extension header.
(copyall=yes) [boolean]
If true, all other extensions, including the primary array are
copied to the output file.
(clobber=no) [boolean]
If true, and existing file with the same name as the specified
output file will be overwritten.
EXAMPLES
1. Calculate the values for a column named AREA in the first
extension of a the area.fits file using the columns X and Y found
in the first extension of the input.fits file using the expression
"X*Y":
ft> fcalc input.fits area.fits AREA "X*Y"
2. Calculate the dot product of two 2-D vector columns, (A.B) of the
input file named twovec.fits and store the result in the DOT column
of the output file dot.fits:
ft> fcalc twovec.fits dot.fits DOT "A[1]*B[1] + A[2]*B[2]"
3. Shift all the values in the ORIGIN column of the input file,
in.fits by 5.50 and store the result in the same ORIGIN column of
the output file out.fits:
ft> fcalc in.fits out.fits ORIGIN "ORIGIN + 5.50"
4. Normalize a spectrum in the vector column SPEC by its average
value:
ft> fcalc spec.fits out.fits SPEC
"SPEC/( SUM(SPEC) / NELEM(SPEC) )"
5. Count the number of pixels in a scanline, held in vector column
SCAN, with intensities above a value given by a detection-threshhold
keyword, DETECT, putting the result in a new column CNT:
ft> fcalc scan.fits out.fits CNT "SUM( SCAN>DETECT )"
BUGS
SEE ALSO
ftcalc, ftselect, ftcopy, maketime. fv, the interactive FITS file
editor, can also be used to perform calculations on FITS columns.