Generate some `completeness' histograms

XMM-Newton Science Analysis System

eimsimreduce (eimsim-2.4) [xmmsas_20190531_1155-18.0.0]

Generate some `completeness' histograms

This function may be performed alone by calling the script with entrystage and finalstage=`completeness'.

The main aim of this function is to produce a histogram which shows the fraction of the simulated sources which have been detected, as a function of the flux of the simulated source. One expects this to be close to 1 in the bright limit, but to fall to zero towards the faint end. The flux at which the detected fraction falls to about 1/2 can be considered the sensitivity of the detection technique which was employed. See figure 3 for an example of a plot of cumulative completeness.

Note that any sensitivity figure obtained in this fashion represents an average across the entire mosaiced field of view. Typically the exposure and detected background flux vary greatly over such a mosaic. If precise sensitivity figures are desired it would probably be better to use artificial exposure and background templates, in which the pixel values for each instrument and energy band were either constant0 or 0. The non-zero area would also need to be the same shape and extent for each instrument.

**Figure 3:** Cumulative completeness (100 runs, 2xmm detection).
$\begin{figure}\centerline{\psfig{figure=Completeness5.eps,angle=-90,width=0.9\textwidth}} \end{figure}$

The first step performed by the present function is to make a histogram of the occurrence of simulated sources as a function of $\log_{10}($ SIM_FLUX. All the available lists of simulated sources are harvested in this step. The columns created are

LOG10_SF_LO, $\log_{10}$ of the lower edge of the bin.
LOG10_SF_HI, $\log_{10}$ of the upper edge of the bin. The bins all have equal widths in $\log_{10}$ space.
N_SIM, the total number of `detectable' simulated sources (ie, those for which INV_SENSY ) which have fluxes that fall within the bin.
NET_FLUX, sum of SIM_FLUX within the histogram bin.

Some possibly useful additional columns are next calculated from these:

N_SIM_ERR: Square root of N_SIM.
N_SIM_INT: Reverse-cumulative total of N_SIM (ie, sum of N_SIM in this plus all brighter bins).
DENS_SIM: Average sky-density (in deg $^{-2}$ ) of (detectable) sim sources. This is N_SIM divided by SKY_AREA divided by the number of source lists which were merged. SKY_AREA, which is read from the keyword of that name in the lists of detected sources, is the area in square degrees of the non-zero parts of the mosaiced maps of reciprocal sensitivity, which were constructed as part of eimsimprep.
DENS_SIM_ERR: Error in DENS_SIM.
DENS_SIM_INT: Reverse-cumulative total of DENS_SIM.
FLUX_DENS_INT: Forward-cumulative total of NET_FLUX, divided by the sky area.
SIM_FLUX_LO: SIM_FLUX at the lower edge of the bin (10.0**LOG10_SF_LO). This should be used as the -value when plotting any `reverse-cumulative' quantity on the -axis.
SIM_FLUX_MID: SIM_FLUX at the middle of the bin. This should be used as the -value when plotting any `differential' quantity on the -axis.
SIM_FLUX_HI: SIM_FLUX at the upper edge of the bin (10.0**LOG10_SF_HI). This should be used as the -value when plotting any `forward-cumulative' quantity on the -axis.

Now it is time to tally up the detections. However, we need now to make a distinction between detections which are likely to be `genuine' and those which are not. `Genuine' is a somewhat slippery concept in present application, but we do have a quantity which we can use to get a handle on it, namely the probability MATCH_PNULL that the match between a detection and its matching simulated source could have occurred by chance. We define a cutoff value of MATCH_PNULL and declare that all those detections for which MATCH_PNULL falls below the cutoff are genuine, and the others not. The cutoff is under user control via the parameter probcutoff of eimsimreduce. Detected sources for which SIM_INV_SENSY = 0 are also screened out at this stage.

The situation is actually even a little bit more complicated, due to the fact that, although we may be fairly confident that genuine detections have small values of MATCH_PNULL, spurious detections have values which are evenly spread between 0 and 1. This means that our initial tally of detections with MATCH_PNULL below the cutoff $P_\mathrm{cut}$ comprises not the total number of reliable detections, out of a total , but $R^\prime = R + P_\mathrm{cut} \times (A - R)$ - ie there are some black sheep among the white. is thus calculated from $R^\prime$ as

$\begin{displaymath} R = \frac{R^\prime - P_\mathrm{cut} A}{1 - P_\mathrm{cut}} \end{displaymath}$

and

$\begin{displaymath} \sigma_R^2 = \frac{R^\prime + P_\mathrm{cut}^2 A}{(1 - P_\mathrm{cut})^2}. \end{displaymath}$

The next batch of columns to be calculated are as follows:

N_ALL_DET: All detections (same bin edges as for the simulated sources).
N_ALL_DET_ERR: Square root of N_ALL_DET.
N_ALL_DET_INT: Reverse-cumulative total of N_ALL_DET.
N_TRUE_DET: `Genuine' detections, as given by the formula above.
N_TRUE_DET_ERR: Uncertainty in N_TRUE_DET, as given by the formula above.
N_TRUE_DET_INT: Reverse-cumulative total of N_TRUE_DET.
DENS_DET: Average sky-density (in deg $^{-2}$ ) of reliable detections, derived in the same fashion as column DENS_SIM.
DENS_DET_ERR: Uncertainty in DENS_DET.
DENS_DET_INT: Reverse-cumulative total of DENS_DET.

The desired result is then calculated and expressed in the final three columns:

COMP_RATIO: DENS_DET/DENS_SIM.
COMP_RATIO_ERR: Error in COMP_RATIO.
COMP_RATIO_INT: DENS_DET_INT/DENS_SIM_INT.

A last function of this task is to append to the output dataset a table named THEORY, which is a version of the SRCSPECS table of the sim source specification template designed to make it easy to compare the theoretical logN-logS of the simulated sources with the actual logN-logS. You can do this for example using the ftool fv. If you plot first DENS_SIM_INT against SIM_FLUX_INT; then overlay this with a second plot, of THEORY columns DENSITY against FLUX; then change the axes scales to log-log; you will see what I mean.

PLEASE NOTE if you do this that the real distribution will very often appear not to match the theoretical logN-logS very well at the bright end of the scale. Such deviations appear more significant than they really are, because the brain expects the values in adjacent flux bins to be statistically independent, which is not true of a cumulative plot. A comparison of differential plots is often much more satisfying.

Subsections

What's the difference between DETEC_PNULL and MATCH_PNULL?

XMM-Newton SOC/SSC -- 2019-06-02