Understanding Crowding

Understanding Crowding in TESS data

Welcome everyone to our TESS Lightkurve tutorial!


Rebekah Hounsell - Support scientist for TESS in the NASA GSFC GI Office.

Download the notebook

If you would like to download a copy of this notebook you can do so by clicking the link below


Learning goals

In this tutorial, we will teach the user about crowding in the TESS data products and how to correct for it.

This tutorial assumes a basic knowledge of python and astronomy, and will walk the user through several of the concepts outlined below:

  1. Downloading and comparing LightCurve Object data
  2. Examining a TargetPixel File (TPF) for crowding
  3. Creating a light curve from a TPF
  4. Removing the effects of scattered light and noise
  5. Removing the effects of crowding

This tutorial is designed for users that have previous experience with Lightkurve.


This tutorial requires the use of specific packages: - Lightkurve to work with TESS data (v2.0.1) - Matplotlib for plotting. - Numpy for manipulating the data.

First time users

If you are not that experienced with Python, or cannot download Lightkurve, you can run this notebook as a Colab notebook. Colaboratory allows users to write and execute Python in your browser with zero configuration required.

All you need is a Google account and to copy and paste in the following command at the top of your colab notebook:

!pip install git+https://github.com/lightkurve/lightkurve.git --quiet

This downloads the Lightkurve package.

import numpy as np
import lightkurve as lk
import matplotlib.pyplot as plt
from lightkurve.correctors import RegressionCorrector, DesignMatrix
from lightkurve.correctors import PLDCorrector
import warnings

Introduction into crowding

TESS photometry, while high-cadence and of high quality, does suffer from crowding issues. A single TESS pixel corresponds to 21 arcseconds (~0.35 arc min) on sky and the TESS Pixel Response Function (PRF) is very large compared to the pixel. A target of interest may therefore be contaminated by any number of neighboring objects, and it is important that the light from these other objects be accounted for and removed.

For exoplanets, if this excess flux is not removed, it can cause a decrease in the apparent planet transit depth and lead to a systematic underestimation of the planet radii.

LightCurve Objects have been corrected for this crowding via the data processing pipeline developed by the Science Processing Operations Center (SPOC). A description of this correction and its application is provided in Section 2.3.11 of this paper. The correction however is applied only to the PDCSAP flux and not the SAP flux.

The crowding correction applied focuses on two parameters: - The crowding metric: This reflects what fraction of the flux in the aperture is due to the target itself, not the nearby light sources. - The flux fraction: Similar to excess flux leaking into the aperture, a fraction of the PRF of the target may not be captured in it. To account for this missing fraction, the flux fraction is computed.

If PDCSAP flux is not available, the user may apply the corrections outlined below to remove not only the instrumental noise, but any additional crowding effect.

1. Downloading and comparing LightCurve Object data

In this tutorial we will be examining at the binary star system WR21a. Let’s first see if there are any LightCurve objects avalible for download.

search_lk = lk.search_lightcurve("WR21a")
SearchResult containing 4 data products.
0TESS Sector 092019QLP18004645701670.0
1TESS Sector 102019QLP18004645701670.0
2TESS Sector 362021SPOC1204645701670.0
3TESS Sector 372021SPOC1204645701670.0

OK, great! There are data for multiple sectors. For this tutorial, we will examining data from Sector 36 which is provided by the SPOC. We can download this via the following functions,

lc36 = search_lk[2].download()
<matplotlib.axes._subplots.AxesSubplot at 0x7fc6e008eed0>

This very clearly shows the transit of the system. The flux displayed is the PDCSAP flux which has been fully corrected. Let’s see what the SAP flux looks like in comparison. We can do this by specifying the column in the plot function.

ax = lc36.scatter(normalize=False, label='PDCSAP')
lc36.scatter(ax=ax, column='sap_flux', normalize=False, color='red', label='SAP')
plt.xlim(xmin=2290, xmax=2293)
(2290, 2293)

The above plot clearly indicates the significant difference in amplitude between the PDCSAP and SAP flux. The plot illustrates how much of a difference the flux fraction and crowding metric can change the absolute magnitudes of the light curves.

Now we plot the normalized PDCSAP and SAP flux to show that the transit depth has changed.

ax = lc36.scatter(normalize=True, label='PDCSAP')
lc36.scatter(ax=ax, column='sap_flux', normalize=True, color='red', label='SAP')
plt.xlim(xmin=2290, xmax=2293)
(2290, 2293)

The transit depth of the SAP flux is not the same as the PDCSAP. This difference is a good indication that the data suffers from crowding.

2. Examining a Target Pixel File for crowding

To fully appreciate how crowded our object is, we can examine its surroundings via downloading and plotting the associated TargetPixel File (TPF).

tpf = lk.search_targetpixelfile('WR21a', sector=36).download(quality_bitmask='hard')

Note for this tutorial we are specifing the quality_bitmask to be hard. This is to ensure that only good quality data is downloaded. We can now plot our TPF and display the aperture mask that has been defined by the SPOC for our object of interest.

<matplotlib.axes._subplots.AxesSubplot at 0x7fc702669590>

The above plot indicates that we are focusing on the right object, but that it is indeed crowded by a much brighter star and as such, we need to remove the contaminating flux.

3. Creating a light curve from a TPF

To do this, we must first create the light curve of the object using the default mask, as shown below.

tpf_lc = tpf.to_lightcurve(aperture_mask=tpf.pipeline_mask)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc730838450>

A quick glance at this light curve indicates that there are also long term trends/noise that need to be removed before we can correct for crowding.

4. Removing the effects of scattered light and noise

As indicated above, the SAP light curve is effected by scattered light and noise. We must remove this before correcting for crowding. To do this, we can used one of Lighkurves built in corrector functions, in this case Pixel Level Decorrelation (PLD).

The inputs required are the TPF, the aperture, and the number of principal componants.

pld = PLDCorrector(tpf, aperture_mask=tpf.pipeline_mask)
pltAxis = pld.diagnose()
pltAxis[0].set_ylim(ymin=6000, ymax=8000)
pltAxis[1].set_ylim(ymin=6000, ymax=8000)
pltAxis[2].set_ylim(ymin=6000, ymax=8000)

pld_lc = pld.correct(pca_components=5, aperture_mask=tpf.pipeline_mask)

Lets compare this corrected light curve with our previous light curves.

ax = lc36.scatter(normalize=True, label='PDCSAP')
tpf_lc.scatter(ax=ax, normalize=True, color='red', label='SAP ORIGINAL')
pld_lc.scatter(ax=ax, normalize=True, color='green', label='PLD SAP')
plt.xlim(xmin=2290, xmax=2293)
(2290, 2293)

The depth of the PLD SAP light curve still does not match that of the PDCSAP, but there is some improvement in the light curve overall. Let’s now apply the crowding correction to the pld_lc.

5. Removing the effects of crowding

To do this, we must first pull out the two parameters we need to calculate the correction - CROWDSAP and FLFRCSAP

CROWDSAP = tpf.hdu[1].header['CROWDSAP']
FLFRCSAP = tpf.hdu[1].header['FLFRCSAP']

From the above values, we see that the aperture contains only 71% of the object’s flux, and an extra 21% of the flux in the aperture is due to other objects.

To correct for the crowding and missing flux, we must first calculate the median flux of our time series. Note that we want only data that is of a high quality, which is why we originally set our quality_mask as hard.

median_flux = np.median(pld_lc.flux.value)

The excess flux in the aperture is then calculated as (1-CROWDSAP) times the median flux

excess_flux = (1-CROWDSAP)*median_flux

This excess flux must then be subtracted from the time series data

flux_removed = pld_lc.flux.value  - excess_flux

This residual flux, however, does not account for the flux of our object outside of the aperture, as such there is one more correction to apply - FLFRCSA.

flux_corr = flux_removed/FLFRCSAP

The uncertainties on this flux are also now altered to be

flux_err_corr = pld_lc.flux_err.value/FLFRCSAP

We can now convert this into a LightCurve Object again via the following

lc_corr = lk.LightCurve(time=tpf.time.value, flux=flux_corr, flux_err=flux_err_corr)

Let’s plot and compare to our previous light curves.

ax = lc36.scatter(normalize=True, label='PDCSAP')
tpf_lc.scatter(ax=ax, normalize=True, color='red', label='SAP ORIGINAL', alpha=0.5)
pld_lc.scatter(ax=ax, normalize=True, color='green', label='PLD SAP', alpha=0.5)
lc_corr.scatter(ax=ax, normalize=True, color='blue', label='PLD SAP CORR', alpha=0.5)
plt.xlim(xmin=2290, xmax=2293)
(2290, 2293)

The corrected light curve is now significantly closer to that of the PLDSAP light curve. There are still some minor descrpancies, but these are are realated primarily to the removal of noise. Adjustments in the noise removal procedure applied to the SAP light curve can further improve this reduction.

Let’s try another method - the CBV corrector.


from lightkurve.correctors import CBVCorrector
cbvCorrector = CBVCorrector(tpf_lc)
[TESS CBVs, Sector.Camera.CCD : 36.3.1, CBVType : SingleScale, nCBVS : 16,
 TESS CBVs, Sector.Camera.CCD : 36.3.1, CBVType.Band: MultiScale.1, nCBVs : 8,
 TESS CBVs, Sector.Camera.CCD : 36.3.1, CBVType.Band: MultiScale.2, nCBVs : 8,
 TESS CBVs, Sector.Camera.CCD : 36.3.1, CBVType.Band: MultiScale.3, nCBVs : 8,
 TESS CBVs, Sector.Camera.CCD : 36.3.1, CBVType : Spike, nCBVS : 6]
# Select which CBVs to use in the correction
cbv_type = ['SingleScale', 'Spike']
cbv_indices = [np.arange(1,9), 'ALL']
# Perform the correction
cbvCorrector.correct_gaussian_prior(cbv_type=cbv_type, cbv_indices=cbv_indices, alpha=1e-4)

Let’s check to see if we have over or underfit the data.

with warnings.catch_warnings():
    # ignore "RuntimeWarning"
    warnings.simplefilter("ignore", RuntimeWarning)
    cbvCorrector.goodness_metric_scan_plot(cbv_type=cbv_type, cbv_indices=cbv_indices);

We might be slightly overfitting, so let’s adjust our alpha.

cbvCorrector.correct_gaussian_prior(cbv_type=cbv_type, cbv_indices=cbv_indices, alpha=1e-1)

Now we can apply the crowding corrections.

# Perform the FF and CM corrections
median_flux = np.median(cbvCorrector.corrected_lc.flux.value)
excess_flux = (1-CROWDSAP)*median_flux
flux_removed = cbvCorrector.corrected_lc.flux.value  - excess_flux
flux_corr = flux_removed/FLFRCSAP
flux_err_corr = cbvCorrector.corrected_lc.flux_err.value/FLFRCSAP
lc_cbv_corr = lk.LightCurve(time=tpf.time.value, flux=flux_corr, flux_err=flux_err_corr)

We can now compare to the PDCSAP and SAP light curves.

ax = tpf_lc.scatter(normalize=True, color='red', label='SAP ORIGINAL')
lc36.scatter(ax=ax, normalize=True, label='PDCSAP')
lc_cbv_corr.scatter(ax=ax, normalize=True, color='cyan', label='CBV Corrected', alpha=0.5)
plt.xlim(xmin=2290, xmax=2293);

Finally, we can compare our PLD and CBV corrected light curves.

ax = lc36.scatter(normalize=True, label='PDCSAP')
lc_cbv_corr.scatter(ax=ax, normalize=True, color='cyan', label='CBV Corrected')
lc_corr.scatter(ax=ax, normalize=True, color='blue', label='PLD SAP CORR', alpha = 0.5)
plt.title('Comparing PLD correction to CBV Correction')
plt.xlim(xmin=2290, xmax=2293);

The CBV light curve might be a better match to the PDCSAP.