Accessing HEASARC and LAMBDA data in the Cloud

Introduction

Beginning in 2023, the Year of Open Science, as part of NASA's Open Science Initiative, and in collaboration with the Amazon Web Services (AWS) Open Data project, HEASARC data are now available in the cloud. This effort is motivated by the need to increase the accessibility of this data in the broader community and to enable the kind of science that requires the significant resources of cloud computing.

HEASARC data are now on AWS and registered in their Open Data Registry in two buckets called "nasa-heasarc" and "nasa-lambda". Below we show in a tutorial notebook how to do this several ways:

  • in Python, using tools like Astropy,
  • using our new lightweight Hark search and download tool,
  • direct access using HTTPS or the AWS command line interface (CLI).
The file locations on S3 could then be used with cloud-compatible client software such as Astropy-affiliated packages Astroquery and PyVO to provide seamless access to data access in the cloud. Our Xamin data portal offers results in various formats including a list of cloud URIs.

NASA Astrophysics including HEASARC are building cloud analysis capabilities with the Fornax Initiative. See details ....

Pythonic Data Access Tutorial

We have updated our Astroquery module in a number of ways including providing cloud access as described on it's documentation page. In addition, we have a quick tutorial on accessing HEASARC or LAMBDA data in the cloud using more direct tools such as PyVO and boto3. You can download the Python notebook or view it rendered as HTML.

Some software, such as Astropy's FITS IO routines can read data directly from the S3 bucket, including with options to read only a subset of a FITS file. Tools like HEASoft based on cfitsio can also read any file out of a URL. See below.

Note that some HEASoft tools that rely on knowing the directory structure of an input dataset might require you to copy the data out of the S3 object store and into a file system it can access.

Tools

The HEASARC now provides a standalone tool, hark, that can be used to search for data and download it directly from AWS. We recommand using this tool for fast access to data in the cloud, especially if you are download large amounts of data and/or as part of parallel pipelines, because it does not depend on the speed of the HEASARC servers.

See the hark for details

Direct Bucket Access

These data can currently be accessed by using the HEASARC or LAMBDA web tools to browse the archive and retrieve a list of observations or files to download, or by doing the same with one of our APIs. (See our archive pages for the HEASARC options or the LAMBDA data portal.) If the given tool does not return cloud URIs, they can be inferred from the on premises URL. Simply replace the beginning of the traditional access URL with the AWS S3 bucket address. For example, a Chandra image located at

https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/5/4475/primary/acisf04475N004_full_img2.fits.gz

can also be found in the "nasa-heasarc" bucket at

s3://nasa-heasarc/chandra/data/byobsid/5/4475/primary/acisf04475N004_full_img2.fits.gz
or
https://nasa-heasarc.s3.amazonaws.com/chandra/data/byobsid/5/4475/primary/acisf04475N004_full_img2.fits.gz

For LAMBDA data, similar URLs can be turned into URIs using the bucket name "nasa-lambda". Note that for WMAP, there is one small change to the path from "map" to "wmap" to clarify that it's the mission name. I.e.,

https://lambda.gsfc.nasa.gov/data/map/dr5/skymaps/9yr/smoothed/wmap_band_smth_iqumap_r9_9yr_K_v5.fits

can also be found at

s3://nasa-lambda/wmap/dr5/skymaps/9yr/smoothed/wmap_band_smth_iqumap_r9_9yr_K_v5.fits
or
https://nasa-lambda.s3.amazonaws.com/wmap/dr5/skymaps/9yr/smoothed/wmap_band_smth_iqumap_r9_9yr_K_v5.fits

For bulk data access, e.g. to download a directory and its contents, you will need to use the AWS CLI. For example, to list the contents of a directory on AWS:

aws s3 ls s3://nasa-heasarc/swift/data/obs/ --no-sign-request

and to download a directory:

aws s3 cp --recursive --no-sign-request s3://nasa-heasarc/swift/data/obs/2025_03/00014197056/ my_local_directory/00014197056

Thanks to Amazon's Open Data project, these data are free to access from anywhere, not subject to cloud data egress costs. As described on HEASARC's data policy web page, these data are available freely for your use.

Datasets

Data are synchronized on a weekly basis. Please let us know if you would benefit from a higher cadence of a particular dataset. Currently, we also trigger a sync for a fermi/data/gbm/triggers/ directory as soon as the data come in for close to real-time access. The datasets currently available include:

  • High-energy astrophysics datasets
    • Ariel5
    • ASCA
    • BBXRT
    • BeppoSAX
    • Caldb
    • Chandra
    • Compton
    • Copernicus
    • COS-B
    • DXS
    • EXOSAT
    • Fermi (lat/weekly/{photon,spacecraft,1s_spacecraft,extended,diffuse} and gbm/{triggers,bursts}/)
    • Ginga
    • HaloSat
    • HEAO-1
    • Hitomi
    • IXPE
    • Nicer
    • NuSTAR
    • OSO-8
    • ROSAT
    • SAS-2
    • SRT-eRosita
    • Suzaku
    • Swift
    • VELA 5B
    • WASS
    • XQC
    • Rossi XTE
    • XMM-Newton
  • CMB datasets
    • WMAP
    • COBE

Please also see the HEASARC and LAMBDA entries in the AWS Open Data Registry.

Caveats

Some selection of datasets has been made to avoid putting into the cloud data that we don't believe will be useful to access this way, such as older mission data in non-standard file formats. We will also keep the nasa-heasarc bucket in sync with the on-prem archive on a best efforts basis for the ongoing missions. Therefore the most recent data products may only be available from the HEASARC on-prem archive for a few days until the next sync.