HEASARC@SciServer User Guide

Current build "HEASARCv6.31.1": HEASoft v6.31.1, Ciao 4.15, XMM SAS v20, and Fermitools 2.2.0.

Here's an eleven minute video tutorial

Getting started
File systems
Files, groups and sharing
Moving files in and out
Software
Test an example notebook
Data discovery with HEASARC tools
Batch jobs
Notes

If you encounter any issues running code or accessing the data, please contact the HEASARC help desk. If your issue is not related to the HEASARC setup specifically, then you can email the SciServer helpdesk. Click on your username at the top right of your dashboard and select Help from the dropdown menu.

I. Getting started

Create your own account on https://www.sciserver.org/.

Then go to "SciServer -> Compute" (note: not "Compute Jobs", which is the batch). It will list your containers, and if this is the first time you've looked, there won't be any. Here's a screen shot:

Note the grid symbol on the top menu bar that allows you to get around. The Compute is the second down. The first is the Home dashboard, where you'll find your files and groups, which we'll get to later.

Click "Create container". You'll be presented with some options. First, enter a name for your container. (This can be done repeatedly, so don't worry too much.) Then ignore the Domain. Third is the Compute Image your container will be based on. Select HEASARCv6.30.1 (or other versions) from the drop-down menu.

Then further down, your User Volumes are shown (more later). Below that are the Data Volumes where you should select the HEASARC data volume. This will make our data and software areas accessible from within your container.

When you've created it, you'll see its status in the list.

In this screen shot, it isn't already running, so there's a green arrow to start it. The red x lets you delete it. If the green arrow is instead a red square, that means it's already running, and you can stop it if you wish. Note that if you close your browser without stopping the container, it'll still be running in the same state in which you left it.

If you click on the name of the container, a new tab will open with a Jupyter Lab interface. There are 4 conda environments for each of the provided packages. The default environment is (heasoft) and is configured to run Heasoft and Heasoftpy:

From here, you can either start a Jupyter Notebook, a Console, or just run a Terminal. You can also navigate to one of our Jupyter notebooks (more below) or upload your own and run it here.

II. File systems

There are a couple of things to know about the file systems available to you within the container. By default, you'll have areas called "temporary" and "persistent", which are what they sound. The persistent area is going to have the same contents if you stop the container and go back to it later and will be visible from other containers you define. It has a limit of 10GB total.

These will be under your HOME directory:

/home/idies/

which contains

$HOME/workspace/Storage/<username>/persistent/ $HOME/workspace/Temporary/<username>/scratch/

If you choose to define your own user volume (see below) and mount it to this container, it will also appear here, as will any that are shared with you by other users under the corresponding username:

$HOME/workspace/Storage/<username>/<my_user_volume>/ $HOME/workspace/Storage/<anotherUser>/<their_user_volume>/

So under Storage is where you should put your work if you want it to survive outside the containers. If you want to share stuff, you should create a user volume rather than stash it in persistent, which nobody else can see. You should also back it up yourself, see Miscellaneous below.

If you chose to mount the HEASARC data volume (when creating the container), it will be found at the same level as your Storage and Temporary areas, eg., you'll see

$HOME/workspace/headata/FTP/

The FTP area will contain all of the HEASARC data holdings exactly as they are organized on our own FTP site. Our compute image also puts a link to this area at /FTP (though this link will be broken if you forget to mount the volume).

The default software installation is not on that volume but built within the compute container under

/opt/heasoft

The image is non-trivial to update but will be kept up to date with major software releases. The data volume, however, also has a software area that is ours to update as needed with development software builds and extra stuff as required.

Please note that the word 'volume' is used in two different contexts: your "user volume" that you create and share yourself and the system's volumes such as the HEASARC's data volume, which you can also choose to mount. One may prefer to name the "user volumes" with the word "folder" or something to avoid confusion, because they are mounted in different places within the container.

In the Home dashboard is

Under Files, you can create your own "User volumes" and see what "Data volumes" are accessible (you cannot create your own). You can also browse files through this interface.

Under Groups, you can create groups. This is an example, showing the HEASARC software user group as well as others. For each group, it shows you the "Shared Files" (e.g., "user volumes"), the "Shared Data Volumes" (data volumes), and "Shared Compute Images". As you can see, the HEASARC software user group has access to several different things. The Image itself is the instance of Linux plus all the required libraries and software builds already ready to go. The headata volume is where the HEASARCs data holdings will be.

You can create your own groups, add your own shared files/volumes, and invite your collaborators to join your group. This way, you and your group can share data that are not available to anybody else.

IV. Moving files in and out

In the Jupyter Lab interface, the file navigator on the left side lets you download (right click) and upload (button on top) files.

You can also from within your Jupyter Lab use the shell to 'scp' or 'git pull' from external sites that are publicly visible.

Thirdly, outside of JupyterLab, the SciServer web site has a Files tab where you can manage the contents of your user volumes. (Note that on occasion, within the container's JupyterLab session, I've found that I got a permission denied trying to remove a file that is listed as owned and writable by the "idies" user I'm running under in the container. In this case, I found I could delete it outside the container in this SciServer->Files tab. This may be because the user volume is mounted with a different set of permissions not visible within the container.)

V. Software

HEASoft

The image currently has HEASoft 6.30.1:

$ fversion 08Apr2022_V6.30.1 $ which fversion /opt/heasoft/x86_64-pc-linux-gnu-libc2.27/bin/fversion

This is the HEASoft installation you get by default (see your .bashrc). This image requires SciServer admins to update. But you'll also see that you have access to

$HOME/workspace/headata/software/

which we can write to ourselves. If you need it, we can install a development version of HEASoft. (But it takes some time.) Then you'd change your HEADAS to point to that version of HEASoft. Or we can add other software you might need if you cannot install it yourself.

CALDB is currently set to the archive's calibration area, which will be kept up to date:

$ echo $CALDB /home/idies/workspace/headata/FTP/caldb

Other mission software is installed in individual conda environments. You can then use conda to activate and de-activate different environments. When you first log in, you will be in the default environment, currently called "(heasoft)". Note that the conda environments are all writeable by the idies user and located in your home directory. This allows you to add software to your Ciao or Fermitools environments as needed. But note that it also allows you to mess up your builds of those packages by mistake, so if this happens, you will need to create a fresh new image.

Ciao

To use the Ciao package, you activate the environment and then run the Ciao initialization script
$ conda activate ciao And you can now run Ciao tools. To run HEASoft tools, you have to deactivate this environment again.

Fermitools

As for Ciao, to initialize the Fermitools environment:
$ conda activate fermi And you can now run Fermi tools. To run HEASoft tools, you have to deactivate this environment again.

XMM SAS

XMM SAS also now has its own conda environment, which can be activated with: $ conda activate xmmsas Calling SAS tasks from its own conda environment is not strictly needed, but it is the recommonded way of using those tasks. You can also initialize XMM SAS without using conda by running: $ sasinit Note that this may sometimes cause conflicts between different packages though, and it will work better if use each software package within it conda enivronment.

Additional software

The software environment is a work in progress. You may need additional libraries. E.g., for additional machine-learning packages, you can:

$ pip install scikit-learn $ pip install umap-learn

Likewise you can "conda install" into one of the environments.

Alternatively, you can upload your own software and install it. If it uses the distutils, you can run:

$ cd /path/to/your/uploaded/code $ python setup.py install

and it will be installed in your environment. Note that such installs will persist in this container if you stop and restart it, but they will not be there if you create a new container. (Closing your browser and/or logging out of SciServer does not stop your container. When you log back in, your Jupyter Lab session will be exactly as you left it. The container stops when you use your dashboard to stop it as above or if it has been inactive for too long.) Feel free to request additions to the standard environment that may be generally useful.

If the code doesn't have an install script, then you have to do a bit more. On your desktop, you could add the code location to your PYTHONPATH, but on SciServer, it's difficult to change the environment that the JupyterLab session uses. More straightforwardly, you can to add your code directory to the path inside each notebook using, e.g.,

[] import sys [] sys.path.insert(0,'/path/to/your/uploaded/code')

in every notebook that depends on that code. Note that if you upload the code into your persistent storage area, it will be available there for all new or restarted containers.

VI. Test an example notebook

We have several example notebooks in the HEASARC data volume, so if you mount this to your container, then you will find it at:

$ ls ~/workspace/headata/software/cookbooks/sciserver_heasarc data_access.ipynb demo_rxte_ml.ipynb rxte_example_lightcurves.ipynb rxte_example_spectral.ipynb ...

Make yourself a workspace in your persistent area:

$ mkdir ~/workspace/Storage/<username>/persistent/testing1 $ cd ~/workspace/Storage/<username>/persistent/testing1

Get a copy of one of our notebooks to try out:

$ cp ~/workspace/headata/software/cookbooks/sciserver_heasarc/rxte_example_lightcurves.ipynb .

(You can also do this with the navigation sidebar, where you can right click and copy and paste files.)

Then in the sidebar on the left, navigate to that directory and double-click on the name of the notebook to open it.

Our cookbooks directory also contains other useful notebooks such as the NAVO workshop notebooks describing how to browse and fetch data with VO interfaces, and the HEASARC PyXspec notebooks.

Note: for notebooks that are not developed explicitly for this platform, the default Python kernel will not be set correctly. To change it, click on the current kernel, e.g., "Python 3", at the upper right and switch it to "(heasoft)". Otherwise, the necessary libraries may not be available. Starting with version 6.30.1, the image uses python version 3.8.

We hope to expand this into a large set of executable analysis threads written by us and our instrument teams. We also encourage our users who have useful notebooks to share them with the community by sending them to us to place in the contrib area.

VII. Data discovery with HEASARC tools

The usual ways of discovering HEASARC data with our Browse and Xamin tools have not yet been integrated seamlessly into SciServer. For Xamin, when you have a list of files or observations you are interested in, there is now an option to generate a simple list starting with the path /FTP that you can then copy-paste into SciServer as needed. Those paths will work on SciServer within your container when you use the HEASARC image and mount the HEASARC data volume. For W3Browse, you can get a download script and then edit that script into a simple file list starting /FTP. (Option pending.)

Alternatively, we invite you to explore the Python possibilities. There's a data access notebook in the cookbooks directory with a few examples, one of which is using PyVO. This is a powerful new way to explore not just HEASARC data but data from any VO-compliant archive. See NAVO's collection of notebook tutorials for generic use cases. In the RXTE notebook mentioned above is an example of how to get a list of observations and construct a file list with knowledge of the RXTE archive structure. We hope to provide some Python wrappers for making this more straightforward.

VIII. Batch

You can submit a notebook that you have tested interactively to the batch for processing. The batch service is called Compute Jobs (while Compute is interactive) under the main menus. There are a couple of things to be aware of, though. Firstly, we are using the non-standard Python kernel. Set up your Jupyter notebook to use the correct "(Heasarc)" kernel before submitting it to the batch.

Submitting to the batch starts with a process a bit like setting up a new container. You have to select the compute image and the volumes that you want to mount. If it starts successfully, it will look something like this:

IX. Notes

Help

If you encounter any issues running code or accessing the data, please contact the HEASARC help desk. If your issue is not related to the HEASARC setup specifically, then you can email the SciServer helpdesk. Click on your username at the top right of your dashboard and select Help from the dropdown menu.

If you could use some additional software that you cannot install yourself, you can also ask at the HEASARC help desk if we can install it for you.

Limits

A summary of users' limitations on SciServer:

Disk space (persistent): 10 GB total, including your 'persistent' area and all of your user volumes (not including user volumes shared with you).
Disk space (temporary): unlimited for 72 hours from file creation; not private.
Number of containers: you may only have three containers defined at a time (whether running or not).
Lifetime of containers: 90 days beyond last access; i.e., if unused for 90 days, they may be removed.
Batch (asynchronous) time: HEASARC users are currently limited to the "Small Jobs Domain" of 1 hour.
Batch (asynchronous) memory: HEASARC users are currently limited to the "Small Jobs Domain" of 32 GB of RAM.

User Contributions

One of the benefits of SciServer is how easy it makes to share data, code, results, etc. among collaborators. But you can also contribute them to the community of HEASARC@SciServer users. If you have things that you think would be generally useful, place them in a user volume that you can then share with us (e.g., share with user tjaffe), and we'll take a look at whether it would be appropriate to include it on the HEASARC volume for all to use. We also have a notebooks repository on GitHub to which you can contribute through submitting issues to request changes, pull requests to contribute, etc. Likewise for the PyXspec Jupyter notebooks repo.

Chandra analysis

Note that one cannot use some Ciao tools with the archive itself as the input path, since it will expect to be able to write to the data directories. I.e.,

$ chandra_repro /FTP/chandra/data/byobsid/5/9805/ test/chandra_out

will result in an error about the read-only file system. You will have to copy the input data directory to your own workspace (temporary or persistent as appropriate; see above).

XMM analysis

Like Chandra analysis, you must copy the input data from the main archive into your own workspace, as it will expect to write to those directories. This is under development, so please send us any issues.

Mission-Specific Guides

NICER

Miscellaneous

In our experience, if you leave the browser window open a long time, SciServer stays connected, though it starts to pop up little windows with apparently minor access errors. But if you start it on VPN and the VPN connection closes, then you lose your access to it. (Note that your running containers are still running, your running notebooks are in the same state you left them, you just have to reconnect to the Jupyter session.)

Back up your work frequently, e.g., daily! You could crash your image, so anything in temporary is gone. Your persistent area and defined user volumes should be backed up, but don't rely on that alone. I tend to work in a Jupyter notebook, which you can easily download at the end of every day with a couple of clicks. If there are a lot of products, maybe make a tar file each day and download that. Keep your code in a repo on GitHub and commit and push changes regularly. We welcome suggestions for improving this given that it's not possible to rsync *into* SciServer, and you may not be able to connect into an institute behind a firewall (unless you set up ssh tunnels...)

It's convenient to make your own .bashrc and store it in your persistent storage area. Then when you create a new container and have the default setup, you can source your personal one as soon as you open your terminal to get your preferred setup.

If you wish to change your default shell in a given container to tcsh, for example, you must add the following line
c.NotebookApp.terminado_settings = {'shell_command':['/bin/tcsh']} to the ~/.jupyter/jupyter_notebook_config.py file. You then need to re-start your container, but once done, this setting persists in that container. (New containers will not have this, though.)

Known Issues

Applications requiring X11 cannot be used through the console of JupyterLab. This means ds9, fv, and some functions of ximage cannot be used on SciServer. This is a feature not a bug, i.e., it will not change in the foreseeable future. The Astropy FITS module and Numpy can manipulate FITS headers and data as fv does, and there are a number of image plotting packages. We are working on a notebook describing how to use native Python applications like aplpy for ds9 tasks.

The tool ximage, however, can be used without a display and is part of some analysis scripts such as nuproducts.

Some tools have trouble with input strings for filenames that include a long path. (You'll see a buffer overflow and/or segmentation fault error.) If this happens, you'll have to make shorter paths using a soft links.

If you see this:

then close all of your SciServer windows and start from the Dashboard again. If that doesn't work, stop and restart your container.

If you see this:

when you are trying to start a container, and if trying again repeatedly does not help, then try deleting the container and re-creating it.

National Aeronautics and Space Administration

Goddard Space Flight Center

Sciences and Exploration