NOTICE:

This Legacy journal article was published in Volume 3, May 1993, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.

The HEASARC's Newly Consolidated
Anonymous FTP Account

Steve Drake and Bruce O'Neel (HEASARC)

The HEASARC has a new DecSystem 5000 computer that will act as its main data server. Its Internet node name is heasarc.gsfc.nasa.gov, corresponding to the (present) IP address 128.183.8.233; since the IP address may change at some point in the future, it is always better to use the node name, whenever possible. legacy is destined to become the HEASARC's main interface with the user community. The HEASARC On-line Service is currently being ported to this ultrix machine. In July 1993, legacy will replace NDADSA as the gateway to the HEASARC data archive. NDADS will, from that point on, only be used as a data archive facility. We have already established an anonymous ftp account on legacy and this article discusses how to use this facility, the data that are presently available, and the data that will be available in the near future. It should be noted that all anonymous ftp accounts intended for public access on other OGIP[1] machines such as rosserv will be phased out over the course of the next year, and their resident data files moved over to legacy. For the interim period we will have some duplication, in that data may be available on both legacy and other computers.

Notice that legacy may also eventually become a DECNET node so the files in its anonymous ftp account can be copied by users on other DECNET nodes in the standard (DECNET) way.

What is an anonymous ftp account?

The File Transfer Protocol (ftp) is a utility allowing the rapid transfer of information in the form of files between a source computer (in our case, legacy) and the user's home computer. An anonymous ftp account is one that anyone can access by logging in as anonymous (or as ftp). (The user should remember that unix and ultrix machines are case-sensitive.) The user is then prompted to give his or her e-mail address (e.g., jones@node.dept.inst.edu) as a `password', and then the user is permitted into the account. In general, both the source and the user's computers have to be on the INTERNET. Further information on ftp can be found in the useful summary of public software given by E.D. Feigelson, and F. Murtagh, F. in PASP, vol. 104, p. 574. (1992).

The legacy anonymous ftp account

The particular type of ftp server installed on legacy is a friendly or verbose one that automatically types out welcome headers when the user initially logs on and when you cd into a sub-directory for the first time. If you experience difficulties when you access the anonymous ftp account on legacy (e.g., your session gets frozen), it might be because your particular ftp server has trouble communicating with such a verbose ftp server as legacy. If you suspect this is happening, you can suppress all the automatic welcome screens generated by the latter simply by placing a minus sign (-) in front of your password when you log in. Another useful feature of the legacy ftp server is that it automatically logs all commands, so that statistics of the usage of our system and the relative demand for the various data and software products can be readily compiled.

To access the legacy anonymous ftp account from your computer (which, of course, has to have its own ftp server), you type:

> ftp heasarc.gsfc.nasa.gov

You will get the ftp prompt. Follow the log-in instructions, giving anonymous as the username, and your e-mail address as the password. You will then be greeted by a `Welcome' screen which scrolls out automatically, and presents some basic information on the account, and may have some specific additional comments on changes and/or updates to its status, and/or the status of some of the archival datasets. The user is now free to explore the contents of the ftp account using unix-like commands such as

> cd sub_dir

to change directory to the sub-directory called sub_dir, or

> cd ..

to move back up one level in the directory structure, or

> pwd

to find out what the present (working) directory is. The entire suite of ftp commands are listed using ?. To find out more about any individual command, e.g., get, type help get. To find out the contents of the present directory, type ls or dir. The latter command is somewhat more informative then a straight ls command. The result of typing dir in the top level directory of the legacy anonymous ftp account is shown below.

heasarc.gsfc.NASA.GOV> dir
<Opening ASCII mode data connection for /bin/ls.
total 23
drwxrwxr-x  6 415      340           512 Mar 15 16:38 .caldb
-rw-r--r--  1 root     345            17 Jan 28 15:42 .login
-rw-r--r--  1 3T       345          1324 Mar 11 09:25 .message
-rw-r--r--  1 377      345          1906 Mar 11 09:28 README
drwxrwxr-x  2 root     345           512 Feb  9 11:41 ariel5
drwxrwxr-x  3 root     305           512 Mar 17 10:00 asca
drwxrwxr-x 12 227      335           512 Feb 25 15:40 bbxrt
drwxrwxr-x  2 ftp      system        512 Mar  1 15:23 bin
drwxrwxr-x  2 root     345           512 Feb  9 09:51 compton
drwxrwxr-x  2 root     345           512 Feb  9 11:42 cosb
drwxrwxr-x  2 ftp      system        512 Oct  7 11:31 dev
drwxrwxr-x  2 root     345           512 Feb  8 16:32 documents
drwxrwxr-x  5 root     310           512 Feb 26 09:58 einstein
drwxrwxr-x  2 ftp      system        512 Oct  7 11:23 etc
drwxrwxr-x  2 root     345           512 Feb  9 11:42 exosat
drwxrwxr-x  3 root     360           512 Feb 13 08:54 ginga
drwxrwxr-x  2 root     345           512 Feb  8 16:32 retrieve
drwxrwxr-x 10 root     rosat         512 Feb 16 16:50 rosat
drwxrwxr-x  2 ftp      system        512 Oct  7 11:30 shlib
drwxrwxr-x  4 ftp      system        512 Feb 13 08:10 software
drwxrwxr-x  2 root     345           512 Feb  9 11:42 vela5b
<Transfer complete.

The column on the far right gives the file or sub-directory name. If the first character on a line is a "d", then the entry is a sub-directory. If not, it is a file. Another useful datum for any entry is the size in bytes: this is given in column 5 of the "dir" listing (just before the date and time that the entry was last modified).

As can be seen from this particular example, most of the entries in the top directory are actually sub-directories, two of the entries that are actual files are .message (which contains the welcome message that appears after log-in) and README (which is essentially an expanded description). If the user is in some doubt as to what he or she wants, or what the contents of a given directory or directory tree are, the user should type

> get README

which will copy the README file back to their own computer, where it can be typed out and/or printed out. There should be a README and a .message file in most of the top-level directories and subdirectories. The only exceptions are directories used by ftp for its own purposes such as bin, dev, etc., and shlib. For the bottom-level subdirectories like "rosat/pspc/images/fits" the README and .message file in the parent directory "rosat/pspc/images" tell the user what the contents and formats of the files in the "/fits" and "/ps" subdirectories are.

The basic structure of the legacy anonymous ftp account is shown in Figure 1. The name of each top-level directories describes its function.

Figure 1.  legacy anonymous ftp basic structure.
				top-level
____________________________________|____________________________________
|		|		|		|	    |           |
software    documents       retrieve        `mission'    .caldb      'other'

`other': where `other' stands for one of several system sub-directories (bin, dev, etc, shlib) that will normally be of no interest to the general user.

software: for general software packages such as XSPEC;

documents: for general documents like Users Guides, PROS cookbook, etc.;

retrieve: where people who have used the BROWSE facility on legacy will go to find the data products that they have previously extracted, and then ftp them back to their own machine;

`mission': where `mission' stands for one of the following missions for which we will have data available: rosat, einstein, exosat, bbxrt, compton, asca, ariel5, cosb, ginga, and vela5b;

.caldb: the calibration database. For practical reasons, we have set up a distinct directory tree for calibration data. The easiest way for a user to access calibration data is in the tree for that specific mission, but some users are interested in obtaining specific calibration data and might prefer to go straight to the .caldb directory. (Note: either way the user is accessing the same physical calibration files).

Each `mission' directory has what is hopefully a well-defined, well-explained (in the README files), logical structure. Since we are creating the legacy ftp account both out of new databases and those presently resident in other ftp accounts, and also since some projects have their own individual quirks and peccadillos, it is difficult to enforce a completely uniform `standard' format on all the missions. Figure 2 is an example of a mission directory tree.

Figure 2. ROSAT example of mission directory tree

                                     rosat
     _________________________________|__________________________________
     |       |       |          |         |       |         |           |
calib_data  doc   problems   nra_info   data  timelines  software  publications

For each subdirectory for which it is relevant, the next level down is split into instrument-specific subdirectories. This is only of relevance for multiple-instrument missions like Einstein and ROSAT. Thus, for the data subdirectory, the following (or logically equivalent) hierarchical structure is normally followed: mission level, instrument level, data-type level, and data-format level.

Figure 3. ROSAT example of subdirectory hierarchy

				     data
                         	_______|________
				|              |
 instrument		      pspc            hri
		    _________________       ________________
		    |    |     |    |       |    |    |    |
 data-type       images 		      images
	       _________ 		      _________
	       |       |		      |       |
 data-format   ps   fits		      ps   fits

Again, we have attempted to use self-explanatory key-words for the various data-types such as "images", "spectra", "rates", and "events", but there may be some deviations from these standards.

Data formats

Much of the raw data and data products are made available in the form of FITS (Flexible Image Transport System) files. William Pence has discussed our implementation of this strategy in Legacy, 1, 14. As part of this activity, the HEASARC has been developing the FTOOLS software package that consists of a generic set of utilities with which FITS files can be manipulated: this sofware package is available in the legacy anonymous ftp account (check the sub-directory software/ftools). Note that when ftp-ing a FITS file, the user should first type "binary" so that the utility is configured to send a binary file.

There are many data formats available through this anonymous FTP account as follows.

(i) Many of the text files (such as the README files) are in plain ASCII. These can be directly ftp-ed using the default ASCII mode.

(ii) Much of the data are available as PostScript (PS) plot files. The user needs to have a printer that supports PostScript for these to be of any use. These (plain ASCII) files can be copied over to the user's computer using the ftp get command, and then a hard copy can be made following the usual PostScript plot procedures. For example, in the area rosat/data/pspc/images/ps, the user can find PostScript files to create grey-scale images of ROSAT PSPC observations.

(iii) Much of the data are available as (unix-) COMPRESSED files. These are indicated by the suffix .Z at the end of the file name. In general, a user needs to have a computer with a unix operating system, in order to easily UNCOMPRESS .Z files, although many non-unix machines now have special utility programs that can translate .Z files. Whenever a COMPRESSed file is to be copied, remember to set the transmission mode to binary, by typing binary. Compressing a file helps to speed up the ftp-ing of large amounts of data, which is why it is such a nice feature. The legacy ftp software can also compress and decompress files "on the fly". For instance, suppose there is a long ASCII file named "Bob" that you want to get, then if you type:

> get Bob.Z

ftp will first COMPRESS Bob, and then send it to your computer. In the opposite fashion, if you cannot UNCOMPRESS .Z files on your home computer, and you want to copy a file in a legacy directory called Bill.Z, the simply type:

> get Bill

and ftp will first UNCOMPRESS Bill.Z, and then send you the resultant file.

(iv) Some of the data files are available as "tar" files. These are files containing a group of seperate sub-files that have been collected together using the unix tar utility. Again, for these to be of use to a user, the user has to have the facility to handle "tar" files on his or her home computer.

(v) Some of the files (generally documentation type) are in Tex or Latex format. Since these are simply ASCII files with embedded Tex commands, these can always be printed out (after being ftp-ed to the user's home computer) as plain text files.

The above formats are not, of course, exclusive. For example, in the directory "bbxrt/tar/by_observation", the user will find files such as "bbxrt.n4151o.tar.Z" that are compressed (.Z) tar (.tar) files. Perusal of the README file for bbxrt reveals that each such ".tar.Z" file contains a set of individual files that are all themselves in FITS format.

What data do we presently have in the legacy anonymous ftp account?

Since this is a sensitive function of time, no attempt will be made to give a comprehensive list of the datasets in the anonymous ftp account. In a future issue of Legacy, when the format becomes more stable, a more complete description of the contents of the account will be given. The best way to find out what is in the anonymous ftp account now is to ftp to it and check it out for yourself. Some of the highlights are:

(i) The Broad-Band X-Ray Telescope (BBXRT) Archive: See the articles by A. P. Smale in Legacy, 2, 17 and in this issue of Legacy for further details.

(ii) The ROSAT Archive: See the article by M. F. Corcoran, M. Duesterhaus, and K.L. Rhode in Legacy, 2, 9 for further details.

(iii) The Ariel-5 and Vela 5B All-Sky Monitor Databases: See the article by L. Whitlock, J. Lochner, and K. L. Rhode in Legacy, 2, 25 for further details.

(iv) The Ginga Large Area Counter (LAC) summary data files: See the article by B. Perry in Legacy, 1, 30 for further details.

(v) The Compton Observatory Archive: See the article by T. McGlynn et al. in Legacy, 2, 4 for further details.

(vi) Einstein data products from its various instruments (IPC, HRI, SSS, FPCS, MPC, and IPC-Slew) have been created from all the SAO and HEASARC Einstein CD-ROMs.

(vii) The COS-B Archive: This contains FITS files of the 65 pointed observations and associated calibration data for the gamma-ray observatory COS-B, and will be described in more detail in an article by P. Barrett in the next issue of Legacy.

How to identify which data you really want

This is often the hardest part of using an anonymous ftp account, and essentially involves knowing in which particular file or files the particular observation in which the user is interested has been placed. The easiest way to discuss this is by use of examples, but the reader should be aware that these examples are mission-specific.

In a later issue of Legacy, it will be discussed how to use the anonymous ftp account in conjunction with the BROWSE software that is scheduled to be installed on the legacy machine in mid-1993, and to become publically available shortly thereafter. For the present article, two alternate methods will be discussed:

(1) Using BROWSE to determine the existence of useful data, and then ftp-ing to legacy;

(2) Doing it all via ftp.

Example 1: Getting SSS Files

Suppose I want to examine Einstein Solid State Spectrometer (SSS) spectra of 3C 120. Using method (1), I first access the XRAY account on NDADSA, and BROWSE SSS. I a search on the co-ordinates of this object (the sc command) and am informed that the SSS made 6 observations of this object. I examine the spectra in BROWSE using XSPEC, and decide that I want all of the spectra on my own machine for more detailed analysis. I do a "dall" on each spectrum, and find that they have (root)file names like "sc120a", "sc120b", etc., at least on NDADSA. Notice that these filenames bear some resemblance to the target in this example. This is not always the case in other databases. Having found the filenames, I now ftp to the legacy computer, log in as anonymous, and cd to the directory einstein/data/sss/spectra, and am immediately discouraged because ls sc120* finds no such files in this area. This is because there are some file name inconsistencies between the databases on NDADSA and legacy (which will be resolved when BROWSE moves to legacy). In this case, it turns out that the initial "s" in the SSS file name has been omitted in the version on legacy: thus, if I do ls c120* I will at once find the files that I was looking for.

Using method (2) is somewhat less contorted since it only involves accessing legacy, but it does involve me hunting around a bit more in the einstein directory tree. Thus, I do the usual anonymous ftp into legacy, and cd to einstein/doc/sss where I find lots of files with useful-sounding names like sss.cat. I get 1 or 2 (or mget all) of these files to my own computer. I peruse sss.cat and find that it has a listing of all the sss observations as shown below:


             (yy.dd)  (sec) (rate)            (hh mm ) (o  '  ")
Name         Time     Expos  Count  Ice start RA(1950) DEC(1950)  File name
-------------+-------+-----+-------+---------+--------+---------+---------
1H 0240+621  79.032   7864   0.21     1.83    02 41 01  62 15 27  Q0241
1H 0334+098  79.233   2785   0.97     0.81    03 35 57  09 48 32  O335096A
1H 0334+098  79.234   8028   0.96     0.82    03 35 57  09 48 32  O335096B
2A 0430-615  79.207   7208   0.28     0.60    04 30 36 -61 32 60  O430615A
2A 0430-615  79.208   2621   0.29     0.98    04 30 36 -61 32 60  O430615B
2A 0430-615  79.208   4341   0.26     1.00    04 30 36 -61 32 60  O43061
2A 0922-317  79.135    409  -0.01     1.59    09 22 00 -31 42 00  X0922M
2A 1219+305  78.341   1884   0.28     3.06    12 18 52  30 27 14  H1219A
2A 1219+305  79.156   5406   0.39     1.21    12 18 52  30 27 14  H1219B
3C 120       79.049   7700   0.61     1.43    04 30 31  05 14 59  C120A
3C 120       79.070   7782   0.47     1.63    04 30 31  05 14 59  C120B
3C 120       79.232   8273   0.43     0.69    04 30 31  05 14 59  C120C
3C 120       79.233   5570   0.44     0.76    04 30 31  05 14 59  C120D
3C 120       79.247   1228   0.81     0.63    04 30 31  05 14 59  C120E
3C 120       79.248   4341   0.88     0.67    04 30 31  05 14 59  C120F

and thus I again determine that the SSS made 6 observations of 3C 120, and, this time, I get the precise file names that correspond to them. I cd to einstein/data/sss/spectra and type

> binary
> mget c120*

and the files will be sent to my own computer.

Example 2: Getting ROSAT Images

Now, suppose that I want to find out whether there is a ROSAT PSPC image of the X-ray source 1H0551-819 available in the ROSAT Public Archive. Using method (1), I BROWSE the relevant database (ROSUSPSPC; the equivalent one for the HRI data is ROSUSHRI) inside the XRAY account on NDADSA. A search by name (always risky) or a search at its coordinates (the safest technique) gives the following result:

ROSUSPSP_PUBLIC_DEC > sc 
R.A. (2000 d/f= 12 29 40.69 or 187.420): 6 12 44 
Dec  (2000 d/f= 24 31 14.16 or  24.521): -81 50 06 
Radius arcmin (outer inner d/f=   60.00    0.00):

          1
    File Name  Expos   RA(2000) DEC(2000)      Name      Public date Public?
               (sec)  (hh mm s) (o  '  ")                   (y.d)    (YE/NO)
   ---------+-------+---------+----------+----------------+----------+------
 1 RP300026    10123  06 12 44  -81 50 06  1H0551-819      93.016      YE

i.e., there is one observation of this object already in the public archive: the relevant file name (root) is RP300026. I now do the anonymous ftp to legacy, cd to rosat/data/pspc/images/fits, and type >ls rp300026* and I find that, sure enough, there are the 4 files that I want:

rp300026_im1.fits       rp300026_im3.fits 
rp300026_im2.fits       rp300026_mex.fits

and I proceed to mget them safely home.

Using method (2), I directly connect via anonymous ftp to legacy. I check my back issues of the HEASARC Journal, Legacy, and find the article describing the ROSAT Public Data Archive by Corcoran et al. in Number 2 on page 9. This fine article tells me where to find the lists of public ROSAT Position Sensitive Proportional Counter (PSPC) and High Resolution Imager (HRI) data. One caveat: because of our recent attempt to standardize the individual databases, the paths given by Corcoran et al. are obsolete. Thus, the lists of PSPC data (ppublic_data.pos, ppublic_data.seq, and ppublic_data.date for the listings sorted by position, sequence number, and public release date, respectively) can now be found in rosat/data/pspc/doc, while the similar lists of HRI data (hpublic_data.pos, hpublic_data.seq, and hpublic_data.date) are now in rosat/data/hri/doc. Since, in this example, I am interested in a PSPC observation, I cd to rosat/data/pspc/doc, get the file ppublic_data.pos, examine it (e.g., by searching on the coordinates or (more riskliy) on the name, and find:

300026|GSFC| |930116  |  6 12  44.000| -81 50  6.000|10123|1H0551-819|BUCKLEY

The first column of this table is the ROSAT ror number: for an unfiltered, US-processed PSPC observation, the filename (kernel) is the ROR number with the initial prefix "rp". Thus the filename for the observation in which I am interested is rp300026.... I now cd to rosat/data/pspc/images/fits as before, and type (using an initial as well as a final wild card "*" just to be on the safe side):

>ls *300026*

and I find the same 4 files that I found using method (1) which I can now mget.

If there are no files for that particular ROR presently available, check the public release date for the observation (column 3 in the above entry from the ppublic... file); if it is less than 2 to 4 weeks ago, the data have not yet been transferred to legacy, and the user should wait a week or two before re-checking the archive.

Because of the sheer volume of the low-level ROSAT data products (of order 100 Gigabytes per year), the full ROSAT data archive presently resides on the NSSDC computer NDADSA. If after looking at the image of 1H0551-819, the user is intrigued enough to want to do a full analysis of this observation, please refer to Section 2.2 of the Corcoran et al. article for a description of how to request NSSDC for the complete dataset for a given observation.

Conclusions and caveats

This system is still being developed, so we will give periodic updates on the status of the data archives and the software. For example, we hope shortly to support the GOPHER facility, which (if your home computer also has this utility) makes an anonymous ftp account much more user-friendly by allowing the user to examine the contents of a file or file(s) without having to transfer them to his own computer. We will alert users in the Welcome Message to the anonymous ftp account whenever we make major changes and/or enhancements to it such as making GOPHER available. We welcome comments and suggestions to improve this service: they should be sent to drake@lheavx.gsfc.nasa.gov (Internet) or LHEAVX::DRAKE (DECNET).

Finally, a few words of caution: be careful of wild cards as some of the directories may contain many, many files comprising hundreds of Megabytes of data. It is not prudent to do "mget *" on directories. If you need entire data archives, contact the HEASARC and we will discuss the most efficient way of getting such large data volumes to your home site.