NOTICE:

This Legacy journal article was published in Volume 7, June 1998, and has not been updated since publication. Please use the search facility above to find regularly-updated information about this topic elsewhere on the HEASARC site.
The HEASARC CDROM Production Facility

P. Newman (HEASARC)


  1. Introduction

    The agreement between NASA and the Italian Space Agency (ASI) calls for the exchange of archived public BeppoSAX data at the Science Data Center (SDC) in Rome, Italy for selected public data sets archived at the HEASARC. To facilitate this agreement, it was initially decided that the medium for the exchange would be CDROM, with the possibility of using digital video disks (DVD) in the future when the technology had developed further.

    In accordance with this agreement, the HEASARC recently made seven CD copies of selected event files from the ROSAT PSPC and HRI archives. Each copy consisted of 36 CDs for a total of 252 CDs. These CDs were created and verified over a two week period in March 1998. The HEASARC copy can be found at ftp:/heasarc.gsfc.nasa.gov/FTP/.rosatcd along with the previously published ROSAT CDs. Additional CD copies of selected files from the ASCA archive are planned.

    Once the BeppoSAX SDC agreement has been fulfilled, there are other potential uses for the HEASARC's CD facility; some applications that are presently being considered are an interface to W3Browse, and also a dataset distribution mechanism to the PIs of Guest Observer programs (to replace the traditional method of data distribution by 8-mm tape).

  2. Commercial Off the Shelf (COTS) Hardware and Software

    In the fall of 1997, the HEASARC procured a JVC 600 CDROM Library with 8X read only CD drives managed by HyperROM Software from Tracer Technologies, Inc. and a CD-Studio from Young Minds, Inc. that included a Kodak PCD Writer 600, a Kodak Disc Transporter, Rimage CD printer and MakeDisc software. The CD-Studio hardware can produce up to 75 full capacity identical CDs in approximately 16 hours as the Writer 600 is a 6x speed CD recorder. Since the HEASARC will be making copies of selected files in the archive that are stored on magneto optical jukeboxes, the realized throughput is expected to be much lower than this value.

    The CD-Studio is a special purpose PC with firmware that is designed to stage data as a CD image to an internal disk in the unit. It connects to a SCSI port of a PC or workstation and appears to the system as an Exabyte tape device or a CDROM device. A CD image is staged to the unit by using the MakeDisc software which refers to the tape device. After a CD image has been staged to the unit, it can be mounted as a CD device on the workstation. This allows examination of the integrity of the data before it is committed to the CD media. The Writer 600 and the Disc Transporter are connected to the SCSI port on the CD-Studio. The Rimage CD printer is connected to the parallel port of the CD-Studio. The serial port of the CD-Studio is connected to the serial port of the workstation to monitor the operational status.

    The CDROM Library is capable of serving up to 600 CDROM images. The HEASARC exchanged 3 Tracer Technologies Magnavault licenses from older magneto-optical libraries, for one HyperROM license to manage the CDROM library. There are a number of benefits to running the HyperROM software as opposed to the Magnavault software. Most Magnavault filesystems are mounted read-write and span multiple volumes. Mounting a fully populated magneto-optical library will take in excess of 2 hours as an inventory of all filesystem media must take place before the filesystem can be mounted. A typical HEASARC magneto optical filesystem is 20GB and spans 10 platter sides. HyperROM filesystems are read only and contain only one piece of media per filesystem. Since CDROM volumes span only one piece of media, no inventory is needed before filesystems can be mounted. HyperROM caches the entire directory hierarchy of each CDROM image in the optical library. HyperROM filesystems are accessible immediately upon reboot and NFS clients can still count on the aggregate filesystem mount, which allows many CDs to be mounted under a single mountpoint. (ftp:/heasarc.gsfc.nasa.gov/FTP/.rosatcd is an example of such an aggregate mount.)

  3. Delivery and Installation

    The CDROM Library was delivered and installed in November 1997 on a DEC Personal Workstation 433au. It had been delivered more than 6 weeks late and arrived with some superficial cosmetic damage, but was found to be in perfect working order. The installation of the HyperROM software and the CDROM Library executed without incident.

    The CD-Studio was delivered in late December 1997 and installed on another DEC Personal Workstation 433au running Digital UNIX 4.0C. The integration of the CD-Studio equipment with the other CDROM hardware was quite difficult. It was found that the alignment of the Writer 600, the Disc Transporter and the CD printer were critical. Unfortunately, a CD had to be burned for every attempt to calibrate the printer and the content of the printed label was barely adequate for HEASARC needs. Approximately 50 CDs were used to integrate and test all of the components of the CD-Studio.

    Due to a question of compatibility between the Tracer software and the Young Minds, Inc. software, the HEASARC initially dedicated two separate servers, one for the jukebox and one for the CD-Studio. In January, a decision was made to combine the CD-Studio and the JVC CDROM Library on one server because there were no compatibility issues and combining the resources was actually advantageous.

  4. Automating the CD-Studio

    The largest group of customers for the CD-Studio hardware has been in the financial industries. Typically, CDs are created on a daily basis to distribute to financial analysts and bankers. In this application, a GUI interface is used to burn 50 copies of the same CD. Clearly, the HEASARC's application would not require massive copies of one CD image. It is much more cost effective to send a master CD image to a contractor if a new CD publication is the goal. The HEASARC needs are to make copies of selected mission files within the archive for distribution to sites overseas and to quickly write requested data to medium that can be delivered to an investigator. Such an investigator might have a poor network connection to the HEASARC archive or their requested data set may be so large that network downloads would be a futile waste of bandwidth and time.

      There are three steps that the HEASARC used to automate this process:

    • Select and stage ALL the data for the desired CD set. It is best to create a basic directory structure for each CD that includes symbolic links to the actual data, otherwise the data is moved twice. The HEASARC ROSAT production run for SDC used a database dump for all of the public observations available in the archive as an input file. An HTML file was created with links back to the HEASARC archive to the directory location of each sequence that was contained on the CD. This facilitates access to products associated with the sequence that were not selected for the CD. The HTML file also contains some basic database information about the seqeunce. A pipe delimited database dump with the CD volume name is also created to aid ingest into a local SDC database. This stage also had to calculate precisely which sequences would fit on a 650 MB CDROM. The first sequence to go over the 650MB is moved to the next CD volume. The first step of data integrity is also taken here as each file selected for the CD image is checksummed and logged. For ROSAT, perl scripts were written to accomplish this task.

    • Run the Makedisc against the staged directory structure. Data will be moved to the internal hard disk in the CD-Studio to create the CD image that will be burned on a CDROM. A C shell script that was used during the test phase was modifed slightly to handle the HEASARC ROSAT production run. After the CDROM image has been staged, the CD-Studio is mounted as a CDROM image. All of the data is checksummed and compared to the log file to insure data integrity. A cut command is then issued for each CD that is required. This will burn the data to a blank CDROM. Each cut command will require approximately 15 minutes; 12 minutes to burn the CD and another 3-4 minutes to move the media to the CD label printer, then to the output bin and then time to load the next CD to be burned in the recorder.

    • Remove the CDs and put each set in the CDROM library to check the integrity by checksumming all files and comparing them to the checksums recorded in the logs. Additional perl scripts were written to handle this integrity check.

  5. Conclusions

    The HEASARC CDROM production facility is still in its infancy. Missions such as RXTE and future missions may have problems getting a single observation within the 650MB limit of a CDROM. Many of the scripting software and WEB interfaces still need to be developed, but should be portable to a DVD production facility when that technology becomes available.


    Next Proceed to the next article Previous Return to the previous article

    Contents Select another article



    HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public

    Last modified: Monday, 19-Jun-2006 11:40:52 EDT