NOTICE:
This Legacy journal article was published in Volume 5, November 1994, and has not been
updated since publication. Please use the search facility above to find regularly-updated information about
this topic elsewhere on the HEASARC site.
|
SkyView as an Archetype of Archival
Systems of the Future
T.A. McGlynn (HEASARC/USRA), N.E. White (HEASARC),&
K.A. Scollick (CSC)
Abstract
We discuss theSkyView virtual observatory and how it represents a new
approach to the archiving of astronomical information. This approach is
essential if astronomers are to effectively use the vast information resources
that are now coming online.
SkyView takes all-sky or large area surveys in wavelengths ranging from
radio to gamma-rays and provides the data in a convenient, easy-to-use form.
Astronomers need not concern themselves with the geometric issues of
projections, coordinate systems, resampling -- these are addressed
automatically and largely invisible to the user. Rather than simply giving
users a copy of observations as they were taken (the conventional archive
approach),SkyView transforms these data into a form which allows
astronomers to immediately begin addressing the astronomical questions that
interest them.
Since its introduction,SkyView has been very successful. In its first
year we anticipate users will come in from over 10,000 distinct addresses and
users of the preliminary version of the system already retrieve about 100
images per day. With the wide availability of Mosaic to the public, and the
simplicity of the SkyView interface it is used not only by professional
astronomers, but by interested members of the public.
In the next decade the availability of astronomical data to astronomers will
grow at an unprecedented rate. New sources are coming on line, e.g., the
terabytes of the Sloan Sky Survey. Simultaneously, the availability of
increased network connectivity and cheap distribution by CD makes existing
resources far easier to access. To deal with this exciting but potentially
bewildering array of information, our community must begin to build interfaces
of a new kind, with an intrinsic understanding of basic astronomy. The current
generation of archival interfaces, itself a technology only a decade old, is
based on a paradigm of atomic units of information, like books in a library.
In the era we are entering, new ideas which recognize the malleability of
digital information are essential. The next generation of archives will enable
users to retrieve information in forms directly relevant to their immediate
research goals, rather than requiring a series of tedious and often mechanical
steps before the users can begin to do astronomy.
Introduction
Public archives of astronomical information have undergone enormous changes in
the past decade and this rate of change is certain to continue in the future.
It is only roughly a decade ago that the first electronically accessible
archives became available (e.g., at the International Ultraviolet Explorer
Observatory at GSFC). Prior to that, services like the National Space Science
Data Center (NSSDC) supported (and still support) retrieval of information of
requested information by shipping out tapes and other physical media.
Since the first electronic archives came online, they have become increasingly
sophisticated. Starting from little more than tables of contents, archives now
are indexed using sophisticated databases like HST StarView which uses a
relational database with hundreds of different tables.
The ability to distribute data electronically has also grown enormously. The
steady growth of network capacity has culminated in the last year in the
explosion of the World Wide Web which has made available more resources than
any one person could possible deal with. Data centers now can distribute data
using standard mail or FTP protocols or can bring up client-server models which
use some internal distribution protocol.
These later developments have not obviated the need for the simpler archives.
Just as Fortran did not replace assembly language and 4GLs are not replacing
Fortran and C but supplementing them, this growth in the sophistication of
archives has led to archives of many levels of sophistication appropriate to
particular purposes. In this paper we discuss a general classification of
existing archive systems into three categories and discuss a new kind of
archive that is just beginning to become available. We use theSkyView
as an example of these fourth generation archives.
What is SkyView?
SkyView is a network service to allow astronomers to make a virtual
observation of the sky using existing all-sky and large area surveys. If an
astronomer wishes to make an observation of M31 in the infrared, he or she asks
the system for an image of a particular region of the sky and specifies the
kind of data wanted. For example, the IRAS data is distributed on CD-ROM in
B1950 coordinates, but a user may wish the data in J2000. Similarly the user
may want a different scale than the default, or perhaps wishes to view some
large region of the sky which requires mosaicking several of the distribution
images together. SkyView addresses these and other geometric issues and
immediately gives the user the needed data.
In many cases the astronomer would be perfectly cabable of doing the
manipulations that SkyView performs on the data. However, having to do
these -- and having to deal with a different set of manipulations for every
type of data in a multi-wavelength investigation -- sets up a serious barrier
for astronomers in using this information. SkyView lets the astronomer
get a quick look at the situation immediately.
SkyView allows the astronomer to view the data and can also create FITS
files which the astronomer can use for further analysis. The system has
extensive capabilities for manipulating the image and color tables, for
overlaying images, for contour mapping, and for performing overlays of
astronomical catalog sources. While these are very useful, the heart of the
system and what distinguishes SkyView is its geometry engine.
The Types of Archives
We propose to classify archives into four categories which represent increasing
levels of sophistication of the interface and abstraction of users from the
data.
Level 1: Archive as Ordered Files
If the purpose of an archive is to provide some means of recovering
information, then a random collection of files should not be classified as an
archive. The lowest level of archive requires that some order is imposed upon
files. A collection of telemetry tapes taken in time order, perhaps one tape
per day, would represent this kind of simple archive. Even at this level there
is some meta-information required for the archive, e.g., what is the format of
the files, and what is the sense of ordering of the data. Many more advanced
archives may be viewed as containing sets of level 1 archives.
Level 2: Archive with File Index
Beyond the simplest level, archives provide a mechanism which mediates between
the data and the user's requests. The simplest mechanism is a file index.
This is just a static list of the files included in the archive. User's can
search this list and choose a set of files to retrieve. Many archives are of
this form. The typical anonymous FTP archive uses the directory hierarchy to
provide the file index. Many missions provide an observation index which
includes direct pointers to the observation data. In using those elements
alone, one has a level 2 archive.
Level 3: Archive with Database
The typical level of archive access that astronomers now demand is what we term
a level 3 archive. While a level 2 archive had a static index of its contents,
a level 3 archive has a database system which allows users to make queries
about the contents of the archive. Thus users can make a statement in terms
they understand, i.e., "what observations have you made of stars brighter than
B = 5?", and the system will respond. Once the number of files maintained in
the archive gets large and the number of different types of data multiplies, it
becomes very difficult for many users to find data using a static index.
Typically two distinct elements are now present in the user's interaction with
the archive: a series of queries of the archive database, and a separate
retrieval process.
Level 4: Archive with Data Service
In moving from level 2 to level 3, the user's interaction with the archive
catalog goes from dealing directly with the index to dealing with the index
through a database intermediary which interprets the user's astronomical
requirements. As we move to a level 4 archive a similar intermediary is
established between the user and the archive data itself. In level 1-3
archives the system provides the user with data which is atomic -- unchangeable
and indivisible. At these levels we may envisage the archive as a library which
lends out books but is loath to rip out individual pages. A level 4 archive
recognizes the intrinsic malleability of digital data and can extract elements
from the various archive files for processing prior to delivery to the user.
SkyView as a Level 4 archive
The element that distinguishes SkyView as a level 4 archive is that it
generates its products for the user dynamically. When a user requests an image
the system determines the parameters of the request and extracts and
manipulates data from the existing all-sky surveys. Then it creates an output
product to the user specifications.
Several things are key to making it possible to have a fourth-level archive.
First, there must be some agreement among the community of the scope and type
of data manipulations possible. If there is no way to predict the kinds of
manipulations that would be useful the system may have very limited appeal.
SkyView deals with clearly defined geometric transforms. While there
are a number of coordinate projections and coordinate systems this number is
manageable.
Similarly, it is very important that there be some way of presenting the data
to the user in a fashion that he or she can be expected to understand. The
universal adoption of FITS formats by the astronomical community makes this
possible for SkyView. The current draft World Coordinate System
proposal, which SkyView uses, addresses precisely the same geometric
issues. The existence of these standard data formats greatly enhances the
usefulness of the fourth generation archive by making its data products
immediately usable in community software. The alternative is for the archive
to be able to generate data in a variety of formats. SkyView does this
for its image data which can be generated in GIF's, JPEG's, TIFF's, etc., but
this is obviously more work.
Another essential element for the fourth-level archive is the ability to
distribute and display information directly to the user. One can imagine
systems where user's requests are responded to non-interactively. But in such
a system an essential coupling between the user and the archive is lost, just
as there is difference between written and oral communication between people.
In the future we may envisage the relation between the archive and user not as
a set of commands and responses but as a dialog where the archive begins to
anticipate the requests and sensibilities of the user.
The character of interaction with a fourth-level archive is different than with
a third-level archive. The separation that used to be present between querying
and data retrieval begins to blur. Since a fourth-level archive must generate
a data product dynamically, the response to a query is not just a listing but
at least a sample of what the data product looks like. In SkyView, the
user immediately gets back the requested image on the screen. There is still a
separate step to retrieve FITS or image files, but the differentiation between
catalog data and archive data is less meaningful.
One test capability we have implemented in SkyView is to try create an
all-sky mosaic of ROSAT pointed observations. With this mosaic, users do not
need to individually add a set of observations---the data product is provided
ready for use. A user selects a position and a minute or two later the ROSAT
image, or a blank field if there has been no observation, is returned. By
providing this kind of value-added product, a fourth-level archive can enhance
the value of a third-level system. Once the user sees the data, perhaps
discovering that the object of interest is seen in the field of view, he or she
is motivated to work with the original atomic observations to do the very best
science possible. With a third-level archive alone, the process is much more
cumbersome: Search the catalog, extract the needed observations, add the
observations, view the result. If a user is unfamiliar with the data it can be
days before one can determine if there is anything interesting in the field, a
formidable barrier to getting started with a new kind of data.
Other Systems
SkyView is not the only fourth-level archive effort underway in
astronomy. Elements of this emerging technology can be seen in a number of
systems. For example, the spectral plotting features of the ESIS system allow
users to retrieve multi-mission data on sources very readily. The quick-look
capabilities that have been built into the CADC HST Starcat and have been
brought up at the Space Telescope Science Institute (STScI) allow users to
browse data. The HEASARC Xobserver system couples an analysis environment very
tightly to the database system which allows users to see and browse the data.
However, in these cases the data products are still typically bound by the
library paradigm.
Elsewhere the EUVE Guest Observer Facility has recently started a service to
provide all-sky products on demand, but the service is not interactive,
requiring waits of several hours for the data products. At STScI there are
projects to develop expert systems to assist in data analysis. While the
emphasis here is on analysis, to the extent that such assistants interact with
archives they may be seen as a fourth-level archive.
The effort underway to provide a data system for the Earth Observing System,
the EOSDIS system, currently envisages many fourth-level archive elements where
the data will be processed at user request. Since the data volumes there are
so enormous, terabytes per day, it behooves the astronomical community to keep
up with the developments there, likely learning as much from the mistakes as
the successes of this system.
The Future
Astronomy and the other physical sciences are seeing an explosion in the amount
of digital data available. New sources of data such as the Sloan Sky Survey
and new NASA missions continually increase the base volume of data, while the
increases in network capacity continuously add to the effective number of
datasets online. This information is leading to a literal embarassment of
riches where astronomers may not know what, or where, data exists to answer
their questions. Nor can they cope with the varying formats used in different
specialities or at different times.
The only way in which we are going to enable our constituents to deal with this
explosion is to use the comparable increases in capacity of our computers to
develop information systems which provide users with data in forms they can use
immediately. The emergence of third-level archives in the past decade has
enabled archive systems to cope with the very large databases from individual
missions. At the HEASARC, much effort has gone into developing a
discipline-wide third-level archive for high-energy astronomy, but it remains
incomplete and deals with only a small fraction of the astronomical community's
data. SkyView has been developed to address some of the concerns that
have arisen as we have begun to use the resources of the HEASARC and other
facilities, but much more remains to be done.
We feel that it is important to explore new ways in which we present data to
our community. Not only must we provide astronomers with original
observations, we must provide them with the capability of using data from the
multitude of sources transparently. The purpose of archives is not simply to
preserve information, it is to make it useful to the community. We must update
our paradigm of the archive as data library and have archives which tear pages
out of their books and present newly formatted volumes to their users.
Proceed to the next article
Return to the previous article
Select another article
HEASARC Home |
Observatories |
Archive |
Calibration |
Software |
Tools |
Students/Teachers/Public
Last modified: Monday, 19-Jun-2006 11:40:52 EDT
|