The rapid advances in Web technologies have made access over the Web a standard mechanism by which astronomical institutions make their data and services available. Many thousands of catalogs, image display services, name resolvers and other services can be accessed through the Web today.
This very abundance of resources can be intimidating to the astronomers. Users face three distinct problems when trying to access capabilities over the Web:
a. Users must somehow discover the data sets of interest. No user is likely to know all the places where interesting information is available.
b. Even if a location is known, each Web interface tends to have subtle quirks: One system may use B1950 coordinates and another J2000. The radius may be specified in sexigesimal degrees one place and in decimal minutes somewhere else. Many other details of the interface may differ in small but crucial ways. If a large number of resources are to be queried it becomes excruciating for users to have to address all of these differences.
c. Once a user has managed to query a set of resources, he or she faces a daunting task in trying to integrate the results.
The first two problems are much more tractable than the third. Prior efforts to deal with these problems, notably the Astronomical Data Service (ADS), have faltered when addressing integration: it has either imposed an unacceptable burden on the various data providers, or required too large an effort at building some centralized facility.
At the HEASARC we have recently developed an Astrobrowse service which is designed primarily to address the first two issues so that astronomers can easily discover and query information resources on the Web. This effort was inspired by an earlier effort spearheaded by Steve Murray at SAO (see http://niit1.harvard.edu/AstroBrowse/) and is being coordinated with other major archive institutions (see http://hea-www.harvard.edu/adccc/). Using Astrobrowse an astronomer can request images, catalogs and other information from hundreds of resources in a single simple request. Subsequent sections of this paper describe how to use Astrobrowse, its internal architecture, how other sites can build Astrobrowse agents, and finally how we hope to expand this service.
The current HEASARC Astrobrowse service provides access to more than a thousand distinct astronomical resources. Users can ask for information about an object or position in the sky. Object names are internally converted into positions, but we anticipate having a service which uses non-resolved names in the near future (so that one could ask for information on solar system objects). The Web page for the HEASARC's Astrobrowse service is:
Substantial documentation (including a copy of this article) are available from that site. If you have further questions please contact:Thomas McGlynn (email@example.com) 301-286-7743 or
Christina Williams Heikkila (firstname.lastname@example.org) 301-286-1505
The HEASARC's Astrobrowse service is a user agent. A user tells it that he or she is interested in information of a particular type and the Astrobrowse agent goes all over the Web to collect this information.
There are three steps to using Astrobrowse. First the user selects the kinds of resources that are to be queried, then the user enters a position or target for which information is desired. Finally, after Astrobrowse has retrieved information from the desired resources, the user examines the results.
2.1 Selecting resources
The Astrobrowse home page gives three different options for selecting resources (see Figure 1).
Figure1: Astrobrowse Home page
Users can select pre-defined sets of resources. Currently only one such pre-defined set is available. This queries about 20 distinct resources in many wavelength bands. When a user asks for a pre-defined set of resources a page is returned which gives the resources to be queried and allows the user to de-select any resources. The user can then go to the next step of specifying the region of interest.
The second mechanism for selecting resources is for users to explicitly choose the resources of interest. Some resources are recursive. Rather than pointing to external resources they point to lower level Astrobrowse pages. E.g., the Vizier service at the CDS has about 700 catalogs available through Astrobrowse. It would be infeasible to have a form listing all of these. Instead the main form has an entry for each of a number of categories of these catalogs. When a user chooses one of these he or she can later choose any catalog from within this category.
The third mechanism allows users to select resources according to keywords. Users can either select by a very limited set of predefined keywords, or do a word search of the database using a Glimpse text search engine. The pre-defined keywords are undergoing rapid evolution, but currently supported are keywords for the bandpass of the resource, the source of the data (survey observations, heterogeneous observations on non-observational data), and the kind of information that will be returned (e.g., catalogs, images, etc.). The user tells Astrobrowse to find matching resources, and a page of all such resources is returned.
2.2 Specifying a target
Astrobrowse currently searches for information on a particular region of the sky. A target name may also be used but this will immediately be transformed into a position. The last section of this paper discusses additional capabilities which will be included in future versions of Astrobrowse.
Once users have selected resources to be searched they are presented with a simple form to specify the region of interest see Figure 2). Users may enter a target name or enter coordinates. The user can select either J2000, B1950 or Galactic coordinates for the position entered. The user can also select a search radius. The search radius is not always used but often gives the size of the region to be searched. In some cases the region will be rectangular rather than circular. Once this information has been entered the user should click on the 'Start Search' button to actually begin the query.
Figure 2: A Predefined Resources Form
At this point Astrobrowse will 'explode' the query. A query for information about the requested position will be sent for every selected resource.
2.3 Examining results
[The current implementation of results display is likely to change substantially in later versions of Astrobrowse, but the same basic capabilities should be present.]
As soon as Astrobrowse has finished sending out queries -- but possibly long before all results are returned -- it will return a page to the user split into frames (see Figure 3). The frame on the left is a listing of all of the resources requested. The actual results are to be displayed in a frame on the right.Figure 3: Results Page
For each resource there will be a status indicator, a resource identifier and two small buttons. The status indicator shows the current status of the request. A request may be either in process, or have completed successfully or unsuccessfully. To update the status of a request click on the 'Update Status' button.The resource identifier is a hyperlink. When a user clicks on it the results from that request are displayed in the frame on the right. Users can click on this at any time. If the request has not terminated there may be no results or partial results displayed. All links and forms within the results should work appropriately but users should recognize that some sites may be on relatively slow Internet connections.
The user can click on the resource identifiers in any order to compare and view results from the requested resources.
Following the resource identifier are two small buttons. The first displays the results in the entire browser window rather than a single frame. The second deletes a request from the status frame.2.3.1 Recursive requests
One type of resource that requires special mention are recursive requests. A user may, e.g., have requested all Vizier Photometric catalogs. Rather than querying upwards of 100 resources this request returns a page where the user can select the catalogs of interest. When the user clicks on the Submit button at the bottom of this page, a new request will be added to the status frame.
While the appearance of individual interfaces varies widely from system to system, the widespread adoption of the Common Gateway Interface means that Web forms actually have a common simple protocol. Astrobrowse leverages this common protocol.
From the perspective of a data provider, Web forms simply transmit a request in the form of a set of key=value pairs. For astronomical Web pages a large fraction of these forms support queries by position. Thus these forms have some set of key=value pairs which are used to describe the position (and similarly for the radius). All that an Astrobrowse agent needs to know is the syntax of these key=value pairs.
To do this Astrobrowse has access to a database of Web resources using the GLU format (see accompanying article). For each entry there is some limited descriptive information, as well as a description of the exact key names and syntax for positional values. The GLU entry describes the coordinate system, whether separate fields are used for RA and Dec, whether sexigesimal coordinates are used, the separators between hours and minutes, and so forth. Only a few minutes are required to build the entry describing a resource. For most Web queries a HEASARC developed tool allows one to simply fill out a copy of the form just as if one were making a query. The tool analyzes the output and creates an Astrobrowse entry directly.
The descriptive information is used in the resource selection step in Astrobrowse. When a query is exploded, Astrobrowse uses the syntax information to convert coordinate information to the precise syntax of each of the individual resources. [This can also be done by GLU, but the HEASARC Astrobrowse uses GLU only to maintain its database, not in the query processing.]
For each resource a simple Web command-line-interface client submits a request and stores the results in a local cache at the HEASARC. The only modification made is to convert the relative URL's in the results to absolute URL's. A status file maintains the status of each request. It is updated whenever the user requests an update or deletes or adds a request (see 2.3.1). Results are stored for several days in the cache before being deleted.
Thus the hyperlinks to the request results are links to the HEASARC's cache of results and should typically be processed rapidly. However, any images will be stored at the remote site, not the HEASARC, i.e., only the HTML file is downloaded to the HEASARC, but not anything that this HTML refers to.
The core of the Astrobrowse system is a database of resources which is maintained using the CDS's GLU software (see accompanying article). GLU allows many institutions to share these resources. By simply registering an interest in Astrobrowse, a site can download this database automatically. If GLU is run continuously, updates elsewhere will be reflected in the local GLU database.
If an institution is also a data provider, it can create GLU descriptions of its resources. These will automatically propagate to any other site which has registered an interest in Astrobrowse and can get included in agents built at those sites.
Since the entries in a GLU database are quite simple, it is not difficult for one site to create descriptions for other sites. E.g., the HEASARC has developed entries for many other institutions. Ideally institutions would maintain the GLU entries for their own resources so that as these resources are modified or expanded (or occasionally deleted) changes can be made to all Astrobrowse agents in a timely way.
Once an institution has access to the database of resources a simple Astrobrowse agent is straightforward to build. The HEASARC Perl implementation is freely available and GLU itself can be used as the Web CLI client if desired. We anticipate that many Astrobrowse agents will be built as the databases of available resources become more extensive.
Sites might specialize in particular kinds of resources, provide interfaces tailored for different levels of expertise, enhance user interfaces, provide more sophisticated integration tools, etc. The Astrobrowse/GLU system is not envisaged as a large/centralized/hierarchical monolith, but as a relatively lightweight tool where a site can be added to the database with literally no effort on the site's part (i.e., if someone else is willing to spend the few minutes to write the GLU entry), and where sites can build their own agents with only a little effort.
The HEASARC's Astrobrowse is being developed in close cooperation with the CDS, ST ScI and other institutions. We hope that over the next several months a number of different Astrobrowse agents will become available to the public. All of these agents should share a common GLU database of resources.
A number of major enhancements are also planned. Users will be able to query Astrobrowse by target name and have the name searched for directly. This allows searches for solar system objects and will permit integration of planetary sciences data within the Astrobrowse umbrella.More sophisticated meta-data will also be available for Astrobrowse resources. Information about the time of observations, missions, and resolution will be included. Searches of the individual resources by time should also be feasible. In the longer run, we also hope to address the third of the issues that astronomers face in using results from the Web. We are exploring a substantial collaborative effort among many of the large astronomical archive centers to develop a protocol which will allow users to easily compare results from many different sites.
Proceed to the next article Return to the previous article
HEASARC Home | Observatories | Archive | Calibration | Software | Tools | Students/Teachers/Public
Last modified: Wednesday, 13-Apr-2011 09:50:21 EDT
HEASARC support for unencrypted FTP access will be ending on September 20, 2019. Please see this notice for details.