Statement of Intent The member institutions of the Astrophysics Data Executive Committee (ADEC) are committed to supporting the Origins and the Structure & Evolution of the Universe Program by assuring the full interoperability of NASA astrophysics archives. To accomplish this the ADEC will build and maintain the world's largest and most comprehensive electronically cross-linked, multiwavelength Master Directory, of all currently known objects on the celestial sky. This endeavor is challenging, but it is achievable. The Master Directory will initially contain 2-3 billion objects and will act as the primary portal seamlessly linking to and from all of NASA's astrophysics missions, archives, surveys, datasets and individual observations. It will also be securely tied to the peer-reviewed, published literature. Unified access will be provided to all archives through a simple Web interface, with copies of the Master Directory multiply located at a number of geographically distributed mirror sites. Within 18 months of first funding the ADEC will deliver its first public release of this database and its unified portal into NASA's astrophysics holdings. WHITE PAPER on The Integration and Interoperability of NASA's Astrophysics Data Centers and Services EXECUTIVE SUMMARY In support of research into the origins, structure & evolution of the Universe, the ADEC is prepared to rapidly deliver to NASA and its astrophysics community a robust and working solution to archive interoperability. What we are proposing is a comprehensive and all-inclusive solution that will serve stellar, galactic and extragalactic astronomers alike, uniting astronomers and data, covering wavelengths from the radio through the infrared, from the optical to the UV, and out to the x-ray and gamma-ray region of the electromagnetic spectrum. We are proposing to seamlessly link users, through a unified Web interface, to datasets in the archives, using cataloged objects on the sky to point to observations and datasets from the missions, to properties in the thematic centers, to the refereed scientific literature, and back. Building upon existing strengths and already-developed methodologies, NASA's mission-specific centers (HST, CXC, SIRTF, SOPHIA), its wavelength-specific archives (HEASARC, IRSA, LAMBDA, MAST) and its thematic astrophysics centers (ADS, NED) are now uniquely positioned to effectively interoperate and to serve the entire astronomical community in a coherent and easily accessible fashion. To establish the highest level of interoperability, the member organizations of the ADEC are proposing to build and maintain a MASTER DIRECTORY of *all* celestial objects. In doing so this will consolidate access to derivative data (held at the thematic centers, such as NED), provide direct linkage to the literature (through the ADS), and guarantee direct access to the original NASA observations (held at the mission and wavelength-specific centers). This will be accomplished by building an all-inclusive, all-sky database of the 2-3 billion currently known stars, star clusters, nebulae, galaxies, galaxy clusters, radio sources and quasars. In its first incarnation the MASTER DIRECTORY will quickly build upon existing NASA surveys, such as 2MASS (infrared), GSC 2.2 (optical), WGACAT (Xray), including all 7 million cross-identified, tagged and linked NED sources, and progressively grow as it assimilates other NASA surveys, including GALEX (in the ultraviolet) and SWIRE (in the infrared), etc. A single query to the MASTER DIRECTORY (by any flavor of object name, or around any position on the sky) will first return fundamental (positional and classification) data on the object(s). Simultaneously the user will be automatically linked to each of the original observations and datasets residing at NASA centers. Connections to the referencing literature will be provided, and access to the data held at thematic centers, object by object, will be made with the click of the button. All of this from a single simple interface. Copies of the MASTER DIRECTORY will reside around the country, but users will see a single unifying portal to all of NASA's holdings wherever either (the data or the user) may physically reside. ADEC member institutions are prepared to implement this level interoperability today: DATATAGS and BIBCODES have been developed by the ADEC centers and are in place. Catalog intercomparison software tools have been built by and tested within NASA centers (NED, IRSA). It will not be necessary to build the MASTER DIRECTORY of objects or its main access interface from scratch; there is the vast legacy of expertise, knowledge, prior associations and software tools to be brought to bear on the problem from HEASARC, IRSA, MAST, NED and CDS/SIMBAD. With these we can rapidly get closure on building the two main ingredients for an early public release of the MASTER DIRECTORY: an OBJECT DATABASE at its core, and a UNIVERSAL NAME RESOLVER and its interface with the community. For searches on the sky existing commercial database management software, combined with intelligent sky tesselation and optimized object indexing, has been proven to give rapid response, as shown by the already operating services of 2MASS, USNO 2.A, GSC 2.2, APM and other massive astronomical databases. Major components for interoperability are in place, or are rapidly nearing completion already (XML output, for instance) and are ready to be linked together -- the MASTER DIRECTORY is the glue. Other advanced aspects of interoperability planned for the future, such as data mining and grid computing, will be dealt with in the longer term by leveraging off of the NSF ITR (in which many of the ADEC members are active participants) and through focused, science-driven proposals to the ADP, etc. The MASTER DIRECTORY stands as a tangible and feasible solution to an immediate and outstanding problem: accessing NASA's astrophysics data. The MASTER DIRECTORY and UNIVERSAL NAME RESOLVER will also serve as a critical component in the growth and evolution of the electronic publishing, intimately linked to the archives and missions. To engage the research community and the professional societies, while simultaneously redirecting certain current tasks from the datacenters, the ADEC will produce and maintain the Web-based authoring tools necessary for the inclusion and validation of DATATAGS, object names and object positions. These same forms will also include automated logging and tracking of acknowledgement of support from NASA missions, archives and funding programs. With these tools in place, all of the objects cited in a given paper, their cross-identifications and their links to the archives and back to the journals will go on-line simultaneously with the actual publication, with no further action needed from the datacenters, authors or publishers. To fully exploit the research potential of NASA's interoperating science archives a modest but targeted postdoctoral research program is proposed. Assuming that full funding for this effort is made available at the beginning of FY04, the first public release of the MASTER DIRECTORY will occur in March 2006. And mirror sites of the MASTER DIRECTORY will be deployed across the country shortly thereafter. TABLE OF CONTENTS (i) Statement of Intent (ii) Executive Summary 1. Introduction 2. Context 3. Preamble 4. Charge from the SAWG 5. Wider Context 6. Data Tags 7. Dataset Verification Service. 8. Object Tags 9. Catalog Intercomparison Software 10. Building the Master Directory 11. Alerting System and the Level of Alert Schema 12. Hierarchy 13. Inter-Agency and International Cooperation 14. Authoring, Publication and Object-Registry Tools 15. Registry of Software Tools and Services 16. Registry of Services 17. Summary Conclusions 18. Budget Optimal Minimal 19. Three-Year Timetable, Milestones and Deliverables WHITE PAPER on The Integration and Interoperability of NASA's Astrophysics Data Centers and Services INTRODUCTION Context On November 18, 2002 the Science Archives Working Group (SAWG) requested that the Astrophysics Data Executive Committee (ADEC) prepare a White Paper detailing how NASA might best respond in a material way to the National Research Council's Decade Report in which the National Virtual Observatory (NVO) was ranked as the number one small project. This document is our considered response to that request. Charge from the Science Archives Working Group "The SAWG suggests that it is an appropriate time for the archive centers to INCREASE THEIR INTEROPERABILITY in order to MEET STRATEGIC GOALS and to PREPARE FOR NASA PARTICIPATION in the anticipated VO. In particular, this development of VO-related activities should be considered along the lines of a NASA Project that will support the primary goals of the SEUS and OS roadmaps, in concert with the data that would be collected from these envisioned missions. Project requirements should flow from these considerations, and there should be a well-defined set of DATA STANDARDS, GOALS, MILESTONES, STAFFING REQUIREMENTS and BUDGETS along a three year timetable with a nominal start in FY04. A "White Paper" would be the result of this planning. This is envisioned as a modest NASA-only program of limited scope in which the staffing and budget model should be described for both an OPTIMUM and a MINIMAL program." THE WIDER CONTEXT The Decade Report of the National Academy of Sciences highlights a number of distinct criteria (listed below) by which the success of the Virtual Observatory is to be measured. This ADEC White Paper describes a unified solution which directly addresses five of these main items: Data Discovery, Data Access, Linkage to the Literature, Data Federation and Service Access. The remaining elements (Standards, Toolkits and Visualization) are being independently addressed elsewhere within the wider astrophysics community. While our solution is strictly focused on the immediate interoperability of NASA's archives, the basic structure is sufficiently general that any VO-compliant services and/or archives can, at a later date, be accommodated and served with minimal effort. _______________________________________________________________________ _______________________________________________________________________ TABLE I. Item 1: DATA DISCOVERY: (**) "Provide capabilities for discovering what data are available to the NVO user, and for easily incorporating new data into the NVO framework." Item 2: DATA ACCESS: (**) "Provide seamless access to globally distributed data, whatever its type (Observational, simulated, images spectra, catalogs, etc)" Item 3: LINKAGE TO LITERATURE: (**) "Link with existing and future digital libraries and journals." Item 4: DATA FEDERATION: (*) "Provide mechanisms for federating globally distributed data whatever its type." Item 5: SERVICE ACCESS: (*) "Provide seamless access to any online service wherever it is located." Item 6: ARCHIVING STANDARDS: () "Develop universal standards for archiving future data sets." [NSF/ITR/ADEC] Item 7: ANALYSIS TOOLKITS: () "Develop analysis toolkits that can be used as is, or extended as needed, which facilitate processing of large datasets, including catalogs, images and simulated data." [NSF-ITR/ADP/AISRP] Item 8: VISUALIZATION: () "Develop new techniques for visualizing large quantities of data, including catalogs, images or simulated data." [AISRP/ADP] _______________________________________________________________________ _______________________________________________________________________ Item 1: Data Discovery flows naturally from the Master Directory through the astrophysical objects, their cross-identifications and their associated data tags, to the archives. Item 2: Data Access is again made possible by the data tags, and associated original datasets, themselves already being well maintained by NASA's archives. Linkage to the Literature (Item 3) is already provide to our community through the ADEC BIBCODES and NASA Astrophysics Data System Abstract Service (ADS). This will be an integral part of the Master Directory as it grows. Item 4: Data Federation is assured by the linkage provided by the existence of a Master Directory, with its objects, cross-identifications and data tags. Item 5: Access to online services will be addressed through the proposed Service Registry (Section ***) The remaining three items (Items 6-8), especially those relating to Archiving Standards (Item 6) , are being dealt with in parallel by the NSF sponsored ITR and by representatives from NASA's archive centers directly. Items 7: Analysis Toolkits, and Item 8: Visualization Tools, involve open-ended issues of great importance, but well beyond our mandate. Existing tools, such as ALADIN, OASIS and DS9 serve many visualization needs, as do general-purpose analysis packages; but they do not directly concern us in this document. As reported below (Section ***) the MASTER DIRECTORY is being configured to be used by existing visualization tools and WEB interfaces, while having the flexibilty to be used by new ones, as they are developed. Our challenge is to minimize repetition while simultaneously fostering innovation; to comprehensively serve the common needs of a majority of users while facilitating highly specialized, but more infrequent, needs of individual researchers and teams. The first supports "normal science"; the second fosters speculative studies and innovation. These are certainly not incompatible aspects of research, but they do require different levels of support, and we explicitly acknowledge them here to be different. BUILDING THE MASTER DIRECTORY Celestial objects, be they stars, clusters, galaxies or voids, are the conceptual building blocks of the Universe. And so a MASTER DIRECTORY of astrophysical objects the keystone in any virtual observatory concept. When panoramic surveys are conducted one of the first tasks always involves source detection, followed by object extraction and object classification. Catalogs contain objects. When pointed observations are made, cameras and spectrographs target, by name and position, these same individual objects. Objects identified in space and time are the currency and the `lingua franca' of astronomical research. To unify our electronic access to astrophysical data and to the original observations of astronomical objects, all that is necessary is to link names of objects to positions of objects, and to link names to the originating observations of those same objects. We give names to objects to uniquely recall them. We "coordinate", localize and characterize individual objects to find them again. That is precisely what the MASTER DIRECTORY will do for all celestial objects: name them, find them on the sky, and link them to data. With the MASTER DIRECTORY we will unite all electronic archives containing any form of data pertaining to the sky. The MASTER DIRECTORY will, de facto, also be a master registry service of objects and thereby automatically become a gateway and portal back to the archives, the missions, the observatories, the integrating services, the scientific literature and to the community as a whole. It is reasonable to expect that all imaging observations of the sky (frames & exposures, inversions & reconstructions, etc.) made from the ground or from space, made at radio, IR, optical, UV or x-ray wavelengths will contain at least one identifiable (astrophysical) source. It is therefore possible for those same datasets to be uniquely and unambiguously identified, cataloged, linked and recovered by the very objects contained and extracted from them. Thus the observational datasets can be indexed by their object content. Searching for data on objects at geographically distributed sites becomes trivial: a task reduced to searching for the objects in the MASTER DIRECTORY and following their links back to the originating archive(s). Two new, but practical steps will be taken in order for this seamless linkage to be put into effect: (1) Each of the archives must extract statistically high-precision and systematically accurate positions for all significantly (3 sigma?) detected objects (extended and point-like) in all of their current and future digital panoramic holdings. (2) These cataloged objects must then be centrally assimilated, compiled, merged and cross-linked through a massive and all-inclusive registration process which, in combination with the 2 billion previously cataloged stars and galaxies, ultimately defines the core of the MASTER DIRECTORY. Each object detected by a NASA mission carries with it its heritage in the form of linkage back to its referencing data set, be that an image, a spectrum, and interferogram, a radial velocity, or even, at a higher level of refinement, another catalog or even a published paper. Upon submission to the MASTER DIRECTORY the "new" object(s) will be cross-compared in real time with the existing database, at which point a variety of possible actions may be taken: (1) If the object is submitted with a previously registered name and a self-consistent position, then this observation and its linkage with be added to the MASTER DIRECTORY for that (known) object. (2) If the names and/or positions are inconsistent with the MASTER DIRECTORY entries, then conflict resolution will be implemented at a higher level. This may result in an updating of the MASTER DIRECTORY position (to the submitted one), an updating of the proper motion to accommodate the new positional data (for bona fide stars), a rejection of the input name as an erroneous cross-id and the creation of a new MASTER DIRECTORY entry at the input position. etc. This will require a robust combination of object-intercomparison software and scientific expertise (aka: human intervention and decision-making). If the name is new but the input position is credibly coincident with a previously know entry in the MASTER DIRECTORY then the links will be merged and the new name appended to the alias file for this object. The name interpreter will be updated to accept this new description and be enabled to return the corresponding object on demand. All submitted positions will also be archived in case false mergers later have to be disentangled. Procedures just like this are at the core of the success of NED, for example. The MASTER DIRECTORY builds upon, scales up, and extends the concept as a collaboration between all ADEC members. DATA TAGS All members of the ADEC, in cooperation and consultation with the editors of the ApJ, AJ and PASP and members of the University of Chicago Press have begun to implement a coherent and all-inclusive system of uniquely tagging datasets which can be used to identify NASA observations that have been used in published research papers. Authors will be provided with these identifier tags by the missions and the archives. Authors will then include them with their resulting papers, and the journals will attach these files to the electronic on-line versions of these papers. The archives will also provide an automated, WEB-based, service to the authors and editors which will provide data-tag verification. The journal tags, linked to the now universally adopted BIBCODE, will in turn allow missions and archives to track data usage as judged by publications in a uniform fashion across all missions. It will also allow readers to go directly from published papers to original datasets, without learning or directly interacting with archive-specific user interfaces (be they simple, arcane, or otherwise.) SAVINGS: Currently each mission and archive is required by their PDMP to provide independently gathered metrics of data usage, including publications and meaningful citations.This can be extremely time consuming and obviously overlapping in effort (each mission and archive independently "turning the pages"), without any guarantee of completeness or uniformity of success. By the uniform application of the above data-tagging, with the cooperation of the authors, editors and referees, and with the help of the ADEC providing authoring tools (see also Object Tags below) at the time of publication, it will be a simple and very low-cost procedure to automatically track publication frequency and data usage statistics in a meaningful and uniform way for all datasets and all missions at one time. DATASET VERIFICATION SERVICE This service is designed to allow users (publishers, authors, and other archives) to confirm the validity and reality of a data tag (aka, dataset identifier). It is being built in response to the request from data centers to publishers that these tags and dataset identifiers be attached to the electronic literature so as to both track datasets used in published research and to provide linkage back to the archives in which the original data reside. Each data center will be responsible for building and maintaining their socket into this tool but the overall structure, look and feel will be identical from the perspective of the user. This is already a well advanced activity, already initiated and acted upon by the ADEC membership. The WEB tool itself has yet to be built, but it will become an integral part of the proposed MASTER DIRECTORY activity. OBJECT TAGS Electronically joining objects to papers and objects to published data, is an essential part of the archival research and development process. Under the present circumstances this requires humans (on the NED and SIMBAD teams, for example) to turn the pages (digital or hardcopy) of every journal, visually scanning every article in search of objects and associated datasets contained in printed papers. This is inefficient and simply cannot keep up with the growth in data and publications. Again, working closely with the Editors of the ApJ, AJ & PASP, the ADEC has initiated a plan to require authors to provide valid lists of names of objects contained in their papers, compiled at the time of publication and electronically searchable. ADVANTAGES: This will both speed the ingestion of papers into the integrating services of NED, SIMBAD and ADS, and will off-load a considerable amount of work currently being undertaken by these centers (so that they may concentrate on value-added services to facilitate new science with archival data), and equitably (and reasonably) distribute this basic task back to the authors who most surely know best what objects are contained in their papers. CONTRIBUTIONS: To encourage and aid authors in this process the ADEC will provide the publishing community with a Web-based name verification module that will validate name lists provided by the authors at the time of paper submission. During the refereeing process this service will automatically process the submission and will either confirm that a name is valid, request a modification or correction to an existing name, or initiate the registration of a proposed new name for a pre-existing object or register a new name for new object. In any event, a final list of unique, unambiguous and "MASTER DIRECTORY COMPLIANT" names will follow the paper into publication. Once published the paper can be immediately assimilated into the NASA system and made directly accessible through ADS and NED, etc. through its verified object content. SAVINGS: Currently this process has a latency of about six months and requires 1-2 FTE at NED alone to simply track the extragalactic objects in the current literature. Many more librarians in France do this same task gathering objects from the literature for entry into SIMBAD. PRECEDENT: The European journal, "Astronomy & Astrophysics" currently has a place-holder in its manuscript macros for the addition of object names. This is run on a purely voluntary basis with little advertising and no active validation. About 25% of the papers submitted contain object listings provided by the authors. The option has been available for about two years now. In the on-line A&A articles, users click on hyperlinks from the object names to query SIMBAD. The benefits of applying this process to the U.S. astrophysics journals (ApJ, AJ, PASP, etc.) with hyperlinks to the NASA MASTER DIRECTORY would surely be a national treasure. CATALOG INTERCOMPARISON SOFTWARE The federation of large and disparate datasets and surveys should be done initially by domain experts on behalf of the majority of users who want to rapidly go beyond catalog cross-comparison to do astrophysics. This, of course does not preclude others from making specialized applications or undertaking independent cross-comparisons with proprietary datasets or in novel ways. But it does advance the field from the unsustainable situation where numerous groups and individuals are needlessly repeating catalog intercomparisons for identical reasons and with virtually indistinguishable results. Most astronomers want to go beyond archiving and cataloging, to astrophysics. IRSA and NED have jointly developed new high-speed catalog-intercomparison tools that can be deployed locally and delivered to the MASTER DIRECTORY for incorporation into that effort from the beginning. No new tools of this nature will be needed to be developed under this proposal. These tools will be applied in the construction and maintenance of the MASTER DIRECTORY. OBJECTS To access objects in just NASA's main mission archives (CXC, HEASARC, MAST, IRSA, LAMBDA, and SIRTF) requires learning six different interfaces. At least a dozen other services world-wide contain a vast array of other catalogs each with differing interfaces and user-specific access tools. Having data is not a problem in the 21st Century; locating it and accessing it is. IRSA is the curator of 2MASS and IRAS data, whose catalogs contain some 100 million stars, and 1.5 million galaxies. MAST has recently received from the CADC WFPC2 object catalogs consisting of 18 million distinct sources extracted from 21,000 co-added images (stacks, built up from over 75,000 individual images). MAST also houses the GSC 2.2 catalog which has approximately 2 billion stars and galaxies containing positions, magnitudes, colors and proper motions complete to a minimum of V = 18 mag extracted from sky survey plates. At least a dozen other services distributed around the globe contain other catalogs (many with duplicates, and some housing unique holdings) each with differing user interfaces and access tools. Currently NED has over 7 million names linked to 6 million unique objects of extragalactic origin. One attempt to soft-link many of these holdings has been implemented by NED, and is automatically provided to its users. Upon the return of an object search, links are made by NED-specific software to other archive Web sites where relevant data is predetermined to be in residence. The remote services data request pages are automatically filled in and the NED user is transferred to their site. No learning of the remote interface is needed; the NED user only has to hit the request button to query a remote service by object name or by coordinates. MAST offers a similar internal join of its spectral and imaging datasets through its Scrapbook Previewer. In this service MAST links images and spectra from IUE, HUT, STIS, WFPC2, UIT, etc into a single object-based or position-based search spanning all of those datasets resident locally. Representative images and spectra are displayed with comprehensive links to the entire database. Each of these individually useful services are tailor-made and consequently fragile, in the following sense: If the external data providers even trivially alter their HTML-based Web services the hand-made links break, and the dependent, remote service will immediately fail. This proposal will unify the data-discovery and the data-access pathway to and from archives and their individual portals. And at the same time we are proposing to make the new system robust to failure (i.e., "WEB-lock" resistant) and yet it will be relatively easy to maintain. This system is premised upon the existence of the MASTER DIRECTORY in combination with the UNIVERSAL NAME RESOLVER. It also requires the active participation of the archives in registering their data holdings by the names and positions of the objects contained in the observational datasets. NASA FUNDED SMALL RESEARCH PRODUCTS Many unique datasets have been and continue to be produced as a direct result of NASA-funded archival and exploratory research. These programs include the ADP, LTSA, AISRP as well as the HST Legacy, Treasury and Archive programs. Indeed many NASA fellows also create extremely valuable datasets in the course of their research. Many of these collections are already in danger of being orphaned and eventually lost. One example is a massive multicolor imaging study of the Large Magellanic Cloud begun by Hubble Fellow who later obtained an LTSA award to complete the survey. Other examples include the special processing of dozens of time-series observations of nearby galaxies undertaken by the HST Key Project Team in pursuit of the Hubble constant. Only the Cepheid data were published and archived, but hundreds of thousands of non-variable stars were also recorded in the process. For stellar population studies the data are unique and extremely valuable; they should be archived and made accessible otherwise duplication of effort will inevitably follow. NASA needs to provide a coherent means of identifying, archiving and curating these smaller but highly specialized, valuable studies by assimilating them into the larger landscape of datasets and surveys. The MASTER DIRECTORY is a natural home to `register' these datasets too, as they enter the public domain, with complete catalog data, images and spectra being archived and curated in the most appropriate mission archive (HEASARC, IRSA and/or MAST). VISUALIZATION TOOLS No new visualization tools will be developed for this interoperability initiative. There are already two very fine display and plotting tools available and deployed within NASA's existing archive system. The first is OASIS which is managed through IRSA and the second is Aladin which was created at the CDS. Both have been made available through NED for some time now. Cone searches of the MASTER DIRECTORY can be easily and efficiently viewed through either of these tools both projected onto the DSS or displayed as simple graphical overlays. Much of the community is already very comfortable with using these tools so now additional learning will be required in having them available through the MASTER DIRECTORY SERVICE. ALERTING SYSTEM AND THE LEVEL OF ALERT SCHEMA The system should also be `intelligent' enough to recognize that a new entry may be one of many things once the MASTER DIRECTORY is sufficiently large and relatively complete (down to variously specified flux levels at particular wavelengths). Newly submitted objects may indeed be new but simply below the detection threshold of previous surveys. The object may be time-variable, periodic, irregular, or truly transient. Some attempt should be made by the MASTER DIRECTORY loading process to characterize any such new source and issue a higher level alert to interested parties periodically and/or at the time of entry, and add a flag to the object so that these classes of alerts can be cumulatively recalled. Known objects with previous observations inconsistent with the new submission should also trigger an alert of varying severity depending on the nature of the discrepancy. One of the central concepts of the VO is greater community participation. This would engage the astronomers who made the original observations in a scientifically useful pursuit that will benefit all users of the NASA MASTER DIRECTORY and the VO. The MASTER DIRECTORY should be configured in a way that is capable of automatic or semi-automatic triggering to potential science discoveries. Or at very least it should be issuing alerts to the community when various levels of discrepancy are detected. For instance moving objects, luminosity transients, unique events, flux-ratio inconsistencies. One interesting possibility is that the MASTER DIRECTORY might also be configured to request reprocessing of data held at archive centers. As new objects enter the MASTER DIRECTORY from other surveys or objects enter from other wavelengths it may be appropriate for the MASTER DIRECTORY to request an aperture measurement be made at a specific position know to be within the field-of-view of one or more previous observation made for other reasons. These re-entrant interrogations of the distributed mission archives through the MASTER DIRECTORY gateway would provide functionality well beyond any interoperability solution conceived to date. A novel but nevertheless natural configuring of the MASTER DIRECTORY would involve a multi-scale approach to the objects contained in the MASTER DIRECTORY. Not all objects are point sources. And many of these objects resolve into other objects (themselves made up of both point sources and extended objects) at a variety of angular scales. Orion is a constellation, a molecular cloud, an HII region, a cluster, a Trapezium, and an assembly of OB stars, T Tauri stars, Herbig-Haro objects and pre-main-sequence objects. each has an appropriate scale on which they exist and smaller scales at which they are largely invisible. The MASTER DIRECTORY should be able to deal with this implicit hierarchy. In doing so the MASTER DIRECTORY would engage an entire community that is presently not being adequately served by the archives or thematic centers in any concerted way. That is the community studying the interstellar medium. Their objects are inherently multi-scale, fractal-like systems sometimes having no well defined center but nevertheless located in (and over) space, and in some situations they may be seen only in one direction but found at a variety of velocities. Structures along the line of sight within the Milky Way, LMC, SMC, M31, etc., deserve special attention in the MASTER DIRECTORY. And objects of ill-defined shape and center still need to be identifiable and recoverable in the new electronic age. Many of them already have names and extent, but no simply defined positions. Ideally, in the future a new thematic astrophysics archive center that focuses on Galactic extended objects (diffuse nebulae, molecular clouds, star formation regions, etc.) should be initiated, building upon the MASTER DIRECTORY much as NED does and will continue to do for extragalactic objects. HERITAGE: Software jointly developed by CADC, MAST and ST-ECF has already been deployed to produce combined images and then extract sources from over 16,000 WFPC2 images. This will be an on-going effort at MAST and will be generalized to include ACS imaging data in the near future. NED and SIMBAD have been undertaking cross-identification of celestial sources for several decades now. Their procedures are sophisticated and robust. They have individually built a very extensive listing of names and cross-identifications for millions of stars, galaxies and nebulae. These will be merged and rationalized within the context of the MASTER DIRECTORY, but no new effort will be undertaken that might duplicate past efforts such as these. Rather the legacy of the past will be assimilated, centralized and built upon. The ADEC is also in close contact with US publishers and editors of ApJ, AJ and PASP with respect to having pointers to astrophysical objects and archive data sets in published papers. A similar effort in Europe has already resulted in text macros being provided by A&A to its authors for the voluntary entry of names in papers. The MASTER DIRECTORY would build on these two initiatives by uniting the naming conventions and providing validation tools and prepublication macros for easy entry of names into journal publications at the time of refereeing. HIERARCHY The celestial sky, as we currently access it, is fundamentally hierarchical in nature. The Milky Way cuts a wide swath across 360 degrees of our field of view. The Magellanic Clouds and M31 are each a few degrees or more in size. Star clusters, HII regions, planetary nebula and dark nebulae span a range of angular sizes from degrees down to arcseconds. Galaxies have been detected down to fractions of an arcsecond. And all but a handful of stars are basically unresolved to the limits of modern instruments. From the point of view of the database this hierarchy is also important because such objects often have well defined centers but ill-defined or at least very large extent. And too as varying scales are probed objects take on varying degrees of importance and or reality. On the scale of arcseconds does M33 exist as an object in its own right or is it only a nominal backdrop to its components? The answer is, both. In order to preserve this human-based way to cope with a wide range of scales, the names that have been applied objects can be preserved by a mutli-scale approach to building the MASTER DIRECTORY. A cascade of angular sizes will be built into the database. On the largest scales will reside familiar object and their outlines. On smaller scales one will find those components that make up the next rung down in the hierarchy. Stars and high-redshift radio sources will be at the bottom of the hierarchy. There are a finite number of degrees on the unit sphere. Indeed, if we were to limit ourselves to a ground-based resolution of one arcsec then the confusion limit would be reached when the number of sources averaged over the sky exceeded 535 billion, a large but not intractable number of entries to be tracked in a modern computer, if not today, then in just a few short years. In principle, we could simply divide the sky up into one arcsec pixels and assign objects and parts of objects to each of the independent lines of sight. Alternatively we can employ an adaptive tesselation (binning) scheme that is both hierarchical and as fine-grained as is dictated by the sky itself at various locations. However the number of sources known today is several orders of magnitude less than 500 billion. INTERNATIONAL AND INTER-AGENCY COOPERATION The most extensive and the most positionally accurate cataloging of stars and galaxies, covering the entire sky, is the recently revised and released USNO B.1. It contains high-precision positions, proper motions, photographic magnitudes and star/galaxy discriminators for 2.2 billion objects beyond the Solar System. NASA and the Navy have had a long and profitable collaboration for many years now in tying together USNO and 2MASS, object by object, position by position. The culmination of that collaboration will be had in the MASTER DIRECTORY. SIMBAS/CDS (France) and NED have had a close, extremely productive and mutually beneficial collaboration for over 15 years now, exchanging data, cross-identifications, bibliographic references, ideas, algorithms, software and even personnel. It was a direct result of this early collaboration that the BIBCODE was created by NED and SIMBAD archivists, and later successfully adopted, extended and widely deployed as a universal,standard most notably by NASA's ADS, and by the astronomical publishing community as a whole. In the course of operating their stellar (SIMBAD) and extragalactic (NED) databases, both groups also devised rather sophisticated name-resolver software for use in their respective public interfaces. By combining the long legacy of experience and expertise that these two groups have provided, the MASTER DIRECTORY will be able to very quickly and very comprehensively, access to the entire know contents of the sky, not only by position-based (cone) searches, but also by any variation of names and designations for both stars and composite objects, be they galactic or extragalactic, resolved or not resolved. No longer will it be required for a user to check both NED and SIMBAD for the resolution of a name into its aliases, cross-ids and/or position on the sky. The MASTER DIRECTORY will be in a position to provide that functionality in a broad and comprehensive way, and hand users off to NED and SIMBAD (and elsewhere) if the requested object(s) reside there, to focus on science queries and analysis that goes well beyond the basic step of source cross identification. For decades both NED and SIMBAD employees have been independently scanning the printed literature for references to previously know objects and for announcements and compilations of newly discovered astronomical objects. Finding these new stars, galaxies and quasars in individual papers is a time consuming and inefficient activity. Authors know the contents of their papers better than any archivists or librarians. The ADEC has therefor already begun negotiations with North American astronomical journal editors to have the authors list, within their electronically submitted papers, a machine-readable listing of unambiguous objects, names and coordinates (if not already in the MASTER DIRECTORY at the time of writing) contained in the paper. All parties agree that this will only work if the incremental load on the research community and the publishers is small. In the first instance it will be voluntary, and largely unvalidated, as is currently the case for a similar system already in place for Astronomy & Astrophysics: A European Journal. However, as the MASTER DIRECTORY evolves, it makes sense for its staff to provide the journals and its community with authoring tools that will pre-validate list of names and/or coordinates of "new" objects, before publication. In this way the true object content of each paper will be checked and confirmed as part of the refereeing and publication process; the papers will not have to be subsequently found and scrutinized by NED or SIMBAD after printing; and the assimilation of objects, papers and bibliographic pointers will be simultaneous with (or even in advance) of the actual publication, instead of lagging by the better part of a year in many cases currently. Increased speed and increased accuracy in tying NED, SIMBAD and the MASTER DIRECTORY to the published literature will come with great savings to the very same services by passing the task to the authors once east-to-use validation software is available for the authors, referees, and journals to use at the time of submission. Exactly the same logic and methodology applies to the DATATAGS now being made visible to authors and publishers by NASA's mission/data centers. Upon receipt of data from on-going missions or from the corresponding wavelength archives, users will be supplied with a DATATAG unique to that archive and that observation. Placed in the resulting publication(s) arising from the analysis and usage of that dataset, those DATATAGS will serve at least three unique and useful purposes: (1) They will allow other researchers to immediately access the exact datasets used in the publication [DATATAGS will link users to datasets]. (2) Missions and archives will be able to compile comprehensive and robust statistics on the sue of their datasets in leading to published results [DATATAGS will link data products to research publications] and (3) with the linkage to and from the literature it will be possible to direct archive users from individual observations to all papers arising from the use of that data [DATATAGS will link observations to the relevant literature]. A REGISTRY OF SOFTWARE TOOLS In these formative stages in the federation of NASA's astrophysics mission and archival data centers, we as a group have chosen to concentrate on interoperability issues that affect the largest number of users undertaking common and often repetitive tasks. Our aim is to enable the process of data discovery (Section XX: The MASTER DIRECTORY), streamline data recovery (Section XX: DATATAGS and Object Identifiers), and facilitate the process of multiwavelength data integration (Section XX: Data Models and Toolkits), paving the way to more speculative future applications that hover on the `bleeding edge' of technology, such as grid computing and multi-dimensional data mining Each mission and each wavelength-oriented community has its own culture, terminology and tools tailored to their datasets and astrophysical goals. We have no intentions of changing or manipulating those cultures in their pursuit of excellence. Nevertheless, it is important for missions to at least know what suites of tools have already been developed (at other archives or by other missions) before then embark on coding up these same tools "anew". We acknowledge that new ideas will arise, and that software must evolve, but we hope that this change will occur in full knowledge, and not in needless ignorance, of what has already been done and what is already freely available. Accordingly, the member organizations of the ADEC are compiling lists of tools, packages and coded algorithms that will be prominently posted, documented and searchable on the WEB in a consolidated fashion, so that any future mission manager, scientist or engineer can readily discover existing software available within the system and make informed decisions on how to best proceed with new software developments. System requirements change, software and languages evolve; however many basic functions (coordinate transformations, geometric/distortion correction formulae, source extraction and profile fitting techniques, etc.) persist and have had many years to mature and stabilize. These tools and applications should not be re-invented without serious cause, but rather they should be built upon, refined and evolved to new situations and novel applications. Programming skills should be reserved for truly new and innovative data-processing and research analysis tool. SUMMARY CONCLUSIONS The charter members of NASA's Astrophysics Data Executive Committee are proposing to cooperatively and collectively deploy an immediate and workable solution to NASA's astrophysics archive interoperability. This would provide multi-wavelength data, cross-linking of objects, with distributed user access, and multi-mission, inter-archive communications. The initiative will build upon existing services, and use robust technology and proven in-house expertise to specifically address the immediate needs for data access and data discovery, as expressed by the scientific research community and independently identified by the data centers themselves. The most novel and challenging aspect of this proposal comes in the construction of a MASTER DIRECTORY with a UNIVERSAL NAME RESOLVER. The MASTER DIRECTORY will contain names, positions, basic attributes and massive cross-referencing from all known objects in the celestial sky back to all originating datasets and catalogs known to and/or residing at NASA archives (HEASARC, IRSA, LAMBDA & MAST), mission centers (CXC, HST & SIRTF) and integrating services (ADS & NED). The MASTER DIRECTORY will be a continuously updated and validated service uniformly linking in real time, all current and future NASA datasets and missions with one another and out to the entire scientific community, including the peer-reviewed literature. The MASTER DIRECTORY, when it first comes on line, will have the names and positions for approximately 2 billion objects, cross-identified and assimilated from many NASA surveys, already available, but currently far-flung and largely uncoordinated. These include: (1) the Guide Star Catalog V2.0 prepared for optical sources by the Hubble Space Telescope Science Institute, (2) the 2MASS all-sky, near-infrared survey residing at IPAC within IRSA, (3) the far-infrared IRAS Faint Source Survey, independently archived at many NASA centers including IRSA, NED, and NSSDC, (4) the ROSAT all-sky X-ray survey, and WGACAT which currently reside both in NED and at the HEASARC, and soon (5) the GALEX all-sky source catalog, covering sources found in the vacuum ultraviolet (MAST and NED) The MASTER DIRECTORY will have a UNIVERSAL NAME RESOLVER, and within its purview maintain the most comprehensive cross-indexing of names of all known objects in the Universe. The UNIVERSAL NAME RESOLVER will be capable of growing to assimilate new names as they are registered, and built to understand a wide range of `dialects', shortforms and aliases of existing names as they are found in common usage. The UNIVERSAL NAME RESOLVER will provide an intelligent interface between users and the data, wherever they individually or collectively reside. The MASTER DIRECTORY will coordinate all linking of NASA datasets and observations to published research by building and providing a suite of WEB-based tools to allow authors at the time of paper submission to uniformly and simply (1) cross-identify previously known objects and/or register new objects contained in their papers, (2) provide data tags from their papers and their objects to the original data sets held at NASA archives, and (3) seamlessly acknowledge grants, observing time and services rendered by NASA's missions, all by a simple checklist submission service. Several geographically-distributed copies (mirrors) of the MASTER DIRECTORY will be deployed at the various NASA astrophysics centers, both to accommodate the inevitably high load anticipated for this service, and to assure continuous availability of the interoperability service in the event of any single point of failure. These mirror sites would be automatically updated on a daily basis, and would act as back-ups for each other, and for the MASTER DIRECTORY itself. Having the MASTER DIRECTORY centrally absorb the (now) duplicate efforts in data management, literature tracking, object and catalog cross-referencing, etc, means that the existing ADEC services and archives will be freed to concentrate more of their efforts on providing discipline-specific and wavelength-specific services to their respective communities. The MASTER DIRECTORY will coordinate access to NASA centers, not dictate their content, tools or local presentation. The MASTER DIRECTORY will enhance and facilitate connectivity by focusing its efforts on providing the user community with a single entry point to the sky, if they choose it, but still leaving direct access to the individual centers as a user-selectable option. The MASTER DIRECTORY will be panchromatic, but focussed in its functionality; complex queries will be left to the domain experts at the respective NASA centers and elsewhere. The MASTER DIRECTORY will know names, object types and positions, and it will provide the user with a graphical view of the sky and a direct, rapid linkage to deeper layers of ancillary data and original datasets, external but known to the MASTER DIRECTORY; the bulk of the observations (images, spectra, and derivative data) will continue to be housed outside of the MASTER DIRECTORY itself at the respective NASA centers. The only liens from the MASTER DIRECTORY back on the missions and data centers are that they provide (1) unique and unambiguous data tags for their resident data sets, (2) multi-corner coordinates for their images ("footprints"), and most importantly, (3) lists of objects (unique identifiers, positions, calibrated fluxes and data tags) for all pipeline-detected sources in any and all of their imaging and/or spectroscopic exposures. The object listings will unambiguously link objects on the sky to widely distributed multi-wavelength data in the geographically separated originating NASA archives. Finally, it must be re-iterated and emphasized that because of the singularly comprehensive and panchromatic nature of its holdings, the MASTER DIRECTORY will be the first entity of its kind uniquely poised to automate the subtle and ever elusive process of discovery. The MASTER DIRECTORY will be the first point of contact between new observations (being continuously made at a variety of wavelengths) and the collective knowledge of the past. By its very nature the MASTER DIRECTORY will be unprecedented in the history of astronomy. Once populated and deployed as anticipated, the MASTER DIRECTORY will be in a position to (a) see anomalies in new observations, (b) flag variability in known sources, (c) be instantly aware of transient events, (d) distinguish and log entirely new objects, and (e) be able to predict plausible fluxes at other wavelengths for missions yet to be flown. As such the MASTER DIRECTORY should be in a unique position to alert individuals or the community as a whole to the existence of any pre-defined but rare objects, or even trigger to the arrival of individual examples of entirely new classes of objects as they serendipitously enter the database. DISTRIBUTION OF TASKS AND RESPONSIBILITIES Master Directory: ALL Authoring/Refereeing Tools: ADS (CDS)? Data Tags and Object Links: ADS, NSSDC, (CDS)? Catalog Cross Comparison Software: IRSA? Discovery Alert Software: HEASARC? Bayesian Source Characterization Algorithm: HEASARC? Universal Name Resolver: NED, (SIMBAD)? Database Tesselation: IRSA? Maintain Extragalactic Object Ingestion: NED? Galactic Object Ingestion: MAST, GSFC, IRSA, (SIMBAD)? Extended Object Definition: MAST? Pipeline Source Extraction: (CADC+ESA/STF), CXC, STScI, SIRTF? coordinated with HEASARC, IRSA and MAST Author Locator (e-mail) Registration Service: ADS? TIMETABLE AND BUDGET YEAR 1: MASTER DIRECTORY: Assimilate the major object holdings of 2MASS, GSC+USNO and NED into a single, monolithic database of cross-identified and classified objects. [Full-sky coverage and 2 billion objects on-line for testing by end of first fiscal year] MOU with CDS for transfer and mutual future exchange of SIMBAD object directory and cross-identification lists with the MASTER DIRECTORY. Test mirror-site updating procedures locally, and use a hot backup for MASTER DIRECTORY as it develops. MISSION ARCHIVES: ADS: (a) Coordinate the establishment of a common dataset tagging procedure for mission-specific observations and products, and prepare the Web interface tools to be made available to the publishers, authors, referees and editors for the listing, linking and registering of objects and datasets through the peer-reviewed and published literature. (b) Develop tools for reporting literature statistics back to the missions and to the funding agencies outlining on a periodic basis (or upon request) the usage of datasets, catalogs and mission products. CADC: Deliver to the MASTER DIRECTORY all sources extracted from the WFPC-2 imaging datasets. CDS: Aid in the transfer and assimilation of all names and cross-identifications of all celestial objects currently know to their system. Arrange for the orderly future addition of new names and objects registered within their archive. HEASARC: Deliver to the MASTER DIRECTORY all XMM and ROSAT sources extracted from all-sky survey and WGACAT special processing of pointed observations. Establish tools for the extraction of sources from future Chandra imaging datasets, and coordinate the regular transfer of these source lists to the MASTER DIRECTORY. IRSA: (a) Deliver the complete 2MASS catalog to the MASTER DIRECTORY and aid in its cross-comparison and merger with GSC 2.2, USNO.2, COSMOS, NED and APM. (b) Deliver two supported versions of the Catalog Cross-Comparison Software to the MASTER DIRECTORY: one for internal use, and another with a separate interface for public use over the Web, for cross-identification of users' lists with the MASTER DIRECTORY. MAST/CXC/SIRTF: (a) Implement pipeline procedures for extracting sources (minimally: positions, fluxes, and point-source vs extended-object classifications) from future images, and coordinate the software development needed to automatically submit and register these new observations (and their names) within the MASTER DIRECTORY. (b) Deliver object names and links to originating datasets (data tags) for all pointed observations contained in their current respective databases. NED: Transfer total object contents and cross-identifications to the MASTER DIRECTORY. Aid in the creation of the the Universal Name Resolver to handle all of NED, 2MASS, GSC 2.2 & USNO.V2 (and other names as time permits) as valid names linked to MASTER DIRECTORY holdings. ALL: Advertise and recruit postdoctoral positions. YEAR 2: Implement discovery and alert warning system within the MASTER DIRECTORY. Ingest first GALEX public release of UV-detected all-sky catalog. Establish first external mirror site. First meeting of MASTER DIRECTORY advisory board -- prioritize future catalog assimilation. ALL: Advertise and recruit postdoctoral positions. YEAR 3: Routine operations continue Establish second external mirror site. ALL: Advertise and recruit postdoctoral positions. TBD BUDGET MATTERS HARDWARE: YEAR 1 Server x 2: $20K 2TB (RAID) Disk Farm x 2: $20K DBMS (Informix/Oracle?) x 2: $4K Other items: $6K Total: $50K YEAR 2 Server x 2: $20K 2TB (RAID) Disk Farm x 2: $20K DBMS (Informix/Oracle?) x 4: $8K Other items x 4: $12K Total: $60K YEAR 3 Upgrade Server x 2: $20K Additional 2TB (RAID) Disk Farm x 2: $20K DBMS (Informix/Oracle?) x 4: $8K Other items x 4: $12K Total: $60K HARDWARE SUBTOTAL(1): FY04 = $47K FY05 = $60K FY06 = $60K ADEC PARTICIPATION: Baltimore: (MAST + HST) (Datatags, Mirror Host, Pipeline): $100K + overhead/benefits + $5K travel and publications Goddard: (HEASARC + LAMBDA) (Datatags, Mirror Host, Pipeline): $100K + overhead/benefits + $5K travel and publications Boston: (CXC + ADS) (Name Resolver, Journal Interface, Datatags, Mirror Host, Pipeline): $100K + overhead/benefits + $5K travel and publications Pasadena: (IRSA, SIRTF, NED) (Datatags, Mirror Host, Pipeline): $100K + overhead/benefits + $5K travel and publications ADEC PARTICIPATION SUBTOTAL(2): FY04 = $420K FY05 = $420K FY06 = $420K MASTER DIRECTORY (Location TBD): Years 1 through 3 Director: $100K/yr DBMS Programmer: $70K/yr System Administrator (0.5 FTE): $40K/yr Interface Programmer: $60K/yr Year 2 and 3 Scientist Programmer: $80K/yr (begins Year 2) Cognizant Scientist: Galactic Object Cross-Comparison (GODDARD): $60K/yr Cognizant Scientist: Extragalactic Cross-Comparison (IPAC): $60K/yr Cognizant Scientist: Extended Objects & Hierarchy (MAST): $60K/yr $10K travel and publications MASTER DIRECTORY ACTIVITIES SUBTOTAL(3): FY04 = $460K FY05 = $540K FY06 = $540K POSTDOC PROGRAM: FY04: 1 postdoc = $50K/yr FY05: 2 postdocs = $100K/yr FY06: 3 postdocs = $150K/yr SUBTOTAL(4): FY04 = $50K FY05 = $100K FY06 = $150K FY04 FY05 FY06 SUBTOTAL(1): $50K $60K $60K SUBTOTAL(2): $400K $400K $400K SUBTOTAL(3): $460K $540K $540K SUBTOTAL(4): $50K $100K $150K OVERHEAD(2-4): $910K $1000K $1000K TOTAL: $1870K $2100K $2150K GLOSSARY OF TERMS AND ACRONYMS ADEC - Astrophysics Data Executive Committee ADS - Astrophysics Data System CADC - Canadian Astronomical Data Centre CDS - Centre Donnees Strasbourg CXC - Chandra X-Ray Observatory Center GALEX - Galaxy Evolution Explorer GSC - Guide Star Catalog HEASARC - High Energy Astrophysics Science Archive Research Center IRSA - Infrared Science Archive MAST - Multi-Mission Archive at Sapace Telescope NED - NASA/IPAC Extrgalactic Database NSF ITR - National Science Foundation - Information Technology SAWG - Science Archive Working Group SDT - Science Definition Team SIMBAD - Sets of Identifications and Basic Astronomical Data SWIRE - SIRTF Wide-Area Infrared Explorer WGACAT - White-Giommi-Angelini Catalog of ROSAT Point Sources 2MASS - 2 Micron All-Sky Survey APPENDIX A SCIENCE DEFINITION TEAM REPORT The National Virtual Observatory (NVO) Science Definition Team (SDT) was formed in June 2001, as a bi-agency (NASA and NSF) advisory body, in response to the recommendation by the NAS Astronomy and Astrophysics Decadal Survey Report, "Astronomy and Astrophysics in the New Millennium". The Survey recommends the creation of the NVO as the highest priority in their "small projects" category. The Charter of the SDT was to: (1) Define and formulate a joint NASA/NSF initiative to pursue the NVO goals. (2) Solicit input from the U.S. astronomy community, and incorporate it in the NVO definition documents and recommendations for further actions. (3) Serve as liaison to broader space science, computer science, and statistics communities for the NVO initiative, and as liaison with the similar efforts in Europe, looking forward towards a truly Global Virtual Observatory. The SDT was requested to deliver the following: (1) A summary of the typical and major scientific drivers for the NVO, with implications for the technical requirements. (2) An overall architecture, framework, or frame of reference for the NVO. (3) A recommended roadmap for proceeding further. Their report is available at http://www.caltech.edu/~george/vo/nvosdtjune02aas.pdf