Managing Metadata

Olympus DISS® software helps organize geospatial data

By Marius Necsoiu, PhD     image of PDF button

Electronic information technology has become a crucial component in research, business and government. We have come to rely on the ability to retrieve information quickly, accurately and easily. Because of this, many enterprises no longer treat their information respositories as only tools for decision-making. The knowledge base itself has become an increasingly important corporate asset.


Dr. Marius Necsoiu is a research scientist in the Center for Nuclear Waste Regulatory Analyses (CNWRA) at SwRI. He has extensive experience in geographic information systems and satellite remote sensing, with expertise in earth sciences as well as natural hazards assessment.


Scientists and engineers at the Center for Nuclear Waste Regulatory Analyses (CNWRA), located at Southwest Research Institute (SwRI), faced the difficult task of managing more than 15 years of information and creating a knowledge-based system to allow quick and easy access and searches of this data. The CNWRA was established in October 1987 as a federally funded research and development center to provide the U.S. Nuclear Regulatory Commission (NRC) an independent assessment capability related to licensing a proposed underground geologic repository for high-level radioactive waste. The facility is intended to safely house and isolate 70,000 tons of high-level radioactive nuclear waste from the environment for at least 10,000 years. The CNWRA provides geoscience expertise in structural geology, volcanism, seismology, hydrology, geomorphology, geophysics, general geology and physical geography -- all of which require extensive use of maps and other spatial and analytical data.

The CNWRA identified the requirements for designing a data management system to allow full access for its staff to stored and archived data. A newly developed, integrated approach to solving data management problems was needed to establish interdisciplinary collaboration. CNWRA and NRC researchers needed quick and easy access to comprehensive, spatially referenced data such as maps and data points. The system also had to allow for extensive data handling, analysis and map preparation with minimal training.

The Olympus Data Information Sharing System (DISS®) is an intranet web-based geographical data and information sharing system that allows access to geographic data, including the original source of material (Necsoiu, 2001). The system uses established data standards and allows for the output of geographic data in multiple formats.


This map, built atop a satellite photograph of the dam and spillway of Canyon Lake, near San Antonio, illustrates how information can be gathered from multiple databases and overlaid to produce a single, composite visual document.


Background

Flexibility was a key component of the system. It had to allow for ingestion of a variety of data formats such as raster (aerial and satellite images), vector (maps or illustrations) and tabular (table) data. SwRI also wanted to maximize its investment: The system needed to be "self-enriching." Collecting data can be expensive and time-consuming. The new system had to be simple and user friendly -- as simple as accessing the Internet -- so researchers could focus on analysis or algorithm development rather than format conversion.

The new system accesses data over the intranet and, if necessary, combines it with data downloadable from the web. Olympus DISS also provides a secure way to control how and to whom data and services are delivered. Each researcher can control whether the data holdings are accessible to only a limited number of people or to the whole data-sharing community. This allows the safe storage of proprietary data. The system is also easily scaled to meet changing demands.

Change is the essence of geospatial data in a networked environment. Such data uses spatial descriptors, organization and relationships taken from remote sensing, mapping, surveying and global positioning systems. Once created, data is accessible almost instantaneously through a network and can be used for many kinds of spatial analyses. Information can then be reused and analyzed for a new situation or retransmitted to another user.


A schematic of the CNWRA's Olympus DISS architecture shows how input data is compiled into a metadata repository, then processed and finally made accessible for viewing.


How does one track changes? A simple answer would be to associate a metadata file to each geographic dataset. Metadata, or "data about data," typically contains more information about the data, offering multiple sources of information in one larger file. More specific metadata contains descriptive information about the content, quality, condition and origin as well as other characteristics.

For the Olympus geo-information infrastructure, spatial metadata records are essential. Metadata facilitates data identification by search and retrieval mechanisms based on the user's selection criteria. It allows a user to fully understand the content and evaluate its usefulness by providing downloadable geographic datasets.

The required level of detail of metadata depends on its purpose. Data managers need very specific information on data format, internal structures and data definitions. Users generally require a kind of "catalogue" of information as to where to find certain data, how to use it, and who originated it. Olympus' architecture accommodates both of these purposes - it can ingest both incomplete and complete sets of records. The incomplete set later can be updated as more data becomes available.


This image shows how data is gathered from discrete sources, converted into geometric representations, and "layered" onto a composite image; in this case, the Canyon Lake map seen above.


Metadata Production

The information needed to create metadata is often readily available at the time data are collected. A small amount of time invested at the beginning of a project may save significant expenses later. Data producers and users cannot afford to be without documented data. The potential costs of duplicated or redundant data generation commonly outweigh the initial expense of documenting data. Recently developed metadata standards such as Federal Geographic Data Committee Content standards for Digital Geospatial Metadata (FGDC, 1998) provide a systematic way to collect metadata.

Typically, providing metadata sets is very labor-intensive and expensive. For the Olympus DISS, however, all CNWRA technical staff can contribute in generating information records. Thus, each researcher theoretically can process and organize data and create searchable keywords while developing the geodata, or shortly afterwards if desired . By distributing the workload, the costs in money and time also are efficiently distributed.


With the Olympus DISS™ graphical user interface, one can customize a database search within a range of spatial and temporal parameters. This illustration involves the use of online geodata.

Distributed DISS has been successful for other research facilities, but the CNWRA system distinguishes itself with its flexible design and simplicity, as well as with its minimal cost.

Olympus DISS provides the necessary organization and search capabilities for an intranet web-based system. Because Olympus uses established data standards, such as the FGDC, it provides a flexible mechanism for future work with the data.

The system software is centralized, using software components such as ArcCatalog™, a commercial off-the-shelf software component produced by ESRI Inc.; metadata parser, and the Isite information system, two public domain packages produced by the U.S. Geological Survey (USGS) and the Center for Networked Information Discovery and Retrieval (CNIDR®), respectively; and Harvester, a software component produced by CNWRA.

Applications

Olympus DISS provides search and retrieval mechanisms for querying a metadata database, containing records associated with each specific geographic dataset or geodata. Each metadata file contains information that describes geodata in the same way a card in a library card catalog describes a book.

Olympus can create and "ingest" metadata from a variety of geodata formats: ERDAS IMAGINE, TIFF, MrSID, JPEG, ERDAS 7.5 LAN, ERDAS Raw, ESRI GRID Stack File, ESRI Shapefile and ESRI Arc/Info coverage. Once metadata and the associated geodata are created, the user places it in a designated repository area. Daily, the Olympus system "harvests" the metadata in the repository and automatically builds an index and a relational database with metadata information. Olympus provides geographic (spatial), keyword and temporal search and retrieval capabilities for the repository of geographic data. Search and retrieval are done through a web-based graphical user interface consisting of a login page, a search page and results and metadata pages.

Spatial searches allow three methods of entering data for queries: enter geographic coordinates in the provided text fields; simply draw a box around the desired geographic location on a map of either the United States or the world; or select, from a dropdown menu, the state (in the United States) that the user wishes to search. If a map is used, the select tool can be personalized by selecting both the color and style from two dropdown menus. The selectable style can be Point, Point (Compressed X and Y), or XY Plane. In all methods, the geographic coordinates could be the values queried in the database.

The result page lists the term(s) queried, the number of matching records found, the number of records currently being viewed and the titles and links to metadata describing the available data. From this page, the user can go to each geo-dataset metadata page. Each metadata page provides information on how to download data.

Recently, a new feature was added that allows storing and retrieving non-spatial data such as photos, graphs and engineering drawings.


This example shows how a representative entry from the "Search Results" page connects with a corresponding metadata page using the Olympus DISS™'.

Impact on Future Applications

Olympic DISS users need only a standard desktop computer with an intranet connection and web browsing tool. The system contains more than 1,500 records and is continuing to grow. Many of these records consist of legacy data that can be quickly retrieved for use in multiple projects.

The CNWRA anticipates benefiting from the Olympus DISS as a central repository for all CNWRA geospatial data, allowing for better data management. Research machines are best used for research and analysis, not long-term data storage. This system provides better quality control and reduces redundant data. The Olympus DISS is a fast, convenient way of accessing geospatial data. Data backups are more efficient in time and space requirements because only the current research data is handled. The system has an open architecture that can be configured to interface with other data management systems that use the Z39.50 information retrieval protocol standard and the FGDC metadata standard. It can easily be implemented for data sharing across a company. Implementation costs are minimal.

Comments about this article? Contact Necsoiu at (210) 522-5541, or marius.necsoiu@swri.org.

Bibliography
Necsoiu, Marius. "Capability Development: Olympus CNWRA's Distributed Data and Information Sharing System" (White Paper). June 26, 2001. Federal Geographic Data Committee. FGDC-STD-001-1998. Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C.

Acknowledgments
The author acknowledges the contributions of the Olympus team: Brandi Winfrey for web design and interface coding issues and Katherine Murphy for metadata records. Dr. Larry McKague and the Geology and Geophysics team provided helpful critiques and constructive reviews of the manuscript. Special thanks to Dr. Wesley Patrick and Dr. Budhi Sagar for their support in developing this system.

Published in the Spring 2003 issue of Technology Today®, published by Southwest Research Institute. For more information, contact Joe Fohn.

Technics Spring 2003 Technology Today
SwRI Publications SwRI Home