Investigation Into Using Data Mining and Data Visualization to Improve the Intelligent Transportation System Planning Process, 10-9225Printer Friendly Version
Inclusive Dates: 10/01/00 - 09/30/01
Background - Within the software industry, a field called "data mining" is emerging as the next logical step in the progression for querying data warehouses beyond online analytical processing (OLAP). In contrast to OLAP, the goal of which is to inquire about known relationships among data, the objective of data mining is to seek out relationships among data that have heretofore remained undiscovered. The potential value of uncovering such previously unknown relationships are manifold, especially by those involved in planning decisions. The discovery of relationships among data may reveal, for example, a correlation among certain events and conditions. Knowledge of these relationships may then be used to gain insights regarding the conditions under which certain events occur and to provide information for modifying those conditions, if it is desired to alter the likelihood of those events occurring. Data mining is, therefore, a method that can be used by planners that have massive amounts of heterogeneous raw data upon which they must base decision processes, but for which it is difficult to gain insight.
A specific domain for which data mining is appropriate is advanced traffic management Systems (ATMSs) because these systems generate a tremendous amount of traffic data. These data, which typically include lane speed and occupancy values, can provide a transportation planner with significant insight into traffic patterns because the data are collected often (typically every 10 to 20 seconds) and the data are time stamped. Traffic engineers have historically used rubber tubes to count traffic. These tubes provide only the number of cars over a fixed period of time (typically measured in days). ATMS also provide much more detail (e.g., data at the lane level and in increments as small as 20 seconds). The amounts of data collected are enormous, and the typical traffic engineer is not educated to process large amounts of data. For example, each day San Antonio TransGuide generates more than 10 megabytes of data, and it is estimated that the Houston TranStar system could generate more than 3 gigabytes of data daily.
Approach - This project developed techniques to apply data-mining principles against intelligent transportation system (ITS) data to allow visualization of ITS data. The technical approach was to design and implement a basic system with three fundamental capabilities that include data identification, data mining, and data visualization, with the longer range objective of being able to augment, refine, and tailor these functionalities as needs for additional capabilities are identified. This approach is consistent with the principles of divide-and-conquer and back-to-basics in which comprehensive performance measurement systems are broken into small manageable tasks that can later be integrated into a larger system, starting with measures whose meanings are relatively transparent. The current ITS data sets to which data-mining methods have been applied include real-time traffic data acquired from TransGuide. Specifically, these data include:
In addition to ITS data available from TransGuide, it was realized that weather data should also be available for mining because weather conditions can directly affect traffic conditions, speed, flow, congestion, and events such as incidents. For this reason, hourly local weather conditions at the San Antonio International Airport were archived and used as the basis for examining the effects of weather on roadways in close proximity to the airport. ITS and weather data are then merged into a single data resource that can be mined.
In a general sense, the first step in data mining should always include a rough analysis of data sets of interest using a traditional query tool so that some intuitive insight can be gained prior to applying more advanced techniques. The use of such query tools is not, however, the focus of this work. Rather, the focus for mining ITS data is to use other (nonquery-based) methods and to understand the results through the use of data visualization techniques. After relationships among various types of data are understood, then insights that are useful in achieving objectives relating to planning, operations, maintenance, and cost/benefits analyses are sought.
Visualization can be a powerful tool for gaining useful insights into massive amounts of data. For this reason, ITS and weather data are rendered using various visualization methodologies. These visualizations are based on the fundamental premise that the most useful insights will be gained by examining ITS data as they are related across space and time. For this reason, an underlying graphical user interface that facilitates spatio-temporal mining of ITS data was designed and implemented. The top illustration shows a screen snapshot of the visualization tool developed, while the bottom illustration is an example of the output produced by the visualization tool.
Accomplishments - The project was completed, and promotional activities are ongoing. The objectives for this project were:
All these objectives were fulfilled on schedule and within budget. When the internal research proposal was submitted, it was stated that the success of the project would be determined by whether the methods employed make it possible to discover relationships among ITS data and to provide insights into these relationships that would be useful for planning purposes. With respect to the planning process, the project has unquestionably met this criterion for success. It is also true, however, that within the larger context of the ITS community, other demonstrable benefits have been reaped that may be promoted as well. Data mining has substantial potential for measuring the performance of ITSs by being able to provide insights to planners, operations and maintenance personnel, and for management needing to produce cost/benefits analyses. The prototypical system that has been developed for mining ITS data with a strong emphasis on visualization gives direct feedback for these purposes. The system developed to date can visualize many types of ITS data across space and time and of relating various combinations of speed, incidents, variable message signs, weather, and other types of data of interest within the ITS community. The system has been designed with the intention of further augmentation and tailoring, such that it can be used to provide additional insight into the relationships among other types of ITS data that may be required for specific purposes.