@ Information Exploration Shootout


Which data exploration tools are better? Which work on what kind of data sets?

Are certain combinations better?

How do data analysts or explorers discover patterns in data?

The Information Exploration Shootout is a project to help answer these questions. This project is open for participation to all researchers in the commercial, government, and academic sectors.

The IES project description describes in more detail the objective of this project. If you are interested in exploring/analyzing the datasets please consider registering by sending us an e-mail to ivpr@ivpr.cs.uml.edu. Please, provide the following information: name, e-mail address, company name, URL, city, state, ZIP code, country, telephone number, fax number and any comments that you may have. We respect your privacy, so we promise to use your e-mail address for notifications about bugs/changes related to these datasets, only.

Datasets are available through our download page. Upon completion of analysis of the dataset(s), please consider sharing your results with others by e-mailing them to ivpr@ivpr.cs.uml.edu.

This Shootout was supported by The MITRE Corporation, the National Institute for Standards and Technology, and The Institute for Visualization and Perception Research.


Project Description

Over the past year many users have requested more serious comparative evaluations of the various data exploration techniques: analysis, knowledge discovery and data mining, statistics and grand tours, database tools, visualization, or combinations thereof.

Many now recognize that mining for information and knowledge from large databases and documents will be the next fundamental impact in database systems, knowledge discovery, and visualization. This is considered an important area for major cost savings and potential revenue, and it has immediate applications in decision systems, intelligence, information management, business, and communication-in the form of both on-line services and the World Wide Web. Data mining now draws from fields including databases, statistics, information technology, data visualization, and artificial intelligence, especially machine learning and knowledge-based systems. There is a clear sense that, to achieve the next increase in knowledge exploitation, individual data exploration approaches must work together.

There have been promising developments. In 1995 a "shootout" was held for the statistical community [1]. The knowledge discovery in databases (KDD) community has meanwhile made numerous data sets publicly available for timing "benchmarks"[2]. There has not, however, been any comparative evaluation of techniques across domains-and definitely none permitting hybrid approaches.

How does one discover information and knowledge in datasets-e.g., databases, archives, document collections, television news reports, the Web? What process do analysts and other data explorers use in discovering non-trivial patterns? How do, or should, knowledge discovery, statistics, and visualization work together to support the human exploration process [3, 4, 5]? What are the procedures for using visualization and analytic agents, in context with the human operator, to achieve timely, computationally responsive discoveries in data?

There is now a plethora of techniques to explore data. They range from purely statistical approaches to neural networks, machine learning, and knowledge discovery as batch processes. Integrated approaches use applied perception (e.g. glyphs) with interactive grand tours, and purely geometric systems such as parallel coordinates that, integrating little mathematics, rely more on human participation. The questions abound. Which techniques are better? Which work on what kind of data sets? Are certain combinations better?

The Information Shootout project has identified several datasets and is making them publicly available for exploration and discovery. Analysts are to provide us the results of their exploration, including the discoveries they make and the process they used in making them.


Dataset Description

A dataset has been identified and selected to be made publicly available for exploration and discovery. The released dataset, the Network Intrusion dataset, consists of generated network intrusion attempts and a baseline dataset with no intrusions. There were 4 different intrusions over a period of time and these were tracked in separate datasets. Information explorers are to discover these intrusions. Explorers are encouraged to submit and share their results with the data mining community.


References

[1] Hoaglin and Velleman (1995), A Critical Look at Some Analyses of Major League Baseball Salaries, American Statistician, Vol.49, #3, pp 277-285.

[2] Piatetsky-Shapiro, G., Editor, The Knowledge Discovery Mine

[3] Lee, J.P. and Grinstein, G. (1994), Editors, Proceedings of the IEEE Workshop on Issues on the Integration of Databases and Visualization, Springer-Verlag Lecture Notes in Computer Science, Vol. 871.

[4] Grinstein, G., Wierse, A. and Lang, U. (1996), Proceedings of the second IEEE Workshop on Issues on the Integration of Databases and Visualization, Springer-Verlag Lecture Notes in Computer Science.

[5] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (1996), Editors, Advances in Knowledge Discovery and Data Mining, AAAI Press.

[6] Grinstein, G. and H. Levkowitz (1995). Editors. Perceptual Issues in Visualization. IFIP Computer Graphics Series, Springer-Verlag Publishers.



(3-5-98) Results of the Network Intrusion Shootout Contest are now available. This was a very difficult task, and we are still interested in hearing about other approaches and results.


Submit Results

Submit results via e-mail. Please, provide the following information:


Copyright ©1996-2001 by the Institute for Visualization and Perception Research. All rights reserved.
Please, send comments and/or questions to Dr. Georges Grinstein.

Last update: April 4, 2001