Abstract
The rapid increase in the volume of scientific data sets has resulted in distributed data information systems applicable to Earth system science. Such a system should help users to locate data sets, to provide preliminary research results quickly and to support data deliveries under users’ request. At George Mason University, we have been developing a data information system with both search and analysis components. In this system, three phases of data accesses are supported: phase one for meta-data search; phase two for on-line data analysis; and phase three for data ordering. For large volumes of data, searching on meta-data only will not be adequate. Scientists often need to search for data based on actual data values. This is a particular kind of data mining, which searches for data sets based on data content.
In this chapter, we first describe the system architecture. We then develop the concept of a data pyramid model and propose a histogram clustering technique for content-based searches. We use the model and the related technique to answer content-based queries approximately but efficiently. We will also describe our prototypes that integrate the content-based searches into a data information system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Asrar and R. Greenstone, editors. 1999 EOS Reference Handbook. NASA (Washington, DC), 1999.
B. Doty and J. Kinter III. Geophysical Data Analysis and Visualization using GrADS. In E.P. Szuszczewicz and J.H. Bredekamp, editors, Visualization Techniques in Space and Atmospheric Sciences, pages 209–219. NASA, Washington, D.C., 1995.
J. Gallagher and G. Milkowski. Data Transport within the Distributed Oceanographic Data System. In World Wide Web Journal, Fourth International World Wide Web Conference Proceedings, pages 691–702, 1995.
J. D. Jobson. Applied Multivariate Data Analysis, volume 2. Springer, 1992.
H. L. Kyle, J. M. McManus, and S. Ahmadand et al. Climatology Interdisciplinary Data Collection, Volumes 1–4, Monthly Means for Climate Studies. NASA Goddard DAAC Science Series, Earth Science Enterprise, National Aeronautics & Space Administration, NP-1998(06)-029-GSFC, 1998.
M. Kafatos, X. S. Wang, Z. Li, R. Yang, and D. Ziskin. Information Technology Implementation for a Distributed Data System Serving Earth Scientists: Seasonal to Interannual ESIP. In Maurizio Rafanelli and Matthias Jarke, editors, Proceedings of the 10th International Conference on Scientific and Statistical Database Management, pages 210–215. IEEE, Computer Society, 1998.
Z. Li, X. S. Wang, M. Kafatos, and R. Yang. A Pyramid Data Model for Supporting Content-based Browsing and Knowledge Discovery. In Maurizio Rafanelli and Matthias Jarke, editors, Proceedings of the 10th International Conference on Scientific and Statistical Database Management, pages 170–179. IEEE, Computer Society, 1998.
NASA. NASA Selects Earth Science Information Partners. NASA Press release, Dec. 2, 1997,1997. http://www.nasa.gov/releases/1997/.
R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. In Proce. of the 20th VLDB Conference Santiago, Chile,pages 144–155, 1994.
R. D. Price, M. D. King, J. T. Dalton, K. S. Pedelty, P. E. Ardanuy, and M. K. Hobish. Earth Science Data for all: EOS and the EOS Data and Information System. Photogrammetric Engineering and Remote Sensing,60:277–285, 1994.
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S-Plus. Springer-Verlag, 1994.
C. J. Willmott and K. Matsuura. Global Air Temperature and Precipitation: Regridded Monthly and Annual Climatologies (version 2.01). Center for Climatic Research, Dept of Geography, Univ. of Delaware., 1998.
R. Yang, C. Wang, M. Kafatos, X. Wang, and T. El-Ghazawi. Remote Data Access via SIESIP Distributed Information System. In Z. Meral Ozsoyoglu, Gultekin Ozsoyoglu, and Wen-Chi Hou, editors, Proceedings of the 11th International Conference on Scientific and Statistical Database Management, page 284. IEEE, Computer Society, 1999.
R. Yang, K. Yang, M. Kafatos, and X. S. Wang. Value Range Queries on Earth Science Data via Histogram Clustering. In Kathleen Hornsby and John F. Roddick, editors, Proceedings of International Workshop on Temporal, Spatial and Spatio-Temporal Data Mining (TSDM2000),2000.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Rec., 25(2):103–114, 1996.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Yang, R., Kafatos, M., Yang, KS., Wang, X.S. (2001). Data Mining in Integrated Data Access and Data Analysis Systems. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_11
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1733-7_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-0114-7
Online ISBN: 978-1-4615-1733-7
eBook Packages: Springer Book Archive