Skip to main content

Data Mining in Integrated Data Access and Data Analysis Systems

  • Chapter
Data Mining for Scientific and Engineering Applications

Part of the book series: Massive Computing ((MACO,volume 2))

  • 428 Accesses

Abstract

The rapid increase in the volume of scientific data sets has resulted in distributed data information systems applicable to Earth system science. Such a system should help users to locate data sets, to provide preliminary research results quickly and to support data deliveries under users’ request. At George Mason University, we have been developing a data information system with both search and analysis components. In this system, three phases of data accesses are supported: phase one for meta-data search; phase two for on-line data analysis; and phase three for data ordering. For large volumes of data, searching on meta-data only will not be adequate. Scientists often need to search for data based on actual data values. This is a particular kind of data mining, which searches for data sets based on data content.

In this chapter, we first describe the system architecture. We then develop the concept of a data pyramid model and propose a histogram clustering technique for content-based searches. We use the model and the related technique to answer content-based queries approximately but efficiently. We will also describe our prototypes that integrate the content-based searches into a data information system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Asrar and R. Greenstone, editors. 1999 EOS Reference Handbook. NASA (Washington, DC), 1999.

    Google Scholar 

  2. B. Doty and J. Kinter III. Geophysical Data Analysis and Visualization using GrADS. In E.P. Szuszczewicz and J.H. Bredekamp, editors, Visualization Techniques in Space and Atmospheric Sciences, pages 209–219. NASA, Washington, D.C., 1995.

    Google Scholar 

  3. J. Gallagher and G. Milkowski. Data Transport within the Distributed Oceanographic Data System. In World Wide Web Journal, Fourth International World Wide Web Conference Proceedings, pages 691–702, 1995.

    Google Scholar 

  4. J. D. Jobson. Applied Multivariate Data Analysis, volume 2. Springer, 1992.

    Book  MATH  Google Scholar 

  5. H. L. Kyle, J. M. McManus, and S. Ahmadand et al. Climatology Interdisciplinary Data Collection, Volumes 1–4, Monthly Means for Climate Studies. NASA Goddard DAAC Science Series, Earth Science Enterprise, National Aeronautics & Space Administration, NP-1998(06)-029-GSFC, 1998.

    Google Scholar 

  6. M. Kafatos, X. S. Wang, Z. Li, R. Yang, and D. Ziskin. Information Technology Implementation for a Distributed Data System Serving Earth Scientists: Seasonal to Interannual ESIP. In Maurizio Rafanelli and Matthias Jarke, editors, Proceedings of the 10th International Conference on Scientific and Statistical Database Management, pages 210–215. IEEE, Computer Society, 1998.

    Google Scholar 

  7. Z. Li, X. S. Wang, M. Kafatos, and R. Yang. A Pyramid Data Model for Supporting Content-based Browsing and Knowledge Discovery. In Maurizio Rafanelli and Matthias Jarke, editors, Proceedings of the 10th International Conference on Scientific and Statistical Database Management, pages 170–179. IEEE, Computer Society, 1998.

    Google Scholar 

  8. NASA. NASA Selects Earth Science Information Partners. NASA Press release, Dec. 2, 1997,1997. http://www.nasa.gov/releases/1997/.

    Google Scholar 

  9. R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. In Proce. of the 20th VLDB Conference Santiago, Chile,pages 144–155, 1994.

    Google Scholar 

  10. R. D. Price, M. D. King, J. T. Dalton, K. S. Pedelty, P. E. Ardanuy, and M. K. Hobish. Earth Science Data for all: EOS and the EOS Data and Information System. Photogrammetric Engineering and Remote Sensing,60:277–285, 1994.

    Google Scholar 

  11. W. N. Venables and B. D. Ripley. Modern Applied Statistics with S-Plus. Springer-Verlag, 1994.

    Book  MATH  Google Scholar 

  12. C. J. Willmott and K. Matsuura. Global Air Temperature and Precipitation: Regridded Monthly and Annual Climatologies (version 2.01). Center for Climatic Research, Dept of Geography, Univ. of Delaware., 1998.

    Google Scholar 

  13. R. Yang, C. Wang, M. Kafatos, X. Wang, and T. El-Ghazawi. Remote Data Access via SIESIP Distributed Information System. In Z. Meral Ozsoyoglu, Gultekin Ozsoyoglu, and Wen-Chi Hou, editors, Proceedings of the 11th International Conference on Scientific and Statistical Database Management, page 284. IEEE, Computer Society, 1999.

    Chapter  Google Scholar 

  14. R. Yang, K. Yang, M. Kafatos, and X. S. Wang. Value Range Queries on Earth Science Data via Histogram Clustering. In Kathleen Hornsby and John F. Roddick, editors, Proceedings of International Workshop on Temporal, Spatial and Spatio-Temporal Data Mining (TSDM2000),2000.

    Google Scholar 

  15. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Rec., 25(2):103–114, 1996.

    Article  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Yang, R., Kafatos, M., Yang, KS., Wang, X.S. (2001). Data Mining in Integrated Data Access and Data Analysis Systems. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1733-7_11

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-0114-7

  • Online ISBN: 978-1-4615-1733-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics