skip to main content
research-article

Querying and mining of time series data: experimental comparison of representations and distance measures

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

References

  1. Additional Experiment Results for Representation and Similarity Measures of Time Series. http://www.ece.northwestern.edu/~hdi117/tsim.htm.Google ScholarGoogle Scholar
  2. R. T. Ng (2006), Note of Caution. http://www.cs.ubc.ca/~rng/psdepository/chebyReport2.pdf.Google ScholarGoogle Scholar
  3. H. André-Jönsson and D. Z. Badal. Using signature files for querying time-series data. In PKDD, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Aßfalg, H.-P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Similarity search on time series based on threshold queries. In EDBT, 2006.Google ScholarGoogle Scholar
  5. D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Cai and R. T. Ng. Indexing spatio-temporal trajectories with chebyshev polynomials. In SIGMOD Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cardle. Automated motion editing. In Technical Report, Computer Laboratory, University of Cambridge, UK, 2004.Google ScholarGoogle Scholar
  8. L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Chen, M. T. Özsu, and V. Oria. Using multi-scale histograms to answer pattern existence and shape match queries. In SSDBM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. Chen, L. Chen, X. Lian, Y. Liu, and J. X. Yu. Indexable PLA for Efficient Similarity Search. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung. SpADe: On Shape-based Pattern Detection in Streaming Time Series. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Geurts. Pattern Extraction for Time Series Classification. In PKDD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Geurts. Contributions to Decision Tree Induction: bias/variance tradeoff and time series classification. PhD thesis, University of Lige, Belgium, May 2002.Google ScholarGoogle Scholar
  17. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, CA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Karydis, A. Nanopoulos, A. N. Papadopoulos, and Y. Manolopoulos. Evaluation of similarity searching methods for music data in P2P networks. IJBIDM, 1(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Kawagoe and T. Ueda. A Similarity Search Method of Time Series Data with Combination of Fourier and Wavelet Transforms. In TIME, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Keogh, X. Xi, L. Wei, and C. Ratanamahatana. The UCR Time Series dataset. In http://www.cs.ucr.edu/~eamonn/time_series_data/, 2006.Google ScholarGoogle Scholar
  21. E. J. Keogh. Exact Indexing of Dynamic Time Warping. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. J. Keogh. A Decade of Progress in Indexing and Mining Large Time Series Databases. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In SIGMOD Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst., 3(3), 2001.Google ScholarGoogle Scholar
  25. E. J. Keogh and S. Kasetty. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Min. Knowl. Discov., 7(4), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowl. Inf. Syst., 7(3), 2005.Google ScholarGoogle Scholar
  27. S.-W. Kim, S. Park, and W. W. Chu. An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD Conference, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov., 15(2), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. D. Morse and J. M. Patel. An efficient and accurate method for evaluating time series similarity. In SIGMOD Conference, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pang-Ning Tan and Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison-Wesley, Reading, MA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. pong Chan and A. W.-C. Fu. Efficient Time Series Matching by Wavelets. In ICDE, 1999.Google ScholarGoogle Scholar
  34. I. Popivanov and R. J. Miller. Similarity Search Over Time-Series Data Using Wavelets. In ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  35. C. A. Ratanamahatana and E. J. Keogh. Three myths about dynamic time warping data mining. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  36. Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.Google ScholarGoogle Scholar
  37. S. Salzberg. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov., 1(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Steinbach, P.-N. Tan, V. Kumar, S. A. Klooster, and C. Potter. Discovery of climate indices using clustering. In KDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing Multidimensional Time-Series. VLDB J., 15(1), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y.-L. Wu, D. Agrawal, and A. E. Abbadi. A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. In CIKM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana. Fast time series classification using numerosity reduction. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. B.-K. Yi and C. Faloutsos. Fast Time Sequence Indexing for Arbitrary Lp Norms. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE. IEEE Computer Society, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zhu and D. Shasha. Warping Indexes with Envelope Transforms for Query by Humming. In SIGMOD Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Querying and mining of time series data: experimental comparison of representations and distance measures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader