Abstract
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.
- Additional Experiment Results for Representation and Similarity Measures of Time Series. http://www.ece.northwestern.edu/~hdi117/tsim.htm.Google Scholar
- R. T. Ng (2006), Note of Caution. http://www.cs.ubc.ca/~rng/psdepository/chebyReport2.pdf.Google Scholar
- H. André-Jönsson and D. Z. Badal. Using signature files for querying time-series data. In PKDD, 1997.Google ScholarCross Ref
- J. Aßfalg, H.-P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Similarity search on time series based on threshold queries. In EDBT, 2006.Google Scholar
- D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, 1994.Google ScholarDigital Library
- Y. Cai and R. T. Ng. Indexing spatio-temporal trajectories with chebyshev polynomials. In SIGMOD Conference, 2004. Google ScholarDigital Library
- M. Cardle. Automated motion editing. In Technical Report, Computer Laboratory, University of Cambridge, UK, 2004.Google Scholar
- L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, 2004. Google ScholarDigital Library
- L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, 2005. Google ScholarDigital Library
- L. Chen, M. T. Özsu, and V. Oria. Using multi-scale histograms to answer pattern existence and shape match queries. In SSDBM, 2005. Google ScholarDigital Library
- Q. Chen, L. Chen, X. Lian, Y. Liu, and J. X. Yu. Indexable PLA for Efficient Similarity Search. In VLDB, 2007. Google ScholarDigital Library
- Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung. SpADe: On Shape-based Pattern Detection in Streaming Time Series. In ICDE, 2007.Google ScholarCross Ref
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference, 1994. Google ScholarDigital Library
- E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, 2007.Google ScholarCross Ref
- P. Geurts. Pattern Extraction for Time Series Classification. In PKDD, 2001. Google ScholarDigital Library
- P. Geurts. Contributions to Decision Tree Induction: bias/variance tradeoff and time series classification. PhD thesis, University of Lige, Belgium, May 2002.Google Scholar
- Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, CA, 2005. Google ScholarDigital Library
- I. Karydis, A. Nanopoulos, A. N. Papadopoulos, and Y. Manolopoulos. Evaluation of similarity searching methods for music data in P2P networks. IJBIDM, 1(2), 2005. Google ScholarDigital Library
- K. Kawagoe and T. Ueda. A Similarity Search Method of Time Series Data with Combination of Fourier and Wavelet Transforms. In TIME, 2002. Google ScholarDigital Library
- E. Keogh, X. Xi, L. Wei, and C. Ratanamahatana. The UCR Time Series dataset. In http://www.cs.ucr.edu/~eamonn/time_series_data/, 2006.Google Scholar
- E. J. Keogh. Exact Indexing of Dynamic Time Warping. In VLDB, 2002. Google ScholarDigital Library
- E. J. Keogh. A Decade of Progress in Indexing and Mining Large Time Series Databases. In VLDB, 2006. Google ScholarDigital Library
- E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In SIGMOD Conference, 2001. Google ScholarDigital Library
- E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst., 3(3), 2001.Google Scholar
- E. J. Keogh and S. Kasetty. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Min. Knowl. Discov., 7(4), 2003. Google ScholarDigital Library
- E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowl. Inf. Syst., 7(3), 2005.Google Scholar
- S.-W. Kim, S. Park, and W. W. Chu. An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases. In ICDE, 2001. Google ScholarDigital Library
- R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, 1995. Google ScholarDigital Library
- F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD Conference, 1997. Google ScholarDigital Library
- J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov., 15(2), 2007. Google ScholarDigital Library
- M. D. Morse and J. M. Patel. An efficient and accurate method for evaluating time series similarity. In SIGMOD Conference, 2007. Google ScholarDigital Library
- Pang-Ning Tan and Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison-Wesley, Reading, MA, 2005. Google ScholarDigital Library
- K. pong Chan and A. W.-C. Fu. Efficient Time Series Matching by Wavelets. In ICDE, 1999.Google Scholar
- I. Popivanov and R. J. Miller. Similarity Search Over Time-Series Data Using Wavelets. In ICDE, 2002.Google ScholarCross Ref
- C. A. Ratanamahatana and E. J. Keogh. Three myths about dynamic time warping data mining. In SDM, 2005.Google ScholarCross Ref
- Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.Google Scholar
- S. Salzberg. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov., 1(3), 1997. Google ScholarDigital Library
- M. Steinbach, P.-N. Tan, V. Kumar, S. A. Klooster, and C. Potter. Discovery of climate indices using clustering. In KDD, 2003. Google ScholarDigital Library
- M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, 2002. Google ScholarDigital Library
- M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing Multidimensional Time-Series. VLDB J., 15(1), 2006. Google ScholarDigital Library
- Y.-L. Wu, D. Agrawal, and A. E. Abbadi. A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. In CIKM, 2000. Google ScholarDigital Library
- X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana. Fast time series classification using numerosity reduction. In ICML, 2006. Google ScholarDigital Library
- B.-K. Yi and C. Faloutsos. Fast Time Sequence Indexing for Arbitrary Lp Norms. In VLDB, 2000. Google ScholarDigital Library
- B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE. IEEE Computer Society, 1998. Google ScholarDigital Library
- Y. Zhu and D. Shasha. Warping Indexes with Envelope Transforms for Query by Humming. In SIGMOD Conference, 2003. Google ScholarDigital Library
Index Terms
- Querying and mining of time series data: experimental comparison of representations and distance measures
Recommendations
Experimental comparison of representation methods and distance measures for time series data
The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for ...
A decade of progress in indexing and mining large time series databases
VLDB '06: Proceedings of the 32nd international conference on Very large data basesTime series data is ubiquitous; large volumes of time series data are routinely created in scientific, industrial, entertainment, medical and biological domains. Examples include gene expression data, electrocardiograms, electroencephalograms, gait ...
Bounded similarity querying for time-series data
We define the problem of bounded similarity querying in time-series databases, which generalizes earlier notions of similarity querying. Given a (sub)sequence S, a query sequence Q, lower and upper bounds on shifting and scaling parameters, and a ...
Comments