Classification of time series by shapelet transformation

Hills, Jon; Lines, Jason; Baranauskas, Edgaras; Mapp, James; Bagnall, Anthony

doi:10.1007/s10618-013-0322-1

Classification of time series by shapelet transformation

Published: 18 May 2013

Volume 28, pages 851–881, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Jon Hills¹,
Jason Lines¹,
Edgaras Baranauskas¹,
James Mapp¹ &
…
Anthony Bagnall¹

9003 Accesses
347 Citations
7 Altmetric
Explore all metrics

Abstract

Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A shapelet is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best \(k\) shapelets, which are used to produce a transformed dataset, where each of the \(k\) features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bagnall A, Hills J, Lines J (2012) Shapelet based time series classification. http://www.uea.ac.uk/computing/machine-learning/Shapelets. Accessed 14 May 2013
Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. Proceedings of the twelfth SIAM conference on data mining (SDM)
Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. Proceedings of the eleventh SIAM conference on data mining (SDM)
Bober M (2001) Mpeg-7 visual shape descriptors. IEEE Trans Circ Syst Video Technol 11(6):716–719
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Buza K (2011) Fusion methods for time-series classification. Ph.D. thesis, University of Hildesheim, Germany
Campana S, Casselman J (1993) Stock discrimination using otolith shape analysis. Can J Fish Aquat Sci 50(5):1062–1083
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Davis LM, Theobald B-J, Lines J, Toms A, Bagnall A (2012) On the segmentation and classification of hand radiographs. Int J Neural Syst 22(5):1250020
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Deng H, Runger G, Tuv E, Vladimir M (2011) A time series forest for classification and feature extraction. Tech. rep., Arizona State University
De Vries D, Grimes C, Prager M (2002) Using otolith shape analysis to distinguish eastern gulf of mexico and atlantic ocean stocks of king mackerel. Fish Res 57(1):51–62
Article Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Google Scholar
Duarte-Neto P, Lessa R, Stosic B, Morize E (2008) The use of sagittal otoliths in discriminating stocks of common dolphinfish (coryphaena hippurus) off northeastern brazil using multishape descriptors. ICES J Mar Sci J du Conseil 65(7):1144–1152
Article Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Article MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Hartmann B, Link N (2010) Gesture recognition with inertial sensors and optimized dtw prototypes. Systems man and cybernetics (SMC), 2010 IEEE international conference on. IEEE pp 2102–2109
He Q, Dong Z, Zhuang F, Shi Z (2012) Fast Time Series Classification based on infrequent shapelets. Machine learning and applications (ICMLA), 2012 11th international conference on. IEEE, pp 215–219
Hoare C (1962) Quicksort. Comput J 5(1):10–16
Article MATH MathSciNet Google Scholar
Image Processing and Informatics Lab, University of Southern California, The Digital Hand Atlas Database System. http://www.ipilab.org/BAAweb
Janacek G, Bagnall A, Powell M (2005) A likelihood ratio distance measure for the similarity between the fourier transform of time series. Proceedings of the Ninth Pacific-Asia Conference on knowledge discovery and data mining (PAKDD)
Jeong Y, Jeong M, Omitaomu O (2010) Weighted dynamic time warping for time series classification. Pattern Recognit 44:2231–2240
Article Google Scholar
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. Proceedings of the thirteenth SIAM conference on data mining (SDM)
Kruskal W (1952) A nonparametric test for the several sample problem. Ann Math Stat 23(4):525–540
Article MATH MathSciNet Google Scholar
Latecki L, Lakamper R, Eckhardt T (2000) Shape descriptors for non-rigid shapes with a single closed contour. Computer vision and pattern recognition, 2000. Proceedings. IEEE conference on vol. 1. IEEE, pp 424–429
Lines J, Bagnall A (2012) Alternative quality measures for time series shapelets. Intelligent data engineering and automated learning (IDEAL). Lect Notes Comput Sci 7435:475–483
Article Google Scholar
Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297
Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, New York
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1154–1162
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270
Rakthanmanon T, Keogh E (2013) Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. Proceedings of the thirteenth SIAM conference on data mining (SDM)
Rodriguez J, Alonso C (2005) Support vector machines of interval-based features for time series classification. Knowl-Based Syst 18:171–178
Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. Pattern Anal Mach Intell IEEE Trans 28(10):1619–1630
Article Google Scholar
Shannon C, Weaver W, Blahut R, Hajek B (1949) The mathematical theory of communication, vol 117. University of Illinois press, Urbana
MATH Google Scholar
Sivakumar P et al. (2012) Human gait recognition and classification using time series shapelets. International conference on advances in computing and communications (ICACC)
Stransky C (2005) Geographic variation of golden redfish (sebastes marinus) and deep-sea redfish (s. mentella) in the north atlantic based on otolith shape analysis. ICES J Mar Sci J du Conseil 62(8):1691–1698
Article Google Scholar
Wu Y, Agrawal D, Abbadi AE (2000) A comparison of dft and dwt based similarity search in time-series databases. Proceedings of the ninth international conference on information and knowledge management (ACM CIKM)
Xing Z, Pei J, Yu P (2012) Early classification on time series. Knowl Inform syst 31(1):105–127
Article Google Scholar
Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. Proceedings of the eleventh SIAM conference on data mining (SDM)
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182
Article MATH MathSciNet Google Scholar
Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. Data mining (ICDM), 2012 IEEE 12th international conference. pp. 785–794

Download references

Author information

Authors and Affiliations

University of East Anglia, Norwich, NR4 7TJ, UK
Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp & Anthony Bagnall

Authors

Jon Hills
View author publications
You can also search for this author in PubMed Google Scholar
Jason Lines
View author publications
You can also search for this author in PubMed Google Scholar
Edgaras Baranauskas
View author publications
You can also search for this author in PubMed Google Scholar
James Mapp
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Bagnall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jon Hills.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hills, J., Lines, J., Baranauskas, E. et al. Classification of time series by shapelet transformation. Data Min Knowl Disc 28, 851–881 (2014). https://doi.org/10.1007/s10618-013-0322-1

Download citation

Received: 07 February 2013
Accepted: 06 May 2013
Published: 18 May 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10618-013-0322-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of time series by shapelet transformation

Abstract

Access this article

Similar content being viewed by others

Binary Shapelet Transform for Multiclass Time Series Classification

Binary Shapelet Transform for Multiclass Time Series Classification

A Hybrid Approach to Time Series Classification with Shapelets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification of time series by shapelet transformation

Abstract

Access this article

Similar content being viewed by others

Binary Shapelet Transform for Multiclass Time Series Classification

Binary Shapelet Transform for Multiclass Time Series Classification

A Hybrid Approach to Time Series Classification with Shapelets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation