Article

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Authors:
Alex Graves

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland
View Profile

,
Santiago Fernández

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland
View Profile

,
Faustino Gomez

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland
View Profile

,
Jürgen Schmidhuber

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland and Technische Universität München (TUM), Garching, Munich, Germany

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Manno-Lugano, Switzerland and Technische Universität München (TUM), Garching, Munich, Germany
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 369–376https://doi.org/10.1145/1143844.1143891

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 369–376

ABSTRACT

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

References

Bengio., Y. (1999). Markovian models for sequential data. Neural Computing Surveys, 2, 129--162.Google Scholar
Bishop, C. (1995). Neural Networks for Pattern Recognition, chapter 6. Oxford University Press, Inc. Google ScholarCross Ref
Bourlard, H., & Morgan, N. (1994). Connnectionist speech recognition: A hybrid approach. Kluwer Academic Publishers. Google ScholarDigital Library
Bridle, J. (1990). Probabilistic interpretation of feed-forward classification network outputs, with relationships to statistical pattern recognition. In F. Soulie and J. Herault (Eds.), Neurocomputing: Algorithms, architectures and applications, 227--236. Springer-Verlag.Google Scholar
Gers, F.; Schraudolph, N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3, 115--143. Google ScholarDigital Library
Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional LSTM networks for improved phoneme classification and recognition. Proceedings of the 2005 International Conference on Artificial Neural Networks. Warsaw, Poland. Google ScholarDigital Library
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18, 602--610. Google ScholarDigital Library
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9, 1735--1780. Google ScholarDigital Library
Kadous, M. W. (2002). Temporal classification: Extending the classification paradigm to multivariate time series. Doctoral dissertation, School of Computer Science & Engineering, University of New South Wales. Google ScholarDigital Library
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning (pp. 282--289). Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
LeCun, Y., Bottou, L., Orr, G., & Muller, K. (1998). Efficient backprop. Neural Networks: Tricks of the trade. Springer. Google ScholarDigital Library
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE (pp. 257--286). IEEE.Google ScholarCross Ref
Robinson, A. J. (1991). Several improvements to a recurrent error propagation network phone recognition system (Technical Report CUED/FINFENG/TR82). University of Cambridge.Google Scholar
Robinson, A. J. (1994). An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks, 5, 298--305.Google ScholarDigital Library
Schraudolph, N. N. (2002). Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent. Neural Comp., 14, 1723--1738. Google ScholarDigital Library
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45, 2673--2681. Google ScholarDigital Library
Werbos, P. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78, 1550--1560.Google ScholarCross Ref

Index Terms

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Chinese Audio Transcription Using Connectionist Temporal Classification
ICCCM '20: Proceedings of the 8th International Conference on Computer and Communications Management

Mandarin is one of the global languages that have large users and speakers. There are several important factors for learners to be an expert in Mandarin. To be able to communicate properly, mastery in Chinese character (hànzì) and pīnyīn are required. ...
Read More
Temporal computation in connectionist models
Read More
A Study of All-Convolutional Encoders for Connectionist Temporal Classification
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,795
  Total Citations
  View Citations
- 9,175
  Total Downloads
- Downloads (Last 12 months)1,044
- Downloads (Last 6 weeks)144
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Audio Transcription Using Connectionist Temporal Classification

Temporal computation in connectionist models

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Audio Transcription Using Connectionist Temporal Classification

Temporal computation in connectionist models

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media