Materials and methods
Results
Term | Number of occurrences | Number of final occurrences | |
---|---|---|---|
1 | AdaBoost | 3 | 3 |
2 | Artificial neural networks (ANNs) | 104 | 53 |
3 | Autoencoder | 4 | 4 |
4 | Classification | 105 | 62 |
5 | Clustering | 22 | 12 |
6 | Convolutional neural networks (CNNs) | 93 | 74 |
7 | Decision trees | 5 | 5 |
8 | Dimensionality reduction | 3 | 3 |
9 | Dynamic time warping (DTW) | 1 | 1 |
10 | Ensemble learning | 15 | 11 |
11 | Feed forward neural network | 2 | 2 |
12 | Fully convolutional networks | 14 | 11 |
13 | Gated recurrent units (GRUs) | 1 | 1 |
14 | Generative Adversarial Networks (GAN) | 11 | 7 |
15 | Gradient boosting | 3 | 2 |
16 | Hidden Markov models (HMMs) | 6 | 6 |
17 | Imitation learning | 3 | 3 |
18 | Instance segmentation | 10 | 10 |
19 | JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset | 26 | 18 |
20 | k-nearest neighbors (kNN) | 6 | 6 |
21 | Kernel | 3 | 3 |
22 | Lasso regression | 1 | 1 |
23 | Logistic regression | 38 | 16 |
24 | Long short-term memory (LSTM) | 18 | 18 |
25 | Multilayer perceptrons | 4 | 4 |
26 | Object detection | 15 | 14 |
27 | Principal component analysis | 5 | 5 |
28 | Random Forests | 21 | 15 |
29 | Recurrent neural networks (RNNs) | 10 | 8 |
30 | Regression | 95 | 49 |
31 | Reinforcement learning | 7 | 7 |
32 | Representational learning | 18 | 15 |
33 | Ridge regression | 1 | 1 |
34 | Semantic segmentation | 11 | 11 |
35 | Supervised learning | 35 | 24 |
36 | Support vector machines (SVM) | 26 | 21 |
37 | Transition state clustering (TSC) | 1 | 1 |
38 | Unsupervised learning | 12 | 12 |
Term | Garrow et al. [15] | Chang et al. [21] | Chen et al. [22] | Egert et al. [9] | Hashimoto et al. [23] | Ma et al. [24] | Zhou et al. [14] | Tanzi et al. [25] | Anteby et al. [17] | Van Amsterdam [26] | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | AdaBoost | Y | |||||||||
2 | Artificial neural networks (ANNs) | Y | Y | Y | Y | Y | |||||
3 | Autoencoder | Y | Y | ||||||||
4 | Classification | Y | Y | Y | Y | Y | Y | Y | |||
5 | Clustering | Y | Y | Y | |||||||
6 | Convolutional neural networks (CNNs) | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
7 | Decision trees | Y | Y | Y | |||||||
8 | Dimensionality reduction | ||||||||||
9 | Dynamic time warping (DTW) | Y | Y | ||||||||
10 | Ensemble learning | ||||||||||
11 | Feed forward neural network | ||||||||||
12 | Fully convolutional networks | Y | |||||||||
13 | Gated recurrent units (GRUs) | Y | Y | ||||||||
14 | Generative Adversarial Networks (GAN) | Y | Y | ||||||||
15 | Gradient boosting | ||||||||||
16 | Hidden Markov models (HMMs) | Y | Y | Y | Y | ||||||
17 | Imitation learning | Y | |||||||||
18 | Instance segmentation | ||||||||||
19 | JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset | Y | Y | Y | |||||||
20 | k-nearest neighbors (kNN) | Y | Y | ||||||||
21 | Kernel | Y | Y | Y | |||||||
22 | Lasso regression | ||||||||||
23 | Logistic regression | Y | Y | Y | |||||||
24 | Long short-term memory (LSTM) | Y | Y | Y | Y | Y | |||||
25 | Multilayer perceptrons | ||||||||||
26 | Object detection | Y | Y | ||||||||
27 | Principal component analysis | Y | Y | Y | |||||||
28 | Random Forests | Y | Y | Y | Y | Y | Y | Y | |||
29 | Recurrent neural networks (RNNs) | Y | Y | Y | Y | Y | Y | Y | |||
30 | Regression | Y | Y | Y | |||||||
31 | Reinforcement learning | Y | Y | Y | Y | ||||||
32 | Representational learning | Y | |||||||||
33 | Ridge regression | ||||||||||
34 | Semantic segmentation | Y | |||||||||
35 | Supervised learning | Y | Y | Y | Y | ||||||
36 | Support vector machines (SVM) | Y | Y | Y | Y | Y | Y | Y | |||
37 | Transition state clustering (TSC) | Y | Y | ||||||||
38 | Unsupervised learning | Y | Y | Y |
Activation function | A non-linear function applied to a linear function representing neurons. The resulting function is non-linear and allows neural networks to approximate complex phenomena. The most used activation functions are: hyperbolic tangent (tanh), sigmoid, and rectified linear unit (ReLU) [27] |
AdaBoost | Short for Adaptive Boosting. It is an ensemble method aggregating several weak learners into a strong one. It first trains a model, e.g. a Decision Tree for classification, to make predictions on the training set. It then increases the relative weight of misclassified training instances. Then it trains a second model with the updated weights and makes predictions on the training set. This time the model makes better predictions. The process is repeated for any additional models [27] |
Annotation | The process of drawing the contours of the objects inside an image |
Area Under the Curve | It is the area under the receiver operating characteristic (ROC) curve. It returns a value ranging from zero to one [28]. It is used for binary classification |
Artificial neural networks (ANNs) | Inspired by biological neurons, these algorithms consist of several layers, each of which includes neurons. Neurons between layers are connected by parameters called weights |
Autoencoder | A neural network based on an encoder and a decoder |
Backpropagation | An algorithm used to train ANNs. Batches of instances traverse the layers of the ANN until an output is computed (forward pass). The difference between the predicted output and real output is computed. It is the error and called loss. Then, the algorithm computes how much each intermediate layer contributed to the error. The algorithm then measures how much of the error contributions came from each connected neuron in the previous layers, working until the algorithm reaches the input layer (backward pass). During the backward pass, the gradients of the error through all the connections in the ANN are computed. The algorithm then adjusts the weights of the ANN through a gradient descent. The process of forward and backward passes continues until the end of training [29, 30] |
Batch normalization | |
Bounding box | A box enclosing an object in an image. It is used for computer vision tasks, like detection and segmentation |
Classification | A typical supervised learning task to predict a target class (a discrete value) [29] |
Clustering | This task aims to identify similar instances and assigning them to clusters, or groups of similar instances. It belongs to unsupervised learning |
Convolutional neural networks (CNNs) | A type of neural networks made up of convolutional layers. They are typically used for analysis of images. In convolutional neural networks a small matrix (called filter or kernel) slides over a larger matrix (e.g. an image which can be described as a 2D matrix or tensor of pixels). The convolution is performed by multiplying the filter pixelwise by the portion of the image and summing the result. [32, 33] |
Cost function | A function evaluating the model. It computes the difference between the predicted and actual value. Some of the most popular cost functions are: mean absolute error and the mean squared error (for regression), and binary cross entropy and categorical cross entropy (for classification) |
Cross-validation | A method to evaluate generalization of models. The most common type of cross-validation is k-fold cross-validation where data are split into parts of equal size, called folds. A model is first trained using the first fold as test set and the other four as training set. Accuracy is evaluated on the first fold. Then, a second model is built using the second fold as test set and the others as training set. Accuracy is evaluated on the second fold. This process is repeated for all k folds. At the end we get an accuracy value for each fold [28] |
Decision trees | These ML models use a hierarchy of if/else question leading to a decision. The purpose is to reach the right answer by asking the minimum number of if/else questions [28]. Decision trees look for the best test for each node. They are used for both classification and regression tasks. They tend to overfit the data |
Decoder | A neural network decompressing a representational vector back to the original domain [34] |
Deep learning (DL) | A subfield of machine learning based on artificial neural networks [32] |
Dimensionality reduction | A machine learning technique to reduce significantly the number of features. It is especially useful when the number of features is so high that some problems seem initially unsolvable [29]. Dimensionality reduction enables to solve efficiently those problems. Then, ML algorithms can be applied after dimensionality reduction without the risk to run out of computer resources. One of the most popular examples is Principal Component Analysis [35] |
Dropout | |
Dynamic time warping (DTW) | A technique to dynamically compare time series of data when the time indices between comparison data points do not sync up perfectly. The time series are ‘‘warped” or matched, together based on their similarities at each time point [15] |
Encoder | It is a network compressing high-dimensional input data into a lower-dimensional representational vector [32] |
Ensemble | The process of learning from an aggregation of models [29]. For instance, random forests are an ensemble of decision trees |
Exploding gradient | A phenomenon in which the gradient of the cost function with respect to each parameter becomes so big causing the weights to receive large updates. This way the training diverges [29] |
False negative (FN) | Ratio of positive instances which are incorrectly classified as negative [29] |
False positive (FP) | Ratio of negative instances which are incorrectly classified as positive [29] |
Features | Independent variables acting as input to the model. In images a feature corresponds to the value of the color channel (e.g. RGB) of each pixel |
Federated learning | A method to train an AI model across multiple decentralized devices on servers holding local data, without exchanging them |
Feed forward neural network | A type of ANNs where the signal flows only in one direction, from the input to the output [29] |
Floating-point operations per second (FLOPS) | A measure of computers performance used in computations with floating-point numbers |
Fully connected layers | In fully connected layers each neuron of one layer is connected to all neurons of the next layers, as in multilayer perceptrons |
Fully convolutional networks | A neural network consisting only of convolutional layers [29] in contrast with conventional CNNs which include both convolutional and fully connected layers |
Gate Recurrent Unit (GRU) | |
Generative Adversarial Imitation Learning (GAIL) | A GAN based learning method to imitate experts’ behavior. The discriminator learns to distinguish generated performances from expert demonstrations, whereas the generator attempts to mimic the expert to fool the discriminator into thinking as its performance was an expert demonstration [38] |
Generative Adversarial Network (GAN) | A type of ANNs with competing networks, called generators and discriminators. The generator takes a random distribution and outputs some data, e.g. an image. The discriminator takes as input either a fake image from the generator or a real image from the training set and must guess if it is real or fake [29, 39] |
Gradient boosting | An ensemble model. Like AdaBoost it corrects its predecessors. Gradient Boosting tries to fit the new model to the residual errors made by the previous one [29] |
Gradient descent | A popular algorithm to tune the parameters to minimize a cost function. Gradient descent measures the gradient of the cost function with regard to a parameter vector. It goes in the direction of descending gradient. Once the gradient is zero, the minimum of the cost function is reached [29] |
Graphics processing unit (GPU) | A chip for parallel computation which results in performance boost for task requiring intensive workload. For this reason, they are used to accelerate AI tasks. A GPU is faster than a central processing unit (CPU) |
Grid search | A method to adjust the hyperparameters of supervised models for the best generalization performance [28] |
Hidden Markov models (HMMs) | A statistical tool that models a system as a Markov process, which is a system existing in a series of distinct states, with transitions between them occurring at random intervals. In a HMM the states of the model are not directly observable [40] |
Hyperparameters | They are parameters which are not estimated from the data. They are used to tune the model parameters |
Imitation learning | Also called “learning from demonstration”, it enables robots to perform autonomously new tasks [14] |
Instance segmentation | A task of computer vision to predict object instances using segmentation mask |
JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) | A publicly available RAS dataset collected through a collaboration between the Johns Hopkins University (JHU) and Intuitive Surgical, Inc. (ISI) [41] |
K-means clustering | A clustering algorithm splitting a set of samples into k groups by minimizing the variation within the cluster [42] |
K-nearest neighbors (k-NN) | A simple ML algorithm considering the k closest points to the point of interest. For classification tasks, the occurrence of the class of each neighbor is counted and the most frequent class is then assigned to the prediction [28]. For a regression task, the prediction is the average value of the neighbors [28] |
Kernel | In ML a kernel is a function capable to perform the product between two vectors. There are different types of kernels: linear, polynomial, Gaussian RBF, and sigmoid [29]. In CNNs a kernel is a small matrix sliding over a larger one (e.g. an image). It is also named filter |
Lasso regression | A type of regression to regularize linear regression. It is also called L1 regularization. It forces some weights of the features to be zero, which means that some features are ignored by the model [28] |
Latent space | A low dimensional space which is mapped to a high-dimensional space. It is used for representational learning |
Layer | Neurons in ANNs are grouped in layers. The first layer is called input layer, the last is called output layer. Neurons of one layer are connected to the neuron of the preceding layer and subsequent layer. There are different types of layers: dense or fully connected, convolutional, deconvolutional, pooling, and recurrent |
Linear regression | A linear model making predictions by computing the weighted sum of the input features plus a bias term (called also intercept) [29] |
Logistic regression | An algorithm used for binary classification. It computes the probability that an instance belongs to a class [29]. If the estimated probability is greater than 50%, then the model predicts that this instance belongs to that class (called positive). Otherwise, it predicts that it belongs to the negative class [29] |
Long Short-Term Memory (LSTM) | A type of RNN specialized in remembering information for a long period of time and not suffering from the vanishing gradient and short memory issues of RNNs [43] |
Machine learning (ML) | ML, a subfield of AI, is the field of study that gives computers the ability to learn without being explicitly programmed [29] |
Model parameters | They are parameters which can be estimated from the data |
Multilayer Perceptron (MLP) | A MLP consists of layers of fully connected neurons. The first layer is called input layer, the last output layer, while the internal ones hidden layers [29] |
Natural language processing (NLP) | A computer science field focused on helping computers to understand human language |
Object detection | A computer vision task consisting of localization and classification |
Optimizer | An algorithm used to tune the value of the parameters (i.e., the weights) of an ANN to minimize the cost function |
Overfitting | A common behavior of ML models performing well on the training data, but not well on unseen data, i.e., the test data [29] |
Padding | A trick used in CNN to have a layer with the same width and height of the previous layer. It requires to add zeros around the inputs [29] |
Perceptron | One of the simplest ANNs where each input has a weight [29] |
Precision | The accuracy of positive predictions [29] |
Precision recall curve | A curve plotting precision and recall for different probability threshold. It is used for binary classification |
Principal component analysis (PCA) | A technique for dimensionality reduction. PCA starts by first finding the axis (direction) which accounts for the largest amount of variance of the data. The second axis is orthogonal to the first and accounts for the largest amount of remaining variance, and so on [29] |
Random Forests | A type of ensemble ML models. It consists of decision trees that can be used for both classification and regression |
Recall | |
Receiver operating characteristic (ROC) curve | A curve displaying true positive rate (recall) versus false positive rate [29]. It is used for binary classification |
Recurrent Neural Network (RNN) | A network made similar to a feedforward network but with connections pointing backward. RNNs are made up of layers of recurrent neurons which receive an input, compute an output and send the output back to them. RNNs have two limitations: vanishing gradient and a limited memory. Both these drawbacks can be solved by LSTMs [29] |
Region Proposal Network (RPN) | A fully CNN taking an image as input and outputting bounding boxes and objectness score (i.e., whether an object is present in an image or not). RPN is an essential component of Faster R-CNN for object detection [44] |
Region of interest (RoI) | An area of an image which may contain an object |
Regression | A task to predict a continuous numeric value [29] |
Regularization | A technique to constrain a model to reduce overfitting [29] |
Reinforcement learning | |
Representational learning | A type of learning where the samples are modeled in a low dimensional latent space instead of the original high dimensional space [32] |
Ridge regression | A type of regression to regularize linear regression. It is also called L2 regularization. It forces the weights of the features to be close to zero, thus minimizing their effect on the result |
Semantic segmentation | A computer vision task whose goal is to label each pixel of image with a class. It is different form instance segmentation since it does not distinguish instances of the same class |
Sensitivity | Another term for recall [29] |
Specificity | It is equal to true negative rate, the ratio of negative instances which are correctly classified as negative [29] |
Stride | In CNN the kernel (filter) slides over an image. Stride defines the number of pixels the filter slides over the image horizontally and vertically |
Spectral Clustering | A clustering method that uses eigenvectors of a matrix derived from the data [28] |
Supervised learning | A type of learning where the training set includes the desired solution, called label [29] |
Support Vector Machine (SVM) | A model computing a line called decision boundary to separate classes (for classification). For regression, SVM tries to fit as many instances as possible on a stripe while limiting margins violation |
Tensor | A multidimensional array or matrix, commonly used in DL |
Tensor Processing Unit (TPU) | A chip specifically designed to process tensors. It is faster than a GPU |
Testing set | The part of data to see how the model performs on unseen (new) data [28] |
Training set | The part of data used to build the model [28] |
Transfer learning | A method in which a pretrained model developed for a task is reused as starting point for another task |
Transition state clustering (TSC) | An unsupervised algorithm exploiting repeated demonstrations of a task by clustering segment endpoints across demonstrations [45]. TSC complements any motion-based segmentation algorithm by identifying candidate transitions, clustering them by kinematic similarity, and then correlating the kinematic clusters with available sensory and temporal feature [45] |
True negative (TN) | Ratio of negative instances which are correctly classified as negative [29] |
True positive (TP) | Ratio of positive instances which are correctly classified as positive [29] |
True positive rate | Another term for recall [29] |
Underfitting | A model which does not perform well in both training data and test data. It occurs typically when the model is too simple. Possible solutions to underfitting include the selection of a more complex algorithm, the use of better features, or the reduction of regularization [29] |
Unsupervised learning | A type of learning where the training set is unlabeled, and the system tries to learn without a teacher. An example of unsupervised learning is clustering |
Validation set | The part of data to select the parameters of the model [28] |
Vanishing gradient | During training of ANNs the gradient of the cost function with respect to each parameter becomes too small so that the weights do not change. This way training does not converge [29] |
Visual odometry | A technique to localize a robot by using only a stream of images acquired from a single or multiple cameras attached to the robot [46] |