Background
Methods
VDAART cohort
Characteristics | Healthy (n = 249) | Asthmatic (n = 83) |
---|---|---|
Gender | ||
Male | 128 | 33 |
Female | 121 | 50 |
Race | ||
Asian | 20 | 2 |
Black, African American | 100 | 42 |
Native Hawaiian | 2 | 1 |
White | 86 | 30 |
Others | 40 | 6 |
Mother’s age (year) | 28.11 \(\pm\) 5.66 | 27.15 \(\pm\) 6.09 |
Mother’s gestation age in days, at enrollment | 96.51 \(\pm\) 18.78 | 101.11 + 19.76 |
Vitamin blood values (ng/ml) at Enrollment visit | 24.42 \(\pm\) 1046 | 23.00 \(\pm\) 9.99 |
Site name | ||
Boston Medical Center | 45 | 25 |
Kaiser Permanente Southern California Region | 104 | 23 |
Washington University at Saint Louis | 100 | 35 |
Prediction methods and performance evaluation
Method | Description | Refs. |
---|---|---|
Linear models | ||
LR | Logistic Regression models the probability of object belonging to a class by having the log-odds for the class to be a linear combination of features | [42] |
LRCV | Logistic Regression with build-in validation support to find the optimal parameters | [42] |
LR-VAE | Logistic Regression with reduced features using VAE (Variational AutoEncoder) | |
LRCV-VAE | LRCV-VAE: Logistic Regression with build-in validation support to find the optimal parameters and reduced features using VAE | |
Nearest neighbors | ||
KNN | k-nearest neighbors algorithm that predicts the class of object to the class of most common among its k nearest neighbors | [45] |
Support vector machine | ||
SVC | C-Support Vector Classification is a method for classification by constructing a set of hyperplanes in high dimensional space | [46] |
Ensemble methods | ||
AdaBoost | AdaBoost algorithm is an iterative procedure that tries to approximate the Bayes classifiers by combining many weak classifiers | |
GTB | Learning procedure in Gradient Tree Boosting consecutively fit new models to provide a more accurate estimate of the response variable | |
RF | Random forest is an ensemble classifier by constructing many decision trees and the final prediction is selected by most trees | [51] |
Bagging | Bagging algorithm is a method for generating multiple versions of a predictor, then using these predictions to get an aggregated predictor | [52] |
Ensemble | Aggregate the predictions of all other classifiers together. The continuous probability of a subject being asthmatic is the average probabilities of 15 methods, and a subject is predicted as asthmatic if it was predicted as asthmatic by at least 7 methods | |
Decision trees | ||
DecisionTree | Decision Trees predict the response value by learning simple decision rules inferred from the data features | [53] |
ERT | An extremely randomized tree classifier is a tree-based ensemble method consisting of randomizing strongly both attribute and cut point choice | [54] |
Naïve Bayes | ||
BernoulliNB | Implements the Naïve Bayes training and classification for data that is distributed based on multivariate Bernoulli distribution | [55] |
GaussianNB | Implements the Naïve Bayes training and classification for data that is distributed based on multivariate Gaussian distribution | [56] |
Neural networks | ||
MLP | Multi-layer Perceptron in a fully connected feedforward neural networks with at least three layers | [57] |
MOGONET | MOGONET is a multi-omics data analysis framework for classification tasks utilizing graph convolutional networks | [14] |
Tabnet | Tabnet uses a canonical deep neural networks architecture for tabular data with interpretability | [21] |