2. Principles and algorithms for bagging and boosting Random Forests and AdaBoosting
Bagging and boosting are ensemble methods in machine learning, where multiple models are used to achieve better predictive performance compared to using a single model. Bagging reduces variance, while boosting reduces bias. Two popular algorithms that make use of these methods are Random Forests (bagging) and AdaBoost (boosting).
Overview¶
- Decrease Variance - Bagging
- Decrease Bias - Boosting
- Improve Predictions - Stacking
Bagging: Random Forests¶
Bagging (Bootstrap Aggregating) involves training multiple models independently and then combining their predictions. Each model is trained on a different subset of data, created by sampling with replacement (bootstrapping) from the original dataset.
Pros¶
- Reduces variance: by averaging predictions from multiple models
- Reduces overfitting: by using multiple models trained on different subsets of the data
Cons¶
- Increases complexity: using multiple models can make the algorithm more complex and computationally expensive
- Limited impact on bias: if the models in the ensemble are biased, bagging may not improve performance.
Random Forests¶
Random Forests is a bagging algorithm which constructs a multitude of decision trees at training time and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
- Each decision tree is built on a different subset of the original data.
- At each node in the tree, only a random subset of features is considered for splitting.
- The final prediction is an average (for regression) or majority vote (for classification) across all trees.
Random Forests help to improve the predictive accuracy and control over-fitting by introducing randomness into the ensemble. The variance of the model is reduced without increasing the bias, which leads to a better model.
Hyperparameters¶
- Number of trees
- Size of the bootstrap
- Number of features
Boosting: AdaBoost¶
Boosting is an ensemble method that combines several weak learners to form a strong learner. A weak learner classifies with accuracy slightly better than random guessing, while a strong learner has a low error rate. Boosting algorithms train weak learners in sequence, each trying to correct its predecessor.
- Weak learner: A weak learner is a machine learning algorithm that performs slightly better than random guessing. Examples of weak learners include decision stumps , naive Bayes, and k-nearest neighbors
AdaBoost¶
AdaBoost, short for "Adaptive Boosting", is one of the first and most popular boosting algorithms.
- AdaBoost assigns equal weights to all samples and chooses a weak classifier that minimizes the error rate.
- After evaluating the first learner, AdaBoost increases the weights of the misclassified samples, so that these samples will make up a larger part of the next classifier's training set.
- The process is repeated, each time assigning higher weights to the misclassified samples.
The final prediction is a weighted vote (in classification) or weighted sum (in regression) of the predictions made by the individual learners.
AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. The key idea is to set the weights of classifiers and training the data sample in each iteration such that it ensures the accurate predictions of unusual observations.
AdaBoost sometimes works with Decision Stumps. Decision Stumps are like trees in a Random Forest, but not "fully grown." They have one node and two leaves. AdaBoost uses a forest of such stumps rather than trees.
Algorithm¶
- Initialize Weights: Begin with equal weights for each data point.
- Build a Weak Learner: Train a basic model on the data.
- Compute Error: Calculate the total error of this weak learner, which is the sum of weights of misclassified points.
- Compute Learner's Importance: Calculate the importance of the current learner based on its error rate; lower error rates get higher importance.
- Update Weights: Adjust weights of data points; increase for misclassified points, decrease for correctly classified ones.
- Repeat: Repeat steps 2-5 for a set number of iterations, or until the error rate doesn't improve.
- Form the Final Classifier: Create the final model by taking a weighted vote of the weak classifiers.
Pros¶
- Versatility: AdaBoost is robust and versatile, ideal for various practical applications.
- Complex Classification: Excellently handles hard-to-classify data points in complex classification problems.
- Accuracy: By iteratively training weak classifiers and integrating their results, AdaBoost offers highly accurate predictions.
Cons¶
- Overfitting Risk: Prone to overfitting, especially on noisy datasets, although this can be controlled with regularization techniques.
- Selection and Tuning Complexity: Choosing appropriate weak classifiers and hyperparameter tuning require expertise, potentially making implementation more complex.