1. Evaluace modelů Loss Functions, Test Errors, Evaluation of Regression and Classification
Loss Function¶
A Loss Function is a method for evaluating how well your algorithm models your dataset. If your predictions deviate too much from actual results, loss function would output a large number. Loss functions allow you to measure how well your approach performs and how much your predictions 'miss' the mark.
- Suppose we have a target variable Y and a vector of features (inputs) X.
- The model predicts the value of Y through \hat{Y} ≡ \hat{Y} (X), a function of X.
- The discrepancy or deviation of the predicted value of Y from its true value is quantified by the loss function, denoted by L.
Regression tasks¶
Squared Error Loss¶
This loss function calculates the square of the difference between the actual and predicted values.
- L(Y, \hat{Y} ) = (Y − \hat{Y} )^2
Absolute Error Loss¶
This loss function calculates the absolute difference between the actual and predicted values.
- L(Y, \hat{Y} ) = |Y − \hat{Y} |
Classification tasks¶
For binary classification tasks, we consider the binary probability as:
- \hat{p}(x)=\hat{P}(Y=1|X=x)
Binary Cross-Entropy Loss¶
The binary cross-entropy loss is commonly used for models which output probabilities. It is the go-to loss function for binary classification problems.
- L (Y, \hat{p}(x)) = −Y log(\hat{p}(x)) − (1 − Y ) log(1 − \hat{p}(x))
Training Loss¶
During the training of a model, we aim to minimize the objective L, termed the training loss. This training loss is essentially the average of the loss function over a training set, comprising of N pairs (Y_i,x_i). L = 1/N \sum^{N}_{i=1}L(Y_i,\hat{Y}x_i)
Often referred to as the training error and denoted by \overline{err}_{train}, this training loss provides an estimate of how well the model is performing on the training dataset. It's a measure of the error when the model is predicting outcomes for the data it has been trained on.
- In regression tasks, employing a squared error loss function leads to the minimization of the mean squared error (MSE).
- In binary classification tasks, the usage of binary cross-entropy loss leads to the minimization of the binary cross-entropy.
Regression tasks¶
Mean Squared Error (MSE)¶
The mean squared error (MSE) loss calculates the mean of the square differences between the actual and predicted values.
- L = 1/N \sum^{N}_{i=1}(Y_i − \hat{Y_i} )^2
Classification tasks¶
Binary Cross-Entropy Loss¶
This is the average of the binary cross-entropy losses across all instances in the dataset.
- L (Y, \hat{p}(x)) = 1/N \sum^{N}_{i=1}(\text{binary cross-entropy})
In the training phase, with fixed hyper-parameters, the aim is to discover the parameter values of the model that minimize the training error. In linear regression, these values can often be found explicitly, but for many models, optimization methods such as gradient descent are used to find the minimum, which is typically local. This process also usually involves selecting a differentiable loss function.
Once the model is trained, it is crucial to analyze its generalization performance, or how well it performs on unseen data.
Evaluation of Regression & Classification¶
Different measures can be used to evaluate the performance of regression and classification models:
- Mean Squared Error (MSE): MSE calculates the average squared differences between the actual and predicted values. It is sensitive to large errors and outliers.
- \text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2
- Root Mean Squared Error (RMSE): RMSE calculates the square root of the MSE, providing an interpretable measure that has the same units as the output. It is also sensitive to large errors and outliers.
- \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2}
- Root Mean Squared Logarithmic Error (RMSLE): RMSLE is a variation of RMSE that calculates the square root of the mean squared differences of the logarithm of the actual and predicted values. It is less sensitive to outliers and focuses on relative differences.
- \text{RMSLE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(\log(Y_i+1) - \log(\hat{Y}_i+1))^2}
These metrics enable a more nuanced understanding of model performance, allowing you to gauge how well your model generalizes to unseen data. Note that selection of these measures will depend on the nature of your task and the specific problem you are trying to solve.