In this post we will continue our discussion on feature selection algorithms:
In this part 2, we will mainly introduce the feature selection algo Boruta
What is Boruta? Why Boruta
Illustrations on the mechanism of Boruta: The use of permutation and binomial distribution
Boruta is a feature selection method that is statistically grounded and robust. Its key characteristics are that it makes use of a permutation method and the binomial distribution to determine a feature’s “usefulness”.
Boruta was originally invented by two Polish researchers at the University of Warsaw: Miron Kursa and Witold Rudnick.
Below are some of the characteristics that distinguish Boruta from stepwise forwardd feature selection:
Unlike stepwise forward feature selection, the method is not a greedy algorithm, and thus does not depend on the selection process’s “current state”.
Moreover, Boruta does not make features compete among themselves. Instead, the features’ competitors are the randomized versions of themselves, which are called “shadow features”.
Also, compared to stepwise forward feature selection, Boruta does not aim at finding a minimal subset of features, but finding all relevant features that are robust in predicting the response. Therefore, Boruta helps us better understand the true predictive power of the features.
The following illustrates the mechanism of Boruta:
Suppose we have 3 features: age, height and weight, and we want to use them to predict income.
The use of Permutation Method
1. As the first step, Boruta will double the number of features by making a shadow copy of them.
2. Boruta will then use an estimator (E.g.: XGBoost) to fit these 6 features (original + shadow) on the response (income in this example).
3. Boruta calculates all 6 features’ importance.
4. On each trial run, if an original feature’s importance is higher than the highest shadow feature’s importance. It is considered as a “success”.
The central idea of this mechanism is that: a feature is useful only if it’s capable of doing better than the best randomized feature.
In this one run, we see that age and height scored higher feature importance % than the the randomized versions of themselves. (Age 39% vs Shadow_age 11% ; Height 19% vs Shadow_height 14%) Therefore, in this one trial, the two features age and height got scored a 'hit' while the feature weight did not.
Suppose we did 20 trial runs with different versions of the randomized features (It means that at each trial, the features are shuffled differently to form the shadows) and got the following results
“Age” has higher feature importance score than all the randomized shadow version of itself 20 out of 20 times, while “height” only 4/20 times and weight 0/20 times.
Now, with the above results, you may ask: how do we then make the decision on which features to keep and which to drop?
At this point, Boruta will use the binomial distribution to help us decide.
The use of Binomial Distribution
Think of each run as an independent trial and the probability of getting a “success” randomly is 50%, like tossing a coin, we will then get a probability distribution for 20 trials like this
If we set our probability threshold (alpha in statistics) at 0.01, meaning each side of the tails consists of 0.5% of the distribution, we will then get the red rejection region, green acceptance (strong features) region and blue (weak features) undecided region as shown in the above graph.
In this example, it means that Boruta suggests dropping weight, keeping age, and leaving height up to us because the binomial distribution indicates that it is very unlikely for “age” to outperform the shadow features 20 out of 20 times, and for “weight” to be outperformed by the shadow features 0 out of 20 times due to random chance.
REFERENCES:
Kordeczka, A. (2018, March 13). Retrieved from Boruta - modern dimension reduction algorithm: http://rstudio-pubs-static.s3.amazonaws.com/369273_87ccc31e36c44bb886a5dfbf5865bb1c.html
Mazzanti, S. (2020, Mar 17). Boruta Explained Exactly How You Wished Someone Explained to You. Retrieved from Towards Data Science: https://towardsdatascience.com/boruta-explained-the-way-i-wish-someone-explained-it-to-me-4489d70e154a
Homola, D. (2015, May 8). BorutaPy. Retrieved from DanielHomola: https://danielhomola.com/feature%20selection/phd/borutapy-an-all-relevant-feature-selection-method/