In this post we will discuss two hyperparameter tuning algorithms:
Grid Search and Bayesian Optimization TPE
In part 1, we will discuss these topics in this post:
What is Grid Search?
What is the downside of Grid Search?
Grid Search is a simple and straight forward hyperparameter tuning method that is often taught in Machine Learning 101 class.
As the name suggests, grid search evaluates model performance based on a user-defined grid. A grid is simply all the possible combinations of the user-defined values of each hyperparameter.
For example, suppose your model has two hyperparameter values A and B and you would like to test the following values where set A = [10, 11] and set B = [0.1, 0.2], then the grid combinations are simply [10, 0.1], [10, 0.2], [11, 0.1], [11, 0.2].
Grid-search is very simple to understand, execute, and parallelize, but there are two major downsides:
Grid-search is very computationally expensive. If one’s model has many hyperparameters and one has no prior knowledge on which one to tune and what values to test, the computational resources needed increase exponentially with each hyperparameter added into the tuning process. For example, if one wants to test 10 values each for two hyperparameters A and B, one will need to test 10^2 = 100 combinations. If one has 4 hyperparameters A, B, C and D, one will need to compute 10^4 = 10000 combinations. Unless one has vast computational resources for large-scale parallelization, the tuning process for a big dataset can be horrifyingly long.
Even if one is willing to put in the computational resources, there is no guarantee that the search will yield good results. For hyperparameters that have a wide, continuous range, optimal combinations may lie between or beyond the user-defined grid. Very often, in practice, user needs to run multiple rounds testing different ranges to narrow the search space down. See illustration below that shows how optimal solution is missed due to the restriction of a grid structure.
The optimal combination lies between the grid intersection at the dot located inside the bottom square.
This above diagram probably also reminds a lot of the readers about an alternative method called Random Search. But since that is also a rather beginner-level hyperparameter tuning method and is not the highlight of this post discussion, we will skip the details of that and go straight to the Bayesian Optimization hyperparameter tuning method in part 2.
REFERENCES:
Bajaj, A., & ES, S. (2022, March 4). NeptuneBlog. Retrieved from Hyperparamter Tuning in Python: a Complete Guide: https://neptune.ai/blog/hyperparameter-tuning-in-python-complete-guide