Saturday, February 18, 2017

Points to consider while building a Machine Learning model

The objective of this post is to list down some of the pointers to keep in mind while building a Machine Learning model.

  • Always start with the simplest of models. You can increase the complexity if the performance of a simple model is inadequate.
  • Understand your dataset first. 
  • Build a baseline model before building any prediction model. I will expand on this further in another post.
  • Complex models tend to over fit and simpler models tend to under fit. It is your job to find a balance between these two.
  • High bias and low variance - A property of simpler models. Suggests under fitting.
  • High variance and low bias - A property of complex models. Suggests over fitting.
  • In any Machine Learning model, if the number of parameters is greater than the number of training examples, beware. It leads to over fitting. Try considering a simpler model with lesser number of parameters or reduce the number of hidden layers or anything else to reduce the number of parameters of the model.
  • Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
  • Regularization is very very important. Therefore, consider using an XGBoost model instead of Random Forest.
Some terms to keep in mind:
  • Stratified Sampling - When the training data is overly skewed, the practice of picking the samples such the final training data has the distribution you need.
  • Bootstrapping - Evaluating the same model with different random seeds. 

No comments:

Post a Comment