Wednesday, February 22, 2017

Neural Networks over Random Forest

There has been many works published in the academia that prove that a neural network has a higher capability of capturing the underlying pattern much more effectively than statistical models. This fact is embraced by the industry as well where more and more applications leverage the use of Neural Networks.

I am not going to reiterate on the various reasons why a Neural Network is better than statistical models, as I would not be able to do justice to that. I am going to demonstrate using a simple model how a neural network outperforms statistical model.

I recently came across a very simple problem on Kaggle, the Kobe's Shot Selection - Kobe Bryant Shot Selection. It is one of the simplest problem for any beginner data scientist to cut his teeth on.

I went through a lot of solutions for this problem, which were unsurprisingly filled with Random Forests and XGBoost Classifiers. This was as expected, as XGBoost models are a proven winner in a 70% of Kaggle competitions and also, a Decision Tree would be the most intuitive model to model this particular problem.

I tried many variations of the same and was able to climb upto rank 240 using the XGBoost models. But these models relied heavily on extensive feature engineering. Me, personally, being from a non sports background, it was rather difficult to identify these features which would be relevant to the problem. I had to read blogs and other solutions to understand that certain features like "Is it the last 5 mins of the game?" or "Is it a home or away match?", are very important in order to predict the outcome. This requirement of domain knowledge is a shot in the knee for someone with limited domain knowledge.

The next model I built, was a simple Feed Forward Neural Network with one hidden layer. The input dimension was 197, the hidden layer dimension was 30 and the output was a single sigmoid neuron, optimizing on binary cross entropy.



Such a simple model with practically no feature engineering at all was able to put me in the 25th position on the public leaderboard, Public Leaderboard Ranking.

This leads me to one of the main advantages of neural networks over statistical models. You do not need any domain knowledge in order to build a model. Although the winning solution implements an XGBoost model, I am sure it involves a lot of feature engineering, which is a time consuming task.

Some of the points to keep in mind while building a neural network are:

  1. Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
  2. The number of parameters of the model should be significantly less than the number of training examples. If it is greater, it will lead to over fitting.
  3. Use neurons which align to the objective of the problem. For example, in this case, the model was being evaluated on the logloss and hence it makes sense to use sigmoid neurons.

12 comments:

  1. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    Data science training in Marathahalli|
    Data science training in Bangalore|
    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|

    ReplyDelete
  2. Hi,
    Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take python training in bangalore. Because python course in Bangalore is one of the best that one can do while choosing the course.

    ReplyDelete
  3. thank you sharing this blog, it is very useful understanding the machine learning.
    Machine learning training bangalore

    ReplyDelete
  4. Really awesome blog!!! I finally found great post here.I really enjoyed reading this article. Nice article on data science . Thanks for sharing your innovative ideas to our vision. your writing style is simply awesome with useful information. Very informative, Excellent work! I will get back here.
    Data Science Course

    Data Science Training in Chennai

    Data Science Training in Velachery

    Data Science Training in Tambaram

    Data Science Training in Porur

    Data Science Training in Omr
    Data Science Training in Annanagar

    ReplyDelete
  5. Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also

    SEO Cheltenham
    SEO Agency Gloucester
    Best SEO Company Gloucester
    Best Web Design Company Cheltenham

    ReplyDelete
  6. Join the Best Machine Learning Course in Hyderabad program at AI Patasala to become an early leader on this booming platform.
    AI Patasala Machine Learning Course

    ReplyDelete
  7. Good site! I really love how it is simple on my eyes and the data are well written. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your RSS which must do the trick! Have a great day! data science from scratch

    ReplyDelete
  8. Pleasant data, important and incredible structure, as offer great stuff with smart thoughts and ideas, loads of extraordinary data and motivation, the two of which I need, because of offer such an accommodating data here.
    business analytics training in hyderabad

    ReplyDelete
  9. Wow! Such an amazing and helpful post this is. This clear the topic of random forest and neural network. Thanks for sharing this.

    SEO Cheltenham
    Web Design Cheltenham
    Digital Marketing Agency Cheltenham

    ReplyDelete