Machine Learning and Data Science: Neural Networks over Random Forest

Wednesday, February 22, 2017

Neural Networks over Random Forest

There has been many works published in the academia that prove that a neural network has a higher capability of capturing the underlying pattern much more effectively than statistical models. This fact is embraced by the industry as well where more and more applications leverage the use of Neural Networks.

I am not going to reiterate on the various reasons why a Neural Network is better than statistical models, as I would not be able to do justice to that. I am going to demonstrate using a simple model how a neural network outperforms statistical model.

I recently came across a very simple problem on Kaggle, the Kobe's Shot Selection - Kobe Bryant Shot Selection. It is one of the simplest problem for any beginner data scientist to cut his teeth on.

I went through a lot of solutions for this problem, which were unsurprisingly filled with Random Forests and XGBoost Classifiers. This was as expected, as XGBoost models are a proven winner in a 70% of Kaggle competitions and also, a Decision Tree would be the most intuitive model to model this particular problem.

I tried many variations of the same and was able to climb upto rank 240 using the XGBoost models. But these models relied heavily on extensive feature engineering. Me, personally, being from a non sports background, it was rather difficult to identify these features which would be relevant to the problem. I had to read blogs and other solutions to understand that certain features like "Is it the last 5 mins of the game?" or "Is it a home or away match?", are very important in order to predict the outcome. This requirement of domain knowledge is a shot in the knee for someone with limited domain knowledge.

The next model I built, was a simple Feed Forward Neural Network with one hidden layer. The input dimension was 197, the hidden layer dimension was 30 and the output was a single sigmoid neuron, optimizing on binary cross entropy.

Such a simple model with practically no feature engineering at all was able to put me in the 25th position on the public leaderboard, Public Leaderboard Ranking.

This leads me to one of the main advantages of neural networks over statistical models. You do not need any domain knowledge in order to build a model. Although the winning solution implements an XGBoost model, I am sure it involves a lot of feature engineering, which is a time consuming task.

Some of the points to keep in mind while building a neural network are:

Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
The number of parameters of the model should be significantly less than the number of training examples. If it is greater, it will lead to over fitting.
Use neurons which align to the objective of the problem. For example, in this case, the model was being evaluated on the logloss and hence it makes sense to use sigmoid neurons.

12 comments:

roseSeptember 1, 2017 at 12:28 AM
This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

Data science training in Marathahalli|
Data science training in Bangalore|
Hadoop Training in Marathahalli|
Hadoop Training in Bangalore|
ReplyDelete
Replies
easylearnSeptember 2, 2019 at 11:33 PM
Hi,
Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take python training in bangalore. Because python course in Bangalore is one of the best that one can do while choosing the course.
ReplyDelete
Replies
TrishanaFebruary 24, 2020 at 9:45 PM
thank you sharing this blog, it is very useful understanding the machine learning.
Machine learning training bangalore
ReplyDelete
Replies
davidMay 16, 2020 at 6:11 AM
The article is really useful to the viewers. systematically it is solving the queries.

Data Science Training Course In Chennai | Data Science Training Course In Anna Nagar | Data Science Training Course In OMR | Data Science Training Course In Porur | Data Science Training Course In Tambaram | Data Science Training Course In Velachery

ReplyDelete
Replies
aarthiJuly 17, 2020 at 9:13 PM
Easy to understand.Thanks for giving such an awesome post.
Java training in Chennai

Java training in Bangalore

Java training in Hyderabad

Java Training in Coimbatore

Java Online Training
ReplyDelete
Replies
delfenOctober 28, 2020 at 3:01 AM
Really awesome blog!!! I finally found great post here.I really enjoyed reading this article. Nice article on data science . Thanks for sharing your innovative ideas to our vision. your writing style is simply awesome with useful information. Very informative, Excellent work! I will get back here.
Data Science Course

Data Science Training in Chennai

Data Science Training in Velachery

Data Science Training in Tambaram

Data Science Training in Porur

Data Science Training in Omr
Data Science Training in Annanagar

ReplyDelete
Replies
lakshmik7410December 5, 2020 at 8:17 PM
Thanks for a very informative blog!
data science training in chennai

ccna training in chennai

iot training in chennai

cyber security training in chennai

ethical hacking training in chennai
ReplyDelete
Replies
Dynamic Sales SolutionsFebruary 25, 2021 at 12:53 AM
Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also

SEO Cheltenham
SEO Agency Gloucester
Best SEO Company Gloucester
Best Web Design Company Cheltenham
ReplyDelete
Replies
Ramesh SampangiOctober 14, 2021 at 12:09 PM
Join the Best Machine Learning Course in Hyderabad program at AI Patasala to become an early leader on this booming platform.
AI Patasala Machine Learning Course
ReplyDelete
Replies
abdul quddosNovember 14, 2021 at 12:51 AM
Good site! I really love how it is simple on my eyes and the data are well written. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your RSS which must do the trick! Have a great day! data science from scratch
ReplyDelete
Replies
360DigiTMGFebruary 4, 2022 at 3:31 AM
Pleasant data, important and incredible structure, as offer great stuff with smart thoughts and ideas, loads of extraordinary data and motivation, the two of which I need, because of offer such an accommodating data here.
business analytics training in hyderabad

ReplyDelete
Replies
David BlainFebruary 8, 2022 at 1:59 AM
Wow! Such an amazing and helpful post this is. This clear the topic of random forest and neural network. Thanks for sharing this.

SEO Cheltenham
Web Design Cheltenham
Digital Marketing Agency Cheltenham
ReplyDelete
Replies

Add comment