There has been many works published in the academia that prove that a neural network has a higher capability of capturing the underlying pattern much more effectively than statistical models. This fact is embraced by the industry as well where more and more applications leverage the use of Neural Networks.
I am not going to reiterate on the various reasons why a Neural Network is better than statistical models, as I would not be able to do justice to that. I am going to demonstrate using a simple model how a neural network outperforms statistical model.
I recently came across a very simple problem on Kaggle, the Kobe's Shot Selection - Kobe Bryant Shot Selection. It is one of the simplest problem for any beginner data scientist to cut his teeth on.
I went through a lot of solutions for this problem, which were unsurprisingly filled with Random Forests and XGBoost Classifiers. This was as expected, as XGBoost models are a proven winner in a 70% of Kaggle competitions and also, a Decision Tree would be the most intuitive model to model this particular problem.
I tried many variations of the same and was able to climb upto rank 240 using the XGBoost models. But these models relied heavily on extensive feature engineering. Me, personally, being from a non sports background, it was rather difficult to identify these features which would be relevant to the problem. I had to read blogs and other solutions to understand that certain features like "Is it the last 5 mins of the game?" or "Is it a home or away match?", are very important in order to predict the outcome. This requirement of domain knowledge is a shot in the knee for someone with limited domain knowledge.
The next model I built, was a simple Feed Forward Neural Network with one hidden layer. The input dimension was 197, the hidden layer dimension was 30 and the output was a single sigmoid neuron, optimizing on binary cross entropy.
Such a simple model with practically no feature engineering at all was able to put me in the 25th position on the public leaderboard, Public Leaderboard Ranking.
This leads me to one of the main advantages of neural networks over statistical models. You do not need any domain knowledge in order to build a model. Although the winning solution implements an XGBoost model, I am sure it involves a lot of feature engineering, which is a time consuming task.
Some of the points to keep in mind while building a neural network are:
I am not going to reiterate on the various reasons why a Neural Network is better than statistical models, as I would not be able to do justice to that. I am going to demonstrate using a simple model how a neural network outperforms statistical model.
I recently came across a very simple problem on Kaggle, the Kobe's Shot Selection - Kobe Bryant Shot Selection. It is one of the simplest problem for any beginner data scientist to cut his teeth on.
I went through a lot of solutions for this problem, which were unsurprisingly filled with Random Forests and XGBoost Classifiers. This was as expected, as XGBoost models are a proven winner in a 70% of Kaggle competitions and also, a Decision Tree would be the most intuitive model to model this particular problem.
I tried many variations of the same and was able to climb upto rank 240 using the XGBoost models. But these models relied heavily on extensive feature engineering. Me, personally, being from a non sports background, it was rather difficult to identify these features which would be relevant to the problem. I had to read blogs and other solutions to understand that certain features like "Is it the last 5 mins of the game?" or "Is it a home or away match?", are very important in order to predict the outcome. This requirement of domain knowledge is a shot in the knee for someone with limited domain knowledge.
The next model I built, was a simple Feed Forward Neural Network with one hidden layer. The input dimension was 197, the hidden layer dimension was 30 and the output was a single sigmoid neuron, optimizing on binary cross entropy.
Such a simple model with practically no feature engineering at all was able to put me in the 25th position on the public leaderboard, Public Leaderboard Ranking.
This leads me to one of the main advantages of neural networks over statistical models. You do not need any domain knowledge in order to build a model. Although the winning solution implements an XGBoost model, I am sure it involves a lot of feature engineering, which is a time consuming task.
Some of the points to keep in mind while building a neural network are:
- Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
- The number of parameters of the model should be significantly less than the number of training examples. If it is greater, it will lead to over fitting.
- Use neurons which align to the objective of the problem. For example, in this case, the model was being evaluated on the logloss and hence it makes sense to use sigmoid neurons.
This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
ReplyDeleteData science training in Marathahalli|
Data science training in Bangalore|
Hadoop Training in Marathahalli|
Hadoop Training in Bangalore|
Hi,
ReplyDeleteGood job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take python training in bangalore. Because python course in Bangalore is one of the best that one can do while choosing the course.
thank you sharing this blog, it is very useful understanding the machine learning.
ReplyDeleteMachine learning training bangalore
The article is really useful to the viewers. systematically it is solving the queries.
ReplyDeleteData Science Training Course In Chennai | Data Science Training Course In Anna Nagar | Data Science Training Course In OMR | Data Science Training Course In Porur | Data Science Training Course In Tambaram | Data Science Training Course In Velachery
Easy to understand.Thanks for giving such an awesome post.
ReplyDeleteJava training in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Online Training
Really awesome blog!!! I finally found great post here.I really enjoyed reading this article. Nice article on data science . Thanks for sharing your innovative ideas to our vision. your writing style is simply awesome with useful information. Very informative, Excellent work! I will get back here.
ReplyDeleteData Science Course
Data Science Training in Chennai
Data Science Training in Velachery
Data Science Training in Tambaram
Data Science Training in Porur
Data Science Training in Omr
Data Science Training in Annanagar
Thanks for a very informative blog!
ReplyDeletedata science training in chennai
ccna training in chennai
iot training in chennai
cyber security training in chennai
ethical hacking training in chennai
Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also
ReplyDeleteSEO Cheltenham
SEO Agency Gloucester
Best SEO Company Gloucester
Best Web Design Company Cheltenham
Join the Best Machine Learning Course in Hyderabad program at AI Patasala to become an early leader on this booming platform.
ReplyDeleteAI Patasala Machine Learning Course
Good site! I really love how it is simple on my eyes and the data are well written. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your RSS which must do the trick! Have a great day! data science from scratch
ReplyDeletePleasant data, important and incredible structure, as offer great stuff with smart thoughts and ideas, loads of extraordinary data and motivation, the two of which I need, because of offer such an accommodating data here.
ReplyDeletebusiness analytics training in hyderabad
Wow! Such an amazing and helpful post this is. This clear the topic of random forest and neural network. Thanks for sharing this.
ReplyDeleteSEO Cheltenham
Web Design Cheltenham
Digital Marketing Agency Cheltenham