This is a site for the data science aspirants, who are passionate about data science and for the people who wants to start their career in data science from beginning.

Monday, 7 November 2016

Job of a Data Scientist

By 22:52
What do data scientists do? Do you have this question, then here are the answers for you. A data scientist makes predictions using past data to make that prediction and answer the questions by using past data which can not be answered by normal techniques.
Data scientist have their own style and steps to solve the problems and answer the questions, let us know what exactly a data scientist is does.


Define the question
In every field we have problems to solve like wise a data scientist also have problems to solve, when you have some question to answer you need to define that question, what are the challenges that you are facing, what area you have to solve in that question, it is not a good idea to travelling without knowing our destination so you should define your question before you answer it.

Define the ideal data set
After defining our question its time to define the ideal data set that you should use to answer the question, a professional data scientist will define his data set by his intuition because the data you are going to define to solve your question will play a major role in the outcome of your results, so define the data set which you are going to play with.

Obtain the data
Once we define our data set we have to obtain it, often data became very large day by day it is increasing and become very cheap. We can get data from many sources like data created by human by different action example social media etc, data created by industries and by machines lite ATM's, and many more things, you should be clear from where you have to obtain your ideal data set.

Clean the data
We can not start making jewelry with the raw gold, we need to clean in and extract the pure gold then you to start making beautiful jewelry, same thing should be considered with the data you have obtained. You cannot process further without cleaning the data as it is a raw data you obtain it would have the data that you don't need, you need to extract the data which you need to process.

Exploratory data analysis
An exploratory data analysis builds on a descriptive analysis by searching for discoveries, trends, correlations, or relationships between the measurements of multiple variables to generate ideas or hypotheses.

Statistical prediction/modeling
Statistical modeling is the traditional way to analyse the problem, you need to make statistical modeling or prediction, an intuition about what's going to happen in the next sample you might take.

Interpret results
Interpreting your results, challenging them. Then synthesizing them and writing them up in reproducible ways that can be shared with other people.


Data products
Finally, we're going to talk about distributing results through things like interacting graphics, also through right ups and presentations, and finally through interactive apps built on top of R or Python based on your comfort.



Read More...

Sunday, 6 November 2016

Data Scientist vs Data Analyst-A common question for every beginner.

By 18:33
Often we see that the boom in data industry with the immense creation of data everyday from different sources, so the topic of data science become interesting. I have seen many people asking a question about the different fields, that is what is the difference between data scientist and data analyst.

Before discuss about the differences let us discuss about data, what is data?, data is an information or knowledge,for example this article itself is some data. Now we should have question that who generates data, the data is mainly creating by three sources they are humans, industries and machines.
Humans are creating data by their different action in social media and in many more things, next one is an organisation and finally the data is created by machines.

So now time comes to talk about data scientist and data analyst, first we will discuss about data analyst, if you observe from the below picture the data analyst is pointing to something 

representation, so he use to break the large problem to small pieces for better understand-ability, so data analyst the use to give the solutions through a representation using different kind of visualizations like bar charts, pie charts etc based on what happened so far. Whereas being a data scientist who will see the problem in business point of view, will do the predictive analysis to find what going to be happen in future, that is what a data scientist will do.

Let us see the skills set and team structure of a data scientist, if we see the following picture we can find that a data scientist have a team of data analyst, software engineer and a domain skill experts like

 a java, R, or Scala programmer. whereas data analyst have a team of software engineer and a data warehousing. A data analyst and a data scientist have a different roles under them as you see in the following picture.
When it comes to technical skills of a data scientist and a data analyst they are follow as like given below.

So we have discussed the major difference between a data scientist and a data analyst, skill sets of the each field.







Read More...

Saturday, 5 November 2016

Prerequisites for a data scientist

By 23:12
Want to became a data scientist..!, Then you must know the prerequisites before you start cranking. As data scientist is the sexiest job of 21st century, many are willing to start their career as a data scientist. You might have heard that to start career in data science one should have an expert skills in various domains, but the truth is if you have the good basic knowledge in maths, statistics and programming and communication skills then everyone can start their career in data science.
It is better to have basics of mathematics concepts like linear algebra, calculus, probability and statistics, these skills are must to learn data science. If you passionate about data science then things are very easy to learn.
If you have the basic knowledge as we discussed above then you should have a basic knowledge on the following tools


Hadoop: It is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. No data scientist can escape from learning this tool as data scientist have to work with a huge data-sets which is highly difficult with normal storage systems.
Hive: It allows sql queries on dataset stored in a hadoop cluster, that means hadoop itself does not support all the things which need the supporting tools too.
Mahout: It is to build an environment for quickly creating salable performance machine learning applications, machine learning is the trending technology where is very helpful in many industries so one such machine learning applications can be create using mahout and you knowledge on linear algebra and calculus plays a major role in machine learning.
Spark: It is a fast and general engine for large-scale data processing. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.
Storm: It is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lots of fun to use!
So we have discussed the all the major prerequisites for a data scientist, If you are passionate about a data science then all this tools and skills are easy to learn and then you can play with a huge data to find predictions.


Here is my youtube channel please follow and subscribe ill get you more video tutorials and articles to help you to learn data science on from this internet world yourself by showing you the right stuff to learn from.

Thanks guys will see you in next article.



Read More...

Friday, 4 November 2016

The Modern data scientist

By 08:55
Data scientists are in very high demand. There is not enough talent and skills to fill the jobs. Do you know Why? Because the sexiest job of 21th century requires a mixture of different domain skills, multidisciplinary skills ranging from an intersection of Linear mathematics, Algebra, statistics, computer science, data visualization and business. Finding a data scientist is very hard and finding a people who understand who a data scientist is, is equally hard. “Being a data scientist is not only about data crunching. It’s about understanding the business challenge, finding out the predictions of future by modeling the data, and communicating their findings to the business, simply you should be good at playing with data and passionate about huge data. The one who learns the skills needed for data science should not call themselves as data scientists as It means a doctorate which needs very good expertise in playing with huge data, but that doesn't mean one cannot be a data scientist if you are passionate about data science then things will automatically come into your hands.
” Jean-Paul Isson, Monster Worldwide, Inc. says "It is very likely that you will not be able to hire a data science soloist, who can solve all your data problems. The skill-set presented the modern data team should be equipped. In the picture above we can see the skills of modern data scientist.
Read More...