my net house


Monthly Archives: December 2016

Top view of DataScience and machine learning

As a Data Scientist you have or you should not be limited yourself with only the training you got around, you have to think far and more ahead in terms of various things around, think of yourself as an amazing or great thinker like how many variables are possible to run the system or how many variables can really effect the system at which level.

Let’s Look at top three Questions:

What is a Data Scientist?

Why does it appear to be such a hard job?

What are the Skill Sets a Data Scientist Need?

First thing we have to remember or taken care is big data. Organizations have lot of data those are collecting from various sources(mostly clients and activities) but that data is HUGE and no idea how to manage that in particular order. A Data Scientist’s job is to find meaning in that data.

What kind of skills one should look for dataScientist?

For a Data Scientist’s job one need to hire a PhD,Mathematician or Statistician for the job or one can also grow a Data Scientist with-in the organization.

What is the fundamental Job of Data Scientist?

Data Scientist is the one who found new discoveries from the data.

That’s what Scientist do. As a scientist first make a hypothesis and then try to investigate that Hypothesis under various conditions now in the case of Data Scientist they just do it with Data!

Data Scientist look for meaning,Knowlede in data and they do that in couple of different ways.

Visualization of Data:

For example one is Data-Visualization. Data Scientist visualize data in various forms and look for the meaning in the data, That’s what we can say business intelligence or Data Analyst might do.

Using Algorithms:

Advanced Algorithms those actually run through the data by looking all the meaning. Such algorithms are like Machine-learning Algorithms, Neural-Network, SVM, Regression Algorithm or K-means. There are dozens of algorithms and those run through data Looking for the meaning that is one of the fundamental tool of Data Scientist.

So to use those Algorithms Data Scientist must have knowledge of Mathematics,Statistics and Computer Science.

So how Data Scientist’s work is being Started or Done?

A data Scientist is given a large Data-Set or may be small Data-Set with a question.

Something like what customers are likely to return?

Or What customers are likely to buy in weekends?

or How many families buy sweets/fruits on festivals.(You can find the income range of families)-that is classic statics problem.

Now it is Data Scientist job to run various Algorithms on data and look for the answer. Here is simple thing one must think about, It is like how or why specific algorithm would work out on data. If we have basic or general level knowledge of algorithms then we can identify what this algorithm would really answer such question.

“”””””””So Data Scientist go through various algorithms until they can find some pattern in data to answer the questions””””””

Same thing is applicable with any trading strategy, we have to look for in our research that’s why such specific algorithm would work out Or other question is that how one Data Scientist can improve the recommendations of recommendation engine.

Netflix came with competition that Netflix would pay million dollars to one who would just improve their Recommendation Algorithm by 10%.

Five Data Scientist actually came up with that Algorithm that would do that.

So again we can say that Data Scientists are people who answer questions and they are using data to answer those questions Or they are using the combination of Data or algorithms to answer those questions.

When you have large dataset with various categories and columns then you have to rely on various algorithms so fundamanetal knowledge of such algorithms is what DataScientist should be aware of.

What a Data Scientist is not?

There are various myths about Data Scientist, is not a Java Programmer who knows Hadoop, many people are billing themselves as Data Scientists as they have such technical skills they are not data Scientist unless they don’t know data-discovery techniques and how algorithms would work on that data!!

What is the difference between a A data Scientist and Data Analyst?

Now we should also not confuse a Data-Analyts or busniess analyst with Data Scientist, Data Analyst is the one who create reports,graphs,dashboards based on data. Those reports based on their own knowledge that what they think is “important” to show or consider .

Data Scientist is the one who Hypothesis what is important and then try to prove that Hypothesis.

Now It is great for one person to have both of skills for programming as well as for Business domain knowledge. but most important is fundamental knowledge of ‘Algorithms,mathematics and statistics’. That is the one reason it is bit difficult to find a A Data Scientist because it needs some unique Skills.

Python for text processing

Python is more about ‘Programming like Hacker’ while writing your code if you keep things in mind like reference counting, type-checking, data manipulation, using stacks, managing variables,eliminating usage of lists, using less and less “for” loops could really warm up your code for great looking code as well as less usage of CPU-resources with great Speed.

Slower than C:

Yes Python is slower than C but you really need to ask yourself that what is fast or what you really want to do. There are several methods to write Fibonacci in Python. Most popular is one using ‘for loop’ only because most of the programmers coming from C background uses lots and lots of for loops for iteration. Python has for loops as well but if you really can avoid for loop by using internal-loops provided by Python Data Structures and Numpy like libraries for array handling You will have Win-Win situation most of the times. ­čÖé

Now let’s go with some Python tricks those are Super cool if you are the one who manipulates,Filter,Extract,parse data most of the time in your job.

Python has many inbuilt methods text processing methods:

>>> m = ['i am amazing in all the ways I should have']

>>> m[0]

'i am amazing in all the ways I should have'

>>> m[0].split()

['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']

>>> n = m[0].split()

>>> n[2:]

['amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']

>>> n[0:2]

['i', 'am']

>>> n[-2]



>>> n[:-2]

['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I']

>>> n[::-2]

['have', 'I', 'the', 'in', 'am']

Those are uses of lists to do string manipulation. Yeah no for loops.

Interesting portions of Collections module:

Now let’s talk about collections.

Counter is just my personal favorite.

When you have to go through ‘BIG’ lists and see what are actually occurrences:

from collections import Counter

>>> Counter(xrange(10))

Counter({0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1})

>>> just_list_again = Counter(xrange(10))

>>> just_list_again_is_dict = just_list_again

>>> just_list_again_is_dict[1]


>>> just_list_again_is_dict[2]


>>> just_list_again_is_dict[3]


>>> just_list_again_is_dict['3']


Some other methods using counter:


Counter({'a': 10, 'r': 2, 'b': 2, 'k': 1, 'd': 1})

>>> c1=Counter('abraakadabraaaaa')

>>> c1.most_common(4)

[('a', 10), ('r', 2), ('b', 2), ('k', 1)]

>>> c1['b']


>>> c1['b'] # work as dictionary


>>> c1['k'] # work as dictionary


>>> type(c1)

<class 'collections.Counter'>

>>> c1['b'] = 20

>>> c1.most_common(4)

[('b', 20), ('a', 10), ('r', 2), ('k', 1)]

>>> c1['b'] += 20

>>> c1.most_common(4)

[('b', 40), ('a', 10), ('r', 2), ('k', 1)]

>>> c1.most_common(4)

[('b', 20), ('a', 10), ('r', 2), ('k', 1)]

Aithematic and uniary operations:

>>> from collections import Counter

>>> c1=Counter('hello hihi hoo')

>>> +c1

Counter({'h': 4, 'o': 3, ' ': 2, 'i': 2, 'l': 2, 'e': 1})

>>> -c1


>>> c1['x']


Counter is like a dictionary but it also considers the counting important of all the content you are looking for. So you can plot the stuff on Graphs.


it makes your chunks of data into meaningful manner.

>>> from collections import OrderedDict
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> new_d = OrderedDict(sorted(d.items()))
>>> new_d
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> for key in new_d:
...     print (key, new_d[key])
apple 4
banana 3
orange 2
pear 1


Think it the way you need to save each line of your CSV into list of lines but along with that you also need to take care of not just the memory but as well as You should be able to store each line as dictionary data structure so if you are fetching lines from Excel or CSV document which comes in place when you work at Data-Processing environment.

# The primitive approach
lat_lng = (37.78, -122.40)
print 'The latitude is %f' % lat_lng[0]
print 'The longitude is %f' % lat_lng[1]

# The glorious namedtuple
LatLng = namedtuple('LatLng', ['latitude', 'longitude'])
lat_lng = LatLng(37.78, -122.40)
print 'The latitude is %f' % lat_lng.latitude
print 'The longitude is %f' % lat_lng.longitude


It is Container of Containers: Yes that’s really true. ­čÖé

You better be above Python3.3 to try this code.

>>> from collections import ChainMap

>>> a1 = {'m':2,'n':20,'r':490}

>>> a2 = {'m':34,'n':32,'z':90}

>>> chain = ChainMap(a1,a2)

>>> chain

ChainMap({'n': 20, 'm': 2, 'r': 490}, {'n': 32, 'm': 34, 'z': 90})

>>> chain['n']


# let me make sure one thing, It does not combines the dictionaries instead chain them.

>>> new_chain = ChainMap({'a':22,'n':27},chain)

>>> new_chain['a']


>>> new_chain['n']



You can also do comprehensions with dictionaries or sets as well.

>>> m = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

>>> m

{'d': 4, 'a': 1, 'b': 2, 'c': 3}

>>> {v: k for k, v in m.items()}

{1: 'a', 2: 'b', 3: 'c', 4: 'd'}

StartsWith and EndsWith methods for String Processing:

Startswith, endswith. All things have a start and an end. Often we need to test the starts and ends of strings. We use the startswith and endswith methods.

phrase = "cat, dog and bird"

# See if the phrase starts with these strings.
if phrase.startswith("cat"):

if phrase.startswith("cat, dog"):

# It does not start with this string.
if not phrase.startswith("elephant"):



Map and IMap as inbuilt functions for iteration:

map is rebuilt in Python3 using generators expressions under the hood which helps to save lot of memory but in Python2 map uses dictionary like expressions so you can use ‘itertools’ module in python2 and in itertools the name of map function is changed to imap.(from itertools import imap)

>>>m = lambda x:x*x
>>>print m
 at 0x7f61acf9a9b0>
>>>print m(3)

# now as we understand lamda returns the values of expressions for various functions as well, one just have to look
# for various other stuff when you really takes care of other things

>>>my_sequence = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
>>>print map(m,my_sequence)

#so square is applied on each element without using any loop or if.

For more on map,reduce and filter you can fetch following jupyter notebook from my Github:


How to learn Fast!!

Learning is divine and sometimes process to reach that divine feeling could be stressful. Most of the times we do things at work or at life to make life more interesting and rich(in terms of knowledge and security) but in real manner Learning is DAMN! difficult and it takes to much time to learn any new skill. That is biggest lie running around us which is really making us slow and less progressive in our life.

All we really need is have to take care of few things/hurdles those come in our situations and learn few tactics or build up unique attitude towards those Tactics so we will be able to overcome such stuff and be good at anything just in 20 hours.

Learning new skills improves your life:

We as regular people have stuff to do in day to day life to earn some bucks and pay rent, even if you are a student or anything you have various tasks like Playing,conversations over coffee,movies,class-bunks etc so does not matter either you are in professional life or non-professional you have no time at all! But one thing we all know that learning a new skill can improve our life, it is just not to improve your life but even the people around you!

So First thing is believe in this thought/idea/saying ‘Learning a new skill improves life.’ It will give lot of fuel to your feelings and will motivate you constantly at each hurdle you get for specific skill that you are going to learn any time soon.

Set your time for each day or just take 4-5 days off from Everything: As I specifically told you that it only takes just 20 hours to learn anything or to be good at anything so you have to manage those 20 hours. if you are going to give 20 hours to each of your project /skill you want to learn/do you have to set time for sure.

it will work out something like this:

1 hour each day–>1 day to complete 1 hour learning:

20 days to complete 20 days learning

2 hours each day:

10 days to learn anything or to be good at anything.

4 hours each day and BOOM! –> Just ‘Five days’ to learn anything!

One thing you have to understand carefully, If you are giving 4 hours of your daily life to specific task/work you have to make sure that you have to give yourself a challenge or set of challenges those you have to complete in time. Don’t make the challenges too hard or too soft, Just the right amount and that right amount will entirely be dependent on your capacity of memorizing/reading and doing practical work related to that skill. Amount of doing research and practical may vary based on the skill, If you want to learn swiming then you will spent like 0.5 hour for reading and other 3.5 into swimming pool, If you are learning about programming it is good to give 1 hour for reading about some basics of programmingand rest will be consumed by solving a challenge. Let me make sure one thing solving a programming challange does not mean looking at Google to find a solution. ­čśÇ

If you are going to learn about Machine-learning or modeling a system you must give like 50-50 %age of time to each of the task.

By following above approach you don’t have to like wait for ever do learn/do anything new or something else.

Perfection is the enemy of Good:

There is one another research which is you have to spend 10,000 hours if you really want to learn anything, Which is also true in another sense. This research is based on the events/learnings of people who are GODs in their fields.

So you don’t have to be GOD or Just perfect to learn a new skill and enjoy returns come from that skill, For example if you want to learn how to play football you just have to read some rules,Get a football,find a ground/park around and kick your football with your legs, may be after 7-8 days you will be able to find some friends or team or others to join you. so that will be easy but it will take to really 10,000 hours of you want to compete against Ronaldo or Messy.

And when you get good enough and you enjoy doing it and that leads to more practice/perfection of that skill.

Make your decision and set target performance level:

You have to decide first what you actually want to do with that skill or what you actually want to learn, There will me many tasks you want to do in your life but you really have to write those in some manner. Now other important thing is setting a Target Performance level, How much you want to gain from specific skill. If you want to learn programming you have to tell yourself that you are learning because you want to make your business website or you are learning to code because you want to get job at company or you want to learn to code because you want to get job at Google.

It is always great to dream BIG but having a small stepping stones does matter, so If you want to get job at Google as a programmer it is always good to work on your personal project first then move to some professional paid work, after that you will get the ability to guide yourself like where to go from that point.

in other words once you get most baseline proficiency in something it sucks less that baseline level give you inspiration to learn more about that skill.

Deconstruction of skill:

Most of the time you see a skill is subset of various things, When you care about learning a new skill a quick study about that skill can tell you that how many other subsets are there for that particular skill-set, Some subsets are good to go with but some are really difficult to understand, So this deconstruction can help you to understand which easy skills you can learn first as well as what about of subsets you need/want to learn.

For example if you want to learn about Algorithmic trading you don’t need to learn first about marketing strategies, macro economics, policies/factors effects wall-street,internal structure of wall-street,quantitative analytics,machine-learning as well various machine learning techniques those are used for research purpose while construction of a algorithm/strategy, But in real for starting you just have to learn about those strategies which are currently being used by most of the traders and gives good returns, that number of strategies is not more than 10 or something┬áso at first for algorithmic trading you have to know about those strategies and know for what conditions which strategy should be applied on stock market so you will b able to get better results of your trading.

In this process you will found that most of the 2-3 sub-skills repeat over and over again which help you to learn/do things much faster and that save your lot of time and energy which is mot important.

Research VS Practice:

When we have to learn any skill procrastinate! At some level procrastination is really good thing because back in your brain you unconsciously process/think about that skill, but if it is too much it will kill your focus as well, so right amount of procrastination is great for you. When we learn anything new we read-research-discuss. But rather than just limiting ourself into research mode will kill our productivity as well.

Human brain loves to do research but we need to switch between things constantly which is read/research and do.

For example if you really are going to learn programming either you can read 5-6 books first then try to write a function which will fetch birthday information of your friend from Facebook and let you know if anyone’s birthday falls in present month.

Practices makes you perfect But how to practice?:

There are various things you have to know about practice, Make sure whatever new work you are doing/learning, Do it just before you are going to sleep and after sleeping try it out as first thing in the morning, Study shows that in sleep your brain turns your small practice into good neuron structures for passing of various messages, such messages makes your mind more strong and fast to react towards grasping of new skill-sets.

Above method works for both either it is Cogitative or Motor skill.

Removing the barriers: (a general approach)

Sometimes those barriers are just environmental distractions. You have to make a list of distractions those really comes into your world when you try learn a new skill, Those distractions should be turned off like your phone or Chat, some sound coming from outside, Turning off your TV Or at last but not least TURN OFF YOUR INTERNET.

If you want to learn how to play Harmonica you just have to put it some=where in front of you! this is behavior psychology, that just make sure rather than getting distracted from any other shiny object you have to see that thing you want to do/learn first.

It is something like keep those things on your Computer Desktop those you want to learn/do but that does not mean your desktop should be overfilled with things because that also kills your productivity and you system’s speed.

It is also observed from studies that if you listen vocal-music while doing/reading something or even programming effects your ability to be more productive,but if you listen non-vocal music or some jazz it will not just help you to increase your productivity but also help you to improve your mind state.

Commit to practice for at-least 20 hours!

For more information please refer following video:

%d bloggers like this: