my net house


Effective Machine Learning for Quantitative Finance

Sometimes we believe that Computer programs are magic, Softwares work like magic in the machine, You click on various buttons on Your PC and Boom! Real interest or real things are really far away from all this stuff. All you have to do is handle large databases, parse those as per requirement and do lots of UI tricks as well as many Exceptional Handling techniques those lead you to a great product or may be the one a real Human can handle! Sometimes it feel that it is also True in the case of Machine-Learning and Data-Science. Selecting some predicates and just applying most popular model might not be enough because we are developing a model and learning with it, So it makes sense to grow with it and get your models grow as well.

Let’s do some discussions with ourselves and learn how Good models could lead to better Trading-Strategies.

Why we use ML models?:

ML Models are used because it saves us to write 100’s or may be 1000’s of lines of code.That is mathematics, one equation can save knowledge of worth hundreds of pages in it. (Remember Binomial Expressions for calculation of Probability? )

Machine learning solutions are cost effective and it takes really less amount of time to implement those models rather than designing an expert system which takes large amount of time and might come with various error pron results on the other hand a simple ML model can be built in few hours(A bit complicated and little bit smart) and years can be used to implement and make it more reliable towards decision making process.

How Machine learning models can learn more Effectively?:

When we are really into model building process there is one thing we might get confused and that is which model to use, really I guess that is really a question one should look for in real. There are not just hundreds but thousands of algorithms one can look for and each year those algorithms get developed more and more with different set of features but here real thing again comes out as intensive question that really which model to use?

Model you want to use is really dependent on your requirement and type of features you believe could be helpful to predict the outcomes. While choosing any algorithm one should take care of three things: Representation, Evaluation, Optimization. Representation describes the features one need to pick for predictions and your algorithm must be able to handle that feature space. This space is also called hypothesis space.

Evaluation plays an important role as well. When you are figuring out which Algorithm you should use or which one can come up with better results, All you need to do is use some kind of Evaluation function for your algorithm/model/classifier and such Evaluation function could be internal as well as external, It all depends on your choice of Algorithm/Model.

Optimization is one another thing that really plays very important role in each aspect of building a model. There are various factors we consider such like ‘Predictability score of model’, ‘Learning rate’, ‘Your model’s memory consumption’ and so on. And there are various factors those are included while doing optimization parts which are something like how we handle all the search criteria.

How Magically Machine-Learning Algorithms do predictions?

One thing we have to understand carefully and that is Approximation/Generalization. My most favorite line about mathematics is : ‘Mathematics for Engineers is all about Finding an approximation Solution and That approximation solution is just enough to put rover on Mars’. An approximation solution leads almost close to various predictions. Splitting your data into train and test set is the best practice and one should never underestimate train test because based on these results of train and test set we are able to initiate the process of optimization. After trying various optimizations on our model which are like choosing the type of search either we are going to use ‘Beam Search’ or ‘Greedy Search’ or how test score improves. Notice that generalization being the goal has an interesting consequence for machine learning. Unlike in most other optimization problems, we don’t have access to the function we want to optimize!  So all your Machine learning model does is just finds a common relation between various predicates by experimenting thousands/millions/billions times on your Training data and that experimentation process to find common patterns those may or may not be true for future-use are really based on such Experimentations/Hypothesis/Generalization.

Why Machine learning and Predictions are called BIG-Data?:

Let’s take an example that will explain how much data is good data. Suppose you want to apply for a job so chance of getting a job will be more if you will be able to apply for various positions. So more data is better than clever algorithm because each time machine will be having various cases to understand the classifications but on the other hand BIG data also causes the usage for heavy computing power and Programs like Apache-Spark or Hadoop are industry standard Examples those can be used to process and understand big data in understandable form with fastest way possible.

How many models/Algorithms I must learn?

There would be very simple answer and that is learn as many as you can and apply as many you can. Ensemble learning is known as very popular technique where multiple Algorithms can be used to achieve better results for example in Random-Forest A set of Decision-Trees is being used that means each model/sub-model or each decision tree has given a particular amount of weight and after various trainings random best weights of different Decision trees are being used to predict the outcomes. To understand the process of Ensemble learning one has to look into three things those are most important: 1. Bagging 2. Boosting and 3. Stacking

Bagging: we simply generate random variations of the training set by re-sampling, learn a classifier on each, and combine the results by voting. This works because it greatly reduces variance while only slightly increasing bias.

Boosting: In boosting, training examples have weights, and these are varied so that each new classifier focuses on the examples the previous ones tended to get wrong.

Stacking: Stacking, the outputs of individual classifiers become the inputs of a “higher-level” learner that figures out how best to combine them.

Combining it all becomes: Split training sets into random variations, Apply algorithm on each variance, give a rank(highest – lowest) to each[Bagging] , Now the algorithms/Models having lowest ranks will be improved(How improved?-look for optimization section) [Boosting], Now we have each individual model with weights/ranks and outputs as each have learned from different set of variances Let’s Stack them all – In stacking you have just to suppose that for higher level Model each individual lower level model acts as training set.  Here we could have some biasness regarding the sub-models having greater scores those we assigned in Bagging but higher level Model/Algorithm also considers the decisions of Low-Ranked individual model so that reduces the effect of Biasness.

Which are better Models? – Simple or Complicated?:

When it comes to modeling a model or developing a model always it is assumed that simple models are performed better and generate less errors, Yes that is true in various cases but one must not think biased and that is how we profile our models. Simple models have less error and mush faster than complicated one but that does not imply all the time that simple models should be developed for each problem and one should never approach complicated models. Complicated models exist because those also work and provide good results as well Ensemble learning is good approach that has made this concept completely valid.

How to know that which model/Algorithm should apply?: 

Few weeks ago I read an article about how one can come up with the solutions/results/models/algorithms those can solve all the problems of the world- and for sure answer is AI(Artificial intelligence) But are we really there yet? there will be very long debate on this section but real truth is or at-least I believe could be is that one must present or visualize the relationships between two or more variables of data so one could be able to understand that if the relationship is linear or non-liner  Or moreover one also need to understand how such relations effect the actual outcome of the predictor. For example there could be perfect linear relationship between close prices of two stocks and Linear-Regression could be better apply-able for that than Logistic or any other. This thing simply conveys that all models don’t solve all the problems so difficult thing could be how to find out which model will be able to solve which problem in better way. For that a top overview of many models must be required.

If it is correlated that means it will always be true/predictable:

That is really a good assumption but what would be the need of Data-Scientist  if it would be true all the time? Correlation does not imply causation and that means it must not be the only reason to construct a model, When building model there are various Hypothesis As a Data-Scientist need to propose and come with the ones those which come out as real relationship or better predictions when applied on actual life. In stocks we can come up with correlation that news directly effect the stock prices and constructing a strategy that uses NLP(Natural Language Processing) to generate buy/sell calls we might have come up with Negative returns and investors might lost their money. because There might be a correlation between news and stock prices but that does not mean it is the only factor that must be considered to build a model that runs trading strategy based on NLP, so at the end we have various things to consider like how much Stock-volume is present in the company, What are the rankings of stock in SP500 or how much money is available in books of company or how that particular stock is performing in the past decades of time.

Some Final thoughts:

When you think about ML models and how those can create happiness in your life, Here happiness means how you can come up with models those generate best predictions and always come up with great predictions there is only one thing to remember which is ‘Rule of Evolution‘. You can sit on one technology/Design for rest of your life OR you can grow! Thinking about Machine-Learning or building models those are related to Machine-learning is Learning. Machine is desperate for learning and learning is only possible by doing lots of stuff(Experimentation) getting good results keeping those models for further improvements and neglecting those are not Good enough to use but also give those a try may be sooner or later.   

Preprocessing your Data and Why?

Preprocessing plays very important role in Data Science. When are going to feed data to Algorithm there are various things to take care of like normalization, label-encoding, scaling and so on.

A really manual method to remove outliers from your data is as follows:

**Your data should be in between median-2*STD and median+2*STD

STD = Standard Deviation

In the above formula we are considering median as measurement of central tendency by just assuming that it is better than mean but in real life it may not be accurate formula to work on.

In this post we will discuss about various data preprocessing Techniques used in Data-science and Machine-learning those may help to increase the prediction rate, Sometimes you can live without preprocessing but sometimes it is good to do Preprocessing on your data. This ‘Sometime’ depends on your understanding of work you do.


The real Health of your Data describes the real wealth of your Model, That means your data is the most important part that is the main reason it always take much more time like 70-80% while preparing your data for better use, well not really 70-80% If you are Python Ninja 😀 🙂

Let’s Understand Data-Preprocessing in Richard man’s Style.(What is Richard Man’s Style?)

Hack the source!!

Data Scaling or Standardization:


Sigma represents Standard-Deviation and Mu represents Mean.

It is always great to have less biased data with low variance(Spread-out) but Why?

Think of activation function in neural network while performing Forward-propagation, What our activation function does is convert each input between the range of Zero to One(0 to 1) so that would minimize/scale the range of data but in other Algorithms like Regression or Classification we don’t have that automatic-Scaling facility so what we do is apply manual scaling methods.(Too bad I have to write one more function before training my Data :D) One thing also we should remember that ‘Your neural network will be much faster if you feed it with normal data ‘

So decreasing the spread-Out/variance we can achieve better predications because it is easy for system/Algorithm to find patterns into Smaller area. here is small portion of wikkipedia article about feature scaling you might find interesting:

For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.    

from sklearn import preprocessing
import numpy as np
frag_mented_array = np.array([2,78,12,-25,22,89])
defrag_array = preprocessing.scale(frag_mented_array)

/home/metal-machine/anaconda2/lib/python2.7/site-packages/sklearn/utils/ DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
wwarnings.warn(msg, _DataConversionWarning)

array([-0.67832556, 1.18502658, -0.43314765, -1.34030593, -0.18796973,

**Please never underestimate that warning in real life. Try to look into advance-section of numpy array that will teach you how you can assign ‘types’ to Numpy arrays!

Now the point arises should I scale my Training data or Testing data as well, Well answer is Both. Look out for StandardScaler class in Scikit_learn:

There are some other useful features as well:
MinMaxScaler – Scale data between range (0,1)
MaxAbsScaler – Scale data between range (-1,1)

One question here arises that when I should use Normal Standardization or MinMaxScaler or MaxAbsScalar?

Answer could be really tricky or ‘not in the plain form’, It just depend on the Requirements or your model/ML-Algorithm that you are going to apply on your your data.

We can also understand it this way as well that how decreasing ‘Euclidean distance’ does effect the performance of your model or not.



Normalization is the process of finding unit normals for individual groups,
Remember one thing carefully: Scaled data will always have zero mean and Unit Variance, achieving that is the real cause behind scaling or standardization. Scaling is kind of personal choice that how and at what limits you want to scale your data but when it comes to Normalization you have to figure-out an external standard.

Normalizing can either mean applying a transformation so that you transformed data is roughly normally distributed. Normalizing in scikit-learn refers to rescaling each observation (row) to have a length of 1 (called a unit norm in linear algebra).


It is really simple and effective process to think of. It converts your data into binary for that is 0 and 1. which means boolean values, In the algorithms those are used in Stock’s-Trading are heavily based on binarization where we use 0 if stock’s predicted price will go down and 1 if price will go up or vice-versa. but important thing to note down is how Binarization is important to make final decisions as well for Algorithm to predict values in the form of 0 and 1.


Removing Biasness from Your Data:

Biased data leads your model towards one form of real universe and your model will only understand that biasness in your data and learn/make predictions only based on that biasness(That is the reason we must use Hypothesis-Testing and Cross-Validation before running our model into production) One thing that comes on my mind as an example of biasness is as follows:

Suppose you picked Apple Price stock from period 01-01-14 ti 01-01-15 and let’s assume in that period Apple’s Stock were going up Everyday/Every-Month/Every-Quarter so after training your model with that particular time period data your model will predict apple’s future price as higher than present price because that became nature of model after learning from Biased data.

This is just an Stupid example to tell readers that how Biased does effect your data.

Survivorship Biased data:

A historical database of stock prices that does not include stocks that have disappeared due to bankruptcies, de-listings, mergers, or acquisitions suffer from the so-called survivorship bias, because only “survivors” of those often unpleasant events remain in the database. (The same term can be applied to mutual fund or hedge fund databases that do not include funds that went out of business.)

Backtesting a strategy using data with survivorship bias can be dan-
gerous because it may inflate the historical performance of the strat-

Disclaimer: This is not full proof post about data preprocessing there are so many other things to know like PCA(Principle component analysis) or Gradient Descent for more understanding of Machine-learning operations while applying on your data.

Important Julia Packages

  1. Julia Ipython

Julia is able to run very well on you Ipython notebook Environment. After all, All you have to do is Data-Science and Machine-Learning. 🙂


1.1 Open Julia Prompt(At Ubuntu it works like typing ‘julia’ command in your Terminal)

1.2 run command > Pkg.add(“IJulia”) # it will do almost all the work.

2. DataFrames: Whenever you have to read lot of files in Excel-Style Julia DataFrames Package is good to go.


3. Arduino:

A Julia Package for interacting with Arduino.

4. Neural Network Implementation of Julia

5. Visualizing and Plotting in Julia:

6. Reading and writing CSV files in Julia

7. DataClusting in Julia:

For more Large number of Packages, Please refer following link:

Note*: You can also run most of the Shell commands in Julia environment as well. 🙂

things and things

Things those need to be understood in many ways.

  1. Various important parts of Statistics and implementation
  2. Hypothesis Testing
  3. Probability Distributions and Importance
  4. AIC and BIC
  5. Baysian models
  6. Some black Magics of OOPS

Let’s have fun with Correlation:-O

Ok have some fun first. 😀

Whenever you will read any post or paper related to Machine-Learning or Data-Science you will get word ‘Correlation’ many times and how it’s value is important in your model Building.

A simple definition of Correlation: A mutual relationship or connection between two or more things. (that’s layman’s definition and It should be enough most of the times 😉 )

Coefficient of Correlation  is just an integer, From which we understand how two or more things

are related to each-other. As we discussed Coefficient of Correlation is an integer so it could be +ve or -ve and value of correlation decides how two data-sets effect each other.

Following two images tell lot about Correlation and it’s Value.

Coefficient of Correlation between range -0.5 to +0.5 is not that valuable by why and how we calculate correlation?

What is Covariance ?

Now if you still feel that something is really missing we should talk about Variance:

Let’s Remove Co from Covariance.

Variance is Measurement of randomness. So How you would calculate Variance of Data?

Give me data:

Data = [4,5,6,7,12,20]

I will find means and subtract it from each individual- Isn’t that Mean-Deviation ? 😀 OMG!

Have a look at the following Picture:


Let’s wait for stuff like:

Coefficient of determination, Probable Error and interpretation.

Hacker’s Guide to Quantitative Trading(Quantopian Python) Part 2

Quantopain Provides required API functions,Data,Helpful-community as well as batteries included Web-based Dashboard to play with Algorithmic-Trading, Create Your own trading Strategies, and launch your Trading model in live Market.

Here I will only talk about code and how it should be written to create your own Trading Strategy.

There are basically Two methods.

initialize() and handle_data()

initialize act as initializer for various variables. same as __init__ method in Python.

Now what kind of variables we have to declare in initialize() function is dependent on your strategy. we can select limited number of stocks,days,type of trading,variables required for Algorithms.

A very simple example of initialize() code could look like as follows:

def initialize(context): # consider context just as 'self' in Python

   context.stocks = [sid(24),sid(46632)] # sid stands for stock_id

initialize() also contains the stuff that can be used many times or all the times in our Trading Algorithm:

1. A counter that keeps track of how many minutes in the current day we’ve got.

2. A counter that keeps track of our current date.

3. A list that stores the securities that we want to use in our algorithm.

Whatever variables that you define here will remain persistent (meaning that they’ll exist) but will be mutable. So that means that if you initialize context.count as 0 in initialize, you can always change it later in handle_data().

A Simple Example of handle_data():

def handle_data(context,data):

   for stock in context.stocks:

        if stock in data:


Momentum Strategy:(Common Trading Strategy)

In this strategy we consider Moving average price of stock as an important factor to make decision to put a security price in Long or Short.

Here is simple explanation of momentum Strategy:

● If the current price is greater than the moving average, long the security

● If the current price is less than the moving average, short the security

Now we will use Quantopian API to implement this strategy for Trading. instead, our algorithm here is going to be a little more sophisticated. We’re going to look at two moving averages: the 50 day moving average and the 200 day moving average.

David Edwards writes that “the idea is that stocks with similar 50 & 200 day moving averages are more likely to be fairly valued and the algorithm will avoid some of the wild swings that plague momentum strategies. The 50/200 day crossover is also a very common signal, so stocks might be more likely to continue in the direction of the 50day MA because a lot of investors enter and exit positions at that threshold.”

The decision-making behind Moving-average is as follows:

● If the 50 day moving averages is greater than the 200 day moving average, long the security/stock.

● If the 50 day moving average is less than the 200 day moving average, short the security/stock

So now Let’s make a Trading Bot!

1. First we have to create our initialize() function:

def initialize(context):


”’Set universe is inbuilt function by Quantopian which provide us the stocks with-in required universe. Here we have selected stocks those we have DollarVolumeUniverse with 99.5% and 100% as our floor and ceiling. This means that we’ll be selecting the top 99.5 ~ 100% stocks of our universe with the highest dollar*volume scores.

Please read the comments in the code.

   context.stocks_to_long = 5

   context.stocks_to_short = 5
   context.rebalance_date = None # we will get today's date then we will keep positions active for 10 days here

   context.rebalance_days = 10 # it is just an assumption now for 10 days or finer value

Now we have defined required __init__ parameters in initiliaze() let’s move to


def handle_data():

   if context.rebalance_date!=None: # if rebalnce date is not null then set next_date for changing the position of algorithm

       next_date = context.rebalance_date + timedelta(days=context.rebalnce_days) # next_date should be that days away from rebalnce_date

   if context.rebalance_date==None or next_date==get_datetime(): # if today is that day after 10 days when we market long/short for out stock

       context.rebalnce_date = get_datetime() # set rebalnce_date for today so next_date will be set to again 10 days ahead from rebalnce_date

       historical_data = history(200,'1d',price)

Get historical data of all stocks initilized in initiliaze() function, ‘1d’= 1 day,200=days,’price’=we are only fetching price details because that is only required for our strategy, may be for some strategy volume of stock could be more beneficial

  past_50days_mean = historical_data.tail(50).mean()

  past_200days_mean = historical_data.mean()

  diff = past_50days_mean/past_200days_mean-1

# if diff>0 we will long if diff<1 we will short

   buys = diff[diff>0]

   sells = diff[diff<0]   

# here we will get list of securities/stocks whose moving average will be

# greater as well as less than 0

   buys.sort() # sorting buys list why? - getting top securities from top- more is better
   sells.sort(ascending=False) # reverse sorting sells list - getting top seurities from bottom, less is better because we are selling agiainst market
   buys = buys.iloc[:buy_length] if buy_weight !=0 else None # buy_length = number of securities we want to purchase , 
   sells = sells.iloc[:short_length] if short_weight !=0 else None # short_length = number of securities we want to short

Now here we have buys and sells are two lists!! (remember carefully) all the decisions are going to be made based on these two lists

We can also implement risk factors in out Trading Strategy. Let’s implement minimum form of Risk-Factor, 0.02% of last_traded_price that means if security is going to much lower than that then we will exit.

We will go through each security in our data/universe and those who will satisfy condition of ‘buys’ and ‘sells’ list will be bought/sold.

# if security exists in our sells data

   for sym in data:

       if sells is not None and sym in sells.index:




# here stop_price is the price of security in real-time+change happend in stops

# order_target_price is inbuilt function.

   # if security exists in our buy data

   elif buys is not None and sym in buys.index:'Long:%s'%sym.symbol)




The `order_target_percent` method allows you to order a % target of your portfolio in that security. So this means that if 0% of your total portfolio belongs in AAPL and you order 30%, it will order 30%. But if you had 25% already and you tried ordering 30%, it will order 5%.

You can order using three different special order methods if you don’t want a normal market order:

#`stop_price`: Creates a stop order

#`limit_price`: Creates a limit order

#`StopLimitOrder`: Creates a stop­limit order

How Trading Differentiates from Gambling:

Most of times when you find that you are able to get good returns from your capital you try to beat the market, Beating the market means most of the traders tried to earn much more than fine earnings are being returned by the market for your stock, Such beating the market process can be done by various actions like reversing the momentum or looking for bad happenings in the market(which is also called finding the shit!)Some people are really good at this kung-fu but as you are just budding trader and you have only limited money of yours, So here one important thing should be remembered, “”Protect your capital””. – That’s what most of the Big banks do and if they will hire you as their Quant or Trading-Execution person they will expect same from you. Big banks have billions of dollars that they don’t want to loose but definitely want to used that money to get good returns from market.

So they follow one simple rule for most of the times.

Guaranteed returns even if those are low.

[Make sure returns should be positive after subtracting various costs like brokerage,leverage etc, Because getting positive returns by neglecting market costs is far easy but such strategies should not be used with real money.]

So the real key is think like a computer programmer at first place, something like it should work at first place, so first thing to make sure is getting returns even low but stable returns by calculating various risk-factors.

I am quoting some of informative things from SentDex Tutorial:

Most individual traders are trading on account sizes of somewhere between maybe $25,000 and $100,000 USD, so their motives are to hopefully increase that account size as much as possible, so this person is more likely to take part in High Risk High Yield (HRHY).

Most people who use HRHY strategies, tend to ignore the HR (High Risk) part, focusing on the HY (High Yield).

The same is common with gamblers,even over astronomical odds with things like the lottery.

In other words, always ask yourself – what’s about the market that makes my strategy work? Because, at the end of the day, algorithmic trading is more about trading than about algorithm.

Hacker’s Guide to Quantitative Trading(Algorithmic Trading) Part 1

Quant-ita-tive Analytics:

Word Quant is derived from Quantitative Analytics. In present system finacial market is very heterogeneous in nature,  With coming of many private banks in financial sector
now most the private banks also invest their account holder’s fund into the stock market as a safe trading,  with very low but almost guaranteed returns(money they earn from trading
There was an exception of this ‘safe investment’ term used by banks when whole economy crashed and  millions of people suicided,lost their homes,lost their jobs,lost their lives
as well as most the small countries like in Europe like greece,____,____, were almost finished.

Main reason of this whole disaster happened in digital age was  due to ‘not looking into data properly’.

Now we can also use term Quantitative Analytics is process where a
person/Mathematician/Expert/Data Scientist/Computer-Programmer(with domain specific knowledge)  parses,analyses and develops results from MBs , GBs or sometimes TBs of data to produce results those results are based on history of trading.  Such results helps BIG banks or big investors in the Financial market to build trading Strategies with the target to gain maximum profit from trading equities or at-lest to play safely with their money which results low but assured returns.

Statistical science Plays an important role in study of Quantitative Analytics:

Statistical Science deals with every aspect of our life. Before going further into Statistical science we have to understand the meaning of ‘Statistics’.
According to Dictionary of Statistical terms:

“”Numerical data relating to an aggregate of individuals or the science of
collecting,analyzing and interpretation such data.””

There are various Statistical methods those are used to analyze the data which is Quantitative(HUGE with number of variables or  SMALL with number of variables and Quant’s job is to analyze how those variables are correlated) in nature for example: using grouping,measures of location:average-mean,median,mode,,
measures of spread-range,variance and standard deviation,skew,identifying patterns, univariate and multivariate analysis, comparing groups,choosing right test for data.

Word Quant comes from:
Word Quant come from the the person who use Mathematical formulas and machine learning techniques to predict output of stock market. There are various other profiles as well,  Like Quant Developer- Trading strategy written in steps(not programming) is provided to programmer with domain specific  knowledge of financial system.
it is job of Quant developer to convert Trading Strategy into  Live Trading System(System that buy or sells stocks,options,futures to get profit from market).

Mathematician and Quant:
The relationship between Quant and Mathematician is quite close or in some sense
it can be stated that Quant is the person having fun in real life with Mathematics. Quant calculate  various factors while implementation an algorithm/equation in real-time trading system. with showing results to other people in the form of  graphs rather than complicated equations of significances.

So in some sense Quant uses serious Statistics to get sense out of data and tell  people why his/her trading strategy is better to produce profit from the financial market.  As a Quant developer only job is to write code for equal
Algorithms those are being used but as a Quantitative Analytics we can assume one has to have  various skills of statistics which help to make sense out of Stock-Trading-Data.

Process(!steps) of doing Quantitative analytics:

Take Care of Data: At first as a analyst you should not get yourself lost in the data.

Frequency Distribution: Find frequencies of occurring values against one variable.
Histograms are best for Frequency Analysis.

Find Descriptive Statistics: We can use Central tendencies(mean,median,mode) and
dispersion(range,standard deviation,variance) to learn more about data.

Comparing means:Need to perform various tests on your Data like T-Tests.

Cross Tabulation: Cross tabulation tells what are the relations among different variables.

Correlations: Find relationship between two variables

** Never mix Correlation and Cross Tabulation between your thoughts.

Trading Strategy/Algorithm:
When Traders buy/sell stock/options/future in trading market based on various calculations to make decisions, Combination of such  decisions is called Trading Strategy. Strategy can be built applied without using any Programming Language. In Algorithmic trading A Quant+developer come up with self-running computer program(Trading Strategy/Quantitative model) that is built based upon various trading decisions to automate
whole process of buying/selling stocks.

Tips and Tricks while building Quantitative models:

* Important things about Your Trading model is it should be good on Back-testing(Make sure back-testers will be different for stocks as well as for futures) but as well as it should be as good on forward testing(paper trading).

* You need Separate Model for Separate Trading. Trading model(Strategy) working at Bitcoin Market will not be beneficial for Stocks.
* You should run Strategy as an experimental Project for various Days to get data from results, Read the data and refine the Strategy.

* Every Strategy is sort of sensitive to various risk factors.

* Think about **risks** , Think about Eliminations.
* Run Multiple models at same time, Some will loose some will win.

* One strategy is not enough , Strategies loose it’s Efficiencies after certain period of time.

* Back-testing is not always true. Never try to create model that matches 100% to your back-testing because that would be over-fitting rather than try to create generalized/simple models which will be more effective to predict abrupt changes in the live-trading.

* What actually happens is strategies are catching strategies.

* Right now every Strategy need a human to operate.

* If we know what kind of shields we have given our model we will get to know that either such kind of things those are coming in-News can effect
our strategy.

* Only Human is not well enough to do trading by it’s own. We must use trading strategies to come out as great trading with profitable returns.
* Diversification of Strategies are as important as required.

* Momentum trade’s behave is somewhat in loss and sometimes very ”good” Profit, Because on Momentum we try to find what’s hot in market and that hot could go so high as well as so low.

* A trading model may not take consideration of Earth-Quake, some kind of Govt. fall into any country but humans can.

* Now using the Sentiment Data analysis tools we can incorporate news while building a strategy but for most of the backtesters does not contain that news data to check performance of strategy. So if you say news data record will increase the chances of working that might not be true all the time.

* Sometimes using a news data as input can cause overwrite of entire model because ‘news are not true always’ 🙂

* Keep reading new Ideas on Blog Posts, Look for interesting Books.

* Learn how to interpret the live market.

* As in the programming we say it mostly does not work at the first time, that is also true with trading strategies. 🙂

* Back-test is at-least good not to give -ve results, Highest the Sharp ratio great is the significance of strategy.

* Data you use to build your own strategy is equally important as important as the factors you are considering to build your model.

* You could come to situations where you feel either you need to stop usage of perticular strategy or modify it.
* Back-test is null Hypothesis.We assume that at this specific case our model is accurate.
* Always concentrate on Sharp Ratios.

* Quantitative Trading is suitable for technically anybody. Quantitative Trading could be slow or Fast, One does not need to be a Math Ninja or Computer Wizard to do Quantitative Trading.

* It is always good to start with Simple Strategies present in Public Domains, implement those, run tests, tune,Play and get rich, 🙂

* It’s better if you go with Open-Source Languages because those can very easily turn into live trading systems.(Python or C++)
* Choosing a standard Language is always a great idea because wide range of library support is there to build things! 🙂 😀 * (Python)

Top view of DataScience and machine learning

As a Data Scientist you have or you should not be limited yourself with only the training you got around, you have to think far and more ahead in terms of various things around, think of yourself as an amazing or great thinker like how many variables are possible to run the system or how many variables can really effect the system at which level.

Let’s Look at top three Questions:

What is a Data Scientist?

Why does it appear to be such a hard job?

What are the Skill Sets a Data Scientist Need?

First thing we have to remember or taken care is big data. Organizations have lot of data those are collecting from various sources(mostly clients and activities) but that data is HUGE and no idea how to manage that in particular order. A Data Scientist’s job is to find meaning in that data.

What kind of skills one should look for dataScientist?

For a Data Scientist’s job one need to hire a PhD,Mathematician or Statistician for the job or one can also grow a Data Scientist with-in the organization.

What is the fundamental Job of Data Scientist?

Data Scientist is the one who found new discoveries from the data.

That’s what Scientist do. As a scientist first make a hypothesis and then try to investigate that Hypothesis under various conditions now in the case of Data Scientist they just do it with Data!

Data Scientist look for meaning,Knowlede in data and they do that in couple of different ways.

Visualization of Data:

For example one is Data-Visualization. Data Scientist visualize data in various forms and look for the meaning in the data, That’s what we can say business intelligence or Data Analyst might do.

Using Algorithms:

Advanced Algorithms those actually run through the data by looking all the meaning. Such algorithms are like Machine-learning Algorithms, Neural-Network, SVM, Regression Algorithm or K-means. There are dozens of algorithms and those run through data Looking for the meaning that is one of the fundamental tool of Data Scientist.

So to use those Algorithms Data Scientist must have knowledge of Mathematics,Statistics and Computer Science.

So how Data Scientist’s work is being Started or Done?

A data Scientist is given a large Data-Set or may be small Data-Set with a question.

Something like what customers are likely to return?

Or What customers are likely to buy in weekends?

or How many families buy sweets/fruits on festivals.(You can find the income range of families)-that is classic statics problem.

Now it is Data Scientist job to run various Algorithms on data and look for the answer. Here is simple thing one must think about, It is like how or why specific algorithm would work out on data. If we have basic or general level knowledge of algorithms then we can identify what this algorithm would really answer such question.

“”””””””So Data Scientist go through various algorithms until they can find some pattern in data to answer the questions””””””

Same thing is applicable with any trading strategy, we have to look for in our research that’s why such specific algorithm would work out Or other question is that how one Data Scientist can improve the recommendations of recommendation engine.

Netflix came with competition that Netflix would pay million dollars to one who would just improve their Recommendation Algorithm by 10%.

Five Data Scientist actually came up with that Algorithm that would do that.

So again we can say that Data Scientists are people who answer questions and they are using data to answer those questions Or they are using the combination of Data or algorithms to answer those questions.

When you have large dataset with various categories and columns then you have to rely on various algorithms so fundamanetal knowledge of such algorithms is what DataScientist should be aware of.

What a Data Scientist is not?

There are various myths about Data Scientist, is not a Java Programmer who knows Hadoop, many people are billing themselves as Data Scientists as they have such technical skills they are not data Scientist unless they don’t know data-discovery techniques and how algorithms would work on that data!!

What is the difference between a A data Scientist and Data Analyst?

Now we should also not confuse a Data-Analyts or busniess analyst with Data Scientist, Data Analyst is the one who create reports,graphs,dashboards based on data. Those reports based on their own knowledge that what they think is “important” to show or consider .

Data Scientist is the one who Hypothesis what is important and then try to prove that Hypothesis.

Now It is great for one person to have both of skills for programming as well as for Business domain knowledge. but most important is fundamental knowledge of ‘Algorithms,mathematics and statistics’. That is the one reason it is bit difficult to find a A Data Scientist because it needs some unique Skills.

Python for text processing

Python is more about ‘Programming like Hacker’ while writing your code if you keep things in mind like reference counting, type-checking, data manipulation, using stacks, managing variables,eliminating usage of lists, using less and less “for” loops could really warm up your code for great looking code as well as less usage of CPU-resources with great Speed.

Slower than C:

Yes Python is slower than C but you really need to ask yourself that what is fast or what you really want to do. There are several methods to write Fibonacci in Python. Most popular is one using ‘for loop’ only because most of the programmers coming from C background uses lots and lots of for loops for iteration. Python has for loops as well but if you really can avoid for loop by using internal-loops provided by Python Data Structures and Numpy like libraries for array handling You will have Win-Win situation most of the times. 🙂

Now let’s go with some Python tricks those are Super cool if you are the one who manipulates,Filter,Extract,parse data most of the time in your job.

Python has many inbuilt methods text processing methods:

>>> m = ['i am amazing in all the ways I should have']

>>> m[0]

'i am amazing in all the ways I should have'

>>> m[0].split()

['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']

>>> n = m[0].split()

>>> n[2:]

['amazing', 'in', 'all', 'the', 'ways', 'I', 'should', 'have']

>>> n[0:2]

['i', 'am']

>>> n[-2]



>>> n[:-2]

['i', 'am', 'amazing', 'in', 'all', 'the', 'ways', 'I']

>>> n[::-2]

['have', 'I', 'the', 'in', 'am']

Those are uses of lists to do string manipulation. Yeah no for loops.

Interesting portions of Collections module:

Now let’s talk about collections.

Counter is just my personal favorite.

When you have to go through ‘BIG’ lists and see what are actually occurrences:

from collections import Counter

>>> Counter(xrange(10))

Counter({0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1})

>>> just_list_again = Counter(xrange(10))

>>> just_list_again_is_dict = just_list_again

>>> just_list_again_is_dict[1]


>>> just_list_again_is_dict[2]


>>> just_list_again_is_dict[3]


>>> just_list_again_is_dict['3']


Some other methods using counter:


Counter({'a': 10, 'r': 2, 'b': 2, 'k': 1, 'd': 1})

>>> c1=Counter('abraakadabraaaaa')

>>> c1.most_common(4)

[('a', 10), ('r', 2), ('b', 2), ('k', 1)]

>>> c1['b']


>>> c1['b'] # work as dictionary


>>> c1['k'] # work as dictionary


>>> type(c1)

<class 'collections.Counter'>

>>> c1['b'] = 20

>>> c1.most_common(4)

[('b', 20), ('a', 10), ('r', 2), ('k', 1)]

>>> c1['b'] += 20

>>> c1.most_common(4)

[('b', 40), ('a', 10), ('r', 2), ('k', 1)]

>>> c1.most_common(4)

[('b', 20), ('a', 10), ('r', 2), ('k', 1)]

Aithematic and uniary operations:

>>> from collections import Counter

>>> c1=Counter('hello hihi hoo')

>>> +c1

Counter({'h': 4, 'o': 3, ' ': 2, 'i': 2, 'l': 2, 'e': 1})

>>> -c1


>>> c1['x']


Counter is like a dictionary but it also considers the counting important of all the content you are looking for. So you can plot the stuff on Graphs.


it makes your chunks of data into meaningful manner.

>>> from collections import OrderedDict
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> new_d = OrderedDict(sorted(d.items()))
>>> new_d
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> for key in new_d:
...     print (key, new_d[key])
apple 4
banana 3
orange 2
pear 1


Think it the way you need to save each line of your CSV into list of lines but along with that you also need to take care of not just the memory but as well as You should be able to store each line as dictionary data structure so if you are fetching lines from Excel or CSV document which comes in place when you work at Data-Processing environment.

# The primitive approach
lat_lng = (37.78, -122.40)
print 'The latitude is %f' % lat_lng[0]
print 'The longitude is %f' % lat_lng[1]

# The glorious namedtuple
LatLng = namedtuple('LatLng', ['latitude', 'longitude'])
lat_lng = LatLng(37.78, -122.40)
print 'The latitude is %f' % lat_lng.latitude
print 'The longitude is %f' % lat_lng.longitude


It is Container of Containers: Yes that’s really true. 🙂

You better be above Python3.3 to try this code.

>>> from collections import ChainMap

>>> a1 = {'m':2,'n':20,'r':490}

>>> a2 = {'m':34,'n':32,'z':90}

>>> chain = ChainMap(a1,a2)

>>> chain

ChainMap({'n': 20, 'm': 2, 'r': 490}, {'n': 32, 'm': 34, 'z': 90})

>>> chain['n']


# let me make sure one thing, It does not combines the dictionaries instead chain them.

>>> new_chain = ChainMap({'a':22,'n':27},chain)

>>> new_chain['a']


>>> new_chain['n']



You can also do comprehensions with dictionaries or sets as well.

>>> m = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

>>> m

{'d': 4, 'a': 1, 'b': 2, 'c': 3}

>>> {v: k for k, v in m.items()}

{1: 'a', 2: 'b', 3: 'c', 4: 'd'}

StartsWith and EndsWith methods for String Processing:

Startswith, endswith. All things have a start and an end. Often we need to test the starts and ends of strings. We use the startswith and endswith methods.

phrase = "cat, dog and bird"

# See if the phrase starts with these strings.
if phrase.startswith("cat"):

if phrase.startswith("cat, dog"):

# It does not start with this string.
if not phrase.startswith("elephant"):



Map and IMap as inbuilt functions for iteration:

map is rebuilt in Python3 using generators expressions under the hood which helps to save lot of memory but in Python2 map uses dictionary like expressions so you can use ‘itertools’ module in python2 and in itertools the name of map function is changed to imap.(from itertools import imap)

>>>m = lambda x:x*x
>>>print m
 at 0x7f61acf9a9b0>
>>>print m(3)

# now as we understand lamda returns the values of expressions for various functions as well, one just have to look
# for various other stuff when you really takes care of other things

>>>my_sequence = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
>>>print map(m,my_sequence)

#so square is applied on each element without using any loop or if.

For more on map,reduce and filter you can fetch following jupyter notebook from my Github:


How to learn Fast!!

Learning is divine and sometimes process to reach that divine feeling could be stressful. Most of the times we do things at work or at life to make life more interesting and rich(in terms of knowledge and security) but in real manner Learning is DAMN! difficult and it takes to much time to learn any new skill. That is biggest lie running around us which is really making us slow and less progressive in our life.

All we really need is have to take care of few things/hurdles those come in our situations and learn few tactics or build up unique attitude towards those Tactics so we will be able to overcome such stuff and be good at anything just in 20 hours.

Learning new skills improves your life:

We as regular people have stuff to do in day to day life to earn some bucks and pay rent, even if you are a student or anything you have various tasks like Playing,conversations over coffee,movies,class-bunks etc so does not matter either you are in professional life or non-professional you have no time at all! But one thing we all know that learning a new skill can improve our life, it is just not to improve your life but even the people around you!

So First thing is believe in this thought/idea/saying ‘Learning a new skill improves life.’ It will give lot of fuel to your feelings and will motivate you constantly at each hurdle you get for specific skill that you are going to learn any time soon.

Set your time for each day or just take 4-5 days off from Everything: As I specifically told you that it only takes just 20 hours to learn anything or to be good at anything so you have to manage those 20 hours. if you are going to give 20 hours to each of your project /skill you want to learn/do you have to set time for sure.

it will work out something like this:

1 hour each day–>1 day to complete 1 hour learning:

20 days to complete 20 days learning

2 hours each day:

10 days to learn anything or to be good at anything.

4 hours each day and BOOM! –> Just ‘Five days’ to learn anything!

One thing you have to understand carefully, If you are giving 4 hours of your daily life to specific task/work you have to make sure that you have to give yourself a challenge or set of challenges those you have to complete in time. Don’t make the challenges too hard or too soft, Just the right amount and that right amount will entirely be dependent on your capacity of memorizing/reading and doing practical work related to that skill. Amount of doing research and practical may vary based on the skill, If you want to learn swiming then you will spent like 0.5 hour for reading and other 3.5 into swimming pool, If you are learning about programming it is good to give 1 hour for reading about some basics of programmingand rest will be consumed by solving a challenge. Let me make sure one thing solving a programming challange does not mean looking at Google to find a solution. 😀

If you are going to learn about Machine-learning or modeling a system you must give like 50-50 %age of time to each of the task.

By following above approach you don’t have to like wait for ever do learn/do anything new or something else.

Perfection is the enemy of Good:

There is one another research which is you have to spend 10,000 hours if you really want to learn anything, Which is also true in another sense. This research is based on the events/learnings of people who are GODs in their fields.

So you don’t have to be GOD or Just perfect to learn a new skill and enjoy returns come from that skill, For example if you want to learn how to play football you just have to read some rules,Get a football,find a ground/park around and kick your football with your legs, may be after 7-8 days you will be able to find some friends or team or others to join you. so that will be easy but it will take to really 10,000 hours of you want to compete against Ronaldo or Messy.

And when you get good enough and you enjoy doing it and that leads to more practice/perfection of that skill.

Make your decision and set target performance level:

You have to decide first what you actually want to do with that skill or what you actually want to learn, There will me many tasks you want to do in your life but you really have to write those in some manner. Now other important thing is setting a Target Performance level, How much you want to gain from specific skill. If you want to learn programming you have to tell yourself that you are learning because you want to make your business website or you are learning to code because you want to get job at company or you want to learn to code because you want to get job at Google.

It is always great to dream BIG but having a small stepping stones does matter, so If you want to get job at Google as a programmer it is always good to work on your personal project first then move to some professional paid work, after that you will get the ability to guide yourself like where to go from that point.

in other words once you get most baseline proficiency in something it sucks less that baseline level give you inspiration to learn more about that skill.

Deconstruction of skill:

Most of the time you see a skill is subset of various things, When you care about learning a new skill a quick study about that skill can tell you that how many other subsets are there for that particular skill-set, Some subsets are good to go with but some are really difficult to understand, So this deconstruction can help you to understand which easy skills you can learn first as well as what about of subsets you need/want to learn.

For example if you want to learn about Algorithmic trading you don’t need to learn first about marketing strategies, macro economics, policies/factors effects wall-street,internal structure of wall-street,quantitative analytics,machine-learning as well various machine learning techniques those are used for research purpose while construction of a algorithm/strategy, But in real for starting you just have to learn about those strategies which are currently being used by most of the traders and gives good returns, that number of strategies is not more than 10 or something so at first for algorithmic trading you have to know about those strategies and know for what conditions which strategy should be applied on stock market so you will b able to get better results of your trading.

In this process you will found that most of the 2-3 sub-skills repeat over and over again which help you to learn/do things much faster and that save your lot of time and energy which is mot important.

Research VS Practice:

When we have to learn any skill procrastinate! At some level procrastination is really good thing because back in your brain you unconsciously process/think about that skill, but if it is too much it will kill your focus as well, so right amount of procrastination is great for you. When we learn anything new we read-research-discuss. But rather than just limiting ourself into research mode will kill our productivity as well.

Human brain loves to do research but we need to switch between things constantly which is read/research and do.

For example if you really are going to learn programming either you can read 5-6 books first then try to write a function which will fetch birthday information of your friend from Facebook and let you know if anyone’s birthday falls in present month.

Practices makes you perfect But how to practice?:

There are various things you have to know about practice, Make sure whatever new work you are doing/learning, Do it just before you are going to sleep and after sleeping try it out as first thing in the morning, Study shows that in sleep your brain turns your small practice into good neuron structures for passing of various messages, such messages makes your mind more strong and fast to react towards grasping of new skill-sets.

Above method works for both either it is Cogitative or Motor skill.

Removing the barriers: (a general approach)

Sometimes those barriers are just environmental distractions. You have to make a list of distractions those really comes into your world when you try learn a new skill, Those distractions should be turned off like your phone or Chat, some sound coming from outside, Turning off your TV Or at last but not least TURN OFF YOUR INTERNET.

If you want to learn how to play Harmonica you just have to put it some=where in front of you! this is behavior psychology, that just make sure rather than getting distracted from any other shiny object you have to see that thing you want to do/learn first.

It is something like keep those things on your Computer Desktop those you want to learn/do but that does not mean your desktop should be overfilled with things because that also kills your productivity and you system’s speed.

It is also observed from studies that if you listen vocal-music while doing/reading something or even programming effects your ability to be more productive,but if you listen non-vocal music or some jazz it will not just help you to increase your productivity but also help you to improve your mind state.

Commit to practice for at-least 20 hours!

For more information please refer following video:

%d bloggers like this: