my net house

WAHEGURU….!

Category Archives: Uncategorized

OOPS and More OOPS in Python

Have to work on following these two things and have to look more into this whole stuff.

https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/

https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

http://www.pythonforbeginners.com/super/working-python-super-function

Advertisements

Concurrency in Python or Natural way of life(Not yet completed POST)

There are various ways one can think about computing , Multiprocessing, Asynchronous, Multi-threading as well as “Parallel Processing” If I would talk about theoratical things I Would say we have to distribute our one particular task in various forms so multiple resources should be available for system to run things or in other way we can say multiprocessing is more of Programmer’s way of understanding the Flow of precess and sometimes rules according to theory does not assure that if one is providing multiple resources to process it will be FAST! it could be FAT! also.

Now let me start with very simple Example by taking following function as use case:

# Function that run multiple tasks
def get_response(url):
“””returns response for URL ”””
response = requests.get((url),verify=False)
return response.text

Now above function is simple enough that is getting one URL and returning response but if I have to pass multiple URLs but I want that get request to each URL should be fired at same time then That would be Asynchronous process not multiprocessing because in Multiprocessing Threads/Processes needs to communicate with each other but on the other hand in case of Asynchrounous threads don’t communicate(in Python because Python uses Process based multiprocessing not Thread Based although you can do thread-based multiprocessing in Python but then you are on your OWN 😀 😛 Hail GIL (Mogambo/Hitler)).

So above function will be like this as usual:

from multiprocessing import Pool
pool = Pool(processes=20)
resp_pool = pool.map(get_response,tasks)
URL_list = []
resp_pool = _pool.map(get_response,tasks)
pool.terminate()
pool.join()

One thing you have to understand very carefully and that is GIL does not harm for i/o bound operations but yes when it comes to non-i/o bound operations in python You have Numpy,Scipy,Pandas,Cython where one can really release GIL and take full advantage of the code.

How to release GIL using Cython: https://lbolla.info/blog/2013/12/23/python-threads-cython-gil
Although one can look for interesting features about GIL: http://www.dabeaz.com/python/NewGIL.pdf

Intel has also provided Python Distribution that is helpful get speedups in Python but that would only be helpful for Machine-learning and Data-Science work.

http://www.techenablement.com/orders-magnitude-performance-intel-distribution-python/(Seems like worth to give it a Try:::)

Now there is one important thing you must need to care about when you are releasing GIL in Python.

You can also scratch your head many times by just reading/watching this one interesting presentation: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Although Numba is also there but make one thing for sure Use such tools only when your Operation is CPU bound not I/O bound because as I have stated that I/O bound operations don’t care about GIL.

Although you will find out that GIL is not just Python’s Problem:

https://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil

I/O Bound:

The I/O bound state has been identified as a problem in computing almost since its inception. The Von Neumann architecture, which is employed by many computing devices, is based on a logically separate central processor unit which requests data from main memory,[clarification needed] processes it and writes back the results. Since data must be moved between the CPU and memory along a bus which has a limited data transfer rate, there exists a condition that is known as the Von Neumann bottleneck. Put simply, this means that the data bandwidth between the CPU and memory tends to limit the overall speed of computation. In terms of the actual technology that makes up a computer, the Von Neumann Bottleneck predicts that it is easier to make the CPU perform calculations faster than it is to supply it with data at the necessary rate for this to be possible.

In simple cases CPU is Faster and Memory is Slower.
https://en.wikipedia.org/wiki/I/O_bound

Let’s make things more precise:
Sync: Blocking operations.
Async: Non blocking operations.
Concurrency: Making progress together.
Parallelism: Making progress in parallel.

Now Questions arises that do we need all those things together:
http://docs.python-guide.org/en/latest/scenarios/speed/
https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html
https://github.com/dask/dask(Although I just found that Dask is much more Advanced and Promising that one should not ignore at all!!)
http://dask.pydata.org/en/latest/dataframe-performance.html

async: https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e
https://stackoverflow.com/questions/8533318/python-multiprocessing-pool-when-to-use-apply-apply-async-or-map
https://github.com/pyparallel/pyparallel

One mintue read to one minute Manager

Get out more results in less time.

Autocratic VS Democratic:
Autocratic are result oriented and Democratic are happiness Oriented So we
need to be one minute Managers. 🙂

1. One minute Goal Setting:

Everyone should be knowing the goals of the company.
People must know what their roles are in the company.
Goals must not be more than 250 words.
Always review your Goals.

2. One minute Praising:

Give True Feedback.
Always praise immediately.
Share happiness and encourage your people.

3. One minute reprimand:

Immediately point people out for their mistakes.
Tell people how you feel about it.
Point out mistake but don’t criticize.
Be on the side of your people.

conclusion:
Look for the good things in the beginners and bad things in the experienced.
Share what you learn.
We don’t manage people, We manage behaviors.
Love your people and make sure they are also loving you back.
Define your problem grammatically. (What is happening and What you want to be happen.)

Lessons Learned from life

Complete Basics those just went out of my mind, No idea how those gone away. 😦

Work-Life Balance.

You can’t be successful in one day.

All the time people around you tell how to do it, Either you ignore  it or take to next level.

Better late than never.

Never Leave your Day job(even if it is cutting grass ).

Don’t try to be OVER-SMART.

Never consume any Addictive substance.

Learn to respect your personal space as well as others.

Learn to turn off your mind from consistence thinking of things.

Love your work—-Work is never Ending process, Don’t take so much pressure to complete it or start next one.

Have a group of friends outside work.

Nobody is slowing you down Except you.

Learn to say sorry, please, thanks, welcome.

Help others but respect your time and Energy.

Break the pattern of your life.

Be hungry, be foolish – Stop believing that.

Sikhism has different way of living life.(Either believe in that or Live with sorrows.)

If you want to earn more, Be-crazy, Get-exploited and create a big hole inside you, that is your choice as well. 🙂

Law of Wealthy LIfe

Wealthy life does not just mean to having lots of money in the bank but it is much more like creating various things in your society or running various engines those work in the manner that you are really able to make things happen in your life instantly, One thing you must remember or know carefully and that is If you really want to do it fast,  do it well. 🙂

Speed of implementation

Respect your time(Don’t waste on social media and stuff)

Go to bed early and get up early. Although I am writing this post so Late.:P 😦 😉

Important Julia Packages

  1. Julia Ipython

Julia is able to run very well on you Ipython notebook Environment. After all, All you have to do is Data-Science and Machine-Learning. 🙂

julia

1.1 Open Julia Prompt(At Ubuntu it works like typing ‘julia’ command in your Terminal)

1.2 run command > Pkg.add(“IJulia”) # it will do almost all the work.

2. DataFrames: Whenever you have to read lot of files in Excel-Style Julia DataFrames Package is good to go.

Pkg.add("DataFrames")

3. Arduino:

A Julia Package for interacting with Arduino.

https://github.com/rennis250/Arduino.jl

4. Neural Network Implementation of Julia

https://github.com/compressed/BackpropNeuralNet.jl

5. Visualizing and Plotting in Julia:

https://github.com/bokeh/Bokeh.jl

6. Reading and writing CSV files in Julia

https://github.com/JuliaData/CSV.jl

7. DataClusting in Julia:

https://github.com/JuliaStats/Clustering.jl

For more Large number of Packages, Please refer following link:

http://pkg.julialang.org/

Note*: You can also run most of the Shell commands in Julia environment as well. 🙂

things and things

Things those need to be understood in many ways.

  1. Various important parts of Statistics and implementation
  2. Hypothesis Testing
  3. Probability Distributions and Importance
  4. AIC and BIC
  5. Baysian models
  6. Some black Magics of OOPS

Hacker’s Guide to Quantitative Trading(Quantopian Python) Part 2

Quantopain Provides required API functions,Data,Helpful-community as well as batteries included Web-based Dashboard to play with Algorithmic-Trading, Create Your own trading Strategies, and launch your Trading model in live Market.

Here I will only talk about code and how it should be written to create your own Trading Strategy.

There are basically Two methods.

initialize() and handle_data()

initialize act as initializer for various variables. same as __init__ method in Python.

Now what kind of variables we have to declare in initialize() function is dependent on your strategy. we can select limited number of stocks,days,type of trading,variables required for Algorithms.

A very simple example of initialize() code could look like as follows:

def initialize(context): # consider context just as 'self' in Python

   context.stocks = [sid(24),sid(46632)] # sid stands for stock_id

initialize() also contains the stuff that can be used many times or all the times in our Trading Algorithm:

1. A counter that keeps track of how many minutes in the current day we’ve got.

2. A counter that keeps track of our current date.

3. A list that stores the securities that we want to use in our algorithm.

Whatever variables that you define here will remain persistent (meaning that they’ll exist) but will be mutable. So that means that if you initialize context.count as 0 in initialize, you can always change it later in handle_data().

A Simple Example of handle_data():

def handle_data(context,data):

   for stock in context.stocks:

        if stock in data:

            order(stock,1)

Momentum Strategy:(Common Trading Strategy)

In this strategy we consider Moving average price of stock as an important factor to make decision to put a security price in Long or Short.

Here is simple explanation of momentum Strategy:

● If the current price is greater than the moving average, long the security

● If the current price is less than the moving average, short the security

Now we will use Quantopian API to implement this strategy for Trading. instead, our algorithm here is going to be a little more sophisticated. We’re going to look at two moving averages: the 50 day moving average and the 200 day moving average.

David Edwards writes that “the idea is that stocks with similar 50 & 200 day moving averages are more likely to be fairly valued and the algorithm will avoid some of the wild swings that plague momentum strategies. The 50/200 day crossover is also a very common signal, so stocks might be more likely to continue in the direction of the 50day MA because a lot of investors enter and exit positions at that threshold.”

The decision-making behind Moving-average is as follows:

● If the 50 day moving averages is greater than the 200 day moving average, long the security/stock.

● If the 50 day moving average is less than the 200 day moving average, short the security/stock

So now Let’s make a Trading Bot!

1. First we have to create our initialize() function:

def initialize(context):

   set_universe(universe.DollarVolumeUniverse(floor_percentile=99.5,ceiling_percentile=100))

”’Set universe is inbuilt function by Quantopian which provide us the stocks with-in required universe. Here we have selected stocks those we have DollarVolumeUniverse with 99.5% and 100% as our floor and ceiling. This means that we’ll be selecting the top 99.5 ~ 100% stocks of our universe with the highest dollar*volume scores.

Please read the comments in the code.

   context.stocks_to_long = 5

   context.stocks_to_short = 5
   context.rebalance_date = None # we will get today's date then we will keep positions active for 10 days here

   context.rebalance_days = 10 # it is just an assumption now for 10 days or finer value


Now we have defined required __init__ parameters in initiliaze() let’s move to

handle_data()

def handle_data():

   if context.rebalance_date!=None: # if rebalnce date is not null then set next_date for changing the position of algorithm

       next_date = context.rebalance_date + timedelta(days=context.rebalnce_days) # next_date should be that days away from rebalnce_date

   if context.rebalance_date==None or next_date==get_datetime(): # if today is that day after 10 days when we market long/short for out stock

       context.rebalnce_date = get_datetime() # set rebalnce_date for today so next_date will be set to again 10 days ahead from rebalnce_date

       historical_data = history(200,'1d',price)

Get historical data of all stocks initilized in initiliaze() function, ‘1d’= 1 day,200=days,’price’=we are only fetching price details because that is only required for our strategy, may be for some strategy volume of stock could be more beneficial

  past_50days_mean = historical_data.tail(50).mean()

  past_200days_mean = historical_data.mean()

  diff = past_50days_mean/past_200days_mean-1

# if diff>0 we will long if diff<1 we will short

   buys = diff[diff>0]

   sells = diff[diff<0]   

# here we will get list of securities/stocks whose moving average will be

# greater as well as less than 0

   buys.sort() # sorting buys list why? - getting top securities from top- more is better
   sells.sort(ascending=False) # reverse sorting sells list - getting top seurities from bottom, less is better because we are selling agiainst market
   buys = buys.iloc[:buy_length] if buy_weight !=0 else None # buy_length = number of securities we want to purchase , 
   sells = sells.iloc[:short_length] if short_weight !=0 else None # short_length = number of securities we want to short

Now here we have buys and sells are two lists!! (remember carefully) all the decisions are going to be made based on these two lists

We can also implement risk factors in out Trading Strategy. Let’s implement minimum form of Risk-Factor, 0.02% of last_traded_price that means if security is going to much lower than that then we will exit.

We will go through each security in our data/universe and those who will satisfy condition of ‘buys’ and ‘sells’ list will be bought/sold.

# if security exists in our sells data

   for sym in data:

       if sells is not None and sym in sells.index:

           log.info('SHORT:%s'%sym.symbol)

           order_target_price(sym,short_weight.stop_price=data[sym].price_stops[sym])

   

# here stop_price is the price of security in real-time+change happend in stops

# order_target_price is inbuilt function.




   # if security exists in our buy data

   elif buys is not None and sym in buys.index:

       log.info('Long:%s'%sym.symbol)

       order_target_percent(sym,buy_weight,stop_price=data[sym].price-stops[sym])

   else:

       order_target(sym,0)


The `order_target_percent` method allows you to order a % target of your portfolio in that security. So this means that if 0% of your total portfolio belongs in AAPL and you order 30%, it will order 30%. But if you had 25% already and you tried ordering 30%, it will order 5%.

You can order using three different special order methods if you don’t want a normal market order:

#`stop_price`: Creates a stop order

#`limit_price`: Creates a limit order

#`StopLimitOrder`: Creates a stop­limit order





How Trading Differentiates from Gambling:

Most of times when you find that you are able to get good returns from your capital you try to beat the market, Beating the market means most of the traders tried to earn much more than fine earnings are being returned by the market for your stock, Such beating the market process can be done by various actions like reversing the momentum or looking for bad happenings in the market(which is also called finding the shit!)Some people are really good at this kung-fu but as you are just budding trader and you have only limited money of yours, So here one important thing should be remembered, “”Protect your capital””. – That’s what most of the Big banks do and if they will hire you as their Quant or Trading-Execution person they will expect same from you. Big banks have billions of dollars that they don’t want to loose but definitely want to used that money to get good returns from market.

So they follow one simple rule for most of the times.

Guaranteed returns even if those are low.

[Make sure returns should be positive after subtracting various costs like brokerage,leverage etc, Because getting positive returns by neglecting market costs is far easy but such strategies should not be used with real money.]

So the real key is think like a computer programmer at first place, something like it should work at first place, so first thing to make sure is getting returns even low but stable returns by calculating various risk-factors.

I am quoting some of informative things from SentDex Tutorial:

Most individual traders are trading on account sizes of somewhere between maybe $25,000 and $100,000 USD, so their motives are to hopefully increase that account size as much as possible, so this person is more likely to take part in High Risk High Yield (HRHY).

Most people who use HRHY strategies, tend to ignore the HR (High Risk) part, focusing on the HY (High Yield).

The same is common with gamblers,even over astronomical odds with things like the lottery.

In other words, always ask yourself – what’s about the market that makes my strategy work? Because, at the end of the day, algorithmic trading is more about trading than about algorithm.

Power of brain relaxation

This is kind of funny thing that is happening to me in these days, I am trying to be as relax as possible most of the time and it is going to increase my productivity, I feel people around me much cooler,calm,happy,positive,smiling and funny.

relaxation = less stress on mind = sit free do nothing excepting thinking 😀

 

If I am sitting freely most of the time then how I can be more productive or I am more productive because my mind  loves to have soft reboots after completion of small programming tasks?  soft reboots could be anything

  1. Going wash-room and say hello to stranger
  2. Looking at pretty girls in office 😀
  3. closing your eyes and remember the time when you play with your dog
  4. thinking about your dreams
  5. making possible your dreams by reading great quotes on internet
  6. thinking about journey of life
  7. dreaming to have great soul-mate or talking/chatting to her/him if you already have one 😀
  8. having fun with your colleagues
  9. drinking coffee
  10. Playing volleyball or any game which is available. yeahhhh!!!
  11.  writing silly blog-posts exactly like this one 😀 😀 😀

 

 

 

A simple script to do parsing of large file and save it to Numpy array

A normal approach:


huge_file = 'huge_file_location'
import re
import numpy as np
my_regex=re.compile(r'tt\d\d\d\d\d\d\d') #using a compiled regex saves the time
a=np.array([]) # just an array to save all the files
with open(file_location,'r') as f: # almost default method to open file
m = re.findall(my_regex,f.read())
np_array = np.append(a,m)
print np_array
print np_array.size
print 'unique'
print np.unique(np_array) # removing duplicate entries from array
print np.unique(np_array).size
np.save('BIG_ARRAY_LOCATION',np.unique(np_array))

In the above code f.read() saves big chuck of string into memory that is about 8GB in present situation. let’s fire up Generators.

A bit improved version:


def read_in_chunks(file_object):
while True:
data = file_object.read()
if not data:
break
yield data
import numpy as np
import re
a=np.array([])
my_regex=re.compile(r'tt\d\d\d\d\d\d\d')
f = open(file_location)
for piece in read_in_chunks(f):
m = re.findall(my_regex,piece) # but still this is bottle neck
np_array = np.append(a,m)
print np_array
print np_array.size
print 'unique'
print np.unique(np_array)
print np.unique(np_array).size

A little bit faster code:


file_location = '/home/metal-machine/Desktop/nohup.out'
def read_in_chunks(file_object):
while True:
data = file_object.read()
if not data:
break
yield data

import numpy as np
import re
a=np.array([])
my_regex=re.compile(r’tt\d\d\d\d\d\d\d’)
f = open(file_location)
def iterate_regex():
”’ trying to run iterator on matched list of strings as well”’
for piece in read_in_chunks(f):
yield re.findall(my_regex,piece)
for i in iterate_regex():
np_array = np.append(a,i)
print np_array
print np_array.size
print ‘unique’
print np.unique(np_array)
print np.unique(np_array).size

But why performance is still not taht good? Hmmm……
Have to look for more things. Please use the required indentation while testing. 😛

Look at the CPU usage running on Goole instance 8Core system.

 

cpu-usage.png

%d bloggers like this: