# my net house

WAHEGURU….!

## Learning Dataframes in Julia

Week4_DataF

### Week 4 – Working with Distributions and DataFrames.¶

In [1]:
# Import the required packages
using Distributions, DataFrames

In [2]:
# Seed the random number generator
srand(1234);

In [3]:
# Question 4: Create the 3 x 30 array named array_1
# 30 rows and 3 columns array
array_1 = [rand(30) rand(30) rand(30)]
size(array_1)
array_1

Out[3]:
30×3 Array{Float64,2}:
0.590845   0.931115   0.643704
0.766797   0.438939   0.401421
0.566237   0.246862   0.525057
0.460085   0.0118196  0.61201
0.794026   0.0460428  0.432577
0.854147   0.496169   0.082207
0.200586   0.732      0.199058
0.298614   0.299058   0.576082
0.246837   0.449182   0.218177
0.579672   0.875096   0.362036
0.648882   0.0462887  0.204728
0.0109059  0.698356   0.932984
0.066423   0.365109   0.827263
⋮
0.0566425  0.404953   0.0396356
0.842714   0.499531   0.79041
0.950498   0.658815   0.431188
0.96467    0.515627   0.137658
0.945775   0.260715   0.60808
0.789904   0.59552    0.255054
0.82116    0.292462   0.498734
0.0341601  0.28858    0.0940369
0.0945445  0.61816    0.52509
0.314926   0.66426    0.265511
0.12781    0.753508   0.110096
0.374187   0.0368842  0.834362
In [4]:
# Question 5: Mean and variance of column 1
mean_column_1 = mean(array_1[:,1])
var_column_1=var(array_1[:,1])
println("mean=",mean_column_1)
println("var=",var_column_1)

mean=0.5014887976938368
var=0.10653465363277906

In [5]:
# Question 5 (continued): Mean and variance of column 2
mean_column_2 = mean(array_1[:,2])
var_column_2=var(array_1[:,2])
println("mean=",mean_column_2)
println("var=",var_column_2)

mean=0.4160447968360426
var=0.06360439983290869

In [6]:
# Question 5 (continued): Mean and variance of column 3
mean_column_3 = mean(array_1[:,3])
var_column_3=var(array_1[:,3])
println("mean=",mean_column_3)
println("var=",var_column_3)

mean=0.4372634519427959
var=0.07568707224628725

In [7]:
# Question 6: Import array_1 into a DataFrame named df
df = DataFrame(array_1)

Out[7]:
x1 x2 x3
1 0.5908446386657102 0.9311151512445586 0.6437042811826996
2 0.7667970365022592 0.43893895933102156 0.40142056533714965
3 0.5662374165061859 0.24686248047491066 0.5250572942486489
4 0.4600853424625171 0.011819583479107054 0.6120098074984683
5 0.7940257103317943 0.046042826396498704 0.43257652982765626
6 0.8541465903790502 0.496168672722459 0.0822070287962946
7 0.20058603493384108 0.7320003814997245 0.19905799020907944
8 0.2986142783434118 0.29905752670238184 0.5760819730593403
9 0.24683718661000897 0.4491821088563024 0.21817706596841413
10 0.5796722333690416 0.8750962647851142 0.3620355262053865
11 0.6488819502093455 0.046288741031345504 0.20472832290217324
12 0.010905889635595356 0.6983555060532487 0.93298350850828
13 0.06642303695533736 0.3651093677271471 0.8272627957034728
14 0.9567533636029237 0.3024777928234499 0.09929915955881308
15 0.646690981531646 0.3725754415996787 0.6342997886044144
16 0.11248587118714015 0.15050782744925795 0.1327153585755645
17 0.2760209506672211 0.14732938279328955 0.7751941503856596
18 0.6516642063795697 0.2834013103457036 0.8692366891234362
19 0.05664246860321187 0.40495283364883794 0.039635617270926904
20 0.8427136165865521 0.49953074411487797 0.7904095314876494
21 0.9504984071553011 0.6588147837334961 0.43118828904466633
22 0.9646697763820897 0.5156272179795256 0.1376583132625555
23 0.9457754052519123 0.26071522632820776 0.6080803126880718
24 0.7899036826169576 0.5955204840509289 0.2550540600167448
25 0.8211604203482923 0.2924615242315285 0.4987340031883092
26 0.03416010848943718 0.2885798506061561 0.09403688346569439
27 0.09454448946400307 0.6181597973815087 0.5250899072103514
28 0.31492622391998415 0.6642598175011505 0.2655109248498748
29 0.12780989889368866 0.7535081177709988 0.11009621399607639
30 0.374186714831074 0.03688418241886171 0.8343616661080064
In [8]:
# check available names and fieldnames in Julia, Python's alternative
f_name =fieldnames(df)
name=names(df)
println(f_name,name)

Symbol[:columns, :colindex]Symbol[:x1, :x2, :x3]

In [9]:
# Accessing different columns of df
df[:x3]

Out[9]:
30-element Array{Float64,1}:
0.643704
0.401421
0.525057
0.61201
0.432577
0.082207
0.199058
0.576082
0.218177
0.362036
0.204728
0.932984
0.827263
⋮
0.0396356
0.79041
0.431188
0.137658
0.60808
0.255054
0.498734
0.0940369
0.52509
0.265511
0.110096
0.834362
In [10]:
# Question 7: Change the names of the columns to Var1, Var2, and Var3
rename!(df,Dict(:x1=>:Var1,:x2=>:Var2,:x3=>:Var))

Out[10]:
Var1 Var2 Var
1 0.5908446386657102 0.9311151512445586 0.6437042811826996
2 0.7667970365022592 0.43893895933102156 0.40142056533714965
3 0.5662374165061859 0.24686248047491066 0.5250572942486489
4 0.4600853424625171 0.011819583479107054 0.6120098074984683
5 0.7940257103317943 0.046042826396498704 0.43257652982765626
6 0.8541465903790502 0.496168672722459 0.0822070287962946
7 0.20058603493384108 0.7320003814997245 0.19905799020907944
8 0.2986142783434118 0.29905752670238184 0.5760819730593403
9 0.24683718661000897 0.4491821088563024 0.21817706596841413
10 0.5796722333690416 0.8750962647851142 0.3620355262053865
11 0.6488819502093455 0.046288741031345504 0.20472832290217324
12 0.010905889635595356 0.6983555060532487 0.93298350850828
13 0.06642303695533736 0.3651093677271471 0.8272627957034728
14 0.9567533636029237 0.3024777928234499 0.09929915955881308
15 0.646690981531646 0.3725754415996787 0.6342997886044144
16 0.11248587118714015 0.15050782744925795 0.1327153585755645
17 0.2760209506672211 0.14732938279328955 0.7751941503856596
18 0.6516642063795697 0.2834013103457036 0.8692366891234362
19 0.05664246860321187 0.40495283364883794 0.039635617270926904
20 0.8427136165865521 0.49953074411487797 0.7904095314876494
21 0.9504984071553011 0.6588147837334961 0.43118828904466633
22 0.9646697763820897 0.5156272179795256 0.1376583132625555
23 0.9457754052519123 0.26071522632820776 0.6080803126880718
24 0.7899036826169576 0.5955204840509289 0.2550540600167448
25 0.8211604203482923 0.2924615242315285 0.4987340031883092
26 0.03416010848943718 0.2885798506061561 0.09403688346569439
27 0.09454448946400307 0.6181597973815087 0.5250899072103514
28 0.31492622391998415 0.6642598175011505 0.2655109248498748
29 0.12780989889368866 0.7535081177709988 0.11009621399607639
30 0.374186714831074 0.03688418241886171 0.8343616661080064
In [11]:
### we can also tail function see last required entries
tail(df,20)

Out[11]:
Var1 Var2 Var
1 0.6488819502093455 0.046288741031345504 0.20472832290217324
2 0.010905889635595356 0.6983555060532487 0.93298350850828
3 0.06642303695533736 0.3651093677271471 0.8272627957034728
4 0.9567533636029237 0.3024777928234499 0.09929915955881308
5 0.646690981531646 0.3725754415996787 0.6342997886044144
6 0.11248587118714015 0.15050782744925795 0.1327153585755645
7 0.2760209506672211 0.14732938279328955 0.7751941503856596
8 0.6516642063795697 0.2834013103457036 0.8692366891234362
9 0.05664246860321187 0.40495283364883794 0.039635617270926904
10 0.8427136165865521 0.49953074411487797 0.7904095314876494
11 0.9504984071553011 0.6588147837334961 0.43118828904466633
12 0.9646697763820897 0.5156272179795256 0.1376583132625555
13 0.9457754052519123 0.26071522632820776 0.6080803126880718
14 0.7899036826169576 0.5955204840509289 0.2550540600167448
15 0.8211604203482923 0.2924615242315285 0.4987340031883092
16 0.03416010848943718 0.2885798506061561 0.09403688346569439
17 0.09454448946400307 0.6181597973815087 0.5250899072103514
18 0.31492622391998415 0.6642598175011505 0.2655109248498748
19 0.12780989889368866 0.7535081177709988 0.11009621399607639
20 0.374186714831074 0.03688418241886171 0.8343616661080064
In [12]:
# Creatring Second DataFrame
df2=DataFrame(tail(df,20))

Out[12]:
Var1 Var2 Var
1 0.6488819502093455 0.046288741031345504 0.20472832290217324
2 0.010905889635595356 0.6983555060532487 0.93298350850828
3 0.06642303695533736 0.3651093677271471 0.8272627957034728
4 0.9567533636029237 0.3024777928234499 0.09929915955881308
5 0.646690981531646 0.3725754415996787 0.6342997886044144
6 0.11248587118714015 0.15050782744925795 0.1327153585755645
7 0.2760209506672211 0.14732938279328955 0.7751941503856596
8 0.6516642063795697 0.2834013103457036 0.8692366891234362
9 0.05664246860321187 0.40495283364883794 0.039635617270926904
10 0.8427136165865521 0.49953074411487797 0.7904095314876494
11 0.9504984071553011 0.6588147837334961 0.43118828904466633
12 0.9646697763820897 0.5156272179795256 0.1376583132625555
13 0.9457754052519123 0.26071522632820776 0.6080803126880718
14 0.7899036826169576 0.5955204840509289 0.2550540600167448
15 0.8211604203482923 0.2924615242315285 0.4987340031883092
16 0.03416010848943718 0.2885798506061561 0.09403688346569439
17 0.09454448946400307 0.6181597973815087 0.5250899072103514
18 0.31492622391998415 0.6642598175011505 0.2655109248498748
19 0.12780989889368866 0.7535081177709988 0.11009621399607639
20 0.374186714831074 0.03688418241886171 0.8343616661080064
In [13]:
# Question 9: Calculate simple descriptive statistics of all the columns in df2 using the describe() function
describe(df2)

Var1
Summary Stats:
Mean:           0.484341
Minimum:        0.010906
1st Quartile:   0.108001
Median:         0.510439
3rd Quartile:   0.826549
Maximum:        0.964670
Length:         20
Type:           Float64

Var2
Summary Stats:
Mean:           0.397753
Minimum:        0.036884
1st Quartile:   0.277730
Median:         0.368842
3rd Quartile:   0.601180
Maximum:        0.753508
Length:         20
Type:           Float64

Var
Summary Stats:
Mean:           0.453279
Minimum:        0.039636
1st Quartile:   0.136423
Median:         0.464961
3rd Quartile:   0.778998
Maximum:        0.932984
Length:         20
Type:           Float64


In [14]:
# Question 10: Add a column to df2 named Cat1 to df2 consisting of randomly selecting either the strings GroupA or GroupB
df2 = hcat(df2, rand(["GroupA","GroupB"],20))
rename!(df2,Dict(:x1=>:Cat1))

Out[14]:
Var1 Var2 Var Cat1
1 0.6488819502093455 0.046288741031345504 0.20472832290217324 GroupB
2 0.010905889635595356 0.6983555060532487 0.93298350850828 GroupB
3 0.06642303695533736 0.3651093677271471 0.8272627957034728 GroupA
4 0.9567533636029237 0.3024777928234499 0.09929915955881308 GroupA
5 0.646690981531646 0.3725754415996787 0.6342997886044144 GroupA
6 0.11248587118714015 0.15050782744925795 0.1327153585755645 GroupA
7 0.2760209506672211 0.14732938279328955 0.7751941503856596 GroupB
8 0.6516642063795697 0.2834013103457036 0.8692366891234362 GroupB
9 0.05664246860321187 0.40495283364883794 0.039635617270926904 GroupB
10 0.8427136165865521 0.49953074411487797 0.7904095314876494 GroupB
11 0.9504984071553011 0.6588147837334961 0.43118828904466633 GroupA
12 0.9646697763820897 0.5156272179795256 0.1376583132625555 GroupB
13 0.9457754052519123 0.26071522632820776 0.6080803126880718 GroupA
14 0.7899036826169576 0.5955204840509289 0.2550540600167448 GroupB
15 0.8211604203482923 0.2924615242315285 0.4987340031883092 GroupA
16 0.03416010848943718 0.2885798506061561 0.09403688346569439 GroupB
17 0.09454448946400307 0.6181597973815087 0.5250899072103514 GroupB
18 0.31492622391998415 0.6642598175011505 0.2655109248498748 GroupA
19 0.12780989889368866 0.7535081177709988 0.11009621399607639 GroupA
20 0.374186714831074 0.03688418241886171 0.8343616661080064 GroupA
In [15]:
# Question 11: Create a new DataFrame named df3
df3 = DataFrame(A=1:20,B=21:40,C=41:60)

Out[15]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 10 30 50
11 11 31 51
12 12 32 52
13 13 33 53
14 14 34 54
15 15 35 55
16 16 36 56
17 17 37 57
18 18 38 58
19 19 39 59
20 20 40 60
In [16]:
# Question 12: Change indicated values to empty entries
#In a code cells below, change the values in df3 of the following cells to NA: row 10, column 1, row 15, column 2 and row #19, column 3
df3[10,1] = NA
df3[15,2] = NA
df3[19,3] = NA
df3

Out[16]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 NA 30 50
11 11 31 51
12 12 32 52
13 13 33 53
14 14 34 54
15 15 NA 55
16 16 36 56
17 17 37 57
18 18 38 58
19 19 39 NA
20 20 40 60
In [17]:
# Question 13: Create DataFrame df4 that contains no rows with NaN (NA) values
df4 = completecases!(df3)

Out[17]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 11 31 51
11 12 32 52
12 13 33 53
13 14 34 54
14 16 36 56
15 17 37 57
16 18 38 58
17 20 40 60

## Some Plugs-Plays with Julia Programing

Week3_PR_Template

# Title: Week 3 – Fitting a Curve¶

In [17]:
# Initilization of Plots Package
using Plots
pyplot()

Out[17]:
Plots.PyPlotBackend()

### Reading data from given Sample file¶

In [18]:
data_tofit = readdlm("Week3_PR_Data.dat", '\t', header=true)
typeof(data_tofit)

Out[18]:
Tuple{Array{Float64,2},Array{AbstractString,2}}

### Using For loop to print data in array¶

In [19]:
new_array=data_tofit[1]
for i in 1:size(new_array)[1]
println(new_array[i,:])
end

[0.501309, -0.977698]
[1.52801, 0.527711]
[1.70012, 1.71152]
[1.99249, 1.891]
[2.70608, -0.463428]
[2.99493, -0.443567]
[3.49185, -1.27518]
[3.50119, -0.6905]
[4.45992, -5.51613]
[4.93697, -6.0017]
[5.02329, -8.36417]
[5.04234, -7.92448]
[5.50739, -10.7748]
[5.56867, -10.9172]


### Scatter plot¶

In [20]:
# Create the arrays x and y, assigning x the first column of data_tofit and y the second column
x,y = new_array[:,1],new_array[:,2]
scatter(x,y)

Out[20]:

### Creating parabfit() one-liner function¶

In [21]:
# Create a function called parabfit, with x as the argument, returning a*x^2 + b*x + c
parabfit(x)=a*x^2 + b*x + c

Out[21]:
parabfit (generic function with 1 method)

### Ploting against Default values of a,b and c¶

In [22]:
a = 1
b = 1
c = 1

plot(parabfit,-2,2)

Out[22]:

### Ploting using different range for parabfit()¶

In [23]:
# Create variables a, b and c, assigning each the value 1
a = 1
b = 1
c = 1

# Plot the function parabfit, for x values between -5 and 5
plot(parabfit,-5,5)

Out[23]:
In [24]:
# More plot!() tries.
a,b,c = 1,1,1
scatter(x_axis,y_axis)
plot!(parabfit,-5,5)

UndefVarError: x_axis not defined

Stacktrace:
[1] include_string(::String, ::String) at ./loading.jl:515

Optimize parameters a, b and c such that it fits the data points more concisely.

1. Parbola should be downwards that detarmines cofficient a must be negative.
2. As from the data points value of cofficient c should be close to zero.
3. Cofficient b determines the values of y axis that must be possitive.
In [25]:
# More plot!() tries.
a,b,c = -1,2,3
scatter(x,y)
plot!(parabfit,-5,5)

Out[25]:
In [26]:
# More plot!() tries.
a,b,c = -1,0.1,2
scatter(x_axis,y_axis)
plot!(parabfit,-5,5)

UndefVarError: x_axis not defined

Stacktrace:
[1] include_string(::String, ::String) at ./loading.jl:515
In [27]:
# More plot!() tries.
a,b,c = -1,0.8,3
scatter(x,y)
plot!(parabfit,-5,5)

Out[27]:
In [28]:
# More plot!() tries.
a,b,c = -0.9,2.7,0.05
scatter(x,y)
plot!(parabfit,-5,5)

Out[28]:

### Optimising variable c¶

In [29]:
a,b = 1,1
plot(scatter(x,y,alpha=0.5))
c=0
plot!(parabfit,-5,5)
c = -1
plot!(parabfit,-5,5)
c = -2
plot!(parabfit,-5,5)
c = -3
plot!(parabfit,-5,5)
c = -4
plot!(parabfit,-5,5)
c = -5
plot!(parabfit,-5,5)
c = 2
plot!(parabfit,-5,5)

Out[29]:

### Optimising Variable a¶

In [31]:
c,b = 1,1
plot(scatter(x,y,alpha=0.5))
a=0
plot!(parabfit,0,5)
a = -1
plot!(parabfit,0,5)
a = -2
plot!(parabfit,0,5)
a = -3
plot!(parabfit,0,5)
a = -4
plot!(parabfit,0,5)
a = -5
plot!(parabfit,0,5)
a = 2
plot!(parabfit,0,5)

Out[31]:
In [37]:
#Locating final value for a
c,b = 3,1
plot(scatter(x,y,alpha=0.5))
a = -1
plot!(parabfit,0,5)

Out[37]:

### Optimising for b¶

In [53]:
c,a = 2,-1
plot(scatter(x,y,alpha=0.5))
b=0
plot!(parabfit,0,5)
b = 1
plot!(parabfit,0,5)
b = 2
plot!(parabfit,0,5)
b = 3
plot!(parabfit,0,5)
b = 4
plot!(parabfit,0,5)
b = 5
plot!(parabfit,0,5)
b = -1
plot!(parabfit,0,5)

Out[53]:
In [57]:
# plotting for b=4
c,a = 1,-1
plot(scatter(x,y,alpha=0.5))
b = 3
plot!(parabfit,0,8)

Out[57]:

### final Values of a,b and c¶

In [65]:
# plotting for b=4
c,a,b = 1,-1,3
plot(scatter(x,y,alpha=0.5))
plot!(parabfit,0,5)

Out[65]:

To optimize values of a,b,c we had to plot one variable many times to find out one variable’s occurrence at different levels
of scale. By changing the range of parabola function it was more easy to come up with more accurate values of a,b and c

In [ ]:

## OOPS and More OOPS in Python

Have to work on following these two things and have to look more into this whole stuff.

https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/

https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

http://www.pythonforbeginners.com/super/working-python-super-function

## Concurrency in Python or Natural way of life(Not yet completed POST)

There are various ways one can think about computing , Multiprocessing, Asynchronous, Multi-threading as well as “Parallel Processing” If I would talk about theoratical things I Would say we have to distribute our one particular task in various forms so multiple resources should be available for system to run things or in other way we can say multiprocessing is more of Programmer’s way of understanding the Flow of precess and sometimes rules according to theory does not assure that if one is providing multiple resources to process it will be FAST! it could be FAT! also.

Now let me start with very simple Example by taking following function as use case:

# Function that run multiple tasks def get_response(url): “””returns response for URL ””” response = requests.get((url),verify=False) return response.text

Now above function is simple enough that is getting one URL and returning response but if I have to pass multiple URLs but I want that get request to each URL should be fired at same time then That would be Asynchronous process not multiprocessing because in Multiprocessing Threads/Processes needs to communicate with each other but on the other hand in case of Asynchrounous threads don’t communicate(in Python because Python uses Process based multiprocessing not Thread Based although you can do thread-based multiprocessing in Python but then you are on your OWN 😀 😛 Hail GIL (Mogambo/Hitler)).

So above function will be like this as usual:

from multiprocessing import Pool pool = Pool(processes=20) resp_pool = pool.map(get_response,tasks) URL_list = [] resp_pool = _pool.map(get_response,tasks) pool.terminate() pool.join() 

One thing you have to understand very carefully and that is GIL does not harm for i/o bound operations but yes when it comes to non-i/o bound operations in python You have Numpy,Scipy,Pandas,Cython where one can really release GIL and take full advantage of the code.

How to release GIL using Cython: https://lbolla.info/blog/2013/12/23/python-threads-cython-gil
Although one can look for interesting features about GIL: http://www.dabeaz.com/python/NewGIL.pdf

Intel has also provided Python Distribution that is helpful get speedups in Python but that would only be helpful for Machine-learning and Data-Science work.

http://www.techenablement.com/orders-magnitude-performance-intel-distribution-python/(Seems like worth to give it a Try:::)

Now there is one important thing you must need to care about when you are releasing GIL in Python.

Although Numba is also there but make one thing for sure Use such tools only when your Operation is CPU bound not I/O bound because as I have stated that I/O bound operations don’t care about GIL.

Although you will find out that GIL is not just Python’s Problem:

https://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil

I/O Bound:

The I/O bound state has been identified as a problem in computing almost since its inception. The Von Neumann architecture, which is employed by many computing devices, is based on a logically separate central processor unit which requests data from main memory,[clarification needed] processes it and writes back the results. Since data must be moved between the CPU and memory along a bus which has a limited data transfer rate, there exists a condition that is known as the Von Neumann bottleneck. Put simply, this means that the data bandwidth between the CPU and memory tends to limit the overall speed of computation. In terms of the actual technology that makes up a computer, the Von Neumann Bottleneck predicts that it is easier to make the CPU perform calculations faster than it is to supply it with data at the necessary rate for this to be possible.

In simple cases CPU is Faster and Memory is Slower.
https://en.wikipedia.org/wiki/I/O_bound

Let’s make things more precise:
Sync: Blocking operations.
Async: Non blocking operations.
Concurrency: Making progress together.
Parallelism: Making progress in parallel.

Now Questions arises that do we need all those things together:
http://docs.python-guide.org/en/latest/scenarios/speed/
https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

## One mintue read to one minute Manager

### Get out more results in less time.

Autocratic VS Democratic:
Autocratic are result oriented and Democratic are happiness Oriented So we
need to be one minute Managers. 🙂

1. One minute Goal Setting:

Everyone should be knowing the goals of the company.
People must know what their roles are in the company.
Goals must not be more than 250 words.

2. One minute Praising:

Give True Feedback.
Always praise immediately.
Share happiness and encourage your people.

3. One minute reprimand:

Immediately point people out for their mistakes.
Tell people how you feel about it.
Point out mistake but don’t criticize.
Be on the side of your people.

conclusion:
Look for the good things in the beginners and bad things in the experienced.
Share what you learn.
We don’t manage people, We manage behaviors.
Love your people and make sure they are also loving you back.
Define your problem grammatically. (What is happening and What you want to be happen.)

## Lessons Learned from life

Complete Basics those just went out of my mind, No idea how those gone away. 😦

Work-Life Balance.

You can’t be successful in one day.

All the time people around you tell how to do it, Either you ignore  it or take to next level.

Better late than never.

Never Leave your Day job(even if it is cutting grass ).

Don’t try to be OVER-SMART.

Learn to respect your personal space as well as others.

Learn to turn off your mind from consistence thinking of things.

Love your work—-Work is never Ending process, Don’t take so much pressure to complete it or start next one.

Have a group of friends outside work.

Nobody is slowing you down Except you.

Learn to say sorry, please, thanks, welcome.

Help others but respect your time and Energy.

Break the pattern of your life.

Be hungry, be foolish – Stop believing that.

Sikhism has different way of living life.(Either believe in that or Live with sorrows.)

If you want to earn more, Be-crazy, Get-exploited and create a big hole inside you, that is your choice as well. 🙂

## Law of Wealthy LIfe

Wealthy life does not just mean to having lots of money in the bank but it is much more like creating various things in your society or running various engines those work in the manner that you are really able to make things happen in your life instantly, One thing you must remember or know carefully and that is If you really want to do it fast,  do it well. 🙂

Speed of implementation

Respect your time(Don’t waste on social media and stuff)

Go to bed early and get up early. Although I am writing this post so Late.:P 😦 😉

## Important Julia Packages

1. Julia Ipython

Julia is able to run very well on you Ipython notebook Environment. After all, All you have to do is Data-Science and Machine-Learning. 🙂

1.1 Open Julia Prompt(At Ubuntu it works like typing ‘julia’ command in your Terminal)

1.2 run command > Pkg.add(“IJulia”) # it will do almost all the work.

2. DataFrames: Whenever you have to read lot of files in Excel-Style Julia DataFrames Package is good to go.

Pkg.add("DataFrames")

3. Arduino:

A Julia Package for interacting with Arduino.

https://github.com/rennis250/Arduino.jl

4. Neural Network Implementation of Julia

https://github.com/compressed/BackpropNeuralNet.jl

5. Visualizing and Plotting in Julia:

https://github.com/bokeh/Bokeh.jl

6. Reading and writing CSV files in Julia

7. DataClusting in Julia:

https://github.com/JuliaStats/Clustering.jl

http://pkg.julialang.org/

Note*: You can also run most of the Shell commands in Julia environment as well. 🙂

## things and things

Things those need to be understood in many ways.

1. Various important parts of Statistics and implementation
2. Hypothesis Testing
3. Probability Distributions and Importance
4. AIC and BIC
5. Baysian models
6. Some black Magics of OOPS

## Hacker’s Guide to Quantitative Trading(Quantopian Python) Part 2

Here I will only talk about code and how it should be written to create your own Trading Strategy.

There are basically Two methods.

initialize() and handle_data()

initialize act as initializer for various variables. same as __init__ method in Python.

Now what kind of variables we have to declare in initialize() function is dependent on your strategy. we can select limited number of stocks,days,type of trading,variables required for Algorithms.

A very simple example of initialize() code could look like as follows:

def initialize(context): # consider context just as 'self' in Python

context.stocks = [sid(24),sid(46632)] # sid stands for stock_id


initialize() also contains the stuff that can be used many times or all the times in our Trading Algorithm:

1. A counter that keeps track of how many minutes in the current day we’ve got.

2. A counter that keeps track of our current date.

3. A list that stores the securities that we want to use in our algorithm.

Whatever variables that you define here will remain persistent (meaning that they’ll exist) but will be mutable. So that means that if you initialize context.count as 0 in initialize, you can always change it later in handle_data().

A Simple Example of handle_data():

def handle_data(context,data):

for stock in context.stocks:

if stock in data:

order(stock,1)


In this strategy we consider Moving average price of stock as an important factor to make decision to put a security price in Long or Short.

Here is simple explanation of momentum Strategy:

● If the current price is greater than the moving average, long the security

● If the current price is less than the moving average, short the security

Now we will use Quantopian API to implement this strategy for Trading. instead, our algorithm here is going to be a little more sophisticated. We’re going to look at two moving averages: the 50 day moving average and the 200 day moving average.

David Edwards writes that “the idea is that stocks with similar 50 & 200 day moving averages are more likely to be fairly valued and the algorithm will avoid some of the wild swings that plague momentum strategies. The 50/200 day crossover is also a very common signal, so stocks might be more likely to continue in the direction of the 50day MA because a lot of investors enter and exit positions at that threshold.”

The decision-making behind Moving-average is as follows:

● If the 50 day moving averages is greater than the 200 day moving average, long the security/stock.

● If the 50 day moving average is less than the 200 day moving average, short the security/stock

So now Let’s make a Trading Bot!

1. First we have to create our initialize() function:

def initialize(context):

set_universe(universe.DollarVolumeUniverse(floor_percentile=99.5,ceiling_percentile=100))


”’Set universe is inbuilt function by Quantopian which provide us the stocks with-in required universe. Here we have selected stocks those we have DollarVolumeUniverse with 99.5% and 100% as our floor and ceiling. This means that we’ll be selecting the top 99.5 ~ 100% stocks of our universe with the highest dollar*volume scores.

   context.stocks_to_long = 5

context.stocks_to_short = 5
context.rebalance_date = None # we will get today's date then we will keep positions active for 10 days here

context.rebalance_days = 10 # it is just an assumption now for 10 days or finer value



Now we have defined required __init__ parameters in initiliaze() let’s move to

handle_data()

def handle_data():

if context.rebalance_date!=None: # if rebalnce date is not null then set next_date for changing the position of algorithm

next_date = context.rebalance_date + timedelta(days=context.rebalnce_days) # next_date should be that days away from rebalnce_date

if context.rebalance_date==None or next_date==get_datetime(): # if today is that day after 10 days when we market long/short for out stock

context.rebalnce_date = get_datetime() # set rebalnce_date for today so next_date will be set to again 10 days ahead from rebalnce_date

historical_data = history(200,'1d',price)


Get historical data of all stocks initilized in initiliaze() function, ‘1d’= 1 day,200=days,’price’=we are only fetching price details because that is only required for our strategy, may be for some strategy volume of stock could be more beneficial

  past_50days_mean = historical_data.tail(50).mean()

past_200days_mean = historical_data.mean()

diff = past_50days_mean/past_200days_mean-1

# if diff>0 we will long if diff<1 we will short

sells = diff[diff<0]

# here we will get list of securities/stocks whose moving average will be

# greater as well as less than 0

buys.sort() # sorting buys list why? - getting top securities from top- more is better
sells.sort(ascending=False) # reverse sorting sells list - getting top seurities from bottom, less is better because we are selling agiainst market
sells = sells.iloc[:short_length] if short_weight !=0 else None # short_length = number of securities we want to short


Now here we have buys and sells are two lists!! (remember carefully) all the decisions are going to be made based on these two lists

We can also implement risk factors in out Trading Strategy. Let’s implement minimum form of Risk-Factor, 0.02% of last_traded_price that means if security is going to much lower than that then we will exit.

We will go through each security in our data/universe and those who will satisfy condition of ‘buys’ and ‘sells’ list will be bought/sold.

# if security exists in our sells data

for sym in data:

if sells is not None and sym in sells.index:

log.info('SHORT:%s'%sym.symbol)

order_target_price(sym,short_weight.stop_price=data[sym].price_stops[sym])

# here stop_price is the price of security in real-time+change happend in stops

# order_target_price is inbuilt function.

# if security exists in our buy data

log.info('Long:%s'%sym.symbol)

else:

order_target(sym,0)



The order_target_percent method allows you to order a % target of your portfolio in that security. So this means that if 0% of your total portfolio belongs in AAPL and you order 30%, it will order 30%. But if you had 25% already and you tried ordering 30%, it will order 5%.

You can order using three different special order methods if you don’t want a normal market order:

#stop_price: Creates a stop order

#limit_price: Creates a limit order

#StopLimitOrder: Creates a stop­limit order



Most of times when you find that you are able to get good returns from your capital you try to beat the market, Beating the market means most of the traders tried to earn much more than fine earnings are being returned by the market for your stock, Such beating the market process can be done by various actions like reversing the momentum or looking for bad happenings in the market(which is also called finding the shit!)Some people are really good at this kung-fu but as you are just budding trader and you have only limited money of yours, So here one important thing should be remembered, “”Protect your capital””. – That’s what most of the Big banks do and if they will hire you as their Quant or Trading-Execution person they will expect same from you. Big banks have billions of dollars that they don’t want to loose but definitely want to used that money to get good returns from market.

So they follow one simple rule for most of the times.

Guaranteed returns even if those are low.

[Make sure returns should be positive after subtracting various costs like brokerage,leverage etc, Because getting positive returns by neglecting market costs is far easy but such strategies should not be used with real money.]

So the real key is think like a computer programmer at first place, something like it should work at first place, so first thing to make sure is getting returns even low but stable returns by calculating various risk-factors.

I am quoting some of informative things from SentDex Tutorial:

Most individual traders are trading on account sizes of somewhere between maybe $25,000 and$100,000 USD, so their motives are to hopefully increase that account size as much as possible, so this person is more likely to take part in High Risk High Yield (HRHY).

Most people who use HRHY strategies, tend to ignore the HR (High Risk) part, focusing on the HY (High Yield).

The same is common with gamblers,even over astronomical odds with things like the lottery.