my net house

WAHEGURU….!

Engineer’s Guide to Julia Programming

Engineer’s Guide to Julia Programming

Finally the moment has come when I can say that I can be productive as well as my solution can be Parallel,Optimize-able,Customizable and at last but not least glue-able. Yes those are the fantastic features I believe one can rely on while Learning any New Programming language and Developing a Very High Quality AI/ML Embedded Software Solution.

Why?

Julia Solves Two Language Problem.

Important Disclaimer for Newbies: I am Pythonista by choice and over the last few years I have Developed Projects using Python and it’s sister technologies to provide the solutions those are related to

Automation(Python -Scripting)

Web-Development(Django,Flask,Sanic,Tornado)

Data Analysis(SageMath,Sympy,Paraview,Spread-Sheets,Matplotlib,Numpy,Scipy,SKLearn)

Quantitative-Analysis(Quantopian.com)

3D Modeling(FreeCad, BIM,IFC), and Cluster Computing(Rock’s Cluster).

Now I just wanted a tool that would allow me to write Pure-mathematical expressions(using required signs not variable names) and write Machine-Learning/Artificial-Intelligence/Deep-Learning code where I would find myself on core layer of abstraction not like Tensor-Flow, Pytorch, or Numpy/Pandas. I am not against these libraries those helped me “soooo” much over the years but I have no idea that what kind of things are happening under the hood and may be I will never be allowed to change the working internals of Numpy/Pandas/Cython or anything that is related to Scientific Python only because there could be large amount of Fortran/C++ or Pascal kind things and crunching numbers as well.

Stuff that an Engineer need to perform for various kinds of jobs in Julia-Programming Language can be described as follows:

Solving a Simple Mathematical Equation in Julia:

A = randn(4,4)
x = rand(4)
b = A*x
x̂ = A\b # here we have written x-hat Symbol
println(A)
println(x)
println(x̂)
@show norm(A*x̂ - b)

Doing Matirx Operations in Julia

A = randn(4,4) |> w -> w + w' # pipe A through map A -> A + A' to symmetrize it
println(A)
λ = eigmax(A); # have you checked the lambda?
@show det(A - λ*I)

Performing Integration:

Performing integration might be one of the most important task one would be doing in Day to day if someone is involved with problems related to modeling and designing a solution using CAS(Compute Algebraic System) like Matlab or Sage-Math but designing a solution using CAS and then finding various ways to implement it into production is kind of “LOt of WoRk” I assume that only come with Either Experience or Lots of Extra Brain cells. 😉 See here Julia Plays an important role: “Solving two Language Problem”.

# Integrating Lorenz equations
using ODE
using PyPlot
# define Lorenz equations
function f(t, x)
σ = 10
β = 8/3
ρ = 28
[σ*(x[2]-x[1]); x[1]*(ρ-x[3]); x[1]*x[2] - β*x[3]]
end

# run f once
f(0, [0; 0; 0])

# integrate
t = 0:0.01:20.0
x₀ = [0.1; 0.0; 0.0]
t,x = ode45(f, x₀, t)

x = hcat(x…)’ # rearrange storage of x

# Side-Note::: What … is doing in Julia? (Remember *args and **kwargs in Python?)
# for more see:
goo.gl/mTmeR7

# plot
plot3D(x[:,1], x[:,2], x[:,3], “b-”)
xlabel(“x”)
ylabel(“y”)
zlabel(“z”)
xlim(-25,25)
ylim(-25,25)
zlim(0,60);

Really interesting Dynamic Type System()::

This is one of the most interesting part for me to have so much fun with Julia and it’s GREAT! Type System, You know why? Because It knows how long that bone is and how much Calcium will be there:

### Built-in numeric types

Julia’s built-in numeric types include a wide range of
1. integers: Int16, Int32, Int64 (and unsigned ints), and arbitrary-precision BigInts
2. floating-points: Float16, Float32, Float64, and arbitrary-precision BigFloats
3. rationals using the integer types
4. complex numbers formed from above
5. vectors, matrices, linear algebra on above

Ok let’s Have The Fun!

I encourage you to run following code into Jupyter Notebook that is running With Julia-Kernel.

π

typeof(π)# it will return irrational. Beacuse pi is irrational Number? 😉

Let’ Hack Julia’s Type System on Much deeper level!(Because it is much more than classes)

What else we need to know about it?

Define new Parametric Type in Julia:

type vector_3d{T<:Integer}
x::T
,y::T
end

# can we call x any as Data-Members as like as C++ Data-Members?

type_call = vector_3d{25,25} # this is how we call it.

Let’s Just make Types more interesting: (and immutable)

immutable GF{P,T<:Integer} <: Number
data::T
function GF(x::Integer)
return new(mod(x, P))
end
end

Deep Learning and Machine Learning in Julia:

In the real eye Julia is developed to write “Mathematical Functions” by just using Native Language Syntax. It is more like if you want to do linear regression rather than installing a New_library and calling inbuilt Linear function of that library those could be written in C, C++ or Fortran may be or More or less Optimized Cython-Python Magic. But Julia responsibly provides static inbuilt and Really fast code methods to write your Own linear regression as easy as Python and as Fast as C++/Fortran.

Available Machine-Larning Packages in Julia:

Scikit-Learn in Julia:

ScikitLearn.jl implements the popular scikit-learn interface and algorithms in Julia. It supports both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl).

https://github.com/cstjean/ScikitLearn.jl

Text Analysis in Julia:

The basic unit of text analysis is a document. The TextAnalysis package allows one to work with documents stored in a variety of formats:

  • FileDocument: A document represented using a plain text file on disk
  • StringDocument: A document represented using a UTF8 String stored in RAM
  • TokenDocument: A document represented as a sequence of UTF8 tokens
  • NGramDocument: A document represented as a bag of n-grams, which are UTF8 n-grams that map to counts

https://github.com/JuliaText/TextAnalysis.jl

Machine-Learning Package with name Machine_learning:

The MachineLearning package represents the very beginnings of an attempt to consolidate common machine learning algorithms written in pure Julia and presenting a consistent API. Initially, the package will be targeted towards the machine learning practitioner, working with a dataset that fits in memory on a single machine. Longer term, I hope this will both target much larger datasets and be valuable for state of the art machine learning research as well.

https://github.com/benhamner/MachineLearning.jl

Deep Learning in Julia:

Mocha is a Deep Learning framework for Julia, inspired by the C++ framework Caffe. Efficient implementations of general stochastic gradient solvers and common layers in Mocha can be used to train deep / shallow (convolutional) neural networks, with (optional) unsupervised pre-training via (stacked) auto-encoders. Some highlights:

https://github.com/pluskid/Mocha.jl

Deep Learning with Automatic Differentiation:(What is automatic Differentiation?)

Knet (pronounced “kay-net”) is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU operation and automatic differentiation using dynamic computational graphs for models defined in plain Julia. This document is a tutorial introduction to Knet. Check out the full documentation and Examples for more information. If you need help or would like to request a feature, please consider joining the knet-users mailing list. If you find a bug, please open a GitHub issue. If you would like to contribute to Knet development, check out the knet-dev mailing list and Tips for developers.

https://github.com/denizyuret/Knet.jl

More resources on Julia Programing:

http://online.kitp.ucsb.edu/online/transturb17/gibson/

https://julialang.org/blog/

Feel free to clap and Have fun with Julia. Stay connected.

Advertisements

Programmer’s or Entrepreneur’s life Guide(How to live!)

I just completed Audio version ofSoft Skills: The software developer’s life manualand same after completing of new-skills/books/courses I go with a blog-post, Because I always feel that you can never observer whole book by reading it once and you always need to re-learn things over time because sometimes to learn really new things, you also have to forget  some of  the stuff to learn some of new things as well, And I guess that is one of the most important thing I learned over last few years after my College. So writing a Blog-Post is one of thing I do to Enhance/Validate/Preserve new learnings.

When I first saw John Z. Sonmez on Youtube My first thought was like ” He is so fit, Is he really Software-Engineer” 🙂 How wrong I was.

You can get connected with John here.

 

Ok here comes the Validation of Knowledge about the book mentioned above. You can also use this blog-post to get Excited about book and add it to your Reading List. I am sure you will feel that here is something so common but fresh-about this book, As I am feeling right now that may be I will always read/listen this book at-least once a year.

Be a Specialist: Special Skill is like Brahmanda Astra.

it really does not matter how much you are curios about new technologies but there must be one skill that you will be so good that you can represent yourself at top 1% in your Area/Geo-location/Company/Industry and make sure while doing good amount of work using that skill your produce more Quality than quantity. Alongside that special-skill you must be knowing about sister technologies of that particular skill. There is one more another important thing to remember and that is “Don’t get religious about any particular Technology. “

Learn to Sell yourself High Enough.

Selling yourself only means “Be so great so that nobody can ignore you” but that also does not mean you must write code and build great-softwares by sitting 18-20 hours for day in your basement. You must have off-line as well as on-line good circle of Professional and Merchants in your life. You must have good amount of blog post about the skills you you have learned over time alongside the skills you are learning now due to that you would be so connected with the outside world and most of the people would be knowing that what you really are doing in life.

 Salary Negotiation: It’s always good to negotiate because your Employer know that “It’s good to bargain”

When it comes to salary negotiation it is true that you must be able to know how much you are worth as well as how much more money you need except from your living Expenses. No doubt if you have reached at the level of Salary-Negotiation that means you have already know how much Employer could offer but it never hurts to take that “Moment into bit more negotiation”, Now may be A question could be arise in your mind and that is How really I have to do that? Believe me I am not the guy who can explain it so good as John has descried in the book. So read the book. If you are fan of Audio-Books then Let me know, I will send you Audio version of that book But you will only able to get it free if this is your first book on Audible.com

Investment and Saving: Let your money work for you and you just be sit around to see it is growing. 🙂

So It is really easy to do actually, You just need to understand How much you are able to save and how much you really can do with that saving, Confusing? Money sitting in Bank has no value until or unless you are able to make more of that than Banks do make, and how is that thing possible? There are Wide range of options one can go for like Options-Trading, Futures-Trading, Real State Investment, Making/Investing in one of your own Software Product and many other things. 🙂

For more information about Stocks and Futures please follow the following Articles:

View story at Medium.com

Body and Mental Fitness: The way you look, The way your brain does lot of work for you. 😉

 

There must be really good work-out routine we have to follow/do unless or until you really are not interested growing your brain over the time, Yes that is true only because you have Pump your blood almost everyday and you also have to burn fat, So start running and going to Gym because it is the only way you can not just make your body healthy but also sharp your mind over time.

Diet and Nutrition: You are what you consume.

There is nothing much to say about Diet and eating habits, But make sure you are eating less calories whole day because you have to sit for the most of the day. 🙂

Power of Persistence: Hang in there!

Things work as much as you are staying there for things to make work. Sometimes it takes some time for the results to show up but at some level you have to wait more and let the things fall into the place.

Special One: How money Works.

I never expected John will also talk about great things about Money and Capitalism as well, To understand this you  must have to learn so much stuff and I might not be the person to tell you on this short post that how money works and how you should take it into account. Simple thing about Money is Banks are creating lots of vitalized clusters to generate/initiate the flow of money among different countries and different inside different industries.

Learning Dataframes in Julia

Week4_DataF

Week 4 – Working with Distributions and DataFrames.

In [1]:
# Import the required packages
using Distributions, DataFrames
In [2]:
# Seed the random number generator
srand(1234);
In [3]:
# Question 4: Create the 3 x 30 array named array_1
# 30 rows and 3 columns array
array_1 = [rand(30) rand(30) rand(30)]
size(array_1)
array_1
Out[3]:
30×3 Array{Float64,2}:
 0.590845   0.931115   0.643704 
 0.766797   0.438939   0.401421 
 0.566237   0.246862   0.525057 
 0.460085   0.0118196  0.61201  
 0.794026   0.0460428  0.432577 
 0.854147   0.496169   0.082207 
 0.200586   0.732      0.199058 
 0.298614   0.299058   0.576082 
 0.246837   0.449182   0.218177 
 0.579672   0.875096   0.362036 
 0.648882   0.0462887  0.204728 
 0.0109059  0.698356   0.932984 
 0.066423   0.365109   0.827263 
 ⋮                              
 0.0566425  0.404953   0.0396356
 0.842714   0.499531   0.79041  
 0.950498   0.658815   0.431188 
 0.96467    0.515627   0.137658 
 0.945775   0.260715   0.60808  
 0.789904   0.59552    0.255054 
 0.82116    0.292462   0.498734 
 0.0341601  0.28858    0.0940369
 0.0945445  0.61816    0.52509  
 0.314926   0.66426    0.265511 
 0.12781    0.753508   0.110096 
 0.374187   0.0368842  0.834362
In [4]:
# Question 5: Mean and variance of column 1
mean_column_1 = mean(array_1[:,1])
var_column_1=var(array_1[:,1])
println("mean=",mean_column_1)
println("var=",var_column_1)
mean=0.5014887976938368
var=0.10653465363277906
In [5]:
# Question 5 (continued): Mean and variance of column 2
mean_column_2 = mean(array_1[:,2])
var_column_2=var(array_1[:,2])
println("mean=",mean_column_2)
println("var=",var_column_2)
mean=0.4160447968360426
var=0.06360439983290869
In [6]:
# Question 5 (continued): Mean and variance of column 3
mean_column_3 = mean(array_1[:,3])
var_column_3=var(array_1[:,3])
println("mean=",mean_column_3)
println("var=",var_column_3)
mean=0.4372634519427959
var=0.07568707224628725
In [7]:
# Question 6: Import array_1 into a DataFrame named df
df = DataFrame(array_1)
Out[7]:
x1 x2 x3
1 0.5908446386657102 0.9311151512445586 0.6437042811826996
2 0.7667970365022592 0.43893895933102156 0.40142056533714965
3 0.5662374165061859 0.24686248047491066 0.5250572942486489
4 0.4600853424625171 0.011819583479107054 0.6120098074984683
5 0.7940257103317943 0.046042826396498704 0.43257652982765626
6 0.8541465903790502 0.496168672722459 0.0822070287962946
7 0.20058603493384108 0.7320003814997245 0.19905799020907944
8 0.2986142783434118 0.29905752670238184 0.5760819730593403
9 0.24683718661000897 0.4491821088563024 0.21817706596841413
10 0.5796722333690416 0.8750962647851142 0.3620355262053865
11 0.6488819502093455 0.046288741031345504 0.20472832290217324
12 0.010905889635595356 0.6983555060532487 0.93298350850828
13 0.06642303695533736 0.3651093677271471 0.8272627957034728
14 0.9567533636029237 0.3024777928234499 0.09929915955881308
15 0.646690981531646 0.3725754415996787 0.6342997886044144
16 0.11248587118714015 0.15050782744925795 0.1327153585755645
17 0.2760209506672211 0.14732938279328955 0.7751941503856596
18 0.6516642063795697 0.2834013103457036 0.8692366891234362
19 0.05664246860321187 0.40495283364883794 0.039635617270926904
20 0.8427136165865521 0.49953074411487797 0.7904095314876494
21 0.9504984071553011 0.6588147837334961 0.43118828904466633
22 0.9646697763820897 0.5156272179795256 0.1376583132625555
23 0.9457754052519123 0.26071522632820776 0.6080803126880718
24 0.7899036826169576 0.5955204840509289 0.2550540600167448
25 0.8211604203482923 0.2924615242315285 0.4987340031883092
26 0.03416010848943718 0.2885798506061561 0.09403688346569439
27 0.09454448946400307 0.6181597973815087 0.5250899072103514
28 0.31492622391998415 0.6642598175011505 0.2655109248498748
29 0.12780989889368866 0.7535081177709988 0.11009621399607639
30 0.374186714831074 0.03688418241886171 0.8343616661080064
In [8]:
# check available names and fieldnames in Julia, Python's alternative
f_name =fieldnames(df)
name=names(df)
println(f_name,name)
Symbol[:columns, :colindex]Symbol[:x1, :x2, :x3]
In [9]:
# Accessing different columns of df
df[:x3]
Out[9]:
30-element Array{Float64,1}:
 0.643704 
 0.401421 
 0.525057 
 0.61201  
 0.432577 
 0.082207 
 0.199058 
 0.576082 
 0.218177 
 0.362036 
 0.204728 
 0.932984 
 0.827263 
 ⋮        
 0.0396356
 0.79041  
 0.431188 
 0.137658 
 0.60808  
 0.255054 
 0.498734 
 0.0940369
 0.52509  
 0.265511 
 0.110096 
 0.834362
In [10]:
# Question 7: Change the names of the columns to Var1, Var2, and Var3
rename!(df,Dict(:x1=>:Var1,:x2=>:Var2,:x3=>:Var))
Out[10]:
Var1 Var2 Var
1 0.5908446386657102 0.9311151512445586 0.6437042811826996
2 0.7667970365022592 0.43893895933102156 0.40142056533714965
3 0.5662374165061859 0.24686248047491066 0.5250572942486489
4 0.4600853424625171 0.011819583479107054 0.6120098074984683
5 0.7940257103317943 0.046042826396498704 0.43257652982765626
6 0.8541465903790502 0.496168672722459 0.0822070287962946
7 0.20058603493384108 0.7320003814997245 0.19905799020907944
8 0.2986142783434118 0.29905752670238184 0.5760819730593403
9 0.24683718661000897 0.4491821088563024 0.21817706596841413
10 0.5796722333690416 0.8750962647851142 0.3620355262053865
11 0.6488819502093455 0.046288741031345504 0.20472832290217324
12 0.010905889635595356 0.6983555060532487 0.93298350850828
13 0.06642303695533736 0.3651093677271471 0.8272627957034728
14 0.9567533636029237 0.3024777928234499 0.09929915955881308
15 0.646690981531646 0.3725754415996787 0.6342997886044144
16 0.11248587118714015 0.15050782744925795 0.1327153585755645
17 0.2760209506672211 0.14732938279328955 0.7751941503856596
18 0.6516642063795697 0.2834013103457036 0.8692366891234362
19 0.05664246860321187 0.40495283364883794 0.039635617270926904
20 0.8427136165865521 0.49953074411487797 0.7904095314876494
21 0.9504984071553011 0.6588147837334961 0.43118828904466633
22 0.9646697763820897 0.5156272179795256 0.1376583132625555
23 0.9457754052519123 0.26071522632820776 0.6080803126880718
24 0.7899036826169576 0.5955204840509289 0.2550540600167448
25 0.8211604203482923 0.2924615242315285 0.4987340031883092
26 0.03416010848943718 0.2885798506061561 0.09403688346569439
27 0.09454448946400307 0.6181597973815087 0.5250899072103514
28 0.31492622391998415 0.6642598175011505 0.2655109248498748
29 0.12780989889368866 0.7535081177709988 0.11009621399607639
30 0.374186714831074 0.03688418241886171 0.8343616661080064
In [11]:
### we can also tail function see last required entries
tail(df,20)
Out[11]:
Var1 Var2 Var
1 0.6488819502093455 0.046288741031345504 0.20472832290217324
2 0.010905889635595356 0.6983555060532487 0.93298350850828
3 0.06642303695533736 0.3651093677271471 0.8272627957034728
4 0.9567533636029237 0.3024777928234499 0.09929915955881308
5 0.646690981531646 0.3725754415996787 0.6342997886044144
6 0.11248587118714015 0.15050782744925795 0.1327153585755645
7 0.2760209506672211 0.14732938279328955 0.7751941503856596
8 0.6516642063795697 0.2834013103457036 0.8692366891234362
9 0.05664246860321187 0.40495283364883794 0.039635617270926904
10 0.8427136165865521 0.49953074411487797 0.7904095314876494
11 0.9504984071553011 0.6588147837334961 0.43118828904466633
12 0.9646697763820897 0.5156272179795256 0.1376583132625555
13 0.9457754052519123 0.26071522632820776 0.6080803126880718
14 0.7899036826169576 0.5955204840509289 0.2550540600167448
15 0.8211604203482923 0.2924615242315285 0.4987340031883092
16 0.03416010848943718 0.2885798506061561 0.09403688346569439
17 0.09454448946400307 0.6181597973815087 0.5250899072103514
18 0.31492622391998415 0.6642598175011505 0.2655109248498748
19 0.12780989889368866 0.7535081177709988 0.11009621399607639
20 0.374186714831074 0.03688418241886171 0.8343616661080064
In [12]:
# Creatring Second DataFrame
df2=DataFrame(tail(df,20))
Out[12]:
Var1 Var2 Var
1 0.6488819502093455 0.046288741031345504 0.20472832290217324
2 0.010905889635595356 0.6983555060532487 0.93298350850828
3 0.06642303695533736 0.3651093677271471 0.8272627957034728
4 0.9567533636029237 0.3024777928234499 0.09929915955881308
5 0.646690981531646 0.3725754415996787 0.6342997886044144
6 0.11248587118714015 0.15050782744925795 0.1327153585755645
7 0.2760209506672211 0.14732938279328955 0.7751941503856596
8 0.6516642063795697 0.2834013103457036 0.8692366891234362
9 0.05664246860321187 0.40495283364883794 0.039635617270926904
10 0.8427136165865521 0.49953074411487797 0.7904095314876494
11 0.9504984071553011 0.6588147837334961 0.43118828904466633
12 0.9646697763820897 0.5156272179795256 0.1376583132625555
13 0.9457754052519123 0.26071522632820776 0.6080803126880718
14 0.7899036826169576 0.5955204840509289 0.2550540600167448
15 0.8211604203482923 0.2924615242315285 0.4987340031883092
16 0.03416010848943718 0.2885798506061561 0.09403688346569439
17 0.09454448946400307 0.6181597973815087 0.5250899072103514
18 0.31492622391998415 0.6642598175011505 0.2655109248498748
19 0.12780989889368866 0.7535081177709988 0.11009621399607639
20 0.374186714831074 0.03688418241886171 0.8343616661080064
In [13]:
# Question 9: Calculate simple descriptive statistics of all the columns in df2 using the describe() function
describe(df2)
Var1
Summary Stats:
Mean:           0.484341
Minimum:        0.010906
1st Quartile:   0.108001
Median:         0.510439
3rd Quartile:   0.826549
Maximum:        0.964670
Length:         20
Type:           Float64

Var2
Summary Stats:
Mean:           0.397753
Minimum:        0.036884
1st Quartile:   0.277730
Median:         0.368842
3rd Quartile:   0.601180
Maximum:        0.753508
Length:         20
Type:           Float64

Var
Summary Stats:
Mean:           0.453279
Minimum:        0.039636
1st Quartile:   0.136423
Median:         0.464961
3rd Quartile:   0.778998
Maximum:        0.932984
Length:         20
Type:           Float64

In [14]:
# Question 10: Add a column to df2 named Cat1 to df2 consisting of randomly selecting either the strings GroupA or GroupB
df2 = hcat(df2, rand(["GroupA","GroupB"],20))
rename!(df2,Dict(:x1=>:Cat1))
Out[14]:
Var1 Var2 Var Cat1
1 0.6488819502093455 0.046288741031345504 0.20472832290217324 GroupB
2 0.010905889635595356 0.6983555060532487 0.93298350850828 GroupB
3 0.06642303695533736 0.3651093677271471 0.8272627957034728 GroupA
4 0.9567533636029237 0.3024777928234499 0.09929915955881308 GroupA
5 0.646690981531646 0.3725754415996787 0.6342997886044144 GroupA
6 0.11248587118714015 0.15050782744925795 0.1327153585755645 GroupA
7 0.2760209506672211 0.14732938279328955 0.7751941503856596 GroupB
8 0.6516642063795697 0.2834013103457036 0.8692366891234362 GroupB
9 0.05664246860321187 0.40495283364883794 0.039635617270926904 GroupB
10 0.8427136165865521 0.49953074411487797 0.7904095314876494 GroupB
11 0.9504984071553011 0.6588147837334961 0.43118828904466633 GroupA
12 0.9646697763820897 0.5156272179795256 0.1376583132625555 GroupB
13 0.9457754052519123 0.26071522632820776 0.6080803126880718 GroupA
14 0.7899036826169576 0.5955204840509289 0.2550540600167448 GroupB
15 0.8211604203482923 0.2924615242315285 0.4987340031883092 GroupA
16 0.03416010848943718 0.2885798506061561 0.09403688346569439 GroupB
17 0.09454448946400307 0.6181597973815087 0.5250899072103514 GroupB
18 0.31492622391998415 0.6642598175011505 0.2655109248498748 GroupA
19 0.12780989889368866 0.7535081177709988 0.11009621399607639 GroupA
20 0.374186714831074 0.03688418241886171 0.8343616661080064 GroupA
In [15]:
# Question 11: Create a new DataFrame named df3
df3 = DataFrame(A=1:20,B=21:40,C=41:60)
Out[15]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 10 30 50
11 11 31 51
12 12 32 52
13 13 33 53
14 14 34 54
15 15 35 55
16 16 36 56
17 17 37 57
18 18 38 58
19 19 39 59
20 20 40 60
In [16]:
# Question 12: Change indicated values to empty entries
#In a code cells below, change the values in df3 of the following cells to NA: row 10, column 1, row 15, column 2 and row #19, column 3
df3[10,1] = NA
df3[15,2] = NA 
df3[19,3] = NA
df3
Out[16]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 NA 30 50
11 11 31 51
12 12 32 52
13 13 33 53
14 14 34 54
15 15 NA 55
16 16 36 56
17 17 37 57
18 18 38 58
19 19 39 NA
20 20 40 60
In [17]:
# Question 13: Create DataFrame df4 that contains no rows with NaN (NA) values
df4 = completecases!(df3)
Out[17]:
A B C
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 11 31 51
11 12 32 52
12 13 33 53
13 14 34 54
14 16 36 56
15 17 37 57
16 18 38 58
17 20 40 60

 

 

Some Plugs-Plays with Julia Programing

Week3_PR_Template




Title: Week 3 – Fitting a Curve

In [17]:
# Initilization of Plots Package
using Plots
pyplot()
Out[17]:
Plots.PyPlotBackend()

Reading data from given Sample file

In [18]:
data_tofit = readdlm("Week3_PR_Data.dat", '\t', header=true)
typeof(data_tofit)
Out[18]:
Tuple{Array{Float64,2},Array{AbstractString,2}}

Using For loop to print data in array

In [19]:
new_array=data_tofit[1]
for i in 1:size(new_array)[1]
    println(new_array[i,:])
end
[0.501309, -0.977698]
[1.52801, 0.527711]
[1.70012, 1.71152]
[1.99249, 1.891]
[2.70608, -0.463428]
[2.99493, -0.443567]
[3.49185, -1.27518]
[3.50119, -0.6905]
[4.45992, -5.51613]
[4.93697, -6.0017]
[5.02329, -8.36417]
[5.04234, -7.92448]
[5.50739, -10.7748]
[5.56867, -10.9172]

Scatter plot

In [20]:
# Create the arrays x and y, assigning x the first column of data_tofit and y the second column
x,y = new_array[:,1],new_array[:,2]
scatter(x,y)
Out[20]:

Creating parabfit() one-liner function

In [21]:
# Create a function called parabfit, with x as the argument, returning a*x^2 + b*x + c
parabfit(x)=a*x^2 + b*x + c
Out[21]:
parabfit (generic function with 1 method)

Ploting against Default values of a,b and c

In [22]:
a = 1
b = 1
c = 1

plot(parabfit,-2,2)
Out[22]:

Ploting using different range for parabfit()

In [23]:
# Create variables a, b and c, assigning each the value 1
a = 1
b = 1
c = 1

# Plot the function parabfit, for x values between -5 and 5 
plot(parabfit,-5,5)
Out[23]:
In [24]:
# More plot!() tries.
a,b,c = 1,1,1
scatter(x_axis,y_axis)
plot!(parabfit,-5,5)
UndefVarError: x_axis not defined

Stacktrace:
 [1] include_string(::String, ::String) at ./loading.jl:515

Optimize parameters a, b and c such that it fits the data points more concisely.

  1. Parbola should be downwards that detarmines cofficient a must be negative.
  2. As from the data points value of cofficient c should be close to zero.
  3. Cofficient b determines the values of y axis that must be possitive.
In [25]:
# More plot!() tries.
a,b,c = -1,2,3
scatter(x,y)
plot!(parabfit,-5,5)
Out[25]:
In [26]:
# More plot!() tries.
a,b,c = -1,0.1,2
scatter(x_axis,y_axis)
plot!(parabfit,-5,5)
UndefVarError: x_axis not defined

Stacktrace:
 [1] include_string(::String, ::String) at ./loading.jl:515
In [27]:
# More plot!() tries.
a,b,c = -1,0.8,3
scatter(x,y)
plot!(parabfit,-5,5)
Out[27]:
In [28]:
# More plot!() tries.
a,b,c = -0.9,2.7,0.05
scatter(x,y)
plot!(parabfit,-5,5)
Out[28]:

Optimiseing Each Variable seprately

Optimising variable c

In [29]:
a,b = 1,1
plot(scatter(x,y,alpha=0.5))
c=0
plot!(parabfit,-5,5)
c = -1
plot!(parabfit,-5,5)
c = -2
plot!(parabfit,-5,5)
c = -3
plot!(parabfit,-5,5)
c = -4
plot!(parabfit,-5,5)
c = -5
plot!(parabfit,-5,5)
c = 2
plot!(parabfit,-5,5)
Out[29]:

Optimising Variable a

In [31]:
c,b = 1,1
plot(scatter(x,y,alpha=0.5))
a=0
plot!(parabfit,0,5)
a = -1
plot!(parabfit,0,5)
a = -2
plot!(parabfit,0,5)
a = -3
plot!(parabfit,0,5)
a = -4
plot!(parabfit,0,5)
a = -5
plot!(parabfit,0,5)
a = 2
plot!(parabfit,0,5)
Out[31]:
In [37]:
#Locating final value for a
c,b = 3,1
plot(scatter(x,y,alpha=0.5))
a = -1
plot!(parabfit,0,5)
Out[37]:

Optimising for b

In [53]:
c,a = 2,-1
plot(scatter(x,y,alpha=0.5))
b=0
plot!(parabfit,0,5)
b = 1
plot!(parabfit,0,5)
b = 2
plot!(parabfit,0,5)
b = 3
plot!(parabfit,0,5)
b = 4
plot!(parabfit,0,5)
b = 5
plot!(parabfit,0,5)
b = -1
plot!(parabfit,0,5)
Out[53]:
In [57]:
# plotting for b=4
c,a = 1,-1
plot(scatter(x,y,alpha=0.5))
b = 3
plot!(parabfit,0,8)
Out[57]:

final Values of a,b and c

In [65]:
# plotting for b=4
c,a,b = 1,-1,3
plot(scatter(x,y,alpha=0.5))
plot!(parabfit,0,5)
Out[65]:

To optimize values of a,b,c we had to plot one variable many times to find out one variable’s occurrence at different levels
of scale. By changing the range of parabola function it was more easy to come up with more accurate values of a,b and c

In [ ]:

 

 

Asynchronous recipes in Python

“Concurrency” is not “Parallelism” May be it’s better. If you will not work with DataScience, DataProcessing, Machine-Learning and other operations which are CPU-Intensive you probably will found that you don’t need parallelism but you need concurrency more!

  1. A Simple Example is Training a machine learning model is CPU intensive or You can use GPU.
  2. To Make various Predictions from one model based on many Different Input-Parameters to find out best result You need Concurrency!

 

There are so Many ways one can hack into Python stuff and do cool Stuff either it is CPU intensive or just a task to do stuff that is good/bad/Better/Best for one user to communicate. One thing you have to believe that Python Does support Multiprocessing as well as Multi-threading

but for various reasons when you are doing CPU intensive Tasks you have to Stay away from using Threading operations in Python. Use Numpy, Cython,Jython or anything you feel, Write C++ code and glue it with Python

The number of threads will usually be equivalent to the number of cores you have. If you have hyperthreading on your processor, than you will be able to double the number of threads used.

Above image is just one Example to understand what actually we are doing. We are processing Chunks and Chuncks of Data. Now the real common scenario is If you are using I/O bound tasks use Threads in Python if you are using CPU bound tasks use Processes in Python.  I have worked with various Python Projects where Performance was issue at some level so at that time I always went to other things like Numpy, Pandas, Cyhton or numba but not Plain-Python.

Let’s come to the point and Point is What are those Recipes I can use:

Using concurrent.futures(futures module is also back-ported into Python2.x):

Suppose you have to call multiple URLs at same time using same Method. That is what actually Concurrency is, Apply same method different operations, We can do it either using ThreadPool or ProcessPool.


# Using Process Pool
from concurrent.futures import ProcessPoolExecutor,as_completed
def health_check1(urls_list):
pool = ProcessPoolExecutor(len(urls_list))
futures = [pool.submit(requests.get,url,verify=False) for url in final_url]
results = [r.result() for r in as_completed(futures)] # when all operations done
return results # a Python list of all results, Here you can also use Numpy as well

Using ThreadPool it is also not different:


# Using Thread Pool
from concurrent.futures import ThreadPoolExecutor,as_completed

def just_func(urls_list):
pool = ThreadPoolExecutor(len(urls_list))
futures = [pool.submit(requests.get,url,verify=False) for url in urls_list]
results = [r.result() for r in as_completed(futures)] # when all operations done
return results # a Python list of all results, Here you can also use Numpy as well

In the above code ‘url_list’ is just list of tasks which are similar and can be processed using same kind of functions.

On the other-side using it with with as context manager is also not different. In this Example I will Use ProcessPoolexecutor’s inbuilt map function.


def just_func(url_list):
with concurrent.futures.ProcessPoolExecutor(max_workers=len(final_url)) as executor:
result = executor.map(get_response,final_url)
return [i for i in result]

Using multiprocessing: (Multiprocessing is also Python-library that can be used for Asynchronous behavior of your code.)

*in Multiprocessing the difference between map and apply_async is only that Map returns results as task list is passed to it on the other-hand apply_async returns results based on results those returned by function.


# Function that run multiple tasks
def get_response(url):
“””returns response for URL ”””
response = requests.get((url),verify=False)
return response.text

Now above function is simple enough that is getting one URL and returning response but if have to pass multiple URLs but I want that get request to each URL should be fired at same time then That would be Asynchronous process not multiprocessing because in Multiprocessing Threads/Processes needs to communicate with each other but on the other hand in case of Asynchrounous threads don’t communicate(in Python because Python uses Process based multiprocessing not Thread Based although you can do thread-based multiprocessing in Python but then you are on your OWN 😀 😛 Hail GIL (Mogambo/Hitler)).

So above function will be like this as usual:

from multiprocessing import Pool
pool = Pool(processes=20)
resp_pool = pool.map(get_response,tasks)
URL_list = []
resp_pool = _pool.map(get_response,tasks)
pool.terminate()
pool.join()

Although This is an interesting link one can watch while going into Multiprocessing in Python using Multiprocessing: It is Process-Bases Parallelism.
http://sebastianraschka.com/Articles/2014_multiprocessing.html

Using Gevent: Gevent is a concurrency library based around libev. It provides a clean API for a variety of concurrency and network related tasks.


import gevent
import random

def task(pid):
“””
Some non-deterministic task
“””
gevent.sleep(random.randint(0,2)*0.001)
print(‘Task %s done’ % pid)

def asynchronous():
threads = [gevent.spawn(task, i) for i in xrange(10)]
gevent.joinall(threads)

print(‘Asynchronous:’)
asynchronous()

If you have to Call Asynchronously but want to return results in Synchronous Fashion:

import gevent.monkey
gevent.monkey.patch_socket()

import gevent
import urllib2
import simplejson as json

def fetch(pid):
response = urllib2.urlopen(‘http://json-time.appspot.com/time.json&#8217;)
result = response.read()
json_result = json.loads(result)
datetime = json_result[‘datetime’]

print(‘Process %s: %s’ % (pid, datetime))
return json_result[‘datetime’]

def asynchronous():
threads = []
for i in range(1,10):
threads.append(gevent.spawn(fetch, i))
gevent.joinall(threads)

print(‘Asynchronous:’)
asynchronous()

Assigning Jobs in Queue:

import gevent
from gevent.queue import Queue

tasks = Queue()

def worker(n):
while not tasks.empty():
task = tasks.get()
print(‘Worker %s got task %s’ % (n, task))
gevent.sleep(1)

print(‘Quitting time!’)

def boss():
for i in xrange(1,25):
tasks.put_nowait(i)

gevent.spawn(boss).join()

gevent.joinall([
gevent.spawn(worker, ‘steve’),
gevent.spawn(worker, ‘john’),
gevent.spawn(worker, ‘nancy’),
])

When you have to manage Different Groups of Asynchronous Tasks:

import gevent
from gevent.pool import Group

def talk(msg):
for i in xrange(3):
print(msg)

g1 = gevent.spawn(talk, ‘bar’)
g2 = gevent.spawn(talk, ‘foo’)
g3 = gevent.spawn(talk, ‘fizz’)

group = Group()
group.add(g1)
group.add(g2)
group.join()

group.add(g3)
group.join()

Same As multiprocessing Library you can also use Pool to map various operations:


import gevent
from gevent.pool import Pool

pool = Pool(2)

def hello_from(n):
print(‘Size of pool %s’ % len(pool))

pool.map(hello_from, xrange(3))

Using Asyncio:

Now let’s talk about concurrency Again! There is already lot of automation is going inside asyncio or Gevent but as programmer we have to understand how we need to break a “One large task into small chuncks of Subtasks so when we will write code we will be able to understand which tasks can work independently.

import time
import asyncio

start = time.time()

def tic():
return ‘at %1.1f seconds’ % (time.time() – start)

async def gr1():
# Busy waits for a second, but we don’t want to stick around…
print(‘gr1 started work: {}’.format(tic()))
await asyncio.sleep(2)
print(‘gr1 ended work: {}’.format(tic()))

async def gr2():
# Busy waits for a second, but we don’t want to stick around…
print(‘gr2 started work: {}’.format(tic()))
await asyncio.sleep(2)
print(‘gr2 Ended work: {}’.format(tic()))

async def gr3():
print(“Let’s do some stuff while the coroutines are blocked, {}”.format(tic()))
await asyncio.sleep(1)
print(“Done!”)

ioloop = asyncio.get_event_loop()
tasks = [
ioloop.create_task(gr1()),
ioloop.create_task(gr2()),
ioloop.create_task(gr3())
]
ioloop.run_until_complete(asyncio.wait(tasks))
ioloop.close()

 

Now in the above code gr1 and gr2 are somehow taking some time to return anything it could any kind of i/o operation so what we can do here is go to the gr3 in using the event_loop and event_loop will run until all three tasks are not completed.

Please have a closer look at await keyword in the above code. It is one of the most important step where you can assume interpreter is shifting from one task to another or you can call it pause for function. If you have worked with yield or yield from in Python2 and Python3 you would be able to understand that this is stateless step for the code.

There is on more library which is aiohttp that is being used to handle blocking Http requests with asyncio.


import time
import asyncio
import aiohttp

URL = ‘https://api.github.com/events&#8217;
MAX_CLIENTS = 3

async def fetch_async(pid):
print(‘Fetch async process {} started’.format(pid))
start = time.time()
response = await aiohttp.request(‘GET’, URL)
return response

async def asynchronous():
start = time.time()
tasks = [asyncio.ensure_future(
fetch_async(i)) for i in range(1, MAX_CLIENTS + 1)]
await asyncio.wait(tasks)
print(“Process took: {:.2f} seconds”.format(time.time() – start))

print(‘Asynchronous:’)
ioloop = asyncio.get_event_loop()
ioloop.run_until_complete(asynchronous())
ioloop.close()

In all the above Examples we have just Scratched the world of concurrency but in real there would be much more to look into because real world problems are more complex and intensive. There are various other options in asyncio like handling exceptions with-in futures, creating future wrappers for normal tasks,Applying timeouts if task is taking more than required time and doing something else instead.

There is lot of inspiration I got while learning about concurrent programming in Python from the following Sources:

https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e
http://www.gevent.org/
https://www.binpress.com/tutorial/simple-python-parallelism/121
http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html

Run Flask in Parallel using ThreadPoolExecutor


from flask import Flask
from time import sleep
from concurrent.futures import ThreadPoolExecutor

# DOCS https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor
executor = ThreadPoolExecutor(2)

app = Flask(__name__)

@app.route('/jobs')
def run_jobs():
executor.submit(some_long_task1)
executor.submit(some_long_task2, 'hello', 123)
return 'Two jobs was launched in background!'

def some_long_task1():
print("Task #1 started!")
sleep(10)
print("Task #1 is done!")

def some_long_task2(arg1, arg2):
print("Task #2 started with args: %s %s!" % (arg1, arg2))
sleep(5)
print("Task #2 is done!")

if __name__ == '__main__':
app.run()

OOPS and More OOPS in Python

Concurrency in Python or Natural way of life(Not yet completed POST)

There are various ways one can think about computing , Multiprocessing, Asynchronous, Multi-threading as well as “Parallel Processing” If I would talk about theoratical things I Would say we have to distribute our one particular task in various forms so multiple resources should be available for system to run things or in other way we can say multiprocessing is more of Programmer’s way of understanding the Flow of precess and sometimes rules according to theory does not assure that if one is providing multiple resources to process it will be FAST! it could be FAT! also.

Now let me start with very simple Example by taking following function as use case:

# Function that run multiple tasks
def get_response(url):
“””returns response for URL ”””
response = requests.get((url),verify=False)
return response.text

Now above function is simple enough that is getting one URL and returning response but if I have to pass multiple URLs but I want that get request to each URL should be fired at same time then That would be Asynchronous process not multiprocessing because in Multiprocessing Threads/Processes needs to communicate with each other but on the other hand in case of Asynchrounous threads don’t communicate(in Python because Python uses Process based multiprocessing not Thread Based although you can do thread-based multiprocessing in Python but then you are on your OWN 😀 😛 Hail GIL (Mogambo/Hitler)).

So above function will be like this as usual:

from multiprocessing import Pool
pool = Pool(processes=20)
resp_pool = pool.map(get_response,tasks)
URL_list = []
resp_pool = _pool.map(get_response,tasks)
pool.terminate()
pool.join()

One thing you have to understand very carefully and that is GIL does not harm for i/o bound operations but yes when it comes to non-i/o bound operations in python You have Numpy,Scipy,Pandas,Cython where one can really release GIL and take full advantage of the code.

How to release GIL using Cython: https://lbolla.info/blog/2013/12/23/python-threads-cython-gil
Although one can look for interesting features about GIL: http://www.dabeaz.com/python/NewGIL.pdf

Intel has also provided Python Distribution that is helpful get speedups in Python but that would only be helpful for Machine-learning and Data-Science work.

http://www.techenablement.com/orders-magnitude-performance-intel-distribution-python/(Seems like worth to give it a Try:::)

Now there is one important thing you must need to care about when you are releasing GIL in Python.

You can also scratch your head many times by just reading/watching this one interesting presentation: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Although Numba is also there but make one thing for sure Use such tools only when your Operation is CPU bound not I/O bound because as I have stated that I/O bound operations don’t care about GIL.

Although you will find out that GIL is not just Python’s Problem:

https://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil

I/O Bound:

The I/O bound state has been identified as a problem in computing almost since its inception. The Von Neumann architecture, which is employed by many computing devices, is based on a logically separate central processor unit which requests data from main memory,[clarification needed] processes it and writes back the results. Since data must be moved between the CPU and memory along a bus which has a limited data transfer rate, there exists a condition that is known as the Von Neumann bottleneck. Put simply, this means that the data bandwidth between the CPU and memory tends to limit the overall speed of computation. In terms of the actual technology that makes up a computer, the Von Neumann Bottleneck predicts that it is easier to make the CPU perform calculations faster than it is to supply it with data at the necessary rate for this to be possible.

In simple cases CPU is Faster and Memory is Slower.
https://en.wikipedia.org/wiki/I/O_bound

Let’s make things more precise:
Sync: Blocking operations.
Async: Non blocking operations.
Concurrency: Making progress together.
Parallelism: Making progress in parallel.

Now Questions arises that do we need all those things together:
http://docs.python-guide.org/en/latest/scenarios/speed/
https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html
https://github.com/dask/dask(Although I just found that Dask is much more Advanced and Promising that one should not ignore at all!!)
http://dask.pydata.org/en/latest/dataframe-performance.html

async: https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e
https://stackoverflow.com/questions/8533318/python-multiprocessing-pool-when-to-use-apply-apply-async-or-map
https://github.com/pyparallel/pyparallel

Running Multiprocessing in Flask App(Let’s Spawn) Hell Yeah

Ok It was going to be long time but Finally yeah Finally Able to do Process based multiprocessing in Python and even on Flask. 🙂 oh yeah! There are various recipes for Multiprocessing in this python but here you can only Enjoy with Flask.

:D

from multiprocessing import Pool
from flask import Flask
from flask import jsonify
import ast
import pandas as pd
import requests

app = Flask(__name__)
_pool = None

# Function that run multiple tasks
def get_response(x):
“””returns response for URL list”””
m = requests.get((x),verify=False)
return m.text

@app.route(‘/call-me/’)
def health_check():
“””returns pandas dataframe into HTML for health-check Services”””
resp_pool = _pool.map(get_response,tasks)
table_frame= pd.DataFrame([ast.literal_eval(resp) for resp in resp_pool])
return table_frame.to_html()

if __name__==’__main__’:
_pool = Pool(processes=12) # this is important part- We
try:
# insert production server deployment code
app.run(use_reloader=True)
except KeyboardInterrupt:
_pool.close()
_pool.join()

 

One mintue read to one minute Manager

Get out more results in less time.

Autocratic VS Democratic:
Autocratic are result oriented and Democratic are happiness Oriented So we
need to be one minute Managers. 🙂

1. One minute Goal Setting:

Everyone should be knowing the goals of the company.
People must know what their roles are in the company.
Goals must not be more than 250 words.
Always review your Goals.

2. One minute Praising:

Give True Feedback.
Always praise immediately.
Share happiness and encourage your people.

3. One minute reprimand:

Immediately point people out for their mistakes.
Tell people how you feel about it.
Point out mistake but don’t criticize.
Be on the side of your people.

conclusion:
Look for the good things in the beginners and bad things in the experienced.
Share what you learn.
We don’t manage people, We manage behaviors.
Love your people and make sure they are also loving you back.
Define your problem grammatically. (What is happening and What you want to be happen.)

%d bloggers like this: