my net house


Tag Archives: DataScience machine-learing

TPOT Python Example to Build Pipeline for AAPL

This is  just first Quick and Fast Post.

TPOT Research  Paper:

import datetime
import numpy as np
import pandas as pd
import sklearn
from pandas_datareader import data as read_data
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split

apple_data = read_data.get_data_yahoo("AAPL")
df = pd.DataFrame(index=apple_data.index)
df['multiple_day_returns'] =  df['price'].pct_change(3)
df['rolling_mean'] = df['daily_returns'].rolling(window = 4,center=False).mean()

df['time_lagged'] = df['price']-df['price'].shift(-2)

df['direction'] = np.sign(df['daily_returns'])
Y = df['direction']

X_train, X_test, y_train, y_test = train_test_split(X,Y,train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=50, population_size=50, verbosity=2), y_train)
print(tpot.score(X_test, y_test))

The Python file It returned: Which is real Code one can use to Create Trading Strategy. TPOT helped to Selected Algorithms and Value of It’s features. right now we have only provided ‘price’,’daily_returns’,’multiple_day_returns’,’rolling_mean’ to predict Target. One can use multiple features and implement as per the requirement.

import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

# NOTE: Make sure that the class is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1).values
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'].values, random_state=42)

# Score on the training set was:1.0
exported_pipeline = GradientBoostingClassifier(learning_rate=0.5, max_depth=7, max_features=0.7500000000000001, min_samples_leaf=11, min_samples_split=12, n_estimators=100, subsample=0.7500000000000001), training_target)
results = exported_pipeline.predict(testing_features)


Let’s have fun with Correlation:-O

Ok have some fun first. 😀

Whenever you will read any post or paper related to Machine-Learning or Data-Science you will get word ‘Correlation’ many times and how it’s value is important in your model Building.

A simple definition of Correlation: A mutual relationship or connection between two or more things. (that’s layman’s definition and It should be enough most of the times 😉 )

Coefficient of Correlation  is just an integer, From which we understand how two or more things

are related to each-other. As we discussed Coefficient of Correlation is an integer so it could be +ve or -ve and value of correlation decides how two data-sets effect each other.

Following two images tell lot about Correlation and it’s Value.

Coefficient of Correlation between range -0.5 to +0.5 is not that valuable by why and how we calculate correlation?

What is Covariance ?

Now if you still feel that something is really missing we should talk about Variance:

Let’s Remove Co from Covariance.

Variance is Measurement of randomness. So How you would calculate Variance of Data?

Give me data:

Data = [4,5,6,7,12,20]

I will find means and subtract it from each individual- Isn’t that Mean-Deviation ? 😀 OMG!

Have a look at the following Picture:


Let’s wait for stuff like:

Coefficient of determination, Probable Error and interpretation.

%d bloggers like this: