Tag Archives: DataScience machine-learing
This is just first Quick and Fast Post.
TPOT Research Paper: https://arxiv.org/pdf/1702.01780.pdf
import datetime import numpy as np import pandas as pd import sklearn from pandas_datareader import data as read_data from tpot import TPOTClassifier from sklearn.model_selection import train_test_split apple_data = read_data.get_data_yahoo("AAPL") df = pd.DataFrame(index=apple_data.index) df['price']=apple_data.Open df['daily_returns']=df['price'].pct_change().fillna(0.0001) df['multiple_day_returns'] = df['price'].pct_change(3) df['rolling_mean'] = df['daily_returns'].rolling(window = 4,center=False).mean() df['time_lagged'] = df['price']-df['price'].shift(-2) df['direction'] = np.sign(df['daily_returns']) Y = df['direction'] X=df[['price','daily_returns','multiple_day_returns','rolling_mean']].fillna(0.0001) X_train, X_test, y_train, y_test = train_test_split(X,Y,train_size=0.75, test_size=0.25) tpot = TPOTClassifier(generations=50, population_size=50, verbosity=2) tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test)) tpot.export('tpot_aapl_pipeline.py')
The Python file It returned: Which is real Code one can use to Create Trading Strategy. TPOT helped to Selected Algorithms and Value of It’s features. right now we have only provided ‘price’,’daily_returns’,’multiple_day_returns’,’rolling_mean’ to predict Target. One can use multiple features and implement as per the requirement.
import numpy as np import pandas as pd from sklearn.ensemble import GradientBoostingClassifier from sklearn.model_selection import train_test_split # NOTE: Make sure that the class is labeled 'target' in the data file tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64) features = tpot_data.drop('target', axis=1).values training_features, testing_features, training_target, testing_target = \ train_test_split(features, tpot_data['target'].values, random_state=42) # Score on the training set was:1.0 exported_pipeline = GradientBoostingClassifier(learning_rate=0.5, max_depth=7, max_features=0.7500000000000001, min_samples_leaf=11, min_samples_split=12, n_estimators=100, subsample=0.7500000000000001) exported_pipeline.fit(training_features, training_target) results = exported_pipeline.predict(testing_features)
Ok have some fun first. 😀
Whenever you will read any post or paper related to Machine-Learning or Data-Science you will get word ‘Correlation’ many times and how it’s value is important in your model Building.
A simple definition of Correlation: A mutual relationship or connection between two or more things. (that’s layman’s definition and It should be enough most of the times 😉 )
Coefficient of Correlation is just an integer, From which we understand how two or more things
are related to each-other. As we discussed Coefficient of Correlation is an integer so it could be +ve or -ve and value of correlation decides how two data-sets effect each other.
Following two images tell lot about Correlation and it’s Value.
Coefficient of Correlation between range -0.5 to +0.5 is not that valuable by why and how we calculate correlation?
What is Covariance ?
Now if you still feel that something is really missing we should talk about Variance:
Let’s Remove Co from Covariance.
Variance is Measurement of randomness. So How you would calculate Variance of Data?
Give me data:
Data = [4,5,6,7,12,20]
I will find means and subtract it from each individual- Isn’t that Mean-Deviation ? 😀 OMG!
Have a look at the following Picture:
Let’s wait for stuff like:
Coefficient of determination, Probable Error and interpretation.