PySpark Learning and Exploration

Topics covered in this section.

  1. Loading Data in Spark(Json,CSV and more)
  2. Defining Custom Schema PySpark.
  3. Loading Spark DataFrame as SQL
  4. Run SQL quiries in Spark.
  5. Filter-Data, handling missing-Data and Dealing with Datetime(TimeSeries Data) in sprk.
  6. [Final-Project] Write A Streaming API in Spark!

Course Link:

https://www.skillshare.com/classes/Big-data-analysis-with-Apache-spark-PySpark-Python/175297556/projects?via=watch-history

In [6]:
import pyspark
from pyspark.sql import SparkSession, Row
In [7]:
spark = SparkSession.builder.getOrCreate()
In [8]:
spark
Out[8]:

SparkSession – in-memory

SparkContext

Spark UI

Version
v3.2.0
Master
local[*]
AppName
pyspark-shell
In [9]:
df = spark.createDataFrame([Row(1,2,3),Row(1,2,3),Row(1,2,3)])
In [10]:
df
Out[10]:
DataFrame[_1: bigint, _2: bigint, _3: bigint]
In [11]:
df.show()
+---+---+---+
| _1| _2| _3|
+---+---+---+
|  1|  2|  3|
|  1|  2|  3|
|  1|  2|  3|
+---+---+---+
In [15]:
df2 = spark.createDataFrame([Row(a=1,b=2.0,c="stinrg"),Row(a=1,b=2.0,c="stinrg"),Row(a=1,b=2.0,c="stinrg")])
In [17]:
df2
Out[17]:
DataFrame[a: bigint, b: double, c: string]
In [19]:
df2.show() # All Spark DataFrames are immutable!!
+---+---+------+
|  a|  b|     c|
+---+---+------+
|  1|2.0|stinrg|
|  1|2.0|stinrg|
|  1|2.0|stinrg|
+---+---+------+
In [21]:
from datetime import date, datetime
df3 = spark.createDataFrame([
    (1, 2., 'string1', date(2000, 1, 1), datetime(2000, 1, 1, 12, 0)),
    (2, 3., 'string2', date(2000, 2, 1), datetime(2000, 1, 2, 12, 0)),
    (3, 4., 'string3', date(2000, 3, 1), datetime(2000, 1, 3, 12, 0))
], schema='a long, b double, c string, d date, e timestamp')
df3
Out[21]:
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]
In [22]:
df3.show()
+---+---+-------+----------+-------------------+
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
|  3|4.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+
In [23]:
df3.printSchema()
root
 |-- a: long (nullable = true)
 |-- b: double (nullable = true)
 |-- c: string (nullable = true)
 |-- d: date (nullable = true)
 |-- e: timestamp (nullable = true)
In [30]:
df3.select("a","b","c").describe().show()
+-------+---+---+-------+
|summary|  a|  b|      c|
+-------+---+---+-------+
|  count|  3|  3|      3|
|   mean|2.0|3.0|   null|
| stddev|1.0|1.0|   null|
|    min|  1|2.0|string1|
|    max|  3|4.0|string3|
+-------+---+---+-------+
In [31]:
df3.filter(df3.a==3).show()
+---+---+-------+----------+-------------------+
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  3|4.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+
In [32]:
df = spark.createDataFrame([
    ['red', 'banana', 1, 10], ['blue', 'banana', 2, 20], ['red', 'carrot', 3, 30],
    ['blue', 'grape', 4, 40], ['red', 'carrot', 5, 50], ['black', 'carrot', 6, 60],
    ['red', 'banana', 7, 70], ['red', 'grape', 8, 80]], schema=['color', 'fruit', 'v1', 'v2'])
df.show()
+-----+------+---+---+
|color| fruit| v1| v2|
+-----+------+---+---+
|  red|banana|  1| 10|
| blue|banana|  2| 20|
|  red|carrot|  3| 30|
| blue| grape|  4| 40|
|  red|carrot|  5| 50|
|black|carrot|  6| 60|
|  red|banana|  7| 70|
|  red| grape|  8| 80|
+-----+------+---+---+
In [35]:
df.groupby('color').avg().show()
+-----+-------+-------+
|color|avg(v1)|avg(v2)|
+-----+-------+-------+
|  red|    4.8|   48.0|
| blue|    3.0|   30.0|
|black|    6.0|   60.0|
+-----+-------+-------+

Quantum Machine Learning

Coursera Guided Project

  • There is special version of Tensorflow which is called TensorFow Quantum!!

pennylane

  1. Quntum Computations are represented as Quantum-Node Objects n Pennylane!
  2. Quantu Nodes can be create using Qnode decorator.
  3. each Wire in Pennylane represent a Qubit.

Pennylane Quantum Functions!!

Pennylane-Quantum-Functions (Quantum Functions are restircted Subset of Python Functions)

Quantum funtions consists Set Quantum-operations or set of Templates.

** Templates are very Specific to Pennylane: https://pennylane.readthedocs.io/en/stable/introduction/templates.html

Quantum Function MUST-RETURN “Measurent-Function” Applied to “Qubit-Observable” or

“Continuse-variable-Observable”

Measurement-Function: https://pennylane.readthedocs.io/en/stable/introduction/measurements.html
Qubit-Obserable: https://pennylane.readthedocs.io/en/stable/introduction/operations.html#intro-ref-ops-qobs
Continuse Variable Observable: https://pennylane.readthedocs.io/en/stable/introduction/operations.html#continuous-variable-cv-operations

define a Device

import pennylane a qml

dev1 = qml.device(“default-qubit”,wires=1, shots=100, analytic=True)
# this is default device , Wires define no. of subsystems repreented by device
# Shots=100 define iterations like how many times circuit should be evaluated!
# analytic=True valid for simulaters means device shoud calculate expectations and variations analytically

** we can also write it like wires = [‘wire1′,’wire2’] etc

Qnodes:

Qnodes: QNodes Enabe interfces beteen Quantum-Nodes and machine-learning libraries.

QNode = QUNTUM-Function/Circuit + DEVICE

** Pennylane Defines A collection of QNodes.

  1. We Can have collections of Qnodes(different devices and different Plugins etc) and we can use
    qml.map() Function to map different-kind of parameters across Different QNodes!!
  2. QNODE-Collections: Set of Qnodes which have same Function Signature and Can be Evaluated independently

Templates: (Specfically pennylane-Templates)

So Pennylane Provide Growing Number of templates of Quantu-Variational circuits Archetectures That can be
easily used to build, evaluate and train more complex Models!

  1. Embedding Templates: convert/encode input-features into required Quantum-State of the circuit.
  2. Layer-Templates: Provide Sequence of trainable Gates, that are repeated like layers in gate.
  3. State-preparations: These convert given state into sequence of Gates preparing that State.

Need to watch following Two videos about State-Preparations:
QuantumStates, Qbits and measurements!
https://www.youtube.com/watch?v=NZD9APb7ZtY
https://www.youtube.com/watch?v=SlZoTjkPy7o
https://www.youtube.com/watch?v=9MpSQglnqI0

Quantum-Operations:

Following Quantum Operations are provided by Pennylane.

  1. Quantum_gates: Google and write about it with Examples.
  2. Quantum_observables: Google and write about it with Examples.
  3. Quantum_State-preparations: Google and write about it with Examples.
  4. Quantum_measurements: Google and write about it with Examples.

Pandas Technical Analysis
https://github.com/twopirllc/pandas-ta

Nothing is better than this!
https://towardsdatascience.com/technical-analysis-library-to-financial-datasets-with-pandas-python-4b2b390d3543

[Interview-Experience] and Why I failed it? :(

I was in assumption that I am very good with Programming and I am Kind of Ninja who can slaughter all the given code challenges they will though at me but HOW WRONG I WAS!

I have figured out few learnings from all this and my assumption is it will help me to go for better prospective job offers in the coming Months.

*** [News-Flash] I really need to work with more focus on my work! (There is no excuse in that!!)

*** [Just Stop being Lazy] There is no time or option of being lazy anymore.

***[Just Stop being Awesome] Now you really need to stop being Awesome in your life.

***[Test-Cases] When someone through a pet-project at you, First Thing you need to do is RUN UNIT TEST CASES!!

There is no way of thinking and saying that “TEST CASES ARE JUST TEST CASES!!”

[** Business-logic] There is no value of Software if it does not provide any business and there is no value of Programmer if he/She does not understand Business-Case, understanding the Business case is not the only priority but it is the Must have priority for the business and Programmer!!

Pennylane Tutorials Need to be Explored.

Basic Qubit rotation:

https://pennylane.ai/qml/demos/tutorial_qubit_rotation.html

Quantum Gradients with BackPropagation.

https://pennylane.ai/qml/demos/tutorial_backprop.html

Plugins and Hybrid Device.

https://pennylane.ai/qml/demos/tutorial_plugins_hybrid.html

Turning Keras Layers into Quantum Layers!

https://pennylane.ai/qml/demos/tutorial_qnn_module_tf.html

Algos and DS from Python-Cook Book(David beaz)

Question: You have -element Tuple of Sequence that you have to un-pack into N number of variables.

Question: You have to unpack N-element of tuple or sequence into collection of variables which are less than N. (basically overcome too many values to unpack!)

Question: You have to keep limited history of last few items of iteration or some kind of processing. (Basically tail version of Linux)

Question: Find largest or smallest elements from a sequence, find top 3 , top 10 or smallest 3 or smallest 10

Question: Create a Priority Queue which sorts items based on priority assigned and always return item with highest priority.

Question: Create a dictionary that maps keys to more than one value, aka multidict.

Question: You want to create a dictionary, and you also want to control the order of items when
iterating or serializing.

Question: You want to perform various actions like sort, min value, ma value on dict data.

Question: You have two dictionaries and want to find out what they might have in common (same
keys, same values, etc.).

Question: Remove duplicate values from sequence but preserve it’s order.

Question:

Your program has become an unreadable mess of hardcoded slice indices and you want
to clean it up.

Question:

You have a sequence of items, and you’d like to determine the most frequently occurring
items in the sequence.

Question:

You have a list of dictionaries and you would like to sort the entries according to one
or more of the dictionary values.

Questions: Sort object of Class but they don’t support comparison operatoins!

You want to sort objects of the same class, but they don’t natively support comparison
operations.

Question: Implement a Groupby in python.

You have a sequence of dictionaries or instances and you want to iterate over the data
in groups based on the value of a particular field, such as date.

Question: Filter a list in Sequence.

You have data inside of a sequence, and need to extract values or reduce the sequence
using some criteria.

Question: You want to make a dictionary that is subset of another dictionary.

You want to make a dictionary that is a subset of another dictionary.

Question:

You have code that accesses list or tuple elements by position, but this makes the code
somewhat difficult to read at times. You’d also like to be less dependent on position in
the structure, by accessing the elements by name.

Question:

You need to execute a reduction function (e.g., sum() , min() , max() ), but first need to
transform or filter the data.

Questoin:

You have multiple dictionaries or mappings that you want to logically combine into a
single mapping to perform certain operations, such as looking up values or checking
for the existence of keys.

UPSC Books (Cont.)

  1. Indian polity by Lakshamikant
  2. A brief history of Modren india, Rajiv Ahir
  3. NCERT (6th to 10th social science and science)
  4. india after gandhi or india ince independence (main events in the book)
  5. Certificate Physical and Human geography (part 1, part 2 weather and climate>
  6. Atlas Oxfoard or BlackSwan
  7. indian ecoomy by sanjiv verma or ramesh singh
    8 Environment and ecology, 20 vdeos at unacademy
  8. Economic Survey latest edition(make lots nd lots of note, V v.important!!)
    10.lexicon book
    11.
    hindu Daily,
    yozna mantri
    previous year Question papers

Tasks Related to Penny-lane Work

  1. Under-Stand Device Operations and Observables: pennylane/devices/default_qubit.py
  2. Continuous-Variable-Quantum Operations: pennylane/ops/cv.py
  3. Quantum-Channels : pennylane/ops/channel.py
  4. Discreate-Variable-Quantum-Operations: pennylane/ops/channel.py
  5. Quantum-Tapes: pennylane/tape
  6. Quantum Gradient Transforms: pennylane/gradients
  7. Quantum optimisers: pennylane/gradients
  8. QNode, Devices and Quntum-tape Transformers : pennylane/transforms