This notebook is a technical exercise I worked on as part of an interview for a crypto trading firm. The exercises involve building simplified interfaces to parse order book data and calculate various quantities and derived properties.
I find technical interviews that involve live coding exercises to be really useful and really stressful. Live coding definitely triggers “performance anxiety” for me (when I was a kid I really hated presenting or playing an instrument in front of anyone, even teachers) - I felt self-councious and in this case I needed to use VSCode instead of my familiar Vim+Tmux setup. Consequently progress was really slow and bumpy. Tragically, I got really muddled trying to parse some json after the initial API request!
Whilst thinking about the problem and what code to write, I also needed to think about the following:
- How to communicate my thoughts
- How to optimize my output for an interview context (it’s not a real problem involving tests, edge cases, scalability or performance considerations)
- How to use VSCode (and a mouse or trackpad).
These additional considerations resulted in me writing code that got the job done.. but slowly. The code wasn’t great.
I was curious how much faster I’d be (and how much easier the exercise would seem) if I treated the exercise as a “take home” exercise instead of “live coding”. Here are the results - it took a couple of hours, the ideas flowed more easily, and I could remember methods more accurately. I wish I didn’t (still) get so self conscious!
Despite this, I think live coding interviews are a great way of assessing me - they get under my skin and show my worst sides as well as my best. If I’m going to work in a high paced and demanding role then the interview should also have those elements.
import requests
import operator
import pprint
import pandas as pd
import time
pp = pprint.PrettyPrinter(indent=4)
def get_data(url):
r = requests.get(url)
data = r.json()
return data.get('result')
#get_data("https://api.kraken.com/0/public/Depth?pair=ethusd")
2. Add some functionality that takes a side (bid or ask) and a price p as arguments, and returns the total volume available in the order book at p.¶
def calc_volume(ob_side, price):
data = get_data("https://api.kraken.com/0/public/Depth?pair=ethusd")
# side should be either "asks" or "bids"
side = data["XETHZUSD"][ob_side]
clean_data = []
for i, j in enumerate(side):
price_data = float(side[i][0])
volume = float(side[i][1])
clean_data.append((price_data, volume))
# find all the items where first item is less than or equal to price
index = 0
volume = 0
operate = {
'bids': operator.ge,
'asks': operator.le,
}
op = operate[ob_side]
for i, j in enumerate(clean_data):
if op(clean_data[i][0], price):
index += 1
volume += clean_data[i][1]
else:
break
return volume
calc_volume("bids", 2470)
calc_volume("asks", 2480)
3. Now add some (or modifies existing) functionality that takes a percentage and a side as arguments, and returns the volume available between the best price for that side, and that price +/- the percentage.¶
def slippage(ob_side, percent):
data = get_data("https://api.kraken.com/0/public/Depth?pair=ethusd")
# side should be either "asks" or "bids"
side = data["XETHZUSD"][ob_side]
clean_data = []
for i, j in enumerate(side):
price_data = float(side[i][0])
volume = float(side[i][1])
clean_data.append((price_data, volume))
# find all the items where first item is less than or equal to price
index = 0
volume = 0
starting_price = clean_data[0][0]
# pprint.pprint(starting_price)
if side == 'asks':
percent *= -1
op = operator.add if side == "ask" else operator.sub
limit_price = op(starting_price, (starting_price * percent))
# pp.pprint(f"limit_price: {limit_price}")
operate = {
'bids': operator.ge,
'asks': operator.le,
}
op = operate[ob_side]
for i, j in enumerate(clean_data):
if op(clean_data[i][0], limit_price):
index += 1
volume += clean_data[i][1]
else:
break
# pp.pprint(volume)
return volume
slippage("bids", 0.01)
4. Can you visualize the order book?¶
asks = get_data("https://api.kraken.com/0/public/Depth?pair=ethusd")['XETHZUSD']['asks']
bids = get_data("https://api.kraken.com/0/public/Depth?pair=ethusd")['XETHZUSD']['bids']
asks = pd.DataFrame(asks)
asks.columns = ['price', 'volume', 'timestamp']
asks['volume'] = asks['volume'].astype(float)
asks['cumulative'] = asks['volume'].cumsum()
asks.head()
bids = pd.DataFrame(bids)
bids.columns = ['price', 'volume', 'timestamp']
bids['volume'] = bids['volume'].astype(float)
bids['cumulative'] = bids['volume'].cumsum()
bids.head()
ob = pd.concat([bids[["price","cumulative"]], asks[["price","cumulative"]]], ignore_index=False)
ob.sort_values(by="price", inplace=True)
ob.set_index("price", inplace=True)
ax = ob.plot.bar(figsize=(14,10))
xticks = ax.xaxis.get_ticklocs()
xticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]
ax.xaxis.set_ticks(xticks[::5]);
ax.xaxis.set_ticklabels(xticklabels[::5]);
Part 2: Time-series¶
Build an abstraction that takes the following arguments:
- A pair
- A unit of time (t) that can be either “milliseconds”, “seconds”, “minutes” or “hours”.
- A number of time units (n)
Given these arguments, it should fetch and store the order book for the pair every n t. For example: if t = “second” and n = 3, it should fetch the order book every second for a total of 3 times.
def time_series(pair, tunit, n):
base_url = f"https://api.kraken.com/0/public/Depth?pair={pair}"
pair_mod = "X"+pair[0:3].upper()+"Z"+pair[3:].upper()
results = {}
timeunits = {
"s": 1,
"m": 60*1,
"ms": 0.001,
}
realtime = timeunits[tunit]
for i in range(n):
print(f"getting snapshot {i+1} of {n}")
r = requests.get(base_url)
data = r.json()
print(hash(frozenset(r)))
results[i] = data.get('result')[pair_mod]
time.sleep(n*realtime) # had to modify by multiplying by n so that api is more likely to return different data set.
print("done!")
return results
results = time_series("ethusd", "s", 3)
We see that the hash of the api response is sometimes the same, so we would expect there to be no volume change between these identical snapshots.
#results
Now extend your code to compare the order books you’ve fetched to each other. Given the name of a pair, n, t and a price p, determine how much the total volume available at p has changed between each “frame”, and between the first and last “frame”.¶
# redefine "calc_volume" function from above to make data an input param
def calc_volume(data, ob_side, price):
# side should be either "asks" or "bids"
side = data[ob_side]
clean_data = []
for i, j in enumerate(side):
price_data = float(side[i][0])
volume = float(side[i][1])
clean_data.append((price_data, volume))
# find all the items where first item is less than or equal to price
index = 0
volume = 0
operate = {
'bids': operator.ge,
'asks': operator.le,
}
op = operate[ob_side]
for i, j in enumerate(clean_data):
if op(clean_data[i][0], price):
index += 1
volume += clean_data[i][1]
else:
break
return volume
def snapshot_compare(results, ob_side, price):
snap_vol = []
for key in results:
result = calc_volume(results[key], ob_side, price)
snap_vol.append(result)
total_change = round(snap_vol[-1] - snap_vol[0])
incremental_changes = [round(snap_vol[i+1] - snap_vol[i]) for i, j in enumerate(snap_vol) if i < len(snap_vol)-1]
print(f"vol in each snapshot: {snap_vol}")
print(f"incremental: {incremental_changes}")
print(f"total (diff between last and first): {total_change}")
results = time_series("ethusd", "s", 3)
#results
snapshot_compare(results, "bids", 2470)
Finally, can you think of an efficient way to get a “diff” of two frames? For example, can you think of a way to determine at which price the volume changed the most?¶
Quick idea:
- for each side of each order book, group the orders into buckets with a certain width, e.g. if the lowest ask is 2450, group all the asks from 2450 to 2451, then from 2451 to 2452, etc. Do this for both sides of each order book you collect.
- you then have multiple aggregated order books with the same indexes.
- compare the volumes at the same index across each orderbook
Additional considerations:
- need to force the lowest ask or max bid to be the same in each order book snapshot.
- need to deal with large spreads
- comparisons where one side has 0 volume should be marked as suspicious and investigated further to iterate a better solution.
- need to present negative volumes changes correctly