Comparing the daily mean and median value of on-chain transactions gives an indication of the extent of organic blockchain use. Comparing the mean-median ratio (MMR) across different Bitcoin forks shows that Bitcoin (BTC) is the least influenced by “whales”.
This analysis was prepared for Coin Metrics as part of their recruitment process. It is a short demonstration of my thought process. The additional steps required to develop this into a useful analysis are also discussed.
CoinMetrics Case Study¶
Objective - to evaluate skills and abilities in multiple ways:
- Importing data
- Wrangling data
- Exploring data
- Analysis
- Modeling
- Communicating results
Provide:
- A written explanation of how to approach the problem
- Present the beginning phases of implementation using coin metrics data
Of the four options made available in the case study, option 3 was chosen
Advocating for CoinMetric’s data¶
Produce quality research that is of value to potential clients (doesn’t have to be complete) with a particular focus on network data
Initial ideas¶
My first rough ideas were:
Compare different Bitcoin based chains, (BTC, BCH, LTC, BSV) to test the influence of whales and compare this to their respective (evolving) claims to be a store of value (SoV) and/or alternative to cash.
Develop and expand some of the research by Willy Woo. I find his research to be outstanding. In particular I think the following metrics merit further investigation:
- days destroyed
- hodl waves
- thermo cap
- average cap
Tracking the number of twitter followers of various crypto-twitter thought leaders and celebrities to test the hypothesis that “an increase in follower numbers shows that new retail investors are entering crypto-markets, and an increase in price is expected soon”
Thought leaders / crypto celebrities could be further grouped by what types of coins they speak about most - smart contracts, DeFi, privacy coins, etc.
Weibo could be analysed as well as Twitter to understand Chinese markets, Korean twitter could be analysed for the Korean retail market,etc.
I have an existing side project which has the goal of using a recurrent neural net using an LSTM architecture to predict BTC price movements. The app (model, stored data, data pipeline, visualization of results) will run autonomously on Google Cloud Platform. Candle data is consumed from CoinAPI.io and stored in BigQuery.
Technical indicators will be calculated and used as additional factors to the model. Sentiment analysis from news outlets (Bloomberg, FT) would be added later.
The model would be written using TensorFlow, and the BigQuery tables names would use BQ’s date format capabilites. This would make the project faster and cheaper.
Idea 1 seemed like a sensible option. Ideas 3 and 4 are interesting and worth investigating, but not possible within the scope of this exercise:
Testing the influence of whales¶
and “normal users” on BTC and 4 BTC forks, and discussing results in the context of each chain’s claimed technical advantages and use cases as e.g. a store of value or alternative to cash
This will be achieved by comparing daily mean USD transaction value to daily median USD transaction value. This is done by calculating the mean-median ratio of transaction value (MMR).
Hypothesis: If a chain has a much smaller median transaction size than mean transaction size, then on chain activity is dominated not by regular users making normal daily transactions, but by whales moving large amounts of currency to artificially inflate usage metrics.
This could contradict claims that a blockchain has an active user base that the blockchain is meeting user needs. We assume that:
If a blockchain is functioning as digital cash, then most of its transactions would be small. e.g less than 100 USD. It should be noted that 100 USD is not a particularly small amount even in western countries and due to a blockchains borderless nature, it is even futher above a noraml ‘day-to-day’ transaction amount in large parts of the world.
Conversely, if a blockchain has relatively little organic use by normal users then whales (users with large holdings) will make up a large proportion of on-chain activity and would have average transaction sizes much larger than a day-to-day transaction. An untested guess at a “whale threshold” could be 100,000USD.
Where the ratio of mean to median transaction value is relatively high, we have an environment where the mean value is much higher than the median value, which shows that daily total value transacted is dominated by a few relatively large transactions, rather than many small value transactions. This would imply that whales dominate the blockchain (and likely market behavior) rather than members of the general public or retail investors.
Chains:¶
The chains that will be analysed here are all forks of BTC. They are:
- BTC
- BCH
- BSV
- LTC
- DOGE
Fields¶
using the coinmetrics api, the following metrics will be used:
TxTfrValMeanUSD
The sum USD value of native units transferred divided by the count of transfers (i.e., the mean “size” in USD of a transfer) that interval.
TxTfrValMedUSD
The median USD value transferred per transfer (i.e., the median “size” in USD of a transfer) that interval.
from IPython.display import HTML
HTML('''<script>
code_show=false;
function code_toggle() {
if (code_show){
$('div.input').show();
$('div.output_prompt').show();
} else {
$('div.input').hide();
$('div.output_prompt').hide();
}
code_show =! code_show
}
$( document ).ready(code_toggle);
</script>
<font> This analysis was made using Python. You can toggle the code visibility by clicking <a href="javascript:code_toggle()">here</a>.</font>
''')
# import and setup
import requests
import json
import pandas as pd
import plotly.graph_objs as go
import chart_studio.plotly as py
import plotly
from plotly.offline import init_notebook_mode
plotly.offline.init_notebook_mode()
init_notebook_mode(connected=True)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import warnings
warnings.simplefilter('ignore')
def get_metricdata(asset_id, payload):
url = f'https://community-api.coinmetrics.io/v2/assets/{asset_id}/metricdata'
response = requests.get(
url=url,
params=payload
)
if response.status_code == 200:
# print(f'{asset_id} - success!')
return json.loads(response.content.decode('utf-8'))
else:
print(f'status_code: {response.status_code}')
return None
payload = {
'metrics': 'PriceUSD,'+
'TxTfrValMeanUSD,'+
'TxTfrValMedUSD,'+
'TxTfrValUSD',
'start': '2016-01-01',
}
# PriceUSD and TxTfrValUSD are not utilised yet. Because the work needs to be expanded in order to be complete, I will keep them here for now.
asset_list = ['btc', 'ltc', 'bch', 'bsv', 'doge']
data = {}
for asset in asset_list:
data[asset] = get_metricdata(asset, payload)
dataframes = {}
cols = ['PriceUSD', 'TxTfrValMeanUSD', 'TxTfrValMedUSD', 'TxTfrValUSD']
for asset in data.keys():
values = [ each['values'] for each in data[asset]['metricData']['series']]
index = [ each['time'] for each in data[asset]['metricData']['series']]
df = pd.DataFrame.from_records(values, columns = cols)
df.index = pd.to_datetime(index, infer_datetime_format=True).date
for col in df.columns:
df[col] = df[col].astype(float)
# create new fields
df['TxCount'] = df.TxTfrValUSD / df.TxTfrValMeanUSD
df['MeanMedianRatio'] = df.TxTfrValMeanUSD / df.TxTfrValMedUSD
dataframes[asset] = df
# take a look at the wrangled data:
dataframes['bsv'].sample(5)
dataframes['btc'].sample(5)
dataframes['doge'].sample(5)
Compare daily mean and median USD transaction value for BTC since January 2016¶
btc_mean = go.Scatter(
x=dataframes['btc'].index,
y=dataframes['btc'].TxTfrValMeanUSD,
name='BTC mean',
#marker=dict(color='#ffcdd2')
)
btc_median = go.Scatter(
x=dataframes['btc'].index,
y=dataframes['btc'].TxTfrValMedUSD,
name='BTC median'
#marker=dict(color='#ffcdd2')
)
data = [btc_mean, btc_median]
layout = go.Layout(
title="BTC median and mean transaction values by day",
xaxis=dict(title='Date'),
yaxis=dict(title='USD value'),
yaxis_type='log'
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='btc-mean-median-tnsx-values')
The chart above shows that:
- the daily mean transaction value is higher than the daily median transaction value.
- The two averages are correlated
- From
2016 to present, the mean is approximately 2 orders of magnitude higher than the
median. This relationship appears to be consistent across the previous 4 years.
Note: During the last 4 years, the USD value of 1 BTC has increased from ~400USD
to currently ~10000USD. The impact of the changing USD price of the coins on
mean and median should be investigated. This could be easily achieved using the
TxTfrValUSD
andPriceUSD
metrics. We could also then calculate number of daily transactions.
Plot the ratio of daily mean to median USD transaction values for each asset since January 2016¶
def scatter_plot(asset, name):
return go.Scatter(
x=dataframes[asset].index,
y=dataframes[asset].MeanMedianRatio,
name=name
)
data = [scatter_plot(asset, asset) for asset in dataframes.keys()]
layout = go.Layout(
title="Ratio of daily mean to median transaction value",
xaxis=dict(title='Date'),
yaxis=dict(title='Ratio'),
yaxis_type='log'
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='MMR-btc-ltc-doge-bch-bsv')
Note: Except for BTC, the time series above are very “choppy”. If this chart were to be shown to clients I would consider smoothing the time series by using e.g. 7 day moving average. However, this might obscure some features of the data so for the initial data exploration I will not apply smoothing.
Conclusions¶
The chart above shows that BTC has the lowest ratio of mean to median daily transaction value. This suggests that compared to the other blockchains in this analysis Bitcoin:
- has relatively strong organic use.
- is less influenced by whales.
- MMR has lower day-to-day volatility.
Point 3 suggests a wide and regular user base and total daily transaction volumes should be analysed across the 5 chains to futher strengthen or rebutt these conclusions.
Using this ratio as a proxy to measure organic use, the chain with the second most organic use is Litecoin.
Since the start of 2019, the influence of whales on the Dogecoin network has been decreasing.
Of the two contentious hard forks, Bitcoin Cash shows two distinct phases with different characteristics in each:
From its inception in August 2017 to November 2018, the influence of whales increased at a steady rate. At the coins genesis, there appears to have been a large organic user base transacting daily, bringing the median transaction value to within 50 - 100x the mean daily transaction value. This was lower than Bitcoin’s, which had a much more consistent but higher MMR of 120 - 200.
After November 10 2018, the ratio increases from an average of approximately 500 to approximately 10,000. This is a stark and abrupt change in the daily ratio, and suggests that either
- organic use drastically decreased, or
- BCH very suddenly started being used to facilitate very large value transfers by relatively few users.
As of January 2019, Dogecoin appears to have more widespread organic use than either BCH or BSV, despite its status as a “joke” blockchain. However DOGE has had a higher MMR than BTC or LTC in 2019.
Next Steps¶
This brief investigation was developed over the course of an afternoon, in line with the project brief recommending only 4 hours of work. In order to be applied in a commercial context, this analyis should be expanded and tested in at least the following ways:
Test if the central assumption of this analysis is true. Possible approaches could include:
- Removing exchange outflows from the data. Could this be done using known exchange addresses (exchanges aggregate organic retail investor behavior)
- Quantifying the influence of “change” transactions - in aggregate this should be nil for day-to-day “cash” transactions, but for whales moving the entire balance of an address there would be no “change” amount. Depending how the metric is calculated this may or may not be significant.
- For BTC and LTC, are the lightning networks distorting the results by hiding organic low value transactions?
- For BTC, is the liquid sidechain hiding the activity of whales to the extent that it is not the “healthiest” of the 5 blockchains analysed?
Can we infer where the whales and “normal” users live, by analyzing the time of transactions? People are much more likely to make a transaction at midday than midnight, and we could use this to investigate geographic clustering. e.g. Is BTC a “western” chain, whilst BCH has more organic use in Asia?
An analysis of daily transaction volume (in USD terms) would be essential to this analysis. It would provide a context in which to interpret the significance of differences between each chain and differences bewtween time frames.
Similarly, comparing the hash power dedicated to mining new blocks on each chain would indicate commercial interests, and abrupt changes in hash power could possibly be correlated with changes in mean-median ratio (MMR).