bitcoin
Analysis of the mean and median value of transactions on 5 Blockchains
September 2019

Comparing the daily mean and median value of on-chain transactions gives an indication of the extent of organic blockchain use. Comparing the mean-median ratio (MMR) across different Bitcoin forks shows that Bitcoin (BTC) is the least influenced by “whales”.

This analysis was prepared for Coin Metrics as part of their recruitment process. It is a short demonstration of my thought process. The additional steps required to develop this into a useful analysis are also discussed.

CoinMetrics Case Study

Objective - to evaluate skills and abilities in multiple ways:

  1. Importing data
  2. Wrangling data
  3. Exploring data
  4. Analysis
  5. Modeling
  6. Communicating results

Provide:

  1. A written explanation of how to approach the problem
  2. Present the beginning phases of implementation using coin metrics data

Of the four options made available in the case study, option 3 was chosen

Advocating for CoinMetric’s data

Produce quality research that is of value to potential clients (doesn’t have to be complete) with a particular focus on network data

Initial ideas

My first rough ideas were:

  1. Compare different Bitcoin based chains, (BTC, BCH, LTC, BSV) to test the influence of whales and compare this to their respective (evolving) claims to be a store of value (SoV) and/or alternative to cash.

  2. Develop and expand some of the research by Willy Woo. I find his research to be outstanding. In particular I think the following metrics merit further investigation:

    1. days destroyed
    2. hodl waves
    3. thermo cap
    4. average cap
  3. Tracking the number of twitter followers of various crypto-twitter thought leaders and celebrities to test the hypothesis that “an increase in follower numbers shows that new retail investors are entering crypto-markets, and an increase in price is expected soon”

    Thought leaders / crypto celebrities could be further grouped by what types of coins they speak about most - smart contracts, DeFi, privacy coins, etc.

    Weibo could be analysed as well as Twitter to understand Chinese markets, Korean twitter could be analysed for the Korean retail market,etc.

  4. I have an existing side project which has the goal of using a recurrent neural net using an LSTM architecture to predict BTC price movements. The app (model, stored data, data pipeline, visualization of results) will run autonomously on Google Cloud Platform. Candle data is consumed from CoinAPI.io and stored in BigQuery.

    Technical indicators will be calculated and used as additional factors to the model. Sentiment analysis from news outlets (Bloomberg, FT) would be added later.

    The model would be written using TensorFlow, and the BigQuery tables names would use BQ’s date format capabilites. This would make the project faster and cheaper.

Idea 1 seemed like a sensible option. Ideas 3 and 4 are interesting and worth investigating, but not possible within the scope of this exercise:

Testing the influence of whales

and “normal users” on BTC and 4 BTC forks, and discussing results in the context of each chain’s claimed technical advantages and use cases as e.g. a store of value or alternative to cash

This will be achieved by comparing daily mean USD transaction value to daily median USD transaction value. This is done by calculating the mean-median ratio of transaction value (MMR).

Hypothesis: If a chain has a much smaller median transaction size than mean transaction size, then on chain activity is dominated not by regular users making normal daily transactions, but by whales moving large amounts of currency to artificially inflate usage metrics.

This could contradict claims that a blockchain has an active user base that the blockchain is meeting user needs. We assume that:

  1. If a blockchain is functioning as digital cash, then most of its transactions would be small. e.g less than 100 USD. It should be noted that 100 USD is not a particularly small amount even in western countries and due to a blockchains borderless nature, it is even futher above a noraml ‘day-to-day’ transaction amount in large parts of the world.

  2. Conversely, if a blockchain has relatively little organic use by normal users then whales (users with large holdings) will make up a large proportion of on-chain activity and would have average transaction sizes much larger than a day-to-day transaction. An untested guess at a “whale threshold” could be 100,000USD.

  3. Where the ratio of mean to median transaction value is relatively high, we have an environment where the mean value is much higher than the median value, which shows that daily total value transacted is dominated by a few relatively large transactions, rather than many small value transactions. This would imply that whales dominate the blockchain (and likely market behavior) rather than members of the general public or retail investors.

Chains:

The chains that will be analysed here are all forks of BTC. They are:

  • BTC
  • BCH
  • BSV
  • LTC
  • DOGE

Fields

using the coinmetrics api, the following metrics will be used:

  1. TxTfrValMeanUSD

    The sum USD value of native units transferred divided by the count of transfers (i.e., the mean “size” in USD of a transfer) that interval.

  1. TxTfrValMedUSD

    The median USD value transferred per transfer (i.e., the median “size” in USD of a transfer) that interval.

In [1]:
from IPython.display import HTML
HTML('''<script>
code_show=false; 
function code_toggle() {
 if (code_show){
 $('div.input').show();
 $('div.output_prompt').show();
 } else {
 $('div.input').hide();
 $('div.output_prompt').hide();
 }
 code_show =! code_show
} 
$( document ).ready(code_toggle);
</script>

<font> This analysis was made using Python. You can toggle the code visibility by clicking <a href="javascript:code_toggle()">here</a>.</font>
''')
Out[1]:
This analysis was made using Python. You can toggle the code visibility by clicking here.
In [2]:
# import and setup
import requests
import json

import pandas as pd
import plotly.graph_objs as go
import chart_studio.plotly as py
import plotly
from plotly.offline import init_notebook_mode
plotly.offline.init_notebook_mode()
init_notebook_mode(connected=True)

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import warnings
warnings.simplefilter('ignore')
In [3]:
def get_metricdata(asset_id, payload):
    url = f'https://community-api.coinmetrics.io/v2/assets/{asset_id}/metricdata'
    response = requests.get(
        url=url,
        params=payload
    )
    
    if response.status_code == 200:
        # print(f'{asset_id} - success!')
        return json.loads(response.content.decode('utf-8'))
    else:
        print(f'status_code: {response.status_code}')
        return None
In [4]:
payload = {
    'metrics':  'PriceUSD,'+
                'TxTfrValMeanUSD,'+
                'TxTfrValMedUSD,'+
                'TxTfrValUSD',
    'start': '2016-01-01',
}
# PriceUSD and TxTfrValUSD are not utilised yet. Because the work needs to be expanded in order to be complete, I will keep them here for now.

asset_list = ['btc', 'ltc', 'bch', 'bsv', 'doge']
data = {}
for asset in asset_list:
    data[asset] = get_metricdata(asset, payload)
In [5]:
dataframes = {}
cols = ['PriceUSD', 'TxTfrValMeanUSD', 'TxTfrValMedUSD', 'TxTfrValUSD']
for asset in data.keys():
    values = [ each['values'] for each in data[asset]['metricData']['series']]
    index = [ each['time'] for each in data[asset]['metricData']['series']]
    
    df = pd.DataFrame.from_records(values, columns = cols)
    df.index = pd.to_datetime(index, infer_datetime_format=True).date
    
    for col in df.columns:
        df[col] = df[col].astype(float)
    
    # create new fields
    df['TxCount'] = df.TxTfrValUSD / df.TxTfrValMeanUSD
    df['MeanMedianRatio'] = df.TxTfrValMeanUSD / df.TxTfrValMedUSD
    
    dataframes[asset] = df
In [6]:
# take a look at the wrangled data:
dataframes['bsv'].sample(5)
dataframes['btc'].sample(5)
dataframes['doge'].sample(5)
Out[6]:
PriceUSD TxTfrValMeanUSD TxTfrValMedUSD TxTfrValUSD TxCount MeanMedianRatio
2019-07-14 131.077983 1198.889057 0.879414 1.194010e+08 99593.000016 1363.281771
2019-05-29 193.318255 11679.327829 39.889202 5.360111e+08 45894.000000 292.794223
2019-03-28 63.633667 4749.332321 2.107293 6.684210e+07 14074.000000 2253.759666
2018-12-06 104.833018 47784.404159 25.002851 4.480266e+08 9376.000000 1911.158216
2019-08-20 145.397527 403.290224 1.017783 4.721077e+07 117063.999986 396.243941
Out[6]:
PriceUSD TxTfrValMeanUSD TxTfrValMedUSD TxTfrValUSD TxCount MeanMedianRatio
2018-09-17 6259.378141 6375.290054 63.354108 3.529992e+09 553699.0 100.629466
2018-03-08 9334.330053 13564.972440 115.393462 6.573545e+09 484597.0 117.554082
2018-09-18 6341.494595 6596.250810 65.237872 3.774480e+09 572216.0 101.110760
2017-10-18 5577.949350 12797.567565 68.015674 1.002400e+10 783274.0 188.156154
2018-01-21 11392.308625 13715.375223 161.642277 8.149868e+09 594214.0 84.850173
Out[6]:
PriceUSD TxTfrValMeanUSD TxTfrValMedUSD TxTfrValUSD TxCount MeanMedianRatio
2018-02-17 0.007081 2667.709156 1.382312 1.623514e+08 60858.0 1929.889309
2019-02-26 0.001952 353.988959 0.807753 2.240715e+07 63299.0 438.239326
2019-03-22 0.002017 200.227768 0.443784 1.414069e+07 70623.0 451.182435
2018-08-14 0.002247 184.665415 0.516715 1.255374e+07 67981.0 357.383769
2017-04-09 0.000385 235.941139 0.234514 9.458172e+06 40087.0 1006.084710

Compare daily mean and median USD transaction value for BTC since January 2016

In [7]:
btc_mean = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMeanUSD,
    name='BTC mean',
    #marker=dict(color='#ffcdd2')
)
btc_median = go.Scatter(
    x=dataframes['btc'].index,
    y=dataframes['btc'].TxTfrValMedUSD,
    name='BTC median'
    #marker=dict(color='#ffcdd2')
)

data = [btc_mean, btc_median]
layout = go.Layout(
    title="BTC median and mean transaction values by day",
    xaxis=dict(title='Date'),
    yaxis=dict(title='USD value'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='btc-mean-median-tnsx-values')
Out[7]:

The chart above shows that:

  1. the daily mean transaction value is higher than the daily median transaction value.
  2. The two averages are correlated
  3. From 2016 to present, the mean is approximately 2 orders of magnitude higher than the median. This relationship appears to be consistent across the previous 4 years. Note: During the last 4 years, the USD value of 1 BTC has increased from ~400USD to currently ~10000USD. The impact of the changing USD price of the coins on mean and median should be investigated. This could be easily achieved using the TxTfrValUSD and PriceUSD metrics. We could also then calculate number of daily transactions.

Plot the ratio of daily mean to median USD transaction values for each asset since January 2016

In [8]:
def scatter_plot(asset, name):
    return go.Scatter(
        x=dataframes[asset].index,
        y=dataframes[asset].MeanMedianRatio,
        name=name
    )
        
data = [scatter_plot(asset, asset) for asset in dataframes.keys()]
layout = go.Layout(
    title="Ratio of daily mean to median transaction value",
    xaxis=dict(title='Date'),
    yaxis=dict(title='Ratio'),
    yaxis_type='log'
)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='MMR-btc-ltc-doge-bch-bsv')
Out[8]:

Note: Except for BTC, the time series above are very “choppy”. If this chart were to be shown to clients I would consider smoothing the time series by using e.g. 7 day moving average. However, this might obscure some features of the data so for the initial data exploration I will not apply smoothing.

Conclusions

The chart above shows that BTC has the lowest ratio of mean to median daily transaction value. This suggests that compared to the other blockchains in this analysis Bitcoin:

  1. has relatively strong organic use.
  2. is less influenced by whales.
  3. MMR has lower day-to-day volatility.

Point 3 suggests a wide and regular user base and total daily transaction volumes should be analysed across the 5 chains to futher strengthen or rebutt these conclusions.

Using this ratio as a proxy to measure organic use, the chain with the second most organic use is Litecoin.

Since the start of 2019, the influence of whales on the Dogecoin network has been decreasing.

Of the two contentious hard forks, Bitcoin Cash shows two distinct phases with different characteristics in each:

From its inception in August 2017 to November 2018, the influence of whales increased at a steady rate. At the coins genesis, there appears to have been a large organic user base transacting daily, bringing the median transaction value to within 50 - 100x the mean daily transaction value. This was lower than Bitcoin’s, which had a much more consistent but higher MMR of 120 - 200.

After November 10 2018, the ratio increases from an average of approximately 500 to approximately 10,000. This is a stark and abrupt change in the daily ratio, and suggests that either

  • organic use drastically decreased, or
  • BCH very suddenly started being used to facilitate very large value transfers by relatively few users.

As of January 2019, Dogecoin appears to have more widespread organic use than either BCH or BSV, despite its status as a “joke” blockchain. However DOGE has had a higher MMR than BTC or LTC in 2019.

Next Steps

This brief investigation was developed over the course of an afternoon, in line with the project brief recommending only 4 hours of work. In order to be applied in a commercial context, this analyis should be expanded and tested in at least the following ways:

  1. Test if the central assumption of this analysis is true. Possible approaches could include:

    1. Removing exchange outflows from the data. Could this be done using known exchange addresses (exchanges aggregate organic retail investor behavior)
    2. Quantifying the influence of “change” transactions - in aggregate this should be nil for day-to-day “cash” transactions, but for whales moving the entire balance of an address there would be no “change” amount. Depending how the metric is calculated this may or may not be significant.
    3. For BTC and LTC, are the lightning networks distorting the results by hiding organic low value transactions?
    4. For BTC, is the liquid sidechain hiding the activity of whales to the extent that it is not the “healthiest” of the 5 blockchains analysed?
  2. Can we infer where the whales and “normal” users live, by analyzing the time of transactions? People are much more likely to make a transaction at midday than midnight, and we could use this to investigate geographic clustering. e.g. Is BTC a “western” chain, whilst BCH has more organic use in Asia?

  3. An analysis of daily transaction volume (in USD terms) would be essential to this analysis. It would provide a context in which to interpret the significance of differences between each chain and differences bewtween time frames.

  4. Similarly, comparing the hash power dedicated to mining new blocks on each chain would indicate commercial interests, and abrupt changes in hash power could possibly be correlated with changes in mean-median ratio (MMR).