An Introduction to MinaExplorer's BigQuery Public Dataset

Gareth Davies
7 min readOct 15, 2021

--

Mina is a succinct blockchain, which means while we can verify the chain's current state using a zero-knowledge proof, the prior history is not retained. So, if we want to analyze the chain's history (for example, to see individual transactions), we need to obtain it from an archive node.

The official implementation of an archive node stores its data to a Postgres database, which typically requires the use of recursive queries to determine the canonical chain information.

See this post on how to set up and configure an archive node for redundancy.

Those who want to query historical data without running their own archive nodes (and associated Mina node(s)) can use historical services like Figment's DataHub or MinaExplorer's archive GraphQL API. However, neither offer simple aggregation features to answer questions such as "how many blocks were produced in the last 24 hours". While obtaining this information via scripting is possible, directly querying a database with a SQL query is more accessible and efficient.

MinaExplorer has published its custom archive node dataset to Google BigQuery as a public dataset to resolve this issue. Google BigQuery is a cloud-based big data analytics web service for processing very large data sets. Data is replicated from MinaExplorer's database (which stores GraphQL subscriptions) with a small latency of no more than a few minutes.

The schema of the BigQuery dataset matches that of the MinaExplorer GraphQL API. Notably, snarks and user transactions (aka user commands), which are also nested in blocks data, are separated into their own tables for easier querying.

Making your first query

While you can use any GUI that supports BigQuery, such as PopSQL, Tableau, or DataGrip, we will use the BigQuery console to execute our queries for this article. Access the console via https://console.cloud.google.com/bigquery, create a new project if required, and add the data source by selecting Add Data -> Pin a project -> Enter project name and enter minaexplorer.

Pin the project for easy access and to view the schema

While it is not necessary to pin the project, this will allow you to browse the table’s schemas easily.

Browse the schema of the blocks table

To create our first query, click Compose New Query in the top right of the console, and enter the following to determine the number of blocks in the database, and click Run.

SELECT
canonical,
COUNT(statehash) as total
FROM
minaexplorer.archive.blocks
GROUP BY
canonical

If all goes well, you should see output similar to the below (your result will differ).

Number of blocks in the database, grouped by canonical status

This query highlights that the database stores all blocks seen. If you are only interested in the canonical chain, filter all of your queries WHERE canonical = true to only return canonical blocks/snarks/transactions.

While MinaExplorer pays to host the dataset, any queries you run against the dataset are charged against your own personal billing account. There is a generous free tier of 1TB of data processing to get started.

The remainder of the article will provide some sample queries to demonstrate the use of the dataset.

Querying Block Data

Each block has a corresponding datetime field (UTC), which we can use to, for example, group canonical blocks by day.

SELECT datetime_trunc(datetime, DAY) as day,
COUNT(statehash) as total_blocks
FROM minaexplorer.archive.blocks
WHERE canonical = true
GROUP BY day
ORDER BY total_blocks DESC

This query returns the following result, which we could additionally visualize in Data Studio by clicking on the Explore Data link.

Number of canonical blocks produced per day

Querying Transaction Data

For transactions, as well as filtering for the canonical status of the transaction, you should also check that the transaction did not fail. You can do this by adding a WHERE failurereason IS NULL filter as this field is only populated if the transaction fails. Let's find how many transactions have failed and how many of each type:

SELECT failurereason,
COUNT(id) as total
FROM minaexplorer.archive.transactions
WHERE canonical = true
GROUP BY failurereason
Transactions can fail and should filter for failurereason IS NULL

There are currently two failure reasons in the database Amount_insufficient_to_create_account where the receiving account is not in the ledger, and the amount sent does not cover the account creation fee, and Receiver_not_present which occurs when delegating to an account, not in the ledger.

You can also query user transactions via the blocks table, which is stored as nested JSON under blocks->transactions->usercommands. However, using the transactions table for transaction querying is likely simpler, joining on the block statehash if you require additional block data.

Let's find the largest transactions (in MINA) that occurred over the last 7 days.

SELECT amount / 1000000000 as sent_amount,
`from`,
`to`,
dateTime,
blockheight
FROM minaexplorer.archive.transactions
WHERE canonical = true
AND failurereason IS NULL
AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -7 DAY)
ORDER BY amount DESC
Largest transactions over the last 7 days in MINA

The following gist retrieves the last 100 transactions for a public key, both sent and received, with the associated transaction information.

Querying SNARK Data

One major advantage of the MinaExplorer data over the Postgres archive node is the ability to extract individual SNARK information (these are combined to a single fee transfer in the archive node). As with the transaction data, SNARK data is also available, stored as nested JSON, in the blocks table.

Let's get summary data for all SNARKs included in the canonical chain:

SELECT
COUNT(fee) as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee
FROM
minaexplorer.archive.snarks
WHERE
canonical = true
Maximum, minimum, and average SNARK fees in the canonical chain

Yes, this block really did have a SNARK included for 700 MINA.

You can use this data to help choose appropriate SNARK fees, for example, by filtering for just the last 24 hours by adding AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR) to the WHERE clause. Or via the previous 100 blocks (this query includes a subquery to first determine the latest height):

SELECT COUNT(fee)            as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee,
FROM minaexplorer.archive.snarks as s,
(
SELECT blockheight
FROM minaexplorer.archive.blocks
WHERE canonical = true
ORDER BY blockheight DESC
LIMIT 1
) as b
WHERE canonical = true
AND s.blockheight >= (b.blockheight - 100)

Querying Account Data

The following query to determine the number of accounts in the ledger demonstrates how accounts are added to the ledger. Either via:

  • Included in the Genesis ledger (ledger hash jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee).
  • Received a transaction with an amount high enough to cover the ledger creation fee.
  • Was included via a fee transfer (snark work or coinbase receiver) with a high enough amount to cover the ledger creation fee.
Accounts in ledger

The result of this query determines when the account was first added to the ledger so we can visualize the data grouped by date to produce the following chart (thanks to davidg#1063 for help in generating this query).

Number of accounts in the MINA ledger

We can calculate the balance of any account at any block height with the following query. Replace the variable ${public_key} with the address of the account that you wish to calculate.

Calculate the balance of any public key. Will return -1 if the address is not in the ledger.

Balances are comprised of:

  • Any balance in the Genesis ledger (or a 1 MINA ledger creation fee).
  • The amount of any outgoing transactions that succeded.
  • Any fees paid for any outgoing transactions.
  • The amount of any incoming transactions received.
  • Any coinbase rewards received.
  • Any fee transfers received (via block production or snark work).

If the account is not in the ledger, the following query will return -1.

Finally, we can determine the vesting slot of any timed account with the following query (replace the variable{public_key}).

Calculate the unvested slot for any timed public key

Foundation Delegation Analysis

The Foundation delegation program requires calculating the amount to return to the Foundation each epoch from the balance of the pool and the number of blocks produced. While scheduled emails now provide this information, we can independently verify the values in the email using a public key and epoch data using the following query.

We can also confirm the amount sent between the epoch deadlines (change the variables ${public_key} and ${epoch} to match the address and epoch, you wish to check against the emails received.

SELECT SUM(
CAST(json_extract_scalar(h, '$.amount') as NUMERIC)
) / 1000000000 as amount
FROM minaexplorer.archive.blocks
LEFT JOIN unnest(json_extract_array(transactions.usercommands)) as h
WHERE canonical = true
AND json_extract_scalar(h, '$.to') = "${public_key}"
AND protocolstate.consensusstate.slotsincegenesis >= (${epoch} * 7140) + 3501
AND protocolstate.consensusstate.slotsincegenesis <= ((${epoch} + 1) * 7140) + 3500

--

--

Gareth Davies

Technical writer, data wrangler and (former) full stack dev