An Introduction to MinaExplorer's BigQuery Public Dataset
Mina is a succinct blockchain, which means while we can verify the chain's current state using a zero-knowledge proof, the prior history is not retained. So, if we want to analyze the chain's history (for example, to see individual transactions), we need to obtain it from an archive node.
The official implementation of an archive node stores its data to a Postgres database, which typically requires the use of recursive queries to determine the canonical chain information.
See this post on how to set up and configure an archive node for redundancy.
Those who want to query historical data without running their own archive nodes (and associated Mina node(s)) can use historical services like Figment's DataHub or MinaExplorer's archive GraphQL API. However, neither offer simple aggregation features to answer questions such as "how many blocks were produced in the last 24 hours". While obtaining this information via scripting is possible, directly querying a database with a SQL query is more accessible and efficient.
MinaExplorer has published its custom archive node dataset to Google BigQuery as a public dataset to resolve this issue. Google BigQuery is a cloud-based big data analytics web service for processing very large data sets. Data is replicated from MinaExplorer's database (which stores GraphQL subscriptions) with a small latency of no more than a few minutes.
The schema of the BigQuery dataset matches that of the MinaExplorer GraphQL API. Notably, snarks and user transactions (aka user commands), which are also nested in blocks data, are separated into their own tables for easier querying.
Making your first query
While you can use any GUI that supports BigQuery, such as PopSQL, Tableau, or DataGrip, we will use the BigQuery console to execute our queries for this article. Access the console via https://console.cloud.google.com/bigquery, create a new project if required, and add the data source by selecting Add Data -> Pin a project -> Enter project name and enter minaexplorer.
While it is not necessary to pin the project, this will allow you to browse the table’s schemas easily.
To create our first query, click Compose New Query in the top right of the console, and enter the following to determine the number of blocks in the database, and click Run.
SELECT
canonical,
COUNT(statehash) as total
FROM
minaexplorer.archive.blocks
GROUP BY
canonical
If all goes well, you should see output similar to the below (your result will differ).
This query highlights that the database stores all blocks seen. If you are only interested in the canonical chain, filter all of your queries WHERE canonical = true
to only return canonical blocks/snarks/transactions.
While MinaExplorer pays to host the dataset, any queries you run against the dataset are charged against your own personal billing account. There is a generous free tier of 1TB of data processing to get started.
The remainder of the article will provide some sample queries to demonstrate the use of the dataset.
Querying Block Data
Each block has a corresponding datetime
field (UTC), which we can use to, for example, group canonical blocks by day.
SELECT datetime_trunc(datetime, DAY) as day,
COUNT(statehash) as total_blocks
FROM minaexplorer.archive.blocks
WHERE canonical = true
GROUP BY day
ORDER BY total_blocks DESC
This query returns the following result, which we could additionally visualize in Data Studio by clicking on the Explore Data link.
Querying Transaction Data
For transactions, as well as filtering for the canonical status of the transaction, you should also check that the transaction did not fail. You can do this by adding a WHERE failurereason IS NULL
filter as this field is only populated if the transaction fails. Let's find how many transactions have failed and how many of each type:
SELECT failurereason,
COUNT(id) as total
FROM minaexplorer.archive.transactions
WHERE canonical = true
GROUP BY failurereason
There are currently two failure reasons in the database Amount_insufficient_to_create_account
where the receiving account is not in the ledger, and the amount sent does not cover the account creation fee, and Receiver_not_present
which occurs when delegating to an account, not in the ledger.
You can also query user transactions via the blocks table, which is stored as nested JSON under blocks->transactions->usercommands. However, using the transactions table for transaction querying is likely simpler, joining on the block statehash if you require additional block data.
Let's find the largest transactions (in MINA) that occurred over the last 7 days.
SELECT amount / 1000000000 as sent_amount,
`from`,
`to`,
dateTime,
blockheight
FROM minaexplorer.archive.transactions
WHERE canonical = true
AND failurereason IS NULL
AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -7 DAY)
ORDER BY amount DESC
The following gist retrieves the last 100 transactions for a public key, both sent and received, with the associated transaction information.
Querying SNARK Data
One major advantage of the MinaExplorer data over the Postgres archive node is the ability to extract individual SNARK information (these are combined to a single fee transfer in the archive node). As with the transaction data, SNARK data is also available, stored as nested JSON, in the blocks table.
Let's get summary data for all SNARKs included in the canonical chain:
SELECT
COUNT(fee) as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee
FROM
minaexplorer.archive.snarks
WHERE
canonical = true
Yes, this block really did have a SNARK included for 700 MINA.
You can use this data to help choose appropriate SNARK fees, for example, by filtering for just the last 24 hours by adding AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR)
to the WHERE
clause. Or via the previous 100 blocks (this query includes a subquery to first determine the latest height):
SELECT COUNT(fee) as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee,
FROM minaexplorer.archive.snarks as s,
(
SELECT blockheight
FROM minaexplorer.archive.blocks
WHERE canonical = true
ORDER BY blockheight DESC
LIMIT 1
) as b
WHERE canonical = true
AND s.blockheight >= (b.blockheight - 100)
Querying Account Data
The following query to determine the number of accounts in the ledger demonstrates how accounts are added to the ledger. Either via:
- Included in the Genesis ledger (ledger hash
jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee
). - Received a transaction with an amount high enough to cover the ledger creation fee.
- Was included via a fee transfer (snark work or coinbase receiver) with a high enough amount to cover the ledger creation fee.
The result of this query determines when the account was first added to the ledger so we can visualize the data grouped by date to produce the following chart (thanks to davidg#1063 for help in generating this query).
We can calculate the balance of any account at any block height with the following query. Replace the variable ${public_key}
with the address of the account that you wish to calculate.
Balances are comprised of:
- Any balance in the Genesis ledger (or a 1 MINA ledger creation fee).
- The amount of any outgoing transactions that succeded.
- Any fees paid for any outgoing transactions.
- The amount of any incoming transactions received.
- Any coinbase rewards received.
- Any fee transfers received (via block production or snark work).
If the account is not in the ledger, the following query will return -1.
Finally, we can determine the vesting slot of any timed account with the following query (replace the variable{public_key}
).
Foundation Delegation Analysis
The Foundation delegation program requires calculating the amount to return to the Foundation each epoch from the balance of the pool and the number of blocks produced. While scheduled emails now provide this information, we can independently verify the values in the email using a public key and epoch data using the following query.
We can also confirm the amount sent between the epoch deadlines (change the variables ${public_key}
and ${epoch}
to match the address and epoch, you wish to check against the emails received.
SELECT SUM(
CAST(json_extract_scalar(h, '$.amount') as NUMERIC)
) / 1000000000 as amount
FROM minaexplorer.archive.blocks
LEFT JOIN unnest(json_extract_array(transactions.usercommands)) as h
WHERE canonical = true
AND json_extract_scalar(h, '$.to') = "${public_key}"
AND protocolstate.consensusstate.slotsincegenesis >= (${epoch} * 7140) + 3501
AND protocolstate.consensusstate.slotsincegenesis <= ((${epoch} + 1) * 7140) + 3500