New Music Weekly Magazine 2021 New Music Awards Nominations Deadline Extended

New Music Awards nomination ballot deadline is fast approaching. The leading radio industry publication New Music Weekly hosts an “open-ballot” where music industry VIPs, subscribers, and music fans…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Tracking Down a Tendermint Consensus Failure

After a little bit of digging, we realized the blockchain had stopped at height 190258. And unfortunately, we had been ignoring the alerts channel the whole time. When we dug into the logs, the first thing it said was a panic tied to a tendermint consensus failure: “Wrong Block.Header.AppHash”

The purpose of this is to make sure we’re on track. We want to make reproducible software, so we can get the same answer everywhere, so an audit later will get the same answer. So we have to make sure all nodes have the same state. If one node does not match, it will drop out of the consensus process, but as long as over two-thirds are in sync, the blockchain will continue functioning. (This is basically cryptographically verified event sourcing)

However, the process of tracking and fixing it was rather complicated. And I’d like to illuminate the way for others that might come across this. In our case, we had four nodes. One dropped out several thousand blocks ago and a second dropped out at the point that we found the error, breaking the two-thirds threshold and halting the chain.

My first thought was of some sort of non determinism in our code. So, the first thing we did was try to copy the raw iavl/leveldb databases from the abci apps, to isolate what was different between those nodes. Once I got the complete app states, showing two different app hashes from the same set of transactions, I began look for tools to visualize what was in the stores.

There was nothing.

I figured, some value must be different in the databases that have different hashes. And when I figure out which value is different, I can get closer
to figuring out what code wrong. So I ran this code on two databases and it printed out all the keys and all their values as well as the root hash. And when I ran a diff on it, all the keys and values were the same, but the root hash was different.

That was weird.

Then I remembered that iavl is a form of self-balancing binary tree, so the hash depends not only on the contents, but also on the shape of the tree. And I remembered discussions with Jae how different a insert order could lead to different shapes, and thus different hashes….

So, I had to dig into that more, and that is how I found this bug in my framework.

But I will leave that for the next part. Not just the cause of our inconsistencies, but how you can track down this bug in a Merkle tree, and understanding Merkle trees in general.

Add a comment

Related posts:

Fundamentals of User Stories

One of the biggest challenges in software development is the nearly impossible task of gathering precise requirements and then getting those requirements to remain unchanged during code development…

Eu e ELA

Foi no final de 2013 quando eu a vi pela primeira vez. ELA chegou em minha casa sem pedir licença e foi, aos poucos, mostrando seu jeito único de ser. Eu não a conhecia, nem se quer tinha ouvido seu…

Why VLSI is Used?

Very-large-scale integration (VLSI) is a process used in the design and manufacture of integrated circuits (ICs). These circuits are found in a wide range of electronic devices, including computers…