Merkle Trees and Online Voting
I don't quite know how it happened, but I ended up on the topic of Merkle trees the other day. It was probably inevitable given how much I was reading blockchain related stuff[1]. Either way, I ended up discovering the method that chains of hashes are created.
I don't think I need to explain what Merkle trees are here. That's a different topic. But in it's most basic form, Merkle trees work like this:
- Take some content. Hash it.
- Take some more content. Hash each of it.
- Combine hashes two at a time. Hash the combined values.
- Repeat the process until you are down to just two more hashes. Combine them, hash them, call it a day.
Now that I'm done butchering the explanation of it I'll just share this diagram that explains it better than me.
Honestly, I'm more interested in the application of this and I'm going to share what I learnt about how it is applied in the real world. Specifically in git. Some of the details are going to be inaccurate, but what mattered to me was understanding how data is broken into various chunks when we are not talking about breaking up a large file into smaller chunks.
- Git first takes each file, adds some information to the contents and gets the hash for it
- It then combines all that information with the commit info, file modes, file names, etc and gets a hash.
- This isn't 100% true. I decided to mention this part separately. It also adds the previous commit hash into the information.
Why is the previous commit hash so important? This is what makes it a tree. A chain. If every commit hash depends on the previous hash in some form, that means that it's impossible to change any part of the chain without changing everything else that comes after it. This is pretty awesome because it prevents anyone from messing around with data without it being noticed immediately. Trying to sneak in a commit into the early history? Good luck. No one needs to scan the entire history each minute of the day to notice it; the discrepancy will be visible in the latest commit hash itself. I think[2].
How is this related to voting?
When I first had questions about practical uses of the blockchain, the only one that actually made sense to me was voting. This isn't to say that there aren't any other uses for the blockchain. I just found voting to be the most practical. The way it was presented to me was:
- A competition requires verified voting.
- No matter what comes from the server counting the votes, it's turtles down — you have to verify the software the verifies the software that verifies the software that...
- A blockchain ensures that all transactions are verified by everyone before it even gets inserted to the chain.
This made perfect sense to me. This was one case where the technology itself could definitely prevent fraud[3].
But blockchains cost a lot of energy due to their hash generation requiring significant computing power. And it was probably during a discussion here that I stumbled upon Merkle trees.
And this brings me to an idea I'm mulling about in my head. It's really not something I'm thinking of as a viable business, just an experiment.
An open source Merkle tree based online voting software
Here's what I'd like to try. A software which works in the following way.
- It can be used to organize elections. This means registering voters, creating ballots. The works
- Individuals vote using their accounts created for the elections. Obviously some thought is given to how individuals only once.
- Each vote is recorded into a Merkle tree.
- When recording the vote, personal information of the voter is disassociated from the vote. The fact that a person voted can be stored as a separate piece of information (also part of the chain)
- With each vote that is recorded, the voter receives a token that is auto generated and stored as part of the vote information. This token can be used later to verify one's vote.
- The merkle tree can be continuously queried so that it can be mirrored by others.
- That said, information that the merkle tree points to will have to remain hidden. Why? Vote tallies are never revealed till the end to avoid bias in voting.
- So after the election is complete, the information is revealed. All information can then be verified by independent parties to ensure that there has been no tampering.
This is a very insecure system for a country election. But a small company? Universities? I think this could be a pretty nice experiment honestly.
Final note: Probably worthless
Of course, the people this would be targeted at probably don't care for the kind of "security" that a solution like this offers. That said, self hosted election software seems to be under served. The options that I could find so far are:
None of these seem to really serve the purpose I'm thinking of just in terms of creating ballots and registering voters and what not. Lime survey is close but just from its name, we can tell what it's intended use case is.
Given that that is the case it's probably worth trying this out at some point.
There's been a lot of chatter around NFT's these days. Nvidia has also started adding support to prevent its cards being used for mining. That's a fascinating story on its own. ↩︎
Git seems to verify the chain in ways I don't fully understand. I successfully altered the middle of a git commit history — I made the 3rd git commit change its parent to be the 1st git commit. Running
git log
showed the 2nd git was now gone. No errors were thrown until I rangit log --stat 3rdCommitHash
. Fascinating. ↩︎This argument breaks down when it comes to critical elections such as those run by the state. At that point, attacks on blockchain tech (or any tech) becomes commercially viable to scale up. ↩︎
Posted on March 19 2021 by Adnan Issadeen