We're also storing a similar hash tree (a hash is stored in each node -- an insert/update/delete triggers a recomputation of the hash in all ancestor nodes) to verify if a subtree has changed in SirixDB[1]. However, only pointers to neighbour nodes as well as the content is used for recomputation of the ancestor hashes instead of all children during updates. Furthermore checksums of child page-fragments are stored in parent references.
The hashes are used for fast diff calculations along with an optional DeweyID node scheme to quickly get diffs in a resource in preorder or to check if a subtree has changed without fetching all ancestor nodes (due to the stored DeweyIDs).
For instance you can simply check for updates of a node in a JSON structure with the following query:
let $node := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]]
let $result := for $node-in-rev in jn:all-times($node)
return
if ((not(exists(jn:previous($node-in-rev))))
or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in-rev)))) then
$node-in-rev
else
()
return [
for $jsonItem in $result
return { "node": $jsonItem, "revision": sdb:revision($jsonItem) }
]
It iterates through all revisions of a node and returns the node and the revision (and we could also add the author) when it has been updated.
To make this even easier I could write a native Java function for use in the query.
We do something similar to accelerate set operations over the indices of our in-memory immutable binary triple store.
Since our indices are sets, we only have to compute the hashes for the leaves and can then combine them via XOR in our inner nodes.
Due to the self inverse nature of XOR we can then efficiently maintain them over the operations.
Normally this would be a big no no, but since we are using sets, we have the invariant that hashes will never be the same for two different leaves. (Under the assumption of no hash collisions, which is justified for the 128bit SipHash that we use.)
I'm using a rolling hash with SHA265 but only take 128 bits, as it should be enough to avoid collisions. Leaf node hashes are computed on demand, only inner node hashes are stored persistently.
Can you elaborate a bit more? You mean bitset indexes?
I knew that Go had a checksum database but not that it had a tamper-proof log. This seems like something that other package management systems should do as well.
True to an extent--meaning mostly true in practice.
You can mutate the Git data if you chose to using commands like "filter-branch". "filter-branch" isn't used frequently, since it causes issues with every up/down-stream replica if the data has been pushed/pulled, but it is possible. But, even some commonly used commands like "amend", "rebase", and "squash" cause limited data mutations which are broadly considered appropriate and useful.
How often do packages refer to their dependencies using git hashes? Very seldom in my experience, and for good reason. (Unless you're using git submodules, which is not the usual way.)
Go's checksum server does something different by making sure that the names used in module files refer to the same things for different people. Also, it works even if some packages don't come from git repos.
> By using a tamper-evident to store compliance records, you can keep them in one place and simplify presenting them to an auditor. You can cryptographically prove they haven't been tampered with.
Is this realistically the case? Won't most auditors instead say "Microsoft Access with a password is fine. But your homegrown cryptographic black box we can't trust."?
As others have pointed out: very little is different. Depending on if you include consensus in your use of "blockchain", there may be no difference whatsoever.
Currently it's a massively popular buzzword that frequently means very little in reality. It's just being tacked onto everything whether or not it makes sense because e.g. the Long Island Iced Tea company switched their name to "Long Blockchain" and immediately had massive stock gains: https://arstechnica.com/tech-policy/2017/12/iced-tea-company... . Even in the cases where it is technically accurate, it's rarely a good or useful idea... but it helps get funding.
So in practice, "blockchain" currently means "you will magically get rich". Technically it's almost always a Merkle tree, or a less efficient structure (e.g. Bitcoin's core "chain" is basically a linear version, which is dramatically less efficient to verify to the root for any given block... because it doesn't need that quality. Though it also uses Merkle trees within each block).
A blockchain provides a guarantee of (statisical eventual) write availability, since anyone can mine new blocks (eventually). A verifiable log only accepts new entries from a limited, generally closed set of writers (usually just one), and you can only add new entries by going through one of those writers. (Blockchains also tend to have better read availability due to being more widely distributed, but there's no reason why you couldn't destribute a verifiable log that widely, it just tends not to happen in practice.)
This makes a verifiable log unsuitable for financial purposes since a adversary with lead pipe capabilities (or corrupt judge capabilities as the case may be) can block undesirable transactions at a single (or few) point of failure, whereas against a blockchain they'd have to target anyone with significant compute capacity.
On the other hand, verifiable logs don't need proof of work/stake/etc to limit hostile forking, so if the log is describing things rather than acting a ground source of truth (so you can just ignore it if it starts refusing writes), it's much more efficient than a blockchain.
None of those things are features of blockchains, they are all features of cryptocurrencies.
Blockchains refer to any time you use a chain of hashes to prove a block of data and any data that came before it, has not been mutated.
The time keeper service that takes out classified ads in the new york times with hash blocks for the documents they timestamped to prove they were timestamped before a certain date, is a blockchain, because they include the previous day's hash.
Git, is a blockchain.
It's just hashes all the way down.
The random entropy driver in linux, is technically a blockchain, with how it works.
There is no token/currency associated to it, and it's made for a single specific purpose only, compared to most blockchains which try to run programs, manage wallets, etc. It also doesn't have a decentralized consensus algorithm, nor is there a gossip network to inform about new transactions that are to be added to the log. It's centralized per design.
Which of these differences you see as important to the blockchain / non-blockchain distinction, depends on the precise definition for blockchain that you want to use.
The hashes are used for fast diff calculations along with an optional DeweyID node scheme to quickly get diffs in a resource in preorder or to check if a subtree has changed without fetching all ancestor nodes (due to the stored DeweyIDs).
For instance you can simply check for updates of a node in a JSON structure with the following query:
It iterates through all revisions of a node and returns the node and the revision (and we could also add the author) when it has been updated.To make this even easier I could write a native Java function for use in the query.
[1] https://sirix.io
Since our indices are sets, we only have to compute the hashes for the leaves and can then combine them via XOR in our inner nodes. Due to the self inverse nature of XOR we can then efficiently maintain them over the operations.
Normally this would be a big no no, but since we are using sets, we have the invariant that hashes will never be the same for two different leaves. (Under the assumption of no hash collisions, which is justified for the 128bit SipHash that we use.)
Can you elaborate a bit more? You mean bitset indexes?
Our query engine/compiler is and can be used by other data stores as well:
http://brackit.io
You can mutate the Git data if you chose to using commands like "filter-branch". "filter-branch" isn't used frequently, since it causes issues with every up/down-stream replica if the data has been pushed/pulled, but it is possible. But, even some commonly used commands like "amend", "rebase", and "squash" cause limited data mutations which are broadly considered appropriate and useful.
Go's checksum server does something different by making sure that the names used in module files refer to the same things for different people. Also, it works even if some packages don't come from git repos.
Is this realistically the case? Won't most auditors instead say "Microsoft Access with a password is fine. But your homegrown cryptographic black box we can't trust."?
Pretty sure he is referring to https://immudb.io.
This is a common pattern now.
Similarly if one says B tree or LSM tree that doesn't mean they are also talking about a consensus algorithm.
Deleted Comment
Currently it's a massively popular buzzword that frequently means very little in reality. It's just being tacked onto everything whether or not it makes sense because e.g. the Long Island Iced Tea company switched their name to "Long Blockchain" and immediately had massive stock gains: https://arstechnica.com/tech-policy/2017/12/iced-tea-company... . Even in the cases where it is technically accurate, it's rarely a good or useful idea... but it helps get funding.
So in practice, "blockchain" currently means "you will magically get rich". Technically it's almost always a Merkle tree, or a less efficient structure (e.g. Bitcoin's core "chain" is basically a linear version, which is dramatically less efficient to verify to the root for any given block... because it doesn't need that quality. Though it also uses Merkle trees within each block).
This makes a verifiable log unsuitable for financial purposes since a adversary with lead pipe capabilities (or corrupt judge capabilities as the case may be) can block undesirable transactions at a single (or few) point of failure, whereas against a blockchain they'd have to target anyone with significant compute capacity.
On the other hand, verifiable logs don't need proof of work/stake/etc to limit hostile forking, so if the log is describing things rather than acting a ground source of truth (so you can just ignore it if it starts refusing writes), it's much more efficient than a blockchain.
Blockchains refer to any time you use a chain of hashes to prove a block of data and any data that came before it, has not been mutated.
The time keeper service that takes out classified ads in the new york times with hash blocks for the documents they timestamped to prove they were timestamped before a certain date, is a blockchain, because they include the previous day's hash.
Git, is a blockchain.
It's just hashes all the way down.
The random entropy driver in linux, is technically a blockchain, with how it works.
Deleted Comment
Which of these differences you see as important to the blockchain / non-blockchain distinction, depends on the precise definition for blockchain that you want to use.
That being said, even the CT website uses the term "ledger" to explain what the logs are: https://certificate.transparency.dev/howctworks/