DAT Protocol

If we were able to develop a decentralised social network and bring P2P into the mainstream, we would many privacy and censorship concerns that are associated with today's centralised 'social' networks. One readily-available method is provided by an application layer protocol known as ‘DAT’, and there are reasons to believe it’s likely to succeed where previous ideas failed: Since DAT works entirely at the application layer, and is implemented using Node.js, there’s very little effort or learning curve involved for developers and users of DAT applications. Web applications can be extended to support it, if the demand is there, using already published libraries that are extensively documented.
For those who aren’t developers, there is a working browser that anyone, without technical skills or knowledge, can use to browse and publish sites on the DAT Web.

What is DAT?

DAT started life with the scientific community, which had a need for a more effective method of distributing, tracking and versioning data. In the conventional Web, data objects are moved, Web pages are deleted and domains expire – this is referred to as ‘content drift’. We’ve all come across an example of this in the form of ‘dead links’. When using a hyperlink to reference a data object or Web page, there is no guarantee that link would be valid at some point in the future. DAT was proposed as a solution to this.
But what does this have to do with censorship and privacy, you’re probably asking? The answer to this question is in how data is distributed, discovered and encrypted.

Merkle Trees, Hashing Algorithms and Public Key Encryption

The DAT protocol is essentially a real-world implementation of the Merkle Tree data structure, with the BLAKE2b and Ed25519 algorithms for identification, encryption and verification (other docs state that SHA256 is used as the hashing algorithm). It’s not necessary to understand this concept in order to develop DAT applications, since there are already libraries for implementing this, but for the curious, I reccommend reading Tara Vancil’s explanation first before moving on to the whitepaper.

An important point Vancil made was DAT is about the addressing and discovery of data objects, not the addressing of servers hosting those objects. Data objects are not bound to IP addresses or domains either. Each data object has its own address, and that address is determined by its cryptographic hash value – a file’s hash digest will be static, regardless of where it’s hosted. This is important, because we’re accustomed to thinking of the Internet/Web in terms of the client/server model, and proposed solutions for privacy and anti-censorship typically try to deal with the problem of decentralised host discovery in a peer-to-peer (P2P) network.

Some form of data structure is required to make the data objects addressable and to enable their integrity to be verified. A DAT peer-to-peer network uses Merkle Trees for this, where all ‘leaves’ and nodes contain the hash values of the data objects they represent, and the root node contains the hash digest of all its child nodes. In other words, as the whitepaper puts it, ‘each non-leaf node is the hash of all child nodes‘.
Not only does this provide a way of verifying the integrity of the data objects – the root node’s digest will change if there’s any modification to a data object represented in the tree – it provides the means to an efficient lookup system, as the root hash digest becomes the identifier for a dataset.

Obviously, this means clients would need to fetch the root node’s value for a given dataset from a trusted source, which might be one of many designated lookup peers on the network. If the client wanted a given data object, it wouldn’t need to fetch everything referenced under the root node, but just the root node value, the parent node of the requested objects, and the hash values of the other parent nodes.

Addressing, References and Security

Now, let’s get into the more specific aspects of how Merkle Trees are implemented in the context of DAT. All the ‘leaf’ nodes in the DAT Merkle Tree contain a BLAKE2b or SHA256 (depending on the docs being read) hash digest of the referenced object. All parent nodes contain the hash digest and a cryptographic signature. The signature is generated by creating Ed25519 keys for each parent node and using them to sign the hash digest.

When sharing a locally-created site in the Beaker browser, or viewing one already shared on the network, you might notice the URI following ‘DAT://’ is a long hexadecimal string. This is actually the Ed25519 public key of the archive containing the referenced object being shared, and it’s used to encrypt and decrypt the content. The corresponding private key is required to write changes to the DAT archive. The public key is, in turn, hashed to generate a discovery key, which is used to find the data objects. This ensures no third-party can determine the public key of a private data object that hasn’t been publicly shared.

Beaker

The Beaker browser looks very much like the standard Firefox browser on the surface, and it can be used to browse both DAT:// and HTTP:// addresses. As we can see, DAT sites are rendered just as well as those on the conventional Web. The only problem is that, as with Tor and I2P, sites are hosted on machines that aren’t online 24/7, so many of them are unreachable at a given time.

From the Welcome dialogue, we can get straight to setting up a personal Web site dor publishing on the DAT Web. A default index page, script.js and styles.css are included ready for us to customise. In addition, Beaker allows us to share the contents of an arbitrary directory on the machine it’s running on.

Previously-created sites are available under the ‘Library‘ tab in the main menu. Sites that aleady exist will be listed under the 'Your archives' section, and can be modified and/or published.

What happens to a published site when the local machine is offline? There is a method to keep a site accessible, by somehow getting another person or machine to ‘seed’ the data. This is a short-hand way of saying another person could fetch a copy of the site and re-share it over the network. Seeding happens automatically as a user is actively browsing a DAT site.

The Node.js Modules

Several Node.js modules provide libraries that developers can use to implement DAT features in their applications.

In conjunction with Electron.js and Node.js, the above modules can be used to develop a DAT-enabled desktop application, of which Beaker is just one example.

Node Discovery

Two components are used for this: discovery-channel and discovery-swarm. The discovery-channel component searches BitTorrent, DNS and Multicast DNS servers for peers, and advertises the address/port of the local node. Therefore, it is based on the bittorrent-dht and dns-discovery modules. Using discovery-channel, the client can join channels, terminate sessions, call handlers on session initiation and fetch a list of relevant channels. The network-swarm module uses discovery-channel to connect with DAT peers and control the session.

Using HashBase to Host a DAT Site

One of the main problems in making a site public on a P2P network is its accessibility depends on a running server process on the initial host, until the files are seeded by other peers on the network. Fortunately, some of DAT’s developers operate the HashBase hosting service, which we can use to make our sites persistent.

A site can be developed and uploaded to HashBase very easily from within Beaker’s integrated editor. Mine is a very basic site using Bootstrap.css. When done, review and publish the changes, so the finished article is displayed in the browser at the unique address.

Next, register an account with HashBase.io. There isn’t much in the way of account configuration here (it essentially provides a certain amount of storage associated with an account) so we can get straight to uploading the site’s files.

My site wasn’t accessible after my first attempt, and I assumed it just required time for the changes to propagate on the network. It turned out that dat.json needed to be modified in order to enable HashBase to host it – this isn’t mentioned on the Hashbase site, but instead in their documentation on GitHub. I learned the following lines must be added to dat.json:

{
"title": "SAPPHIRE-DAT"
"dir": "./.hashbase"
"brandname": "Hashbase"
"hostame": "hashbase.local"
"port": "8080"
"csrf": "true"
"bandwidthLimit":
"up": "1mb"
"down": "1mb"
}
This was enough to make the site reachable at dat://sapphire-dat.hashbase.io after re-uploading the archive. Reviewing and publishing changes in Beaker might cause dat.json to be reverted to its default, so it might be worth copying the above into another file called 'template.json'.

There are other recommended configuration options that might be important if we want to host something more than a static site. If you want to enable HTTPS, HashBase can sort the certificate provisioning if the following lines are also added:

"letsencrypt":
"debug": "false"
"agreeTos": "true"
"email": "myname@example.net"
Hashbase uses JSON Web Tokens to manage sessions. You absolutely must replace the secret with a random string before deployment.
"sessions":
"algorithm": "HS256" "secret": "RANDOM-STRING-HERE"
"expiresIn": "1h"