The World Computer’s Hard Drive: Swarm

The World Computer’s Hard Drive: Swarm


Photo by Sharon McCutcheon

This article is part of an on-going series on decentralized file storage for science introduced in Rich in Data, Poor in Wisdom: Science Needs a Decentralized Data Commons.

Swarm is a permissionless, censorship-resistant, peer-to-peer storage and communication system that scales automatically and is sustained by economic incentives. However, as of this writing, it has not been fully implemented. Having grown out of the Ethereum project, it is intended to complete the goal of creating an entirely decentralized world computer on which any program could be written or executed and any data could be stored by anyone with an internet connection. If Ethereum is the world computer’s CPU, Swarm is “the world computer’s storage and communication,” though its emphasis is storage (source). Here we review the basics of Swarm, its incentive system, and its access control system.

Swarm Network Basics

The Swarm network consists of nodes which store and transfer chunks, which are files or parts of files. Every node and every chunk has an address. Nodes and chunks with similar addresses are grouped into neighborhoods. Every node in a neighborhood connects to every other node in the neighborhood. Nodes also connect to some nodes outside their neighborhoods, but the further apart two nodes’ addresses are, the less likely they are to be connected.

Representation of a neighborhood (center) within the Swarm network. Source: Swarm whitepaper.

All nodes in a neighborhood are incentivized (via postage lotteries, discussed below) to store the chunks whose addresses are in the neighborhood. Every chunk is 4 KB or less. Any data upload that’s more than 4 KB is uploaded as multiple chunks linked together in a hash tree, and a manifest entry is created which contains a reference to the hash tree’s root node. Every chunk is either a content-addressed chunk–whose address is a hash of the chunk’s data–or a single-owner chunk. A single-owner chunk is the same as a content-addressed chunk, except it also includes an ID and cryptographic signature attesting to the chunk’s integrity.

Incentive System

Swarm’s incentive system accounts for both storage and bandwidth sharing.

Storage Incentives

Swarm’s storage incentives encourage nodes to store some data for at least as long as payment for that data supports storage. The most important construct for these incentives is the postage stamp (source). A postage stamp is a proof of payment submitted alongside a data upload, and the proof is submitted through Ethereum smart contracts. This payment is used to cover recurring storage costs for the nodes who store the data and the cost of forwarding the data to its neighborhood. By requiring a node to commit before sending a request to the payments associated with the request, Swarm discourages nodes from flooding the storage network with useless data and requests.

Swarm incentivizes long-term storage with postage lotteries, which occur every few blocks. In a postage lottery, storage nodes in a neighborhood offer to store chunks for certain prices, and the nodes who offer the lowest prices win the lottery and are compensated. To win, however, a storage node must also submit successful proofs of storage (which is a simple binary merkle tree proof) for all chunks for which it is responsible. Once a lottery round is complete, all the nodes in the neighborhood are required to store chunks at the lowest price. Postage lotteries, which are funded by postage stamps, encourage storage nodes to store all data in their neighborhood and to offer the lowest prices.

While postage stamps and lotteries encourage good behavior, Swarm has a couple mechanisms to discourage bad behavior. To prevent nodes from simply collecting initial fees and then deleting data, Swarm requires storage nodes to submit a security deposit before selling storage. If the node fails a proof, it loses its security deposit. When the node stops selling storage, its security deposit is returned. Swarm also has a sort of litigation process, allowing for more interactive incentivization. If a storage node fails to serve a chunk to a requestor, the requestor can submit a security deposit and challenge the storage node. The node must prove that either it or another node possesses the chunk. If it fails, it loses its security deposit, but if it refutes the challenge, the challenger loses its security deposit.

Bandwidth Sharing Incentives

On the bandwidth sharing front, nodes must be rewarded for routing valuable data in a timely fashion but punished for malicious behavior, such as spamming. At a high level, Swarm’s bandwidth sharing incentive system consists of five mechanisms.

First, intermediate nodes on a route are rewarded only when the data reaches its destination. A piece of content must travel multiple hops (i.e., across multiple nodes) before it reaches its requestor. By ensuring such nodes only get paid if the network successfully serves the request, Swarm incentivizes intermediate nodes to forward data and requests.

Second, nodes closer to the requestor are paid more for a given chunk than nodes further away, creating a chunk marketplace. When I request a chunk, the request is forwarded by many nodes to a node storing the chunk. The storer node then sells the chunk to a node that is closer to me but still too far to transfer it directly to me. That node then sells the chunk to a node even closer to me. This buying and selling repeats until the chunk reaches me, at which point I pay the node who directly sent it to me. In this way, each intermediate node along a route profits from the difference between the buy-price and sell-price of a requested chunk, so nodes have an incentive to forward chunks.

Third, every request has a time to live (TTL). If the request is serviced within the TTL, the requesting node must pay for the servicing of the chunk. Requesting nodes are thus disincentivized from sending large numbers of requests in attempts to have the requests serviced quickly.

Fourth, nodes can sell any chunk they store. They are thus incentivized to cache popular chunks. This turns Swarm into not just a static storage network, but a distribution network that scales with demand.

Fifth, sanctions are placed on nodes who request non-existent chunks too frequently. This disincentivizes nodes from requesting non-existent chunks.

Swarm Accounting Protocol (SWAP)

The Swarm Accounting Protocol (SWAP) is also important for bandwidth sharing on Swarm, but it is more than a mere incentive. With SWAP, nodes create a bandwidth-based credit system by recording how much bandwidth their peers are consuming. Nodes can cash in their credit for BZZ tokens on Ethereum. If a node uses too much bandwidth without contributing, other nodes can stop transferring chunks to and from the node until it settles its bandwidth-debt in BZZ. This credit system allows, for example, some nodes to upload for free by first routing chunks until earning enough credit to pay for the upload. Also, by adding another layer of reputation, SWAP guards the network from potential free-riding nodes.

Access Control and Encryption

Swarm intends to provide a truly peer-to-peer access control system for storage. Currently, with Web2 systems (such as Google and Amazon), access control is ultimately governed by the companies who control the products. For example, with Google Docs, a user cannot stop Google from accessing a document, no matter what privacy settings the user has; additionally, anyone with access to the servers on which the document is stored can access it. Instead of storing data in locations secured by trusted parties, Swarm uses layers of encryption to restrict access. All access-controlled data is encrypted before upload, so the sensitive data is never really stored on Swarm. This redesign of access control requires new definitions for read access and write access. Read access on Swarm is the ability to decrypt a file, and write access is the ability to change the reference in a file’s manifest entry. The next paragraph clarifies these definitions with a discussion of the manifest entry.

Recall that a manifest entry is created for a file if the file is greater than 4 KB. In the case of storing an encrypted file, the manifest entry stores three things: a reference to the encrypted data, a decryption key, and information about how to get an access key (which cannot be distributed through Swarm). Not only is the data itself encrypted, but the reference and decryption key are also encrypted. They are encrypted together and can be decrypted by the access key. Thus, to retrieve and decrypt an encrypted file on Swarm, a node must

  1. retrieve the manifest entry,
  2. get the access key,
  3. use the access key to decrypt both the reference to the file’s root and its decryption key,
  4. use the reference to retrieve the encrypted file, and
  5. decrypt the file.

The below graph represents the manifest entry, where an ellipse indicates encryption.

Screen Shot 2021-12-22 at 6.53.43 PM

Links and Resources to the Swarm Ecosystem

Conclusion

The Swarm team aspires to create “the world computer’s storage and communication,” a peer-to-peer storage and communication protocol that complements Ethereum smart contracts. Swarm’s base layer consists of nodes storing and transferring chunks of data. The incentive system encourages nodes to both store data and share bandwidth, discourages misuse of these resources, and creates a bandwidth-based credit system (SWARM) which allows bandwidth debt to be settled in BZZ tokens on Ethereum. Swarm uses layers of encryption to achieve decentralized access control. Even though Swarm’s feature set is not complete (as of this writing), Swarm has a buzzing ecosystem: developers are creating apps such as decentralized databases, email systems, and meme generators.

Join the Decentralized Open Science Movement

Does the idea of a free, open, internet of science ring a resonant chord with you? Consider joining the Opscientia community to learn, connect, and collaborate with others building a commons for co-discovery.

Articles in This Series

  1. Decentralized Content Networks for a Permanent Science Data Commons: IPFS
  2. Engineering Incentives for Data Storage as a Commodity: Filecoin
  3. A Permanent Web of Linked Data: Arweave
  4. Peer-to-Peer Storage without a Blockchain: Storj
  5. One of the First Decentralized Cloud Storage Platforms: Sia
  6. The World Computer’s Hard Drive: Swarm
  7. Open, Free, and Automated Pipelines for Permanently Archiving Massive Scientific Datasets
  8. Coral: A Decentralized and Autonomous Knowledge Commons

References

Trón, V. (2020, November 17). Book of Swarm: Storage and Communication Infrastructure for a Self-sovereign Digital Society. Retrieved December 9, 2021, from https://docs.ethswarm.org/the-book-of-swarm.pdf

Swarm. (2021, June 13). Swarm: Storage and Communication Infrastructure for a Self-sovereign Digital Society. Retrieved December 9, 2021, from https://www.ethswarm.org/swarm-whitepaper.pdf

Prahalad, B. (2018, January 7). Merkle proofs Explained. Retrieved December 2, 2021, from Merkle proofs Explained.. This article explains how to obtain and… | by Belavadi Prahalad | Crypto-0-nite | Medium