Photo by @ehmitrich
This article is part of an on-going series on decentralized file storage for science introduced in Rich in Data, Poor in Wisdom: Science Needs a Decentralized Data Commons.
Storj is a blockchain-less decentralized data storage platform built and run by Storj Labs. It competes most directly with Amazon S3. Storj is unique among today’s options of decentralized storage networks in that its design encourages delegation of some responsibilities to trusted parties, while the other networks assume entirely anonymous peer-to-peer interaction. Here we cover Storj’s main selling points, the types of actors on Storj, how files are stored, how Storj ensures data privacy, and Storj’s approach to payments.
Storj’s main selling points are its S3 compatibility, decentralization, speed, pricing, and focus on privacy and security. Storj is advertised as being a sort of decentralized version of S3. The Storj API includes some of the same methods that S3 has. A lot of code that works with S3 will work with Storj (source). Because S3 is so popular, this makes the transition to Storj’s decentralized cloud storage relatively easy for many applications. Storj Labs argues that their decentralized storage offers advantages over the centralized storage offered by S3–primarily price, speed, and having no single point of failure. Storj also emphasizes its focus on data privacy and security, which it accomplishes through encryption, though a technical user could relatively easily encrypt their own files to achieve the same level of privacy and security while using a different decentralized storage solution.
The types of actors on Storj include client, storage node, uplink, and satellite. Clients are users in need of cloud storage services. Storage nodes store data and help discover other nodes. An uplink is “any application or service that implements libuplink,” a library that allows direct interaction with storage nodes and satellites. Satellites handle metadata, payments, the storage node reputation system, and user accounts. Storj expects few users to run satellites and expects most users to have accounts with satellites run by trusted third parties, such as Storj Labs.
How is a file or object stored on Storj? Many things happen before the file even leaves the client’s machine. First, the file is associated with a bucket via metadata. Next, it is separated into small pieces, called segments. These segments are encrypted. Each encrypted segment is separated into smaller pieces, called stripes. Erasure shares are then generated from the stripes; erasure shares allow erasure coding, which is a way to achieve storage redundancy without using as much disk space as simple replication. At this point, the erasure shares are sent to a satellite, leaving the client’s machine. The satellite then records some metadata, mainly information about where stripes are located and which files they construct. Finally, the satellite sends the data to storage nodes.
Storj uses encryption and decentralization to keep user’s data private and secure. As mentioned above, data is encrypted before it even leaves the user’s local machine. Every file is “encrypted with a unique key.” This allows users to grant access to other users on a file-to-file basis. Plus, if an attacker acquired any single key, they would only gain access to a single file. On the decentralization front, storing different pieces of files on different storage nodes reduces the risks associated with centralized storage solutions, such as “changes to the company’s roadmap that could result in the product becoming less useful.” Any access control, however, is managed by satellites. This leaves encrypted data vulnerable to satellite operators.
For Storj to be useful at all, storage nodes must be incentivized to store what they agree to store. In Storj’s reputation system, storage nodes are held accountable by satellites. In addition to keeping track of how well different storage nodes perform (in areas such as speed), satellites also issue daily challenges to storage nodes. When challenged, a storage node must provide a proof of storage to prove that it is storing all the data it is supposed to store and that it is “not susceptible to hardware failure or malintent.” This proof involves requesting erasure shares from different nodes responsible for storing the same data, then attempting to use the erasure shares to generate the complete file (using the Berlekamp-Welch algorithm), and finally using the results to determine which nodes are storing the correct data. The proof of retrievability is sent only to the challenging satellite. If a storage node fails a proof, the node is excluded from the network. Outside of these random challenges, storage nodes “are paid with the assumption that they are faithfully storing all data.”
In Storj, clients pay satellites who in turn pay storage nodes. Storage nodes are paid for retrieval and storage services, while satellites are paid for storing metadata, sending challenges to storage nodes, and repairing data. Satellites can accept multiple forms of payment (such as credit card or cryptocurrency), but storage nodes are always paid in the Ethereum-based STORJ token. Satellites do all the accounting and pay storage nodes once a month. If a storage node frequently goes offline, a satellite might hold payments in escrow until the storage node sufficiently demonstrates its reliability. Satellites, too, charge for their services. They are paid by clients to issue challenges, repair data, route data, and store metadata.
- Storj website. Available at https://www.storj.io.
- “Storj: A Decentralized Cloud Storage Network Framework." Storj Labs. Available at https://www.storj.io/storjv3.pdf.
Storj is a decentralized storage solution designed to replace current object storage services, such as Amazon S3. The platform is run by a peer-to-peer network of storage nodes, satellites, and uplinks to service clients. All files are encrypted before upload, and Storj uses erasure coding to provide redundancy. Maintaining a reputation system for storage nodes, Satellites issue challenges to ensure nodes store what they agree to store. The proofs of retrievability submitted to satellites by storage nodes make use of erasure coding and the Berlekamp-Welch algorithm. All payments from clients go through satellites, who keep some revenue for services such as challenge issuing but also pay storage nodes for storage services. Storj is a good choice for applications who want to quickly port from centralized object storage services, such as S3, to a decentralized option.
Does the idea of a free, open, internet of science ring a resonant chord with you? Consider joining the Opscientia community to learn, connect, and collaborate with others building a commons for co-discovery.
Articles in This Series
- Decentralized Content Networks for a Permanent Science Data Commons: IPFS
- Engineering Incentives for Data Storage as a Commodity: Filecoin
- A Permanent Web of Linked Data: Arweave
- Peer-to-Peer Storage without a Blockchain: Storj
- One of the First Decentralized Cloud Storage Platforms: Sia
- The World Computer’s Hard Drive: Swarm
- Open, Free, and Automated Pipelines for Permanently Archiving Massive Scientific Datasets
- Coral: A Decentralized and Autonomous Knowledge Commons
Storj Labs. (2018, October 30). Storj: A Decentralized Cloud Storage Network Framework. Retrieved December 10, 2021, from https://www.storj.io/storjv3.pdf
Erasure code. (2021, November 12). Retrieved December 12, 2021, from https://en.wikipedia.org/wiki/Erasure_code
Berlekamp-Welch algorithm. (2021, October 9). Retrieved December 12, 2021, from https://en.wikipedia.org/wiki/Berlekamp–Welch_algorithm