Paper Insights - The Eternal Tussle: Exploring the Role of Centralization in IPFS

Paper Link

I'd like to delve into a technology I have significant research experience with: the InterPlanetary File System (IPFS). While the original IPFS white paper - IPFS - Content Addressed, Versioned, P2P File System - was influential, it is not peer reviewed. Therefore, I'll focus on a related paper presented at the 2024 Networked Systems Design and Implementation (NSDI) conference, a prestigious venue in distributed systems research. This paper was also discussed in Stanford's CS244b class.

For background, I recommend reading my "Paper Insights - Dynamo: Amazon's Highly Available Key-value Store", where I discuss consistent hashing and its techniques.

Now, let's explore some fundamental concepts of IPFS.

Decentralized Web

Traditional websites, akin to 1800 numbers, rely solely on the website owner to bear the costs of hosting and computation.

In contrast, decentralized web leverage a distributed network of nodes. This means that the computational burden, and therefore the associated costs, can be shared among numerous participants, regardless of the website's traffic load.

Decentralized web is popularly called Web3.

Decentralized File System

A decentralized file system represents each file block by its cryptographic hash. This hash uniquely identifies the block's content. Subsequently, these hashes are mapped to locations on a decentralized network of servers.

InterPlanetary File System (IPFS)

IPFS is a decentralized storage and distribution system that:

  1. Divides files into blocks.
  2. Generates a unique Content Identifier (CID) for each block using cryptographic hashing.
  3. Stores the mapping of CIDs to their locations on a Distributed Hash Table (DHT).

IPFS DHT Entries

Two key entries within the IPFS DHT are:

  • Provider Records: Map CIDs to the PeerIDs of nodes that possess the corresponding blocks.
  • Peer Records: Map PeerIDs to their respective Multiaddresses. Multiaddresses enable flexible p2p communication across various protocols (e.g., IPv4, IPv6, HTTP).

IPFS Lookup

To retrieve a file:

  1. Local Peer Search: IPFS first attempts to locate the desired CID among connected peers using the BitSwap protocol (an opportunistic p2p exchange protocol).
  2. DHT Lookup: If the CID is not found locally, the DHT is queried to find the PeerIDs associated with the CID.
  3. Peer Discovery: Using the DHT, the Multiaddresses of the identified peers are retrieved.
  4. Data Retrieval: Finally, the BitSwap protocol is employed to request the actual block data from the located peers.

Kademlia Hash

Kademlia is a distributed hash table for decentralized p2p computer networks designed by P Maymounkov and D Mazières in 2002. It was popularized by trackerless bittorrent, now it is used for DHT by IPFS.

Kademlia arranges the nodes into a binary tree where the leaf nodes store

Comments

Popular posts from this blog

Comments in a Code v/s Code Readability

Paper Insights - Cassandra - A Decentralized Structured Storage System

Architecture of High Performance Computing Server at BIT Mesra