BitTorrent clients for fun and profit

This is a work in progress.

…well, not really for profit. The world doesn’t need another BitTorrent client. But it’s a great project for learning about an interesting web protocol, exploring the networking stack, getting to grips with Wireshark and debugging, and learning how things work.

Reading a .torrent

The torrent files you might find somewhere like The Pirate Bay are encoded with bencode (pronounced “B-Encode”, but I’m still gonna say “ben-code”), a terse file format for representing a few simple data structures (strings, ints, dicts and lists).

A bencoded file might look like this:

dl10:hello dave!d3:foo3:bar4:fizz4:buzzei42eee

Tasty. Let’s add newlines for effect:

d
  4:info
  l
    10:hello dave!
    d
      3:foo3:bar
      4:fizz4:buzz
    e
    i42e
  e
e

Elements in a bencoded structure have a letter denoting what they are: d for dicts, l for lists, i for ints. All of those are terminated with a trailing e. Strings are a little different: they start with the length of the string, then a colon, followed by the string itself (e.g. 3:foo).

In the above example, the root of the structure is a dict with one key (“info”) whose value is a list containing a string, a dict and an int.

Writing a decoder for this is an interesting exercise in itself; in the end, I used jackpal/bencode-go.

Here’s what a JSON output of this file might look like (I’m using a Ubuntu image torrent):

{
    "announce": "http://torrent.ubuntu.com:6969/announce",
    "announce-list": [
        [
            "http://torrent.ubuntu.com:6969/announce"
        ],
        [
            "http://ipv6.torrent.ubuntu.com:6969/announce"
        ]
    ],
    "comment": "Ubuntu CD releases.ubuntu.com",
    "creation date": 1524776308,
    "info": {
        "length": 1921843200,
        "name": "ubuntu-18.04-desktop-amd64.iso",
        "piece length": 524288,
        "pieces": "\ufffd <truncated> \ufffd"
    }
}

Talk to the tracker

Next, we need to make a request to the tracker to find some peers to connect to. The tracker URL is in the “announce” field of the metadata.

When first trying this, I got an HTTP 400 and the following error:

you sent me garbage - id not of length 20

Damn. This tracker is so SALTY.

Making a correct request to the tracker requires URL-encoding a bunch of params, including a SHA1 hash of the bencoded “info” section of the torrent metadata. The flow looks something like:

  1. de-bencode torrent metadata
  2. extract the “announce” address
  3. re-bencode the “info” block
  4. generate a random 20-byte peer ID:
    func newClientID() []byte {
     clientID := make([]byte, 20)
     rand.Read(clientID)
     return clientID
    }
    
  5. make a GET request to the announce URL with a bunch of query params:
    "info_hash": <bencoded, hashed "info" block>
    "peer_id":   <unique peer ID>
    "port":      <port you're listening for connection on>
    "uploaded":  <number of bytes uploaded>
    "left":      <number of bytes left to download>
    "compact":   <1 if using compact mode>
    "event":     <one of 'started', 'completed', or 'stopped'>
    
  6. de-bencode the response from the tracker, which contains a list of peers in the torrent swarm.

Handshaking with peers

Next, we have to handshake with some subset of the peers the tracker has told us about.

We open a TCP connection to the peer, and send the following message:

Peer handshake message

To be continued…


Currently CTO at Mast. Formerly engineering at Thought Machine, Pivotal. Makers Academy alumnus.

I've pledged to give 10% of my income to highly effective charities working to improve animal welfare. If my startup is successful, I hope to give away much more.

Also founded EA Work Club, a job board for effective altruists, and Let's Fund, a crowdfunding site for high-risk, high-reward social impact projects.

Subscribe here to get updated when I post.