peer wire protocol defines the mechanism that peers use for exchanging torrent files. we start with a client (you in this case) who wants to download a movie. it first connects with a tracker server and requests the server to provide a list of active peers in the network. a client may connect to one or more tracker servers. it can also connect to multiple peers simultaneously for downloading the movie (it may happen that none of the peers have the whole movie, but a portion of it). suppose the freeloader (in our case, you) got lucky this time and found a peer that happens to have the complete movie. note that at this point, the client does not know which peer has the complete movie; neither does he know whether they even have the movie. the tracker provides the peer’s ip address and port number to the client. it’s only realized after the actual message exchange happens between the two peers (we’ll see how that works). for now, the client has established a TCP connection with the peer using the peer’s ip address and port number. a second (two-way) handshake happens at the application layer where the client and peer exchange a handshake message, of the format: <pstrlen><pstr><reserved><info_hash><peer_id>
the initiator of connection is expected to transmit their handshake immediately. The recipient may wait for the initiator's handshake, if it is capable of serving multiple torrents simultaneously (torrents are uniquely identified by their info_hash). However, the recipient must respond as soon as it sees the info_hash part of the handshake (the peer_id will presumably be sent after the recipient sends its own handshake). the other specs:
what is info_hash?
every peer participating in bitTorrent has a .torrent metadata file which has information of the list of trackers (announce-list) and an info key which is dictionary containing the files and some optional keys. the info dictionary starts from line number 23 and ends at 40 in the below file structure, which is then bencoded and converted to a 20-byte SHA1 hash (yes yes ik that SHA1 is not secure and all, but this is what the earlier (v1) bitTorrent protocol specification used). for two peers to connect, they must have the same info_hash. If the info_hash does not match, the connection is immediately dropped.
after both of them agree on the handshake, file transfer takes place. the remaining messages in the protocol take the form of <length prefix><message ID><payload>.
the file (say the movie size is 900MB) is partitioned into smaller fixed-size chunks, except the last chunk which is of arbitrary size. the next message after the handshake is called a bitfield message and the payload of the message contains the information about the chunks of the file the peer has. at this point, the client decides based on the information whether it should keep the connection alive or drop off. the next message is either a cancel or request message from the client side. cancel is for graceful termination of connection and request is for requesting the chunks one by one. there will be as many request messages as the number of chunks you want to download from a peer. clients can maintain any number of tcp connections with any number of peers at a time and the chunk download happens in a distributed fashion depending on the parts of movie a peer has. it may happen that two peers have a the same chunk. How does the client decide which peer to request from?
the client follows some algorithm to pick the peer (which I won’t be discussing but the reader can find it at BitTorrent Specification under the algorithms section). since we have assumed that our peer has the complete movie, the subsequent messages share chunks over the protocol.
nerd notes from the trenches
-
TCP choice is intentional: UDP would require reimplementing reliability, ordering, and congestion control. BitTorrent piggybacks on TCP's well-tested machinery. But modern extensions (uTP) use UDP with LEDBAT congestion control to be "nicer" to other traffic.
-
No encryption in v1: The protocol is plaintext. ISPs can easily detect BitTorrent by looking for that BitTorrent protocol string.
-
BitTorrent v2 uses SHA-256: The migration is slow because v1 swarms are enormous and the network effect is real.
-
Port forwarding still matters: If you're behind NAT, you need UPnP or manual port forwarding, or you'll be the peer everyone can connect to but you can't connect to anyone else. Your client will show "firewalled" status.