bencoding

2025-10-09 networksC
BEncode Editor was a windows tool to view and edit .torrent files

Bencoding (pronounced be-encoding) is a rather simple type of encoding used by P2P file sharing system BitTorrent to encode .torrent file metadata in a non-human readable form, at the same time allowing complex yet loosely structured data to be stored in a platform independent way.

Bencode uses ASCII and supports four types of data structures:

.torrent files are bencoded dictionaries. Since each possible value has only a single valid bencoding (one-to-one mapping), applications can directly match the encoded forms without ever decoding their values. For BitTorrent, there are mainly two types of parsers used today — streaming which parses data as it arrives in the network, and non-streaming which stores the entire metadata in memory and parses afterwards.

how bencoding works

The algorithm as defined in BitTorrent v1.0 uses ASCII characters as delimiters and digits to encode data structures.

Non-streaming (DOM-style): The parser reads the entire .torrent metadata file into memory and constructs a full hierarchical representation (like a JSON DOM). Each dictionary, list, and string are allocated fixed-sized bufferes and becomes a separate in-memory object allocated on the heap. You then traverse this tree to extract metadata info. This approach is simple and ideal for small files or tracker lists, where the full dataset easily fits in memory.

Streaming (SAX-style): Parses the bencoded data “on the fly” as bytes arrive from the network socket or file. You feed it buffers of bytes, it maintains a small internal parsing stack or state machine and fires callbacks (e.g., on dictionary start/end, key, or value). It doesn’t build an in-memory tree of the bencoded object so it’s more memory efficient and suitable for large continuous streams, but it doesn’t allow random access or post-hoc traversal of the data structure. Distributed Hast Tables have very large messages and loading everything in memory at once is resource-intensive and might introduce network congestion which large number of updates so using a SAX-style parser is better in this case.