TCP Checksum: The Fault in the Stars

TCP Checksum could be deemed as one of the weakest non-cryptographic checksums, and yet it continues to be there, undisputed. Sometimes edge-systems even have it turned off for performance reasons, counting on the application checksums for integrity; while other systems like gateways and servers have them offloaded to the network interface card (NIC), mostly because the functionality exists. The question is, does it really serve any purpose in today’s date? It does, very little.

RFC 793:

The checksum field is the 16 bit one’s complement of the one’s complement sum of all 16-bit words in the header and text. If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros.

The code of simplicity. There are few other operations as in creating the pseudo-header, etc. The code below focuses only on the algorithm part of TCP checksumming.

The Fault in the Stars

The matter of fact is that according to Stone J. et. al. [1], 1 in 16 million to 10 billion packets go unchecked as false-positive through the TCP checksum. This means, for the 1500 MTU data frame on the network, 1 undetected TCP corruption happens in 20GB to 1.2TB data, approximately. Though the paper was published in year 2000, these figures do not become less of a concern because the data rates have only multiplied since then.

Following are the scenarios when the data gets corrupted while the checksum stays correct [2].

Reordering of 2 byte words:
Data:      0x01 0x02 0x03 0x04
Corrupted: 0x03 0x04 0x01 0x02
Inserting zero-valued bytes:
Data:      0x01 0x02 0x03 0x04
Corrupted: 0x01 0x02 0x00 0x00 0x03 0x04
Deleting zero-valued bytes:
Data:      0x01 0x02 0x00 00 0x03 0x04
Corrupted: 0x01 0x02 0x03 0x04
Replacing a string of sixteen 0’s with 1’s or 1’s with 0’s:
Data:      0xff 0xff 0xe9 0x1a 0x00 0x00 0x21
Corrupted: 0x00 0x00 0xe9 0x1a 0xff 0xff 0x21
Multiple errors which sum to zero:
Data:      0x01 0x02 0x03 0x04
Corrupted: 0x01 0x03 0x03 0x03

These scenarios present false-positives. This could get overlooked sometimes when the application is not taking care of integrity. But for most purposes, it does not hurt, for example, if a TCP packet carrying HTTP message (not HTTPS) is corrupted, either the HTML or Javascript is messed up and all that is needed is refreshing the browser.

TCP-fpc is a TCP false-positive checksum testing tool. It is implemented as a kernel module that modifies the incoming packets as in scenario 1 (Reordering of 2 byte words), by specified percentage. This is used for testing the systems where the integrity of data through network cannot go unchecked.

GitHub Available on GitHub: https://github.com/critindirecx/TCP-fpc

Why Design such a Frail Specification?

The answer lies in “when” the spec was drafted. When the first TCP RFC was written in 1981, the fastest of computers were way too slower and much more expensive than today’s low-end mobile phone. For example, IBM-PC was one of the best releases in 1981. Below is the configuration:

IBM Personal Computer (PC)
Model:    5150
Released:    September 1981
Price:    US $1,565 ~ $3,000
CPU:    Intel 8088, 4.77MHz
RAM:    16K, 640K max
Display:    80 X 24 text
Storage:    dual 160KB 5.25-inch disk drives
Ports:    cassette & keyboard only
5 internal expansion slots
OS:    PC-DOS v1.0

Yes, perhaps more intricate and hairy cryptographic checksums were only in the books and research papers — or were yet to be invented. Hence, the TCP checksum algorithm was good enough at that time and it was sensible to sacrifice integrity for performance at that scale. But that never changed even when the processors got faster. More complex applications relied on application-level checksumming and continue to do so. Some of the modern non-cryptographic checksum algorithms in play: murmur, cityhash, farmhash, etc. However, because of the extremely low probability of this problem, it is happily and optimistically ignored!

References:

[1] Stone, J. and Partridge, C., 2000, August. When the CRC and TCP checksum disagree. In ACM SIGCOMM computer communication review(Vol. 30, No. 4, pp. 309-319). ACM.

[2] Partridge, C., Hughes, J. and Stone, J., 1995, October. Performance of checksums and CRCs over real data. In ACM SIGCOMM Computer Communication Review (Vol. 25, No. 4, pp. 68-76). ACM.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s