1

how to count number of differences in large streams quickly?

view story
linux-howto

http://unix.stackexchange.com – I want to count the number of differences (different bytes) in two large streams (devices/files). E.g. two hard disks, or one hard disk and /dev/zero. The program(s) involved must be fast, as fast as the streams come in (say 1GB/s, although 0.2GB/s is probably OK), and they may use at most a few GB of RAM and of tmp files. In particular, there is no filesystem available that is big enough to store the differences to be counted. The streams are several TB in size. The count need not (and in fact must not) treat whitespace or line feeds any different than other characters. The streams are (HowTos)