COMPRESSION UTILITY BENCHMARK 2/12/22 I visited Fabrice Bellard's website (Bellard is a brilliant French programmer) for the first time in a while. His very first entry intrigued me, linking to a the Large Text Compression Benchmark. This is a comprehensive comparison of how well various compression algorithms do at compressing a 1-GB XML dump (from 2006) of Wikipedia. (Of course Bellard's own program, nncp, had the best result as of the last update to the site, which was August 2021, though nncp takes over 2 days to get that result --and that's using a GPU).

So I went through the list and thought I'd try out Mathieu Chartier's mcm entry since it seemed to have the best combination of speed and performance. I compiled it with MinGW gcc 11 and ran my own benchmark of nearly the same uncompressed size: my Win32/64 package for MinGW gcc 11, which has a tar-ball size of 1,032,924,160 bytes. The results, along with results from several other standard compression utilities, are below. Indeed, mcm gets the best compression, but not by much over xz. The widely used 7-zip also turns in a very respectable score with a good blend of speed and compression performance. If you are interested in trying mcm in Windows, here is a Win64 mcm .exe file (command-line based).

Program Flags Run Time (s) Compression size (bytes) Compression Ratio (bits per byte) Compression Speed (MB/s) Compression
mcm -x11 255.6 94,643,694 0.733 3.85 9.16%
mcm -h11 235.1 95,544,308 0.740 4.19 9.25%
mcm -x10 244.5 95,928,885 0.743 4.03 9.29%
mcm -m11 218.5 96,805,951 0.750 4.51 9.37%
mcm -h10 222.2 96,837,087 0.750 4.43 9.38%
xz -9 264.9 97,335,200 0.754 3.72 9.42%
mcm -x9 235.8 97,362,670 0.754 4.18 9.43%
mcm -m10 199.5 98,080,521 0.760 4.94 9.50%
mcm -h9 217.3 98,276,574 0.761 4.53 9.51%
mcm -m9 193.1 99,489,105 0.771 5.10 9.63%
mcm -x8 242.4 103,514,788 0.802 4.06 10.02%
mcm -h8 211.7 104,457,794 0.809 4.65 10.11%
mcm -m8 193.7 105,651,443 0.818 5.09 10.23%
mcm -t11 131.9 108,643,785 0.841 7.47 10.52%
7z -t7z -mx=9 -ms=on 186.9 108,736,435 0.842 5.27 10.53%
mcm -t10 130.3 110,483,748 0.856 7.56 10.70%
mcm -t9 127.5 112,022,467 0.868 7.73 10.85%
mcm -t8 127.8 120,184,123 0.931 7.71 11.64%
7z   150.8 169,579,190 1.313 6.53 16.42%
xz -0 51.1 307,871,492 2.384 19.29 29.81%
bzip2 --best 84.1 328,167,083 2.542 11.71 31.77%
bzip2 --fast 77.2 340,557,458 2.638 12.75 32.97%
gzip --best 107.1 354,146,627 2.743 9.20 34.29%
zip   49.5 363,251,119 2.813 19.88 35.17%
gzip --fast 19.2 391,682,939 3.034 51.31 37.92%
(Run times are on a Core i9-9900 PC running Windows 11.)