So now frost slowly thaws and the winds of blooming spring carry to us tidings of final projects. Our project for the SPO600 course centers around the improvement of a selected checksum or compression open-source package. I chose lrzip, and I hope that I will be able to alter the code in such a way that we can gain a non-negligible improvement in execution time. That, at least, is what I will be aiming for.
So, what even is lrzip?
lrzip – Long Range ZIP or LZMA RZIP
lrzip is a compression program. It’s optimized for the compression of large files and on systems with great amounts of RAM.
lrzip operates in two basic steps, and on a single file.
First, it uses an extended version of rzip to perform a long distance redundancy reduction (ie it encodes duplicate or redundant chunks of data over a large distance). This is done with an LZ77-based algorithm: a sliding window technique is used to keep data in memory in order to find matches, and larger amounts of RAM allow the use of a larger window and thus lead to better compression. Where rzip is limited to a maximum window size of 900MB, lrzip’s use of it scales with available RAM. Duplicate data is replaced with a reference denoting the distance to the original data and the size of the original chunk of data.
Second, after the initial reduction, lrzip compresses the file further by running one of LZMA (default), ZPAQ, LZO, GZIP, and BZIP2 on the reduced data file.
So, I built the package on my x86_64 computer (which required the installation of a few dependencies, but other than that it went okay) and on our AArch64 aarchie server (which, again, needed some dependencies (though some were already installed)). The program ran fine on both systems.
Before we begin tinkering with the code, let’s run some benchmarks on aarchie. We’ll use three files. The first is a text file containing 500MB of text downloaded from Project Gutenberg. The second is a 500MB file composed entirely of random data. The third is a 500MB file composed entirely of zero data.
[atopala@aarchie lrzip-0.621]$ time ./lrzip text Output filename is: text.lrz text - Compression Ratio: 3.918. Average Compression Speed: 1.217MB/s. Total time: 00:06:50.91 real 6m50.924s user 49m2.940s sys 0m9.290s
[atopala@aarchie lrzip-0.621]$ time ./lrzip random Output filename is: random.lrz random - Compression Ratio: 1.000. Average Compression Speed: 6.329MB/s. Total time: 00:01:18.78 real 1m18.796s user 1m34.520s sys 0m2.960s
[atopala@aarchie lrzip-0.621]$ time ./lrzip zero Output filename is: zero.lrz zero - Compression Ratio: 7003.299. Average Compression Speed: 6.024MB/s. Total time: 00:01:22.92 real 1m22.933s user 5m17.620s sys 0m2.490s
The text file took much longer than the others, and it used 23.9% of the memory on aarchie. Compressing the random file required the use of only about 7% of the memory on aarchie. Compressing the zero file, which has a lot more duplicate data to compress, used much more memory, peaking at about 34.5% on aarchie. The text file also took much more user time to compress in comparison to the other files and as compared to the real time, so it was taking greater advantage of parallel processing; the zero file also was compressed using multiple threads. The vast majority of the time was spent in user mode; time spent in the kernel is negligible.
So, the goal is to beat these benchmarks. Maybe a way to recognize incompressible data could speed up the processing of the random file; or maybe a method to detect extremely large runs of identical data quickly could help with the zero file. The important file, though, is the text file, and ideally we would improve the runtime of its compression. Setting the lzma level to 9 (the default is 7) brings the compression ratio up to 4.029 (a minor but not insignificant improvement) but takes twice as long; maybe this could be sped up.
So, apart from improving the algorithm (which, I think, is very unlikely, since lrzip has been receiving regular updates as recently as last year so the algorithm should be pretty close to perfect barring potential oversights) and changing build options (which might not prove fruitful, since the autogen sets the default optimization level to 2, and further optimizations might have a negative impact; but this is probably a good area to work in since I don’t think lrzip has been built with AArch64 in mind) these are potential areas of improvement:
-lrzip uses x86 and x86_64 assembly language files that optimize CRC computation. Maybe we could rewrite the files for AArch64. As of now, “if you don’t have the nasm assembler or have a ppc or other non-x86 system, the standard C CRC routines will be compiled and linked in.” The TODO file suggests that the x86_64 version of the assembly code doesn’t work, so that might need fixing. However, the TODO file also suggests that this is of a low priority.
-the program crashes with the -z flag (ZPAQ optimization) on aarchie, claiming “Illegal instruction” after starting the first thread to compress a piece of data. This occurs on betty as well. It doesn’t happen on x86_64. I am not yet sure of the cause, or if it something that happens on AArch64 systems in general, but if it is a bug on AArch64 then I think this should become my main focus.
I’ll begin with the ZPAQ bug (or what I think might be a bug, at least), and if that doesn’t pan out then I’ll try to edit some of the build options and code to optimize it for AArch64 (since I figure that there may be other people working on lrzip, but most might not have access to an AArch64 machine). If that doesn’t work out either, I’ll try to optimize the C code, and if I can’t do that I’ll work on the (low-priority) assembly language files.