George V. Reilly

Compressing Tar Files in Parallel

TL;DR: use tar -I pigz or tar -I lbzip2 to compress large tar files much more quickly.

I in­ves­ti­gat­ed various ways of com­press­ing a 7GiB tar file.

The built-in --gzip and --bzip2 com­pres­sion methods in GNU tar are single-threaded. If you invoke an external compressor with --use-compress-program, you can get some huge reductions in com­pres­sion time, with slightly worse com­pres­sion ratios.

You can use pigz as a parallel re­place­ment for gzip and lbzip2 as a parallel version of bzip2. Both of them will make heavy use of all the cores in your system, greatly reducing the real time relative to the user time.

Single-threaded com­pres­sion timing: gzip is a lot faster than bzip2:

$ time tar --bzip2 -cf huge-bzip2.tar.bz2 hugedir

real    13m15.352s
user    12m53.972s
sys     0m16.029s

$ time tar --gzip -cf huge-gzip.tar.gz hugedir

real    5m56.489s
user    5m30.271s
sys     0m14.633s

fast parallel com­pres­sion timing: pigz is the clear winner:

$ time tar --use-compress-program='lbzip2 --fast' \
    -cf huge-lbzip2-fast.tar.bz2 hugedir

real    2m35.967s
user    11m38.865s
sys     0m26.981s

$ time tar --use-compress-program='pigz --fast' \
    -cf huge-pigz-fast.tar.gz hugedir

real    0m58.222s
user    3m22.134s
sys     0m17.357s

best parallel com­pres­sion timing: lbzip2 is much quicker than pigz:

$ time tar --use-compress-program='lbzip2 --best' \
    -cf huge-lbzip2-best.tar.bz2 hugedir

real    1m44.365s
user    11m38.277s
sys     0m13.551s

$ time tar --use-compress-program='pigz --best' \
    -cf huge-pigz-best.tar.gz hugedir

real    2m27.694s
user    16m20.441s
sys     0m16.092s

Compressed file sizes: bzip2 family compresses better than gzip family; best is smaller than default com­pres­sion level which is smaller than fast:

$ ls -lSr
-rw-r--r-- 1 user group 2460438578 Feb 22 03:03 huge-lbzip2-best.tar.bz2
-rw-r--r-- 1 user group 2461172874 Feb 22 03:19 huge-bzip2.tar.bz2
-rw-r--r-- 1 user group 2689784220 Feb 22 03:00 huge-lbzip2-fast.tar.bz2
-rw-r--r-- 1 user group 2691286852 Feb 22 03:06 huge-pigz-best.tar.gz
-rw-r--r-- 1 user group 2704591997 Feb 22 03:25 huge-gzip.tar.gz
-rw-r--r-- 1 user group 2950547862 Feb 22 03:01 huge-pigz-fast.tar.gz
-rw-r--r-- 1 user group 7365222400 Feb 22 03:00 huge.tar
blog comments powered by Disqus
Implementing the Tree command in Rust, part 2: Printing Trees »