I find myself quite often in the need to copy large textfiles over the network. Usually one would go with gzip, either transparently by using the compression switch on scp -C or by archiving a file before pushing it over the wire.
Turns out gzip can compress quite well, but it won't saturate your 100mbit line if you do something like this:
cat bigfile.txt | gzip -c | ssh email@example.com 'cat | gunzip -d > bigfile.txt'
While this has the same effect as
scp bigfile.txt firstname.lastname@example.org: it will be helpful to understand the alternatives coming up next.
On a pretty decent machine (Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz) I could get about 60% saturation of the network link. So, what can we do? We could lower the compression level to take load of the CPU and shift it towards the network. We could also use an alternative compression algorithm such as LZO. "lzop" is a free implementation available in most common linux distributions, so this might be the easiest way to go:
cat bigfile.txt | lzop -c | ssh email@example.com 'cat | lzop -cd > bigfile.txt'
My initial tests shown that LZO compression level is about 20% lower than gzip's with default settings. Transfer time was almost cut in half on the other hand. So, how do we get 100% network saturation*AND* great compression? Pigz is a parallel gzip implementation, so instead maxing out only one thread as gzip does, it will use all the available cores and threads your fancy server provides. Downside? No debian stable repository packages available yet. But on the other hand: it does not even require a configure script, how about one header file and 2 "c" files. Most probably your remote connection to the server will take longer to refresh the console output than the compilation process itself.
So, emerge, apt-get install, port install or whatever "pv" and have some fun like this:
- me@host:/mnt/data1/import$ cat bigfile.txt | pv | pigz -c | ssh me@otherhost 'cat | unpigz > /mnt/data1/bigfile.txt'
- 1.83GB 0:00:18 [95.3MB/s] [ <=> ]
Some more numbers: gzip gives me 45 mb/s and lzop 60mb/s
I think "pv" stands for pipe view, it is responsible for the nice stats during the transfer. And yes, this *is* a 100mbit network connection pushing a textfile at ~100mb/s. Nice, isn't it?
You can find pigz over here