[Wlug] large file performance on linux

brad noyes maitre at ccs.neu.edu
Wed May 16 12:02:33 EDT 2007


On Wed, May 16, 2007 at 11:44:13AM -0400, Jamie Guinan wrote:
> On Wed, 16 May 2007, brad noyes wrote:
> Hi,
> 
> Do you mean MB/s as in MegaBytes/second?  Lower-case "b" implies 
> "bits" to me, but I suspect you meant bytes.  :)
> 
Yes, your assumption is correct. I never quite understood the capitalization
conventions when it came to computers.

> Anyway, if you know your total data set is going to exceed your system 
> memory, which will largely get used as page cache, you might as well 
> open the output with O_SYNC and write straight out.  Then the big 
> buffer in the writer thread can go away, and you can just queue chunks 
> between the input and writer.
> 
> A good experiment would be to run the writer thread with dummy input 
> data and see what kind of throughput you get.
> 
> You could try ext2 (no journalling), or tweaking some of the 
> journalling options in ext3 (data=journal/ordered/writeback, see "man 
> mount").
> 
great idea. I forgot about tune2fs and mount options. thanks.

> If you have enough CPU bandwidth, and your input stream has enough 
> redundancy, you could gzip it before writing, which might reduce your 
> output bandwidth requirements.
> 
There isn't enough redundancy in the data. its essentially random data so it
won't compress well at all.

> Hope this helps, keep us posted.
> 
Thanks for the input.

 -- brad


More information about the Wlug mailing list