[Wlug] large file performance on linux

Jeff Moyer jmoyer at redhat.com
Wed May 16 11:38:09 EDT 2007

==> On Wed, 16 May 2007 14:53:16 +0000, brad noyes <maitre at ccs.neu.edu> said:

brad> Hello All,
brad> I am seeing some really slow performance regarding large files on linux. I
brad> write a lot of data points from a light sensor. The stream is about 53 Mb/s and
brad> i need to keep this rate for 7 minutes, that's a total of about 22Gb. I
brad> can sustain 53Mb/s pretty well until the file grows to over 1Gb or so, then
brad> things hit the wall and the writes to the filesystem can't keep up. The writes
brad> go from 20ms in duration to 500ms. I assume the filesystem/operating system 
brad> is caching writes. Do you have any suggestions on how to speed up performance 
brad> on these writes, filesystem options, kernel options, other strategies, etc?

Of course.  Your data set is larger than the page cache, so when you
hit the low watermark, it starts write-back.  You can deal with this a
few different ways, and I'll throw out the easiest ways first:
1) Get more memory
2) Get a faster disk

If those are not options, then you can tweak your application by using
AIO and O_DIRECT.  This will allow you to drive your disk queue depths
a bit further and avoid the page cache.  Check the man pages for
io_setup, io_submit, and io_getevents to get started.

brad> Things I have tried:
brad>  - I have tried this on a ext3 file system as well as an xfs filesystem 
brad>    with the same result.

You may not want to use a journalled file system.  If you must,
though, with ext3 you could try running with the data=writeback

brad>  - I have also tried spooling over several files (a la multiple volumes) 
brad>    but i see no difference in performance. In fact, i think this actually
brad>    hinders performance a bit.

I'm not sure I fully understand what you mean.  Are you saying you
write to separate physical volumes, and that you don't see any
performance increase from doing so?

brad>  - I keep my own giant memory buffer where all the data is stored and 
brad>    then it is written to disk in a background thread. This helps, but
brad>    i run out of space in the buffer before i finish taking data.

Right, this is exactly what happens in the OS.  ;) Speaking of which,
you don't mention which kernel you are using.  Could you please
provide that information?  There are a few vm tunables that you could
try tweaking, but I really don't think they will help if your data set
is larger than memory.  We can explore that option, though, if you

For now, my suggestion is to try using AIO with the open flag
O_DIRECT.  This will require you to align your data on 512 byte
boundaries (and the size of the I/Os has to be a multiple of 512 as
well).  If you need any help converting your app, feel free to contact
me off-list.


p.s.  In your head, is Mb Megabit or Megabyte?

More information about the Wlug mailing list