[Wlug] large file performance on linux

brad noyes maitre at ccs.neu.edu
Wed May 16 12:26:53 EDT 2007


On Wed, May 16, 2007 at 11:38:09AM -0400, Jeff Moyer wrote:
> ==> On Wed, 16 May 2007 14:53:16 +0000, brad noyes <maitre at ccs.neu.edu> said:
> 
> brad> Hello All,
> brad> I am seeing some really slow performance regarding large files on linux. I
> brad> write a lot of data points from a light sensor. The stream is about 53 Mb/s and
> brad> i need to keep this rate for 7 minutes, that's a total of about 22Gb. I
> brad> can sustain 53Mb/s pretty well until the file grows to over 1Gb or so, then
> brad> things hit the wall and the writes to the filesystem can't keep up. The writes
> brad> go from 20ms in duration to 500ms. I assume the filesystem/operating system 
> brad> is caching writes. Do you have any suggestions on how to speed up performance 
> brad> on these writes, filesystem options, kernel options, other strategies, etc?
> 
> Of course.  Your data set is larger than the page cache, so when you
> hit the low watermark, it starts write-back.  You can deal with this a
> few different ways, and I'll throw out the easiest ways first:
> 1) Get more memory
> 2) Get a faster disk
> 
Ha :).  I have 12GB of memory. Which actually brings me to another question. 
How do i alter the per-process memory limit? I can only allocate a memory 
buffer that is 3GB. I'd like to make use of the other 8GB left in the machine.
If i can double my buffer size i think i could sustain the 53MB/s for 7
minutes that i need.

> If those are not options, then you can tweak your application by using
> AIO and O_DIRECT.  This will allow you to drive your disk queue depths
> a bit further and avoid the page cache.  Check the man pages for
> io_setup, io_submit, and io_getevents to get started.
> 
I'll check out these options and man pages.

> brad> Things I have tried:
> brad>  - I have tried this on a ext3 file system as well as an xfs filesystem 
> brad>    with the same result.
> 
> You may not want to use a journalled file system.  If you must,
> though, with ext3 you could try running with the data=writeback
> option.
> 
yup. I'll check this option out.

> brad>  - I have also tried spooling over several files (a la multiple volumes) 
> brad>    but i see no difference in performance. In fact, i think this actually
> brad>    hinders performance a bit.
> 
> I'm not sure I fully understand what you mean.  Are you saying you
> write to separate physical volumes, 
>
Not physical volumes, but different files. By the end of the data
acquisition i will end up with the files: data.01, data.02, data.03 ... etc. 
Each file is a 1GB in size or whatever i set the limit to be. The reason i did
this is because i thought that as the file grows larger there are several
layers of indirection in the inode to get to the actual data blocks on disk;
and perhaps that might hinder performance. 

> and that you don't see any performance increase from doing so?
> 
Correct. I don't see any improvement. At least no measurable performance
improvement in the kind of rates i'm dealing with. 

> brad>  - I keep my own giant memory buffer where all the data is stored and 
> brad>    then it is written to disk in a background thread. This helps, but
> brad>    i run out of space in the buffer before i finish taking data.
> 
> Right, this is exactly what happens in the OS.  ;) Speaking of which,
> you don't mention which kernel you are using.  Could you please
> provide that information? There are a few vm tunables that you could
> try tweaking, but I really don't think they will help if your data set
> is larger than memory.  We can explore that option, though, if you
> like.
> 
i'm using the 2.6.20 kernel from the ubuntu source tree. I recompiled it to get
the large memory support, up to 64GB.

I was looking for some tunable vm options in sysctl, but i didn't see much that
made sense to me. If nothing else helps perhaps i will ask about the vm
options.

> 
> p.s.  In your head, is Mb Megabit or Megabyte?
>
the latter. Jamie already pointed this typo out to me :). Perhaps this time
around my unit abbreviations are correct.

Thanks for your input. I'll keep the list posted.
  -- Brad



More information about the Wlug mailing list