Optimal use of glibc's fseek()
The fseek()
function is the standard (C89 and later)
call for positioning the file position indicator for a stream. At
first glance it seems like a very light-weight function, as it does
no I/O. Its one obvious problem is the use of long
,
not off_t
, for the offset, so on those 64 bit
platforms on which long
is
only 32 bits, one cannot seek by more than +/-2GB.
But fseek()
as implimented in glibc is not at all
light-weight. It suffers from two potentially-serious performance
issues. Sufficiently serious that I have just caused an analysis code
to speed up by a factor of over twenty by combining
large fseek()
s, and replacing small ones
by fread()
s.
fseek()
can imply fread()
In order to optimise the fread()
following
an fseek()
, glibc usually seeks to a multiple of the
page size, and then reads any amount left over. The result of this
read is stored in an internal buffer.
So, if to read a file of fixed-length records, one writes code such as
for i=1 to num_records if (is_wanted(i)) fread(buffer,1,record_length,stream) process(buffer) else fseek(stream,record_length,SEEK_CUR) endif enddo
then it is likely that a read of a fraction of a page-size will occur at the start of every record, even those that are being skipped. If most records are skipped, it may be significantly faster to write
skip_bytes=0 for i=1 to num_records if (is_wanted(i)) if (skip_bytes) fseek(stream,skip_bytes,SEEK_CUR) skip_bytes=0 endif fread(buffer,1,record_length,stream) process(buffer) else skip_bytes+=record_length endif enddo if (skip_bytes) fseek(stream,skip_bytes,SEEK_CUR) skip_bytes=0 endif
Small seeks are slow
Glibc does much buffering. If one attempts to read a small number
of bytes, glibc will read a page or so into an internal buffer, then
return the data requested. If this is followed by another small
read, then no further read is needed, for the data are already in
glibc's internal buffer. So a small read can be very fast, with no
system call being made by glibc. However, fseek()
always results in a system call. In other words
int junk; fread(&junk,1,4,stream);
is likely to be much faster than
fseek(stream,4,SEEK_CUR);
Two demonstrations
Large seeks
#include<stdio.h> #include<stdlib.h> int main(void) { int i; FILE* file; file=fopen("/bin/sh","r"); for(i=0;i<5;i++) fseek(file,10000,SEEK_CUR); fclose(file); return 0; }
A silly piece of code which seeks on the file /bin/sh which is bound to exist and be readable.
$ gcc test.c $ strace ./a.out [...] openat(AT_FDCWD, "/bin/sh", O_RDONLY) = 3 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=125688, ...}, AT_EMPTY_PATH) = 0 lseek(3, 10000, SEEK_CUR) = 10000 lseek(3, 16384, SEEK_SET) = 16384 read(3, "\363\17\36\372H"..., 3616) = 3616 lseek(3, 28672, SEEK_SET) = 28672 read(3, "1\322H\2155\237"..., 4096) = 4096 lseek(3, 36864, SEEK_SET) = 36864 read(3, "\362\0\0\17\37D"..., 4096) = 4096 lseek(3, 49152, SEEK_SET) = 49152 read(3, "\0H\203x\10\0\17"..., 4096) = 4096 close(3) = 0
The first seek is the expected seek of 10000. The next becomes a seek to 16384 plus a read of 3616 bytes (16384+3616=20000). The next becomes a seek to 28672 plus a read of 4096 bytes, which seems like an overshoot on the target of 30000, although glibc will correctly account for this. It is surprising to see four reads from a code which does no reads whatsoever.
Lesson: avoid calling fseek
after fseek
. After fseek
one should
call fread
(or fwrite
).
Small seeks
#include<stdio.h> #include<stdlib.h> int main(void) { char c; int i; FILE* file; file=fopen("/bin/sh","r"); for(i=0;i<10;i++) fseek(file,4,SEEK_CUR); fread(&c,1,1,file); printf("Byte read was %d\n",(int)c); fclose(file); return 0; }
Again a silly piece of code which seeks on the file /bin/sh which is bound to exist and be readable.
$ gcc test.c $ strace ./a.out [...] openat(AT_FDCWD, "/bin/sh", O_RDONLY) = 3 lseek(3, 4, SEEK_CUR) = 4 lseek(3, 0, SEEK_SET) = 0 read(3, "\177ELF\2\1\1\0", 8) = 8 lseek(3, 0, SEEK_SET) = 0 read(3, "\177ELF\2\1\1\0\0"..., 4096) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 lseek(3, 4096, SEEK_SET) = 4096 write(1, "Byte read was 120\n", 18) = 18
So the first seek of four bytes was issued as a seek of four bytes. The next seek of four bytes resulted in seeking to the beginning of the file, and then reading eight bytes. The next seek of four bytes resulted in seeking back to the beginning of the file, and then reading 4096 bytes. The final seven seeks all resulted in a seek to an offset of 4096 bytes. The actual read in the code of just one byte resulted in no read at all, because that byte could be returned by the read produced by the third seek.
Lesson: very small fseek()
s may be better replaced
by fread()
s to a dummy buffer. Small fread()
s
are well optimised and do not result in more system calls than
necessary.
Glib's code
The relevant code from glibc, from wfileops.c
, is
something like:
/* Try to seek to a block boundary, to improve kernel page management. */ new_offset = offset & ~(_IO_buf_end - _IO_buf_base - 1); delta = offset - new_offset; _IO_SYSSEEK (fp, new_offset, 0); if (delta) _IO_SYSREAD (fp, _IO_buf_base, _IO_buf_end - _IO_buf_base);
but with decent error-checking, and many other issues taken care of too.
Further reading
The above was greatly aided by comments in this Stack Overflow article.