File handling routines


Detailed Description

This group contains various low-level and high-level routines for operating on files.

Data Structures

struct  fid_Mappedfile
 Representation of a memory mapped file. More...
struct  fid_Filenamebuffer
 Buffer for creating filenames from basename and extension. More...

Defines

#define fid_MAPPEDFILE_GROWSIZE   ((size_t)524288)
 Number of bytes to append to a file when running out of space.
#define fid_CAST_POINTER(P, T)   ((T *)((void *)(P)))
 Cast pointer into pointer of different type.
#define fid_MAPPEDFILE_IS_WRITABLE(MF)   (((MF)->mmap_prot&PROT_WRITE) != 0)
 Check if the mapped file is writable.
#define fid_MAPPEDFILE_IS_FAKED(MF)   ((MF)->content != NULL && (MF)->fd == -1)
 Check if the mapped file is really a file or not.
#define fid_MAPPEDFILE_CHECKSPACE(MF, N, GROW, ERR, ERRCODE)
 Make sure that the mapped file has space to store N bytes.
#define fid_MAPPEDFILE_APPEND_UNSAFE(MF, TYPE, VAL)
 Append value of given type to end of file.
#define fid_MAPPEDFILE_APPEND_GROW(MF, TYPE, VAL, ERR, ERRCODE)
 Append value of given type to end of file, increase size when needed.
#define fid_MAPPEDFILE_APPEND_TRY(MF, TYPE, VAL, ERR, ERRCODE)
 Append value of given type to end of file, increase size when needed.

Functions

int fid_create_online_files (fid_Sequences *seqs, const fid_Alphabet *alpha, const char *basefilename, fid_Tablerequest tables, fid_Uintsize uisize, fid_Error *error)
 Create files according to requested tables.
int fid_file_map (fid_Mappedfile *mfile, const char *filename, int writable, int may_prefetch, fid_Error *error)
 Map existing file into memory.
int fid_file_new (fid_Mappedfile *mfile, const char *filename, fid_Error *error)
 Create new file and map it writable into memory.
int fid_file_allocate (fid_Mappedfile *mfile, const char *filename, size_t size, fid_Error *error)
 Create new file and map it writable into memory.
void fid_file_fake (fid_Mappedfile *mfile, void *block, size_t size)
 Create a fake mapped file from dynamic array data.
int fid_file_grow_by_size (fid_Mappedfile *mfile, size_t size, fid_Error *error)
 Make memory mapped file larger.
int fid_file_ensure_free_space (fid_Mappedfile *mfile, size_t size, fid_Error *error)
 Make sure that a certain number of bytes are free in the file.
int fid_file_write (fid_Mappedfile *mfile, fid_Error *error, const char *fmt,...)
 Write formatted string to mapped file.
int fid_file_dump_to_file (const fid_Mappedfile *mfile, const char *filename, fid_Error *error)
 Write content of a (fake) mapped file to new file.
int fid_file_make_readonly (fid_Mappedfile *mfile, fid_Error *error)
 Make memory mapped file read-only.
void fid_file_cleanup (fid_Mappedfile *mfile)
 Close memory mapped file and remove it from disk.
void fid_file_prefetch (const fid_Mappedfile *mfile, int smart)
 Read whole mapped file to fill the operating system's file cache.
void fid_file_unmap (fid_Mappedfile *mfile)
 Close memory mapped file.
int fid_filenamebuffer_init (fid_Filenamebuffer *fnamebuf, const char *filename, fid_Error *error)
 Allocate some memory to hold the generated filenames.
int fid_filenamebuffer_init_local (fid_Filenamebuffer *fnamebuf, fid_Filenamebuffer **extbuffer, const char *filename, fid_Error *error)
 Allocate file name buffer if necessary.
void fid_filenamebuffer_free (fid_Filenamebuffer *fnamebuf)
 Free file name buffer.
char * fid_filename_create (const char *basefilename, const char *fileext, fid_Error *error)
 Construct filename from base name and extension.

Define Documentation

#define fid_MAPPEDFILE_GROWSIZE   ((size_t)524288)

Number of bytes to append to a file when running out of space.

Definition at line 91 of file fileutils.h.

Referenced by fid_file_ensure_free_space(), and fid_file_new().

#define fid_CAST_POINTER ( P,
 )     ((T *)((void *)(P)))

Cast pointer into pointer of different type.

With gcc 3.1, a bare cast of unsigned chars to unsigned integers yields a warning (cast increases required alignment of target type), hence we add a prior cast to void to shut up the compiler in these cases.

Definition at line 100 of file fileutils.h.

Referenced by fid_suffixarray_find_large_lcp().

#define fid_MAPPEDFILE_IS_WRITABLE ( MF   )     (((MF)->mmap_prot&PROT_WRITE) != 0)

Check if the mapped file is writable.

The check is performed by checking the fid_Mappedfile::mmap_prot field.

Parameters:
MF A pointer to a fid_Mappedfile structure.
Returns:
True if the file is writable, false otherwise.

Definition at line 111 of file fileutils.h.

Referenced by fid_file_ensure_free_space(), fid_file_grow_by_size(), and fid_file_make_readonly().

#define fid_MAPPEDFILE_IS_FAKED ( MF   )     ((MF)->content != NULL && (MF)->fd == -1)

Check if the mapped file is really a file or not.

Parameters:
MF A pointer to a fid_Mappedfile structure.
Returns:
True if the file is not a real file, but a faked one. This means, the content pointer really points to allocated memory that has been obtained via malloc(), not via mmap().

Definition at line 122 of file fileutils.h.

#define fid_MAPPEDFILE_CHECKSPACE ( MF,
N,
GROW,
ERR,
ERRCODE   ) 

Value:

if((MF)->occupied+(N) > (MF)->allocated)\
  {\
    if(fid_file_grow_by_size(MF,GROW,ERR) == -1)\
    {\
      ERRCODE;\
    }\
  }
Make sure that the mapped file has space to store N bytes.

When running out of space, i.e., if the file is too short to store N bytes, the file length will be increased by GROW bytes.

Parameters:
MF A pointer to a fid_Mappedfile structure.
N Requested size.
GROW Number of bytes to add to mapped file if too short.
ERR Pointer to a fid_Error structure.
ERRCODE Code to be executed in case of an error condition.
See also:
For this macro makes use of fid_file_grow_by_size(), you may consider reading the notes attached to that function.

Definition at line 139 of file fileutils.h.

Referenced by fid_file_write().

#define fid_MAPPEDFILE_APPEND_UNSAFE ( MF,
TYPE,
VAL   ) 

Value:

*fid_CAST_POINTER(&(MF)->content[(MF)->occupied],TYPE)=(TYPE)(VAL);\
  (MF)->occupied+=sizeof(TYPE)
Append value of given type to end of file.

Use this macro only if you are absolutely sure that the file's allocated size is not exceeded by appending the value to it. Use fid_MAPPEDFILE_APPEND_GROW() or fid_MAPPEDFILE_APPEND_TRY() otherwise.

Parameters:
MF A pointer to a fid_Mappedfile structure.
TYPE Type of the value to be stored. This should be a basic C type, no structure.
VAL The value to be stored. It will be casted to TYPE by this macro.

Definition at line 161 of file fileutils.h.

#define fid_MAPPEDFILE_APPEND_GROW ( MF,
TYPE,
VAL,
ERR,
ERRCODE   ) 

Value:

Append value of given type to end of file, increase size when needed.

When running out of space, i.e., if the file is too short to store the given value, the file length will be increased by fid_MAPPEDFILE_GROWSIZE bytes.

Parameters:
MF A pointer to a fid_Mappedfile structure.
TYPE Type of the value to be stored. This should be a basic C type, no structure.
VAL The value to be stored. It will be casted to TYPE by this macro.
ERR Pointer to a fid_Error structure.
ERRCODE Code to be executed is case of an error condition.

Definition at line 180 of file fileutils.h.

#define fid_MAPPEDFILE_APPEND_TRY ( MF,
TYPE,
VAL,
ERR,
ERRCODE   ) 

Value:

if((MF)->content != NULL)\
  {\
    fid_MAPPEDFILE_APPEND_GROW(MF,TYPE,VAL,ERR,ERRCODE);\
  }
Append value of given type to end of file, increase size when needed.

When running out of space, i.e., if the file is too short to store the given value, the file length will be increased by fid_MAPPEDFILE_GROWSIZE bytes. The macro does nothing if the file content is NULL, i.e., if no open file is associated with MF, hence this macro is safe to be used at any time.

Parameters:
MF A pointer to a fid_Mappedfile structure.
TYPE Type of the value to be stored. This should be a basic C type, no structure.
VAL The value to be stored. It will be casted to TYPE by this macro.
ERR Pointer to a fid_Error structure.
ERRCODE Code to be executed is case of an error condition.

Definition at line 201 of file fileutils.h.


Function Documentation

int fid_create_online_files ( fid_Sequences seqs,
const fid_Alphabet alpha,
const char *  basefilename,
fid_Tablerequest  tables,
fid_Uintsize  uisize,
fid_Error error 
)

Create files according to requested tables.

The files will be created writable with an initial size of fid_MAPPEDFILE_GROWSIZE. Existing files will be overwritten.

Parameters:
seqs Reference to the parsed sequences stored on disk in binary form.
alpha The alphabet that is used to transform the textual data into binary.
basefilename The base name of the generated tables.
tables Bitvector of requested tables. Any combination of requests is allowed.
uisize The size of integers stored on files.
error Error messages go here.
Returns:
0 on success, -1 on error.

Definition at line 82 of file createfiles.c.

References fid_Mappedfile::content, fid_Sequences::desfile, fid_file_cleanup(), fid_filenamebuffer_free(), fid_filenamebuffer_init(), fid_sequences_init(), fid_TABLE_DES, fid_TABLE_NONE, fid_TABLE_OIS, fid_TABLE_TIS, fid_Sequences::oisfile, fid_Sequences::sdsfile, fid_Sequences::sspfile, and fid_Sequences::tisfile.

Referenced by fid_sequences_parse_from_memory_to_file().

int fid_file_map ( fid_Mappedfile mfile,
const char *  filename,
int  writable,
int  may_prefetch,
fid_Error error 
)

Map existing file into memory.

Parameters:
mfile The structure that represents a memory mapped file.
filename The name of the file to be opened.
writable Map file in read-only mode if 0, or writable if not 0.
may_prefetch Prefetch file automatically if non-zero, and if allowed by global settings (controlled via environment variable, see fid_options_parse()).
error Errors go here.
Returns:
0 on success, -1 otherwise.

Definition at line 226 of file fileutils.c.

References fid_file_prefetch(), fid_options_parse(), fid_options_prefetch, and fid_options_smart_prefetch.

Referenced by fid_alphabet_init_from_specfile(), and fid_projectfile_parse_from_file().

int fid_file_new ( fid_Mappedfile mfile,
const char *  filename,
fid_Error error 
)

Create new file and map it writable into memory.

The initial file size is determined by the compile time constant fid_MAPPEDFILE_GROWSIZE. It must be truncated to the desired size after it has been filled. Note that this is done automatically by functions like fid_file_make_readonly() and fid_file_unmap().

Parameters:
mfile The structure that represents a memory mapped file.
filename The name of the file to be created.
error Errors go here.
Returns:
0 on success, -1 otherwise.

Definition at line 259 of file fileutils.c.

References fid_MAPPEDFILE_GROWSIZE.

Referenced by fid_projectfile_write().

int fid_file_allocate ( fid_Mappedfile mfile,
const char *  filename,
size_t  size,
fid_Error error 
)

Create new file and map it writable into memory.

The same as fid_file_new(), but the initial file size can be set by the caller.

Parameters:
mfile The structure that represents a memory mapped file.
filename The name of the file to be created.
size Initial size of the new file. This value must be greater than 0.
error Errors go here.
Returns:
0 on success, -1 otherwise.

Definition at line 280 of file fileutils.c.

Referenced by fid_alphabet_write_to_file().

void fid_file_fake ( fid_Mappedfile mfile,
void *  block,
size_t  size 
)

Create a fake mapped file from dynamic array data.

A fake mapped file is basically a dynamic array stored as mapped file. This concept has been introduced to enable parsing sequence data to memory, using dynamic arrays, and using that parsed data just like a mapped file. Please note that most functions that do low-level stuff of mapped files do not, and probably will never, work on fake mapped files.

When calling fid_file_unmap() on a fake mapped file, pointer block passed to this function will be free()'ed. This is for pure convenience so that no special case distinction between real and fake files is required in client code.

This function, and the internal map_file(), is the only initialization function for fid_Mappedfile structures.

Parameters:
mfile The structure to be initialized.
block A pointer to allocated memory.
size Size of the data pointed to by block, in bytes.

Definition at line 309 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_Mappedfile::fd, fid_Mappedfile::filename, fid_Mappedfile::mmap_flags, fid_Mappedfile::mmap_prot, and fid_Mappedfile::occupied.

int fid_file_grow_by_size ( fid_Mappedfile mfile,
size_t  size,
fid_Error error 
)

Make memory mapped file larger.

This function enlarges the memory mapped file by size more bytes. Note that this requires re-mapping the file and therefore the fid_Mappedfile::content pointer might be changed by this call.

Parameters:
mfile The structure that represents a memory mapped file.
size Number of bytes to append to the file.
error Errors go here.
Note:
Repetitive invokation of this function for multiple simultaneously open files may result in heavy fragmentation of virtual address space, and thus unsuccessful calls to mmap() in this or other functions.
Returns:
0 on success, -1 otherwise.

Definition at line 340 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_Mappedfile::fd, fid_error_throw(), fid_MAPPEDFILE_IS_WRITABLE, fid_Mappedfile::filename, fid_Mappedfile::mmap_flags, fid_Mappedfile::mmap_prot, and fid_Mappedfile::occupied.

Referenced by fid_file_ensure_free_space().

int fid_file_ensure_free_space ( fid_Mappedfile mfile,
size_t  size,
fid_Error error 
)

Make sure that a certain number of bytes are free in the file.

If there are less than size bytes free in the file, make it larger to make sure that at least size bytes will fit into it. Note that this requires re-mapping the file, so the fid_Mappedfile::content pointer may have changed after this call.

Parameters:
mfile The structure that represents a memory mapped file.
size Number of bytes required to be free in the file.
error Error messages go here.
Returns:
0 on success, -1 otherwise.
Note:
Repetitive invokation of this function for multiple simultaneously open files may result in heavy fragmentation of virtual address space, and thus unsuccessful calls to mmap() in this or other functions.

Definition at line 417 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_error_throw(), fid_file_grow_by_size(), fid_MAPPEDFILE_GROWSIZE, fid_MAPPEDFILE_IS_WRITABLE, fid_Mappedfile::filename, and fid_Mappedfile::occupied.

int fid_file_write ( fid_Mappedfile mfile,
fid_Error error,
const char *  fmt,
  ... 
)

Write formatted string to mapped file.

This function acts like printf(3) and writes directly to a mapped file. It enlarges the file as necessary, but note that since internally vsnprintf(3) is used, this function has to allocate one byte more than actually necessary because vsnprintf(3) always appends a '\0' character to its output. The fid_Mappedfile::occupied field is updated to not count this extra byte, so it does not need to be adjusted externally.

Additionally, if the file size has to be increased, then it is increased by a bit more than is actually required to decrease the chance of needing more resizes in subsequent calls.

Parameters:
mfile The mapped file that should be written to.
error Error messages go here.
fmt Format string as required by vsnprintf(3), followed by data to be formatted. If this is NULL, then this function does nothing.
Returns:
0 on success, -1 otherwise.

Definition at line 501 of file fileutils.c.

References fid_Mappedfile::content, fid_MAPPEDFILE_CHECKSPACE, and fid_Mappedfile::occupied.

Referenced by fid_projectfile_write().

int fid_file_dump_to_file ( const fid_Mappedfile mfile,
const char *  filename,
fid_Error error 
)

Write content of a (fake) mapped file to new file.

This function is mainly used for writing fake mapped files to real files. Fake mapped files refer to allocated memory, which can be written to real files using this function. A new file is created by this function, or overwritten if a file of given name existed already, and the data the file structure refers to is written to it.

The destination file is deleted if writing fails, i.e., this function leaves no partial files around. Interrupted system calls are handled correctly.

This function does not restrict the input to fake mapped files. Thus, the effect of applying this function to a real mapped file is that the input file is copied to a new file of the given name.

Parameters:
mfile The structure that represents a (fake) memory mapped file.
filename Name of the output file.
error Errors go here.
Returns:
0 on success, -1 otherwise.

Definition at line 674 of file fileutils.c.

References fid_Mappedfile::content, fid_error_throw(), fid_error_throw_errno(), and fid_Mappedfile::occupied.

int fid_file_make_readonly ( fid_Mappedfile mfile,
fid_Error error 
)

Make memory mapped file read-only.

This function truncates the file to its real length first (as indicated by fid_Mappedfile::occupied), and then makes the file and memory map read-only. The function will return immediately with no error if mfile is read-only already.

Parameters:
mfile The structure that represents a memory mapped file.
error Errors go here.
Returns:
0 on success, -1 otherwise.

Definition at line 564 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_Mappedfile::fd, fid_error_throw(), fid_MAPPEDFILE_IS_WRITABLE, fid_Mappedfile::filename, fid_Mappedfile::mmap_flags, fid_Mappedfile::mmap_prot, and fid_Mappedfile::occupied.

void fid_file_cleanup ( fid_Mappedfile mfile  ) 

Close memory mapped file and remove it from disk.

This function comes handy in error conditions. It closes the associated file and removes it from disk afterwards. It does not work for fake mapped files.

Parameters:
mfile The structure that represents a memory mapped file.

Definition at line 753 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_Mappedfile::fd, fid_Mappedfile::filename, and fid_Mappedfile::occupied.

Referenced by fid_create_online_files().

void fid_file_prefetch ( const fid_Mappedfile mfile,
int  smart 
)

Read whole mapped file to fill the operating system's file cache.

When searching in an enhanced suffix array, some tables are accessed in a more or less random order, e.g., the sequence data itself, or the inverse suffix array. The operating system must read values from file if not read before (and present in the file cache) already, resulting in an extremely expensive search operation on the harddisk. While reading files sequentially is usually reasonably fast, accessing files in random order can be slower by orders of magnitude. Hence, for reducing the negative effects of slow harddisk searches, this function can be called to read the whole file in order to fill the operating system's file cache.

Assuming that the operating system always reads and caches whole memory pages when reading single values from a page, this function calls getpagesize() exactly one time, and reads only the first byte of every page from the file. The function will fail silently every time it is called should getpagesize() have returned a non-positive value (a warning will be printed to stderr in this case).

Parameters:
mfile The mapped file to be prefetched. If mfile or fid_Mappedfile::content is NULL, then this function will do nothing.
smart Try to be smart if non-zero. If non-zero, and, by some probability, at least 90% of the file is in cache already, then the file is not read again, and thus this function returns quickly. Otherwise, the whole file is read.

Definition at line 869 of file fileutils.c.

References fid_Mappedfile::content, and fid_Mappedfile::occupied.

Referenced by fid_file_map().

void fid_file_unmap ( fid_Mappedfile mfile  ) 

Close memory mapped file.

If the file is writable and its allocated size does not match its real size, then it is truncated to its real size before it gets closed. A file of zero length will not be deleted automatically.

For fake mapped files, the fid_Mappedfile::content pointer will be free()'ed.

Parameters:
mfile The structure that represents a memory mapped file.

Definition at line 940 of file fileutils.c.

References fid_Mappedfile::allocated, fid_Mappedfile::content, fid_Mappedfile::fd, fid_Mappedfile::filename, and fid_Mappedfile::occupied.

Referenced by fid_alphabet_init_from_specfile(), fid_alphabet_write_to_file(), fid_projectfile_parse_from_file(), fid_projectfile_write(), fid_sequences_free(), fid_sequences_map(), fid_sequences_parse_from_memory_to_file(), and fid_suffixarray_free().

int fid_filenamebuffer_init ( fid_Filenamebuffer fnamebuf,
const char *  filename,
fid_Error error 
)

Allocate some memory to hold the generated filenames.

Parameters:
fnamebuf The structure to be initialized.
filename The base file name from which names are generated.
error Error messages go here.
Returns:
0 on success, -1 if out of memory. Results are return in arguments.

Definition at line 985 of file fileutils.c.

References fid_Filenamebuffer::buffer, fid_Filenamebuffer::bufptr, and fid_OUTOFMEM.

Referenced by fid_create_online_files(), fid_filename_create(), fid_filenamebuffer_init_local(), and fid_suffixarray_load_from_files().

int fid_filenamebuffer_init_local ( fid_Filenamebuffer fnamebuf,
fid_Filenamebuffer **  extbuffer,
const char *  filename,
fid_Error error 
)

Allocate file name buffer if necessary.

Parameters:
fnamebuf The file name buffer to be possibly initialized.
extbuffer Pointer to a file name buffer pointer. If the pointed-to pointer is NULL, then fnamebuf will be initialized and the pointer is set to fnamebuf, otherwise nothing will happen.
filename The base file name from which names are generated.
error Error messages go here.
Return values:
0 if no memory has been allocated.
1 if a new file name buffer has been allocated.
-1 on error.

Definition at line 1022 of file fileutils.c.

References fid_filenamebuffer_init().

Referenced by fid_sequences_map().

void fid_filenamebuffer_free ( fid_Filenamebuffer fnamebuf  ) 

Free file name buffer.

Parameters:
fnamebuf Structure to be freed.

Definition at line 1045 of file fileutils.c.

References fid_Filenamebuffer::buffer, and fid_Filenamebuffer::bufptr.

Referenced by fid_create_online_files(), fid_sequences_map(), and fid_suffixarray_load_from_files().

char* fid_filename_create ( const char *  basefilename,
const char *  fileext,
fid_Error error 
)

Construct filename from base name and extension.

Parameters:
basefilename Base filename.
fileext File name extension, such like "tis". Note that this absolutely must consist of three printable zero-terminated characters, and should not include a leading dot.
error Error messages go here.
Returns:
A pointer to an allocated string, or NULL on error.

Definition at line 1066 of file fileutils.c.

References fid_Filenamebuffer::buffer, fid_Filenamebuffer::bufptr, and fid_filenamebuffer_init().

Referenced by fid_alphabet_write_to_file(), and fid_projectfile_init().


Generated on Wed Jul 8 17:21:16 2009 for Full-text Index Data structure library by  doxygen 1.5.9