BiBiServ Logo
Attention:
Due to technical maintenance some tools might be unavailable.
See maintenance information.
BiBiServ -
                                    Bielefeld         University Bioinformatic Service
Tools
Education
Administration
Tools
Genome Comparison
Gecko
REPuter
...more
Alignments
PoSSuMsearch2
ChromA
...more
Primer Design
GeneFisher2
RNA Studio
RNAshapes
KnotInFrame
RNAhybrid
...more
Evolutionary Relationship
ROSE
...more
Others
XenDB
jPREdictor
...more

libfid - Full-text Index Data structure Library

Short introduction

The Full-text Index Data structure library (libfid) is a portable software library for accessing indexed data through a simple C interface. It implements, among others, functions for reading indexed data from files, and for performing common operations such as fast string matching. Easy alphabet handling for mapping between printable and binary alphabets is integrated from ground up. Currently, the enhanced suffix array [2] is the only full-text index data structure supported; others might be added later.

Introduction

With the decreasing costs of computer memory, be it RAM or harddisk, and the broad availability of 64 bit CPUs, a feasible alternative, or accompanying, solution for searching in large data sets is the use of full-text index data structures such as the enhanced suffix array. Search algorithms operating on enhanced suffix arrays can often achieve sublinear running times with respect to the database size, at the cost of preprocessing the sequence data and storing an index for it on harddisk. For an introduction to suffix arrays in general take a look at [1], enhanced suffix arrays are described in detail in [2].

The software library libfid provides data structures for representing enhanced suffix arrays (the only index data structure currently supported), and implements many operations frequently performed on these. Sequence data is generally transformed into binary representation using freely definable alphabets. The library can process enhanced suffix arrays stored on files as generated by mkESA [3] (which is the same format as generated by mkvtree from the Vmatch package written by Stefan Kurtz). The library comes with its own enhanced suffix array construction program, which is, however, rather slow and intended to be used by the library's test suite only. Please consider downloading and using our more advanced program mkESA for processing real-world data.

A full library API reference in HTML format is available online, or as a tarball on our download page. If you prefer, you can also build the documentation from the library source code using Doxygen. After configuration and running make (some files must be generated before Doxygen can do its work), simply run doxygen to build HTML and LaTeX documentation.

libfid is freely available under the terms of the GNU Lesser General Public License (LGPL). Try our download page.

Installation

libfid has been designed to be as lightweight and portable as possible (portable among UNIX-like systems, that is). It has been compiled and installed successfully on various 32 and 64 bit platforms using different C compilers, including several Linux distributions, Solaris 8 and later, FreeBSD 4.11 and later, NetBSD 3.1.1 and later, OpenBSD 4.2, and Mac OS X. Use something like

$ ./configure [options] && make && make install

to configure, build, and install the library. Optionally, run make check before installation to run the test suite. All tests should pass with no error, except for the first one which will be skipped unless you have downloaded the extra test suite data, and extracted it below the source path of libfid. By default, make install will install the library below /usr/local/.

A typical command for configuring libfid using gcc for a 64 bit platform looks like this:

$ ./configure CFLAGS='-O3 -m64'

Read the output of ./configure --help to learn more about configuration options. Please also take a look into the library documentation on how to use --enable-memprof before using this option (see section "How to use libfid in your programs" on the title page, then in the list below "Optionally:").

References:

[1] U. Manber and E.W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935-948, 1993.

[2] M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Trees with Enhanced Suffix Arrays. Journal of Discrete Algorithms, 2:53-86, 2004.

[3] R. Homann, D. Fleer, R. Giegerich, M. Rehmsmeier. mkESA: enhanced suffix array construction tool. Bioinformatics, 25(8):1084-1085, 2009.

Welcome
Download
Manual
Contact
Fri Dec 14 12:54:38 2012