The library libfid is published under the terms of the GNU Lesser General Public License Version 2.1 (or any later version), GNU LGPL for short. See file COPYING
, shipped as part of the library sources, for the exact terms of licensing. Also included are a few programs, all published under the terms of the GNU General Public License Version 2 (or any later version), GNU GPL for short. See file COPYING
, shipped as part of the library sources in directory tools
, for the exact terms of licensing of these programs. See http://www.gnu.org/copyleft/ to learn more about these licenses.
With the decreasing costs of computer memory, be it RAM or harddisk, and the broad availability of 64 bit CPUs, a feasible alternative, or accompanying, solution for searching in large data sets is the use of full-text index data structures such as the enhanced suffix array. Search algorithms operating on enhanced suffix arrays can often achieve sublinear running times with respect to the database size, at the cost of preprocessing the sequence data and storing an index for it on harddisk. For an introduction to suffix arrays in general take a look at [1], enhanced suffix arrays are described in detail in [2].
This software library provides data structures for representing enhanced suffix arrays, and implements many operations frequently performed on these. Sequence data is generally transformed into binary representation using freely definable alphabets. The library expects enhanced suffix arrays being stored in a format as generated by mkvtree
from the Vmatch package written by Stefan Kurtz. The slowbuildesa
program that comes with libfid can also be used to construct enhanced suffix arrays, but be warned that slowbuildesa
implements a naive suffix sorter based on quicksort and simple string comparisons, and thus is much slower than mkvtree
. (The prime use of slowbuildesa
is for running tests via make check
, which is also the reason why it is doesn't get installed by make install
.)
Please consider using our advanced tool mkesa
[3] for enhanced suffix array construction, a vastly improved version of slowbuildesa
based on a multithreaded Deep-Shallow [4] implementation. It is typically faster than mkvtree
and also more space conserving. mkesa
is available in source code under the terms of the GNU General Public License Version 2 (or any later version), and distributed as a separate package on http://bibiserv.techfak.uni-bielefeld.de/mkesa/.
./configure [options] && make && make install
Optionally, run make check
before installation to run the test suite. All tests should pass with no error, except for the first one which will be skipped unless you have downloaded the extra test suite data (libfid-testdata.tar.gz
), and extracted it below the source path of libfid.
By default, make install
will install the library below /usr/local/
. This can be changed by passing the appropriate options to configure
. Use ./configure --help
to learn about the configuration options.
Hint: create a config.site
file and let the environment variable CONFIG_SITE
point to it. You can put your default compiler flags (optimization, paths, etc.) in there and avoid retyping them over again each time you need to invoke configure
. All configure
scripts generated by GNU Autoconf 2.x honor the CONFIG_SITE
variable, so its use is not limited to libfid.
libfid.h
in your program when programming in plain C, or libfidxx.h
for C++ (support for C++ is, however, somewhat experimental).DEBUG
for compilation and link against the debug version of the library. (A debug version of the library can be built by configuring with --enable-debug
.)DEBUG
is not defined and link against the non-debug version of the library.Optionally:
libfid.32
(or libfid.64
) to get all interfaces with the _32
(or _64
) suffix removed. Please note that using the 32 bit interface simply translates to "working with enhanced suffix arrays represented by 32 bit integers", and has nothing to do with compiling a 32 or 64 bit executable. fid_traits<32>
and fid_traits<64>
from libfidxx.h
to access 32 (or 64) bit interfaces. Do not include libfid.32
or libfid.64
in this case. Note that support for C++ is experimental, so please don't hesitate contacting the author if you think there is something wrong or missing.--enable-memprof
to the configure
script. Note that this implies that your programs must also be compiled to use the profiling facility then. AM_PATH_LIBFID
and FID_MEMPROFILING
macros handy (see file libfid.m4).For a minimal source code example, see the code of program exactsearch.c from the test suite (though its command line handling is ugly).
Note that it is important to link against the correct library version since some data structures are augmented by extra data fields in debug mode. Hence, linking a program compiled without defining symbol DEBUG
against the debug version of libfid will produce a program that is likely to crash, or to behave somewhat "funny" otherwise.
FID_OPTIONS
(see FID_OPTIONS_VARNAME and fid_options_parse()) can be defined to take some influence on how the library behaves in certain situations.[2] M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Trees with Enhanced Suffix Arrays. Journal of Discrete Algorithms, 2:53-86, 2004.
[3] R. Homann, D. Fleer, R. Giegerich, M. Rehmsmeier. mkESA: enhanced suffix array construction tool. Bioinformatics, 25(8):1084-1085, 2009.
[4] G. Manzini, P. Ferragina. Engineering a Lightweight Suffix Array Construction Algorithm. Algorithmica, 40(1):33-50, 2004.