|
|
||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
libfid - Full-text Index Data structure LibraryShort introductionThe Full-text Index Data structure library (libfid) is a portable software library for accessing indexed data through a simple C interface. It implements, among others, functions for reading indexed data from files, and for performing common operations such as fast string matching. Easy alphabet handling for mapping between printable and binary alphabets is integrated from ground up. Currently, the enhanced suffix array [2] is the only full-text index data structure supported; others might be added later. IntroductionWith the decreasing costs of computer memory, be it RAM or harddisk, and the broad availability of 64 bit CPUs, a feasible alternative, or accompanying, solution for searching in large data sets is the use of full-text index data structures such as the enhanced suffix array. Search algorithms operating on enhanced suffix arrays can often achieve sublinear running times with respect to the database size, at the cost of preprocessing the sequence data and storing an index for it on harddisk. For an introduction to suffix arrays in general take a look at [1], enhanced suffix arrays are described in detail in [2]. The software library libfid provides data structures
for representing enhanced suffix arrays (the only index data
structure currently supported), and implements many operations
frequently performed on these. Sequence data is generally
transformed into binary representation using freely definable
alphabets. The library can process enhanced suffix arrays stored
on files as generated by mkESA
[3] (which is the same format as generated by
A full library API reference in HTML format is available
online, or as a tarball on our
download page. If you prefer, you can also build the
documentation from the library source code using Doxygen. After
configuration and running libfid is freely available under the terms of the GNU Lesser General Public License (LGPL). Try our download page. Installationlibfid has been designed to be as lightweight and portable as possible (portable among UNIX-like systems, that is). It has been compiled and installed successfully on various 32 and 64 bit platforms using different C compilers, including several Linux distributions, Solaris 8 and later, FreeBSD 4.11 and later, NetBSD 3.1.1 and later, OpenBSD 4.2, and Mac OS X. Use something like
to configure, build, and install the library. Optionally, run
A typical command for configuring libfid using gcc for a 64 bit platform looks like this:
Read the output of References: [1] U. Manber and E.W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935-948, 1993. [2] M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Trees with Enhanced Suffix Arrays. Journal of Discrete Algorithms, 2:53-86, 2004. [3] R. Homann, D. Fleer, R. Giegerich, M. Rehmsmeier. mkESA: enhanced suffix array construction tool. Bioinformatics, 25(8):1084-1085, 2009. |
|
|||||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||||||