BiBiServ2 - libfid

Short introduction

The Full-text Index Data structure library (libfid) is a portable software library for accessing indexed data through a simple C interface. It implements, among others, functions for reading indexed data from files, and for performing common operations such as fast string matching. Easy alphabet handling for mapping between printable and binary alphabets is integrated from ground up. Currently, the enhanced suffix array [2] is the only full-text index data structure supported; others might be added later.

Introduction

With the decreasing costs of computer memory, be it RAM or harddisk, and the broad availability of 64 bit CPUs, a feasible alternative, or accompanying, solution for searching in large data sets is the use of full-text index data structures such as the enhanced suffix array. Search algorithms operating on enhanced suffix arrays can often achieve sublinear running times with respect to the database size, at the cost of preprocessing the sequence data and storing an index for it on harddisk. For an introduction to suffix arrays in general take a look at [1], enhanced suffix arrays are described in detail in [2].

The software library libfid provides data structures for representing enhanced suffix arrays (the only index data structure currently supported), and implements many operations frequently performed on these. Sequence data is generally transformed into binary representation using freely definable alphabets. The library can process enhanced suffix arrays stored on files as generated by mkESA [3] (which is the same format as generated by mkvtree from the Vmatch package written by Stefan Kurtz). The library comes with its own enhanced suffix array construction program, which is, however, rather slow and intended to be used by the library's test suite only. Please consider downloading and using our more advanced program mkESA for processing real-world data.

A full library API reference in HTML format is available as a tarball on our download page. If you prefer, you can also build the documentation from the library source code using Doxygen. After configuration and running make (some files must be generated before Doxygen can do its work), simply run doxygen to build HTML and LaTeX documentation.

libfid is freely available under the terms of the GNU General Public License (GPL) version 2 or higher. Try our download page.