Data Structures | |
struct | fid_ArraySymbol |
An array of symbols, i.e., a sequence of dynamic size. More... | |
struct | fid_Alphabet |
Definition of an alphabet. More... | |
Defines | |
#define | fid_SYMFMT "%hhu" |
Format string for printing the numeric value of a fid_Symbol. | |
#define | fid_SEPARATOR ((fid_Symbol)UCHAR_MAX) |
Special symbol: sequence separator. | |
#define | fid_WILDCARD ((fid_Symbol)(UCHAR_MAX-1)) |
Special symbol: wildcard character. | |
#define | fid_UNDEF ((fid_Symbol)(UCHAR_MAX-2)) |
Special symbol: undefined symbol. | |
#define | fid_SYMBOLMAX ((fid_Symbol)(UCHAR_MAX-3)) |
Maximum allowed value for a symbol. | |
#define | fid_REGULARSYMBOL(S) ((S) <= fid_UNDEF) |
Check whether symbol S is a sequence symbol or not. | |
#define | fid_SPECIALSYMBOL(S) ((S) > fid_UNDEF) |
The opposite of fid_REGULARSYMBOL(). | |
#define | fid_PRINT_SYMBOL(ALPHA, S) |
Transform binary symbol into its printable form, honoring specials. | |
#define | fid_CHAR_AS_INDEX(C) ((size_t)((unsigned char)(C))) |
Type cast printable character into unsigned array index. | |
Typedefs | |
typedef unsigned char | fid_Symbol |
Use this type to denote a binary transformed sequence symbol. | |
Enumerations | |
enum | fid_Alphabettype { fid_ALPHABET_DNA, fid_ALPHABET_RNA, fid_ALPHABET_DNARNA, fid_ALPHABET_PROTEIN } |
Identifiers of built-in alphabets. More... | |
Functions | |
int | fid_alphabet_init_from_speclines (fid_Alphabet *alpha, const char *str, size_t len, fid_Error *error) |
Parse alphabet definition and fill alphabet structure. | |
int | fid_alphabet_init_from_specfile (fid_Alphabet *alpha, const char *filename, fid_Error *error) |
Parse alphabet definition file and fill alphabet structure. | |
int | fid_alphabet_init_from_string (fid_Alphabet *alpha, const char *string, size_t length, fid_Error *error) |
Determine alphabet from ASCII text. | |
void | fid_alphabet_init_standard (fid_Alphabet *alpha, fid_Alphabettype type) |
Assign standard alphabet to alphabet structure. | |
int | fid_alphabet_add_wildcard (fid_Alphabet *alpha, char wcchar, fid_Error *error) |
Add wildcard character to alphabet. | |
size_t | fid_alphabet_transform_string (const fid_Alphabet *alpha, const char *string, size_t length, fid_Symbol *transformed, int no_special_symbols) |
Transform string according to given alphabet. | |
size_t | fid_alphabet_transform_string_inplace (const fid_Alphabet *alpha, char *string, size_t length, int no_special_symbols) |
Transform string according to given alphabet. | |
fid_Symbol * | fid_alphabet_transform_string_new (const fid_Alphabet *alpha, const char *string, size_t length, int no_special_symbols, fid_Error *error) |
Transform string according to given alphabet into new buffer. | |
int | fid_alphabet_write_to_file (const fid_Alphabet *alpha, const char *basefilename, fid_Error *error) |
Write textual representation of alphabet to file. | |
void | fid_alphabet_dump (const fid_Alphabet *alpha, FILE *stream) |
Print alphabet to output stream. |
#define fid_SYMFMT "%hhu" |
Format string for printing the numeric value of a fid_Symbol.
Definition at line 48 of file alphabet.h.
Referenced by fid_alphabet_add_wildcard(), fid_alphabet_dump(), and fid_alphabet_init_from_string().
#define fid_SEPARATOR ((fid_Symbol)UCHAR_MAX) |
Special symbol: sequence separator.
Definition at line 58 of file alphabet.h.
Referenced by fid_alphabet_dump(), and fid_sequences_dump_range().
#define fid_WILDCARD ((fid_Symbol)(UCHAR_MAX-1)) |
Special symbol: wildcard character.
Definition at line 63 of file alphabet.h.
Referenced by fid_alphabet_add_wildcard(), fid_alphabet_dump(), fid_alphabet_init_from_speclines(), fid_alphabet_write_to_file(), fid_sequences_compute_distribution(), and fid_suffixarray_compute_distribution().
#define fid_UNDEF ((fid_Symbol)(UCHAR_MAX-2)) |
Special symbol: undefined symbol.
Definition at line 68 of file alphabet.h.
Referenced by fid_alphabet_add_wildcard(), fid_alphabet_dump(), fid_alphabet_init_from_speclines(), and fid_alphabet_init_from_string().
#define fid_SYMBOLMAX ((fid_Symbol)(UCHAR_MAX-3)) |
Maximum allowed value for a symbol.
Note that this is not the maximum number of symbols, but the maximum allowed value of a symbol.
Definition at line 76 of file alphabet.h.
Referenced by fid_alphabet_add_wildcard(), fid_alphabet_dump(), fid_alphabet_init_from_string(), fid_alphabet_transform_string(), and fid_alphabet_write_to_file().
#define fid_REGULARSYMBOL | ( | S | ) | ((S) <= fid_UNDEF) |
Check whether symbol S
is a sequence symbol or not.
Note that undefined characters are considered regular symbols, but wildcards and sequence separators are not.
S | A symbol of type fid_Symbol. |
Definition at line 86 of file alphabet.h.
Referenced by fid_suffixarray_find_embedded_interval(), fid_suffixarray_get_intervals(), fid_suffixarray_traverse(), and fid_suffixinterval_lcpvalue().
#define fid_SPECIALSYMBOL | ( | S | ) | ((S) > fid_UNDEF) |
The opposite of fid_REGULARSYMBOL().
Definition at line 91 of file alphabet.h.
Referenced by fid_suffixarray_find_embedded_interval(), fid_suffixarray_get_intervals(), and fid_suffixinterval_lcpvalue().
#define fid_PRINT_SYMBOL | ( | ALPHA, | |||
S | ) |
Value:
((S) == fid_UNDEF\ ?'~'\ :((S) == fid_SEPARATOR\ ?'|'\ :(ALPHA)->sym_to_char[(size_t)(S)]))
Use this macro whenver presenting alphabet encoded sequences to human beings. The given symbol is transformed into its printable form, so that undefined symbols and sequence separators are also printed correctly.
ALPHA | A pointer to a fid_Alphabet structure. | |
S | A symbol encoded by and to be decoded via alphabet ALPHA . |
Definition at line 141 of file alphabet.h.
Referenced by fid_sequences_dump_range(), fid_suffixarray_dump_intervals(), and fid_suffixinterval_dump().
#define fid_CHAR_AS_INDEX | ( | C | ) | ((size_t)((unsigned char)(C))) |
Type cast printable character into unsigned array index.
Converting a signed char into some bigger unsigned type can go very wrong if not done carefully. This macro is careful. Use it for accessing fid_Alphabet::char_to_sym.
C | A printable character, type char . |
Definition at line 159 of file alphabet.h.
Referenced by fid_alphabet_add_wildcard(), fid_alphabet_init_from_speclines(), fid_alphabet_init_from_string(), and fid_alphabet_transform_string().
typedef unsigned char fid_Symbol |
Use this type to denote a binary transformed sequence symbol.
This type has been introduced for pure documentary reasons. There is no, and probably never will be, any form of wide character or unicode support. It would be safe to use unsigned char
all the time, but using fid_Symbol
instead makes the code much more readable and understandable.
Definition at line 43 of file alphabet.h.
enum fid_Alphabettype |
Identifiers of built-in alphabets.
An alphabet structure can be initialized by a library function to define one of the standard alphabets supported by the library.
Definition at line 167 of file alphabet.h.
int fid_alphabet_init_from_speclines | ( | fid_Alphabet * | alpha, | |
const char * | str, | |||
size_t | len, | |||
fid_Error * | error | |||
) |
Parse alphabet definition and fill alphabet structure.
The alphabet definition consists of multiple lines, each containing characters to be considered as equal. Thus, each line defines a character class. Symbols are assigned to character classes in increasing order. Lines starting with a '#' character are treated as comments.
alpha | The structure to be filled according to the passed definition. | |
str | Alphabet definition file content. | |
len | Length of str . If 0, then the length will be determined using strlen(3) . | |
error | Error messages go here. |
Definition at line 54 of file alphabet.c.
References fid_Alphabet::char_to_sym, fid_CHAR_AS_INDEX, fid_error_throw(), fid_UNDEF, fid_WILDCARD, fid_Alphabet::num_of_chars, fid_Alphabet::num_of_syms, and fid_Alphabet::sym_to_char.
Referenced by fid_alphabet_init_from_specfile(), and fid_alphabet_init_standard().
int fid_alphabet_init_from_specfile | ( | fid_Alphabet * | alpha, | |
const char * | filename, | |||
fid_Error * | error | |||
) |
Parse alphabet definition file and fill alphabet structure.
The alphabet definition is read from file. See fid_alphabet_init_from_speclines() for details.
alpha | The structure to be filled according to the alphabet definition from the file. | |
filename | Name of the file containing an alphabet definition. | |
error | Error messages go here. |
Definition at line 170 of file alphabet.c.
References fid_Mappedfile::content, fid_alphabet_init_from_speclines(), fid_error_throw(), fid_file_map(), fid_file_unmap(), and fid_Mappedfile::occupied.
Referenced by fid_suffixarray_load_from_files().
int fid_alphabet_init_from_string | ( | fid_Alphabet * | alpha, | |
const char * | str, | |||
size_t | len, | |||
fid_Error * | error | |||
) |
Determine alphabet from ASCII text.
Any character in the passed string is put into the alphabet as regular symbol. No attempt is made to decode any other text format than ASCII (like UTF-8), and no wildcards will be added by this function.
alpha | The structure to be filled according to the text string. | |
str | Arbitrary text string. | |
len | Length of str . If 0, then the length will be determined using strlen(3) . | |
error | Error messages go here. |
Definition at line 206 of file alphabet.c.
References fid_Alphabet::char_to_sym, fid_CHAR_AS_INDEX, fid_error_throw(), fid_SYMBOLMAX, fid_SYMFMT, fid_UNDEF, fid_Alphabet::num_of_chars, fid_Alphabet::num_of_syms, and fid_Alphabet::sym_to_char.
void fid_alphabet_init_standard | ( | fid_Alphabet * | alpha, | |
fid_Alphabettype | type | |||
) |
Assign standard alphabet to alphabet structure.
Several commonly used alphabets are defined within this library. The type of the desired alphabet is selected by an alphabet identifier.
alpha | The structure to be filled. | |
type | Identifier of a standard alphabet. |
Definition at line 265 of file alphabet.c.
References fid_ALPHABET_DNA, fid_ALPHABET_DNARNA, fid_alphabet_init_from_speclines(), fid_ALPHABET_PROTEIN, and fid_ALPHABET_RNA.
int fid_alphabet_add_wildcard | ( | fid_Alphabet * | alpha, | |
char | wcchar, | |||
fid_Error * | error | |||
) |
Add wildcard character to alphabet.
Note that this function is very useful when initializing alphabets via fid_alphabet_init_from_string().
alpha | The alphabet the wildcard should be added to. | |
wcchar | ASCII representation of the wildcard. If wcchar is already mapped to the wildcard symbol by the alphabet, then this function does not change the alphabet. If wcchar is already mapped to some regular symbol by the alphabet, then this function returns an error. Note that wcchar must not be 0. | |
error | Error messages go here. |
Definition at line 297 of file alphabet.c.
References fid_Alphabet::char_to_sym, fid_CHAR_AS_INDEX, fid_error_throw(), fid_SYMBOLMAX, fid_SYMFMT, fid_UNDEF, fid_WILDCARD, fid_Alphabet::num_of_chars, fid_Alphabet::num_of_syms, and fid_Alphabet::sym_to_char.
size_t fid_alphabet_transform_string | ( | const fid_Alphabet * | alpha, | |
const char * | string, | |||
size_t | length, | |||
fid_Symbol * | transformed, | |||
int | no_special_symbols | |||
) |
Transform string according to given alphabet.
alpha | Alphabet used for transformation. | |
string | The input string to be transformed. | |
length | The number of characters in string . If 0, then the length of string will be determined within this function. In this case the string must be zero terminated. | |
transformed | Buffer the transformed string is written to. This must be of size at least the length of string . | |
no_special_symbols | If unequal to 0, then stop transformation and return an error if a character in string is transformed into a special symbol. |
Definition at line 372 of file alphabet.c.
References fid_Alphabet::char_to_sym, fid_CHAR_AS_INDEX, and fid_SYMBOLMAX.
Referenced by fid_alphabet_transform_string_inplace(), and fid_alphabet_transform_string_new().
size_t fid_alphabet_transform_string_inplace | ( | const fid_Alphabet * | alpha, | |
char * | string, | |||
size_t | length, | |||
int | no_special_symbols | |||
) |
Transform string according to given alphabet.
This function replaces the original string by the transformed string in the same buffer.
alpha | Alphabet used for transformation. | |
string | The string to be transformed in place. | |
length | The number of characters in string . If 0, then the length of string will be determined within this function. In this case the string must be zero terminated. | |
no_special_symbols | If unequal to 0, then stop transformation and return an error if a character in string is transformed into a special symbol. |
Definition at line 434 of file alphabet.c.
References fid_alphabet_transform_string().
fid_Symbol* fid_alphabet_transform_string_new | ( | const fid_Alphabet * | alpha, | |
const char * | string, | |||
size_t | length, | |||
int | no_special_symbols, | |||
fid_Error * | error | |||
) |
Transform string according to given alphabet into new buffer.
This function allocates a buffer large enough to hold the transformed string and returns that to the caller.
alpha | Alphabet used for transformation. | |
string | The input string to be transformed. | |
length | The number of characters in string . If 0, then the length of string will be determined within this function. In this case the string must be zero terminated. | |
no_special_symbols | If unequal to 0, then stop transformation and return an error if a character in string is transformed into a special symbol. | |
error | Error messages go here. |
NULL
pointer may be returned in the following cases:string
was found to be 0.error
in this case.error
in this case (if possible at all in this condition). Definition at line 469 of file alphabet.c.
References fid_alphabet_transform_string(), fid_error_throw(), and fid_OUTOFMEM.
int fid_alphabet_write_to_file | ( | const fid_Alphabet * | alpha, | |
const char * | basefilename, | |||
fid_Error * | error | |||
) |
Write textual representation of alphabet to file.
The function creates a new file and writes an alphabet definition file based on alpha
to that file. The filename will have the extension "al1" appended to basefilename
.
alpha | Alphabet structure whole content shall be written to file. | |
basefilename | The base filename of the enhanced suffix array. | |
error | Error messages go here. |
Definition at line 523 of file alphabet.c.
References fid_Mappedfile::allocated, fid_Alphabet::char_to_sym, fid_Mappedfile::content, fid_file_allocate(), fid_file_unmap(), fid_filename_create(), fid_SYMBOLMAX, fid_WILDCARD, fid_Alphabet::num_of_chars, fid_Alphabet::num_of_syms, and fid_Mappedfile::occupied.
void fid_alphabet_dump | ( | const fid_Alphabet * | alpha, | |
FILE * | stream | |||
) |
Print alphabet to output stream.
alpha | The alphabet structure to be printed out. | |
stream | An output stream to which the alphabet is printed. If NULL , nothing will be printed. |
Definition at line 586 of file alphabet.c.
References fid_Alphabet::char_to_sym, fid_SEPARATOR, fid_SYMBOLMAX, fid_SYMFMT, fid_UNDEF, fid_WILDCARD, fid_Alphabet::num_of_chars, fid_Alphabet::num_of_syms, and fid_Alphabet::sym_to_char.