BiBiServ Logo BiBiServ - Bielefeld University Bioinformatics Server
separator
Tools Education Administration News Links
separator
Links
Genome Projects
Serpins Super+Family

How to Identify and to Evaluate Diagnostic Sites Computationally

Given an alphabet $ \frak{A}$ (including the ``gap letter''), an indexed family of aligned sequences $ S_i=(a_{i1},\hdots,a_{iN})$ ($ i$ in some index set $ I$ - e.g. the numbers from 1 to 91 if we deal with all of the 91 vertebrate serpin sequences all consisting, by virtue of the alignment, of exactly $ N$ letters from $ \frak{A}$, and any subclass of these sequences specified by the corresponding subset $ J$ of the index set $ I$ (e.g. the subclass of certified ovalbumin-type sequences specified by the numbers from 1 to 12), we can form the profile of the subfamily $ S_j$ ($ j\in J$) which we define, for each site $ \nu$ ( $ 1\le \nu \le N$), to be the $ \frak{A}$-tuple $ (p^{\nu}_J(a))_{a\in \frak{A}}$ of observed frequencies
$\displaystyle p^{\nu}_J(a)=\frac{\char93  \{ j\in J\bigm\vert a_{j\nu}=a\}}{\char93  J}, $
and we can compare this $ \frak{A}$-tuple with the corresponding $ \frak{A}$-tuple defined for the complement $ J':=I-J$ of $ J$ in $ I$, e.g., by forming their $ l_1$-distance
$\displaystyle \Vert p^{\nu}_J,p^{\nu}_{J'}\Vert :=\sum \limits _{a\in \frak{A}}\vert p^{\nu}_J(a)-p^{\nu}_{J'}(a)\vert. $
Clearly, we have always
$\displaystyle 0\le p^{\nu}_J(a)\le 1 $
and
$\displaystyle \sum \limits _{a\in \frak{A}}p^{\nu}_J(a)=1 $
and, hence,
$\displaystyle \Vert p^{\nu}_J,p^{\nu}_{J'}\Vert \le 2 $
with equality if and only if the collection
$\displaystyle A_J(\nu ):=\{ a_{j\nu}\bigm\vert j\in J\} $
of letters occuring at site $ \nu$ within the subfamily specified by $ J$ is disjoint from the collection $ A_{J'}(\nu )$ defined correspondingly with the complement $ J'$ of $ J$ replacing $ J$. Consequently, the site ``$ \nu$'' is a diagnostic site for the subfamily in question if and only if $ \Vert p^{\nu}_J,p^{\nu}_{J'}\Vert =2$ holds because this is clearly equivalent to asserting that membership of an index $ i\in I$ to the subset $ J\subseteq I$ can be checked by considering the $ \nu$-th letter $ a_{i\nu}$ in the sequence $ S_i$: If this letter is in $ A_J(\nu)$, the sequence $ S_i$ belongs to the given subfamily (that is, $ i$ then must belong to $ J$), otherwise it belongs to its complement(that is, $ i$ then must belong to $ J'$). More generally, the site ``$ \nu$'' is almost diagnostic for $ J$ if $ \Vert p^{\nu}_J,p^{\nu}_{J'}\Vert$ is close to $ 2$. Using this approach, diagnostic sites for vertebrate serpins have been computed for the six groups of vertebrate serpins suggested by genomic organisation (see diagnostic sites for verterbrate serpins for details). This document was generated using the LaTeX2HTML translator.
separator separator
separator