|
|
|
:: Research Activities
BioAlgo group's objectives are:
- the development of efficient and scalable algorithms for high throughput analysis of biological and genomic data,
in order to help getting useful indication in a -highly competitive- timely manner, in the field of:
- simple and structured motif identification and extraction
- tandem repeats identification and extraction
- microarray gene expression data analysis
- SNP haplotyping analysis
- metabolic networks analysis
- diseases and gene expression profiling classification
-
the development of tools that allow the visualization and the analysis of
raw and processed biological data.
Tools available:
- AMIC@ - All MIcroarray Clusering @ once
an on-line tool for clustering and
visualizing microarray gene expression data, with a wide range of algorithms
-
PTRStalkerDB, a database containing all the fuzzy tandem repeats found in UniProtKB/Swiss-Prot using PTRStalker, a tool developed by our group.
- ReHap - Reconstruct Haplotypes
a web application aiming to provide users with a common interface to five algorithms for the parental haplotype reconstruction problem (also known as Haplotyping Assembly Problem).
- TReaDS - Tandem Repeats Discovery Service
a tandem repeat meta search engine that simultaneously
queries multiple on line publicly available tools for finding exact, approximate, short and long tandem repeats,
merging the results obtained and giving back a report, also including a global view of the results.
|
|
More details...
|
|
Motif Identification and Extraction
|
There is growing evidence that some diseases are related
to malfunctioning of the transcription factors (TF) of a gene
rather than to the gene itself. Even if many methods for motif finding are
available, most of them are able to cope with a few hundred
thousands bases and/or to find "simple" patterns. The
quest for methods capable of finding more complex patterns on large data sets is still open.
We plan to develop new scalable algorithms capable
of detecting complex patterns, and faint signals in the biological
sequences.
|
Tandem Repeats Identification and Extraction
|
Tandem repeats are multiple duplications of substrings in the DNA
that occur contiguously (or at a short distance) and may involve
mutations, indels and transpositions in the repeated patterns. It
is known that several diseases are linked to an abnormal
proliferation of tandem repeats. Thus the analysis of repeats in
human genome is an important genetic profiling
technique.
We developed innovative and robust (also against noise) methodologies
by using filtering techniques based on weak properties of tandem repeats that
can be screened effectively.
We are able to efficiently detect long repeats with high mutations rate both in DNA and protein sequences with, respectively, TRStalker and PTRStalker algorithms.
|
Microarray Gene Expression Data Analysis
|
Microarray technology for profiling gene
expression levels has became a popular tool in modern biological research, finding
applications, for instance, in tissue classification, detection of metabolic
networks, and drug discovery. However,
several obstacles still lay on the path of exploiting the full
potential of these technologies. One issue is, for instance, the
scalability of the data processing software for unsupervised
clustering of gene expression data into groups with homogeneous
expression profile.
We tackle this problem by developing
a class of clustering algorithms based on metric space clustering and
stability-based methods for determining the optimal number of
clusters, showed to be highly scalable without loosing efficiency.
|
Analysis of Haplotyping Data of Individuals
|
We did develop new heuristic algorithms for the problem of the
reconstruction of a Single Individual Haplotype from Shotgun
sequencing data that is fast, handles well gaps, and is able to
deal with high reading error rates and low fragment coverage.
Initial testing with real human data from the HapMap project
demonstrates the effectiveness of the approach.
|
Analysis of Metabolic Networks
|
Metabolic networks have recently been a focus of activity in
bioinformatics as a formalism that encodes concisely accumulated
knowledge related to basic biological functions. On the other hand,
metabolic networks represent a new searchable base of knowledge
for detecting long range interactions and emerging patterns.
We plan to adapt recent techniques of detecting
dense subgraphs of the web-graph to metabolic networks data.
|
Diseases and Gene Expression Profiling Classification
|
Classification is a supervised intelligent data analysis approach.
One of the goals of supervised expression
data analysis is to construct classifiers which assign predefined
classes to a given expression profile to be potentially used for
diagnostics.
We plan to further improve a new and promising classification technique,
which has the potential for high scalability and high accuracy.
|
|
Styled and mantained by:
M. Elena Renda
|