:: Research Activities

BioAlgo group's objectives are:
  • the development of efficient and scalable algorithms for high throughput analysis of biological and genomic data, in order to help getting useful indication in a -highly competitive- timely manner, in the field of:
    • simple and structured motif identification and extraction
    • tandem repeats identification and extraction
    • microarray gene expression data analysis
    • SNP haplotyping analysis
    • metabolic networks analysis
    • diseases and gene expression profiling classification
       
  • the development of tools that allow the visualization and the analysis of raw and processed biological data.
     
    Tools available:
    • AMIC@ - All MIcroarray Clusering @ once
      an on-line tool for clustering and visualizing microarray gene expression data, with a wide range of algorithms
       
    • PTRStalkerDB, a database containing all the fuzzy tandem repeats found in UniProtKB/Swiss-Prot using PTRStalker, a tool developed by our group.
       
    • ReHap - Reconstruct Haplotypes
      a web application aiming to provide users with a common interface to five algorithms for the parental haplotype reconstruction problem (also known as Haplotyping Assembly Problem).
       
    • TReaDS - Tandem Repeats Discovery Service
      a tandem repeat meta search engine that simultaneously queries multiple on line publicly available tools for finding exact, approximate, short and long tandem repeats, merging the results obtained and giving back a report, also including a global view of the results.

More details...


Motif Identification and Extraction
There is growing evidence that some diseases are related to malfunctioning of the transcription factors (TF) of a gene rather than to the gene itself. Even if many methods for motif finding are available, most of them are able to cope with a few hundred thousands bases and/or to find "simple" patterns. The quest for methods capable of finding more complex patterns on large data sets is still open.
 
We plan to develop new scalable algorithms capable of detecting complex patterns, and faint signals in the biological sequences.

Tandem Repeats Identification and Extraction
Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously (or at a short distance) and may involve mutations, indels and transpositions in the repeated patterns. It is known that several diseases are linked to an abnormal proliferation of tandem repeats. Thus the analysis of repeats in human genome is an important genetic profiling technique.
 
We developed innovative and robust (also against noise) methodologies by using filtering techniques based on weak properties of tandem repeats that can be screened effectively. We are able to efficiently detect long repeats with high mutations rate both in DNA and protein sequences with, respectively, TRStalker and PTRStalker algorithms.

Microarray Gene Expression Data Analysis
Microarray technology for profiling gene expression levels has became a popular tool in modern biological research, finding applications, for instance, in tissue classification, detection of metabolic networks, and drug discovery. However, several obstacles still lay on the path of exploiting the full potential of these technologies. One issue is, for instance, the scalability of the data processing software for unsupervised clustering of gene expression data into groups with homogeneous expression profile.
 
We tackle this problem by developing a class of clustering algorithms based on metric space clustering and stability-based methods for determining the optimal number of clusters, showed to be highly scalable without loosing efficiency.

Analysis of Haplotyping Data of Individuals
We did develop new heuristic algorithms for the problem of the reconstruction of a Single Individual Haplotype from Shotgun sequencing data that is fast, handles well gaps, and is able to deal with high reading error rates and low fragment coverage. Initial testing with real human data from the HapMap project demonstrates the effectiveness of the approach.

Analysis of Metabolic Networks
Metabolic networks have recently been a focus of activity in bioinformatics as a formalism that encodes concisely accumulated knowledge related to basic biological functions. On the other hand, metabolic networks represent a new searchable base of knowledge for detecting long range interactions and emerging patterns.
 
We plan to adapt recent techniques of detecting dense subgraphs of the web-graph to metabolic networks data.

Diseases and Gene Expression Profiling Classification
Classification is a supervised intelligent data analysis approach. One of the goals of supervised expression data analysis is to construct classifiers which assign predefined classes to a given expression profile to be potentially used for diagnostics.
 
We plan to further improve a new and promising classification technique, which has the potential for high scalability and high accuracy.
 
Styled and mantained by:
M. Elena Renda