Sequence analysis
Over the past decade we have gained considerable experience with the analysis of nucleotide sequences. Currently, most of our effort is devoted to the analysis of next generation sequencing data that is produced in the AMC sequencing facility (Roche 454 and ABI Solid). We are involved in a large range of projects that often require the development of dedicated methods for the analysis of the data. Examples of such projects include the determination of T-cell variation, virus discovery (metagenomics), identification of alternative splicing and identification of HIV microRNAs.

We have developed several methods and processing pipelines for the (pre-)processing of the data:
  • Re-grouping of sequences based on one or more MIDs/barcodes, primers or other sequence tags
  • Generating basic summaries about the experiments and feature counting such as exons or genes per sample
  • Identification of sequences by mapping them to reference sequences with BLAT or BLAST (e.g. against the human genome, multiple species or a custom made reference database). This generally provides the basis for downstream analysis such as gene structure determination, virus discovery, and identifying alternative splicing products.
  • Identification of patterns in the sequences (e.g. MIDs/barcodes, restriction sites, GC-content)
  • Sequence assembly of shotgun and/or paired-end reads with the Roche (Newbler) or Celera (Cabog) assembler

For specific projects that require analysis with significant computer time we make use of Grid computing.

