Tag Archives: perl

Filtering Unmapped .sam reads

GenOO’s power can be show with the ease that it can handle alignment files such as .sam files.

For example, say we need to filter all unaligned reads within a certain length range and output the sequences into a .fasta file format.

This simple script will filter and print unmapped reads, with options to use the –min and –max to control the length range of the .sam read. It can be easily changed to do other queries.

Input for perl scripts

Introduction

Using <>, shift or @ARGV to get input for perl scripts has extreme limitations. For example you may only be able to get one input stream, or need to have inputs in specific order, or need to have all inputs. Writing documentation and error messages becomes a burden.

A very useful perl library that I extensively use is Getopt::Long::Descriptive, a very simple yet powerful tool for perl input that provides error messages, validation of input,  help messages and other useful features. Their manual page (link) gives much information about how to use, so I will just give a couple easy examples to make some use cases clear.

Continue reading Input for perl scripts

CLIP Seq Tools

GitHub link
CPAN link

Summary

CLIPSeqTools is a collection of command line applications used for the analysis of CLIP-Seq (UV cross-linking and immunoprecipitation with high-throughput sequencing) data. It offers a wide range of analyses (eg. genome read coverage, motif enrichment, relative positioning of reads of two libraries, etc). The toolbox is primarily oriented for bioinformaticians but the commands are simple enough for non experts to use.

GenOO

GenOO: A Modern Perl Framework for High Throughput Sequencing analysis

GitHub link
Full Text Publication BioArxiv

Summary

GenOO [jee-noo] is an open-source; object-oriented Perl framework specifically developed for the design of High Throughput Sequencing (HTS) analysis tools. The primary aim of GenOO is to make simple HTS analyses easy and complicated analyses possible. GenOO models biological entities into Perl objects and provides relevant attributes and methods that allow for the manipulation of high throughput sequencing data. Using GenOO as a core development module reduces the overhead and complexity of managing the data and the biological entities at hand. GenOO has been designed to be flexible, easily extendable with modular structure and minimal requirements for external tools and libraries.

Focus

  • Organize biological entities as perl objects (genomic regions, genes, transcripts, introns/exons, etc)
  • Organize sequencing entities as perl objects/attributes (sequencing reads, alignments, etc)
  • Make I/O from widely used file formats easy (SAM, BED, FASTA, FASTQ)
  • Be consistent and easily extendable

We want to keep this framework focused on the real issues found in sequencing analyses and balance being easily extendable with being focused and efficient.