Category Archives: GenOO

Filtering Unmapped .sam reads

GenOO’s power can be show with the ease that it can handle alignment files such as .sam files.

For example, say we need to filter all unaligned reads within a certain length range and output the sequences into a .fasta file format.

This simple script will filter and print unmapped reads, with options to use the –min and –max to control the length range of the .sam read. It can be easily changed to do other queries.

Opening a Transcript .gtf file with GenOO

Here I will show you how easy it is to parse a .gtf file containing transcript info, using the GenOO TranscriptCollection library.  After installing GenOO you can easily open a .gtf file in perl using the read_collection method:

 

If you are not sure how the input options work see here.

Our .gtf file (input as opt->gtf) is parsed into a transcript collection object using the TranscriptCollection library. A Factory is simply an object that returns another ‘new’ object. In this instance, we input the .gtf file and the ‘GTF’ option (so that the Factory knows what type of parser to use) and we get back the object $transcript_collection that is an object of the type TranscriptCollection.

We can use this TranscriptCollection to do various actions. The simplest one is looping through all members of the collection (i.e. all transcripts) and doing an operation on each of them. There are two ways to do this.

Simple loop

All the methods that can be used on a Transcript object (such as $transcript) can be found here.

foreach_record_do

The method “foreach_record_do” is a way to perform an action on every record of the TranscriptCollection without using a loop.

 

In this way we can reduce clutter in our code and perform easy operations on TranscriptCollections.