Quick FASTQ File Parsing Via Memory Mapping In C/C++

Written by

I recently had a need to speedily parse through 8GiB+ .fastq text files to calculate a simple statistic of genomic data. My initial “pfastqcount” implementation in Ruby worked fine, but with many files to process took longer than I had hoped in addition to consuming an alarming amount of CPU. I ended up reimplementing the pfastqcount command-line program in C, which takes one or more .fastq files, memory maps them, and creates the statistic. Simply dropping my algorithm down to raw C significantly sped up the process and reduced CPU usage, especially coming from an interpreted language. If any of you bioinformaticians find the need to implement a FASTQ data processing algorithm in C, I encourage you to fork the project and use it as a template. The project is Apache 2.0 licensed for your convenience and publicly available on GitHub.

bioinformatics c development fasta fastq git parser programming project ruby

Comments

2 responses to “Quick FASTQ File Parsing Via Memory Mapping In C/C++”

2011.11.21

Allen Watts

Hi Preston- I was a student of yours in CST200 fall ’10; was wondering if (this has nothing to do with your post here) you might have any references to OMR Java libraries? I’ve since graduated and work for a SW dev co.- I’m researching a potential project that will involve the reading of a “play slip” and I’d like to assemble the job in Java. So far I’ve found zilch where OMR libraries are concerned. any ideas? (It’s looking like C# will be the viable alternative here)
2011.11.21

preston.lee

Hi Allen,

Good to hear you’re doing the development thing! If you’re looking for object-relational mapping (ORM) software, check out this list:

http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software

Preston

Quick FASTQ File Parsing Via Memory Mapping In C/C++

Comments

2 responses to “Quick FASTQ File Parsing Via Memory Mapping In C/C++”

Leave a Reply

More posts

Announcing CQL Studio 2.0

CQL Studio v1 Download Now Available

Stakeout v4: Service Monitoring and Screenshot Service Now Open Source

CQL Studio: Filling the Gap in CQL Development Tools