1. Introduction

The proBAM format, defined by The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI), was designed to facilitate data comparison, exchange and verification. In particular, the proBAM format is developed to allow mapping of identified peptides/PSMs to a genome. The format is based on the widely used SAM (Sequence Alignment Map) and BAM (binary SAM) formats, designed for encoding alignment information of sequencing reads to a genome. The proBAM format preserves the BAM file architecture, ensuring that most BAM-compatible software should also work with proBAM files. The proBAM format allows to map peptide identifications on the genome. Additionally, proBAMconvert is able to generate the similar, more compact proBED format (click here for more information about proBED) and peptide-based proBAM formats ( click here for more information about proBAM)

proBAMconvert in a nutshell

proBAMconvert reads common peptide identification files and attempt to extract all relevant attributes from the files, including comments and enzyme information, and converts these attributes to comply with proBAM/proBED. Next, protein identifiers are extracted from the peptide identification files for every PSM. Different software tend to have their own rules to encode protein identifiers, proBAMconvert is designed to be compliant with a wide range of encodings. A crucial prerequisite of proBAMconvert is that the protein identifiers are among the identifiers that proBAM can recognize (see chapter 4). Once the protein identifiers have been retrieved, genomic information is extracted from Ensembl (transcript sequence, exon information, genomic coordinates,...). Next, using this genomic information the peptides are mapped onto the corresponding sequence reconstructed from Ensembl. Combining the genomic information from Ensembl and PSM information from the peptide identification file, the proBAM/proBED file is generated. proBAMconvert has various options, allowing to adopt proBAM/proBED output for a specific research question.

