5. Accepted Formats

Below, the accepted file formats for proBAMconvert are discussed. While proBAMconvert is able to parse pepXML, mzIdentML and mzTab files as defined by the HUPO proteomics standards initiative, software packages used to generate these files may slightly vary in their encodings, implicating the parsing routines. Please note that proBAMconvert is still in a developmental stage, if you encounter trouble converting a specific file, do not hesitate to contact us.

file formats

The table below specifies the accepted file formats. The preferred format for proBAMconvert is mzTab, however proBAMconvert should work with mzIdentML as well as pepXML.



name              version     description
mzTab 1.0.0 mzTab files can contain protein, peptide, and small molecule identifications together with basic quantitative information. mzTab is not intended to store an experiment’s complete data / evidence but only its final reported results. This format is also intended to provide local LIMS systems as well as MS proteomics repositories a simple way to share and combine basic information. (source: http://www.psidev.info/mztab#mzTab_1_0)
mzIdentML 1.1.1 A large number of different proteomics search engines are available that produce output in a variety of different formats. It is intended that mzIdentML will provide a common format for the export of identification results from any search engine. The format was originally developed under the name AnalysisXML as a format for several types of computational analyses performed over mass spectra in the proteomics context. It has been decided to split development into two formats: mzIdentML for peptide and protein identification (described here) and mzQuantM. (source: http://www.psidev.info/mzidentml)
pepXML 1.18 pepXML Is an open data format developed at the SPC/Institute for Systems biology for the storage, exchange, and processing of peptide sequence assignments of MS/MS scans. pepXML is intended to provide a common data output format for many different MS/MS search engines and subsequent peptide-level analyses. Several search engines already have native support for outputting pepXML and converters are available to transform output files to pepXML. (source: http://tools.proteomecenter.org/wiki/index.php?title=Formats:pepXML)


protein identifiers

Depending on the software package used to generate the protein/peptide identification file, the protein identifier encodings could differ. The support encodings are specified below, the list of compatible encodings will be expanded in future releases.

compatible protein identifier formats

The general rule is that the protein ID should be separated with "|" or "_" from all other information. proBAMconvert will parse these tags searching for the protein ID. Furthermore, decoys should be concatenated directly to the protein ID seperated with an underscore. The different protein identifier formats recognizable by proBAMconvert are given below.

Legend:
[protein_ID]: the protein ID, a list of compatible protein IDs is provided below
[DECOY]: Decoy annotation (see below)

Supported formats:


  ..|..|...|[protein_ID]_[DECOY]|...|..|..

  .._.._..._[protein_ID]_[DECOY]_..._.._..



decoy protein ids

proBAMconvert automatically identifies decoy proteins by tags appended to the protein ID. Compatible decoy annotations are provided below, future releases will enable the specification of costum decoy annotations.

Legend:
[protein_ID]: the protein ID, a list of compatible protein IDs is provided below

Supported decoy annotations:


  REV_[protein_ID]

  DECOY_[protein_ID]

  [protein_ID]_REVERSED


Supported protein annotations

proBAMconvert automatically detects the used protein annotation.




Continue to the next chapter "tutorial" where proBAMconvert is demonstrated with a real use-case


Next Chapter: tutorial