4. Options

In this section the options and limitations are discussed, concerning both the GUI as well as the command line proBAMconvert version. the proBAMconvert github page.

GUI options

The table below describes the proBAMconvert GUI options in a top-down fashion as represented in the GUI (picture below).

Figure 1: proBAMconvert graphical user interface


option description
choose file "clicking" on choose file will open a file browser where the file to be converted can be selected (mzid, pepxml or mztab file)
working directory "clicking" on working directory open a folder browser where you can specify the working directory. Generated files will be stored here (default = GUI directory)
project name specify the project name to the right, generated files will have this name
select species A dropdown button is provided to the right where you can select the species to which the mapping should be performed
select database To the right a dropdown button provides the opportunity to select the database from which the genome annotations will be used, for now only Ensembl is available
select database version The dropdown box to the right allows to choose between different versions of the database
remove duplicate psm mappings When protein isoforms are included in the search space, a PSM could map to multiple isoforms. This can lead to confusion when visualising/analysing the data. Turning this option on (Y(yes)), will refrain from mapping to the same location, when a PSM has multiple protein hits
conversion mode The conversion mode specifies which output has to be generated. The different options are proBAM_psm (default, proBAM format) proBAM_peptide (peptide-based proBAM), proBAM_peptide_mod (peptide-based proBAM version where peptides with different modification are considered different modification) and proBED. For more information please consult the proBAM/proBED specification documents
Advanced Settings opens a new window with advanced settings
Manual Opens the manual page in the default browser
allow mismatches While mapping peptides to the genomes mismatches can be allowed, the dropdown button to the right enables specification of the amount of mismathes allowed
convert Clicking this button will commence the conversion to proBAM. Make sure all options are correctly slected before running proBAMconvert

Advanced options:
Figure 2: proBAMconvert advanced options

option         description
sorting order the BAM sorting order (how the rows should be sorted), Options: unknown, unsorted, query_name, coordinate
decoy annotation(s) Here you can specify how decoy annotation should be recognized. The accepted different decoy annotation should be separated be a ',', the default value is: "REV_,DECOY_,_REVERSED,REVERSED_,_DECOY" and covers the deocoy annotation of most software.
3-frame translation When 3-frame translation='N', the coding sequence of transcript is translated and used to map the corresponding peptide. When 3-frame translation='Y' the transcript is translated in all 3-frames and used to map the corresponding peptide, this may be useful to find.
annotation identifiers Common peptide identification files specify the identifier of the protein where the corresponding peptide has been identified. proBAMconvert is able to identify and use different protein identifiers, this attribute represents how proBAMconvert processes these identifiers. A full summary of the different option can be found below:
 First: The first identifier recognized in the psm-file will be used throughout the whole conversion (DEFAULT)
 Ensembl_tr: Ensembl transcript ID's will be used as identifiers
 Ensembl_pr: Ensembl protein ID's will be used as identifiers
 UniProt_ACC: UniProt accession will be used identifiers
 UniProt_Entry: UniProt Entries will be used as identifiers
 RefSeq: RefSeq ID's will be used as identifiers
 all: All the above identifiers will be used as identifiers
include_unmapped Described whether unmapped PSM's should be included in the final output (only for proBAM)
add comment(s) you can add additional comments to the proBAM/proBED file. A newline will start a new comment line

Command line options

The table below described the proBAMconvert command line option. Commandline options can be included by either their full name or abbreviation as described in the table below.



full name           abbreviation      required     description
--help -h no show help message (CLI usage and option) and exits
--name -n yes specify the project name to the right, generated files will have this name
--database -d no Specify the annotation dabase, currently only ENSEMBL is supported. DEFAULT:Ensembl
--version -v no Specify the annotation database version (e.g. 75; for ensembl database version 75) DEFAULT:87
--species -s yes Specify the species name to which the mapping should be performed. Full scientific species names should be provided with underscores between words. Currently supported values: homo_sapiens, mus_musculus, drosophilla_melanogaster, danio_rerio.
--file -f yes Specify the location of the filename to be converted.
--mismatches -m no While mapping peptides to the genomes mismatches can be allowed, specify the allowed number of mismatches.DEFAULT:0
--map_decoy -a no Map decoy PSMs to the genome (can be Y(yes) or N(no)). This can be necessary for downstream analysis. DEFAULT:N
--rm_duplicates -r no When protein isoforms are included in the search space, a PSM could map to multiple isoforms. This can lead to confusion when visualising/analysing the data. Turning this option on (Y(yes)), will refrain from mapping to the same location, when a PSM has multiple protein hits .DEFAULT:N
--directory -d no Specify the location of the working directory. DEFAULT=[current directory]
--conversion_mode -X no The conversion mode specifies which output has to be generated. The different options are proBAM_psm (default, proBAM format) proBAM_peptide (peptide-based proBAM), proBAM_peptide_mod (peptide-based proBAM version where peptides with different modification are considered different modification) and proBED. For more information please consult the proBAM/proBED specification documents
--include_unmapped -U no whether to include unmapped psm's in the ouput files. DEFAULT:N
--pre_picked_annotation -P no Common peptide identification files specify the identifier of the protein where the corresponding peptide has been identified. proBAMconvert is able to identify and use different protein identifiers, this attribute represents how proBAMconvert processes these identifiers. A full summary of the different option can be found below:
 First: The first identifier recognized in the psm-file will be used throughout the whole conversion (DEFAULT)
 Ensembl_tr: Ensembl transcript ID's will be used as identifiers
 Ensembl_pr: Ensembl protein ID's will be used as identifiers
 UniProt_ACC: UniProt accession will be used identifiers
 UniProt_Entry: UniProt Entries will be used as identifiers
 RefSeq: RefSeq ID's will be used as identifiers
 all: All the above identifiers will be used as identifiers
DEFAULT:First
--sorting_order -O no the BAM sorting order (how the rows should be sorted), Options: unknown, unsorted, query_name, coordinate. DEFAULT:unknown
--decoy_annotationr -E no Here you can specify how decoy annotation should be recognized. The accepted different decoy annotation should be separated be a ',', the default value is: "REV_,DECOY_,_REVERSED,REVERSED_,_DECOY" and covers the deocoy annotation of most software.
--three_frame_translation -T no When 3-frame translation='N', the coding sequence of transcript is translated and used to map the corresponding peptide. When 3-frame translation='Y' the transcript is translated in all 3-frames and used to map the corresponding peptide, this may be useful to find. DEFAULT:N


Continue to the the next chapter "accepted formats" where the accepted file formats are discussed.


Next Chapter: accepted formats