Tutorial

Introduction

In order to familiarize users with the proBAM format as well as the proBAMconvert tool, a real-life use case is provided to demonstrate proBAM's functionality. The example includes multi-omics data from a mouse embryonic stem cells sample: mRNA-seq data provides an overview of the transcriptome, RIBO-seq data provides an overview of what is being translated (translatome). Specific drug treatment allows to pinpoint translation initiation sites by stalling ribosomes at the start sites. Shotgun MS provides us with the proteome and N-Terminal COFRADIC allows the identification of the N-terminal peptidome.

Data

RNA-seq and RIBO-seq data is acquired from a publicly available dataset GSE30839, see Ingolia et al. for a detailed description. Both RNA-seq and RIBO-seq data were processed using PROTEOFORMER as described by Crappé et al. PROTEOFORMER generates a costom protein sequence search space based on ribosome profiling data, allowing the detection of protein isoforms (in this example Ensembl version 74 was used as underlying annotation). This costom protein sequence search space was used for subsequent protein/peptide identification. The shotgun and N-Terminal COFRADIC mass-spectrometry data of the mouse embryonic stem cells is publicly available through PRIDE (PRIDE accession: PXD000124), see Menschaert et al. for a detailed description. The proteomics data was processed using SearchGUI (an interface for configuring and running proteomics identification search engines) and PeptideShaker (a platform for interpretation of proteomic results) and , using parameters as described by Menschaert et al., with the sole difference that the MS-GF+ search engine was used in both cases. Results obtained from the RNA-seq and RIBO-seq data were exported as bedgraph files, where only chromosome 10 was processed (size limits). Results from the proteomics experiments were exported as MzIdentML by PeptideShaker and afterwards converted to pepXML.

Requirement

Download and install proBAM convert as described in section 3, Then download and unzip the zipped files containing the bedgraph and pepxml files.

download result.zip

The table below provides an overview of the files in result.zip.

filename                                       file type     description
mESC_mRNA_sense bedgraph Bedgraph file containing RNA-seq results (sense strand)
mESC_mRNA_antisense bedgraph Bedgraph file containing RNA-seq results (antisense strand)
mESC_RIBOseq_CHX_sense bedgraph Bedgraph file containing RIBO-seq results of translating ribosomes (treated with cycloheximide, sense strand)
mESC_RIBOseq_CHX_antisense bedgraph Bedgraph file containing RIBO-seq results of translating ribosomes (treated with cycloheximide, antisense strand)
mESC_RIBOseq_HARR_sense bedgraph Begraph file containing RIBO-seq results of initiating ribosomes (treated with harringtonine, sense strand)
mESC_RIBOseq_HARR_antisense bedgraph Begraph file containing RIBO-seq results of initiating ribosomes (treated with harringtonine, antisense strand)
NTermCofr pepXML pepXML file containing peptide-to-spectrum matches identified by the MS-GF+ search engine from the N-terminal COFRADIC MS experiment
Shotgun pepXML pepXML file containing peptide-to-spectrum matches identified by the MS-GF+ search engine form the shotgun MS experiment

proBAMconvert

After unzipping result.zip, two pepXML files are found (NTermCofr.pepXML and Shotgun.pepXML), which have to be converted to proBAM files using proBAMconvert. Below, you can find a step by step guide to convert the two files.

convert with proBAMconvert GUI

Open the proBAMconvertGUI either by using the standalone application or by executing the proBAMconvert_GUI.py script with python (see section 3 for more information). Adjust the options as stated in the table below:




option value explanation
choose file location of pepXML file Specify the pepXML files (NTermCofr.pepXML or Shotgun.pepXML) to be converted. clicking the "Choose file" button will open a directory browser where you can select the pepXML file
choose working directory location where proBAM files should be saved Specify the directory where the generated proBAM files should be saved. clicking the "working directory" button will open a directory browser where you can select the desired directory/td>
project name NTermCofr or Shotgun(depening on which pepXML file was selected) Here you specify how the generated files should be named, (tip: use logical names summarizing the content)
select species mus musculus The experiments were performed on mouse embryonic stem cells (mus musculus)
select database ENSEMBL We would like to use the Ensembl database to map our peptide to spectrum matches to the genome.
select database version 74 The RNA-seq and RIBO-seq reads were mapped to the genome using the Ensembl database version 74, in order to remain consistent in our mapping, Ensembl version 74 should be used.
map decoys N Decoys should not be mapped, this option is mainly for downstream analysis.
remove duplicate psm mappings Y The search space provided by PROTEOFORMER contains many protein isoforms, meaning that many PSMs will map to multiple protein isoforms within the same gene. This can be quite annoying for visualization purposes as the same PSM will be mapped multiple times on the same location. This options refrain from mapping a PSM to multiple proteins when another inferred protein already mapped to this position.
allowed mismatches 0 Only exact matches are allowed in this example (lower computation time)

Leave other options by their default value. Make sure the options are selected as in the picture below, then press convert.



The console window will indicate the progress. Once the job finished, you can open the working directory that will contain several new files. Here a SAM file, a BAM file, a sorted BAM file and a sorted BAM file index file should be found. The SAM format is the human readable version of the BAM file, have a look at the file to grasp the underlying proBAM architecture. BAM files are the binary alternative to SAM files and sorting/indexing of BAM files allows quick lookup of the data by software applications.

proBAMcovert command line

Below you can find the code to perform the same as explained in the proBAMconvert GUI but from command line:



  [python proBAM.py]or[executable]--file "path to file" --directory "path to working directory" --species mus_musculus  --rm_duplicates Y --database ENSEMBL --version 74 --name "Project name"


After conversion of both files, proceed to the final step

Visualization

The proBAM inherits al the characteristics of a BAM file. This implies that software compatible with BAM files, should be able to process and load proBAM. One of the advantages of proBAM is, that it can be visualized by standard genome browsers, such as the Integrative Genomics Viewer (IGV). Using IGV it is demonstrated how the proBAM format allows integration of proteomics with genomics and translatomics. First, download or open IGV web application which can be found here. Once the IGV browser opened, select the correct genome to be displayed as displayed in the figure below (Mouse (mm10)).



Afterwards, go to file (upper right corner) and select "load from file". Navigate to the directory where result.zip was unzipped and open the bedgraph files (TIP: you can select multiple bedgraph files by holding the "ctrl" button while selecting the begraph files). You will see the different tracks (bedgraphs) displayed on the left side (see picture below). IGV does not scale your data automatically, to fix this select the tracks on the left, right click on them and make sure autoscale is turned on.



Next, go to file, then click on "select from file" and go to the directory where proBAMconvert generated the output. Then select the "project name".sorted.bam files, where "project name" represents the selected project name when converting NTermCofr.pepXML and Shotgun.pepXML. This should load the proBAM files into the IGV browser. (Optional: You can load Ensembl gene annotations by going to file, then "load from server" and select Ensembl genes) Finally, go to the navigation bar next to "go" and enter "cactin", then press go . This will provide you with an view similar as in the picture below, where evidence from the different experiment is combined, providing visual insights on the data. Hovering over the proBAM alignments will show a window summarizing the different attributes embedded in the BAM file.



This ends this tutorial, feel free to further explore the data (other genes of interest: fbxo30, cct2, bclaf1).