MS-GF+

MS-GF+ Documentation home

MS-GF+

ChangeLog

Usage: java -Xmx3500M -jar MSGFPlus.jar

[-conf ConfigurationFile] (Configuration file path; options specified at the command line will override settings in the config file)
   An example parameter file is at https://github.com/MSGFPlus/msgfplus/blob/master/docs/examples/MSGFPlus_Params.txt
   Additional parameter files are at https://github.com/MSGFPlus/msgfplus/tree/master/docs/ParameterFiles

[-s SpectrumFile] (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
   Spectra should be centroided (see below for MSConvert example). Profile spectra will be ignored.

[-d DatabaseFile] (*.fasta or *.fa or *.faa)

[-decoy DecoyPrefix] (Prefix for decoy protein names; Default: XXX)

[-o OutputFile (*.mzid)] (Default: [SpectrumFileName].mzid)

[-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da; Default: 20ppm)
   Use a comma to define asymmetric values. 
   E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the left (ObservedPepMass < TheoreticalPepMass) 
                              and 2.5Da to the right (ObservedPepMass > TheoreticalPepMass)

[-ti IsotopeErrorRange] (Range of allowed isotope peak errors; Default: 0,1)
   Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation.
   The combination of -t and -ti determines the precursor mass tolerance.
   E.g. "-t 20ppm -ti -1,2" tests abs(ObservedPepMass - TheoreticalPepMass - n * 1.00335Da) < 20ppm for n = -1, 0, 1, 2.

[-thread NumThreads] (Number of concurrent threads to be executed; Default: Number of available cores)
   This is best set to the number of physical cores in a single NUMA node.
   Generally a single NUMA node is 1 physical processor.
   The default will try to use hyperthreading cores, which can increase the amount of time this process will take.
   This is because the part of Scoring param generation that is multithreaded is also I/O intensive.

[-tasks NumTasks] (Override the number of tasks to use on the threads; Default: internally calculated based on inputs)
   More tasks than threads will reduce the memory requirements of the search, but will be slower (how much depends on the inputs).
   1 <= tasks <= numThreads: will create one task per thread, which is the original behavior.
   tasks = 0: use default calculation - minimum of: (threads*3) and (numSpectra/250).
   tasks < 0: multiply number of threads by abs(tasks) to determine number of tasks (i.e., -2 means "2 * numThreads" tasks).
   One task per thread will use the most memory, but will usually finish the fastest.
   2-3 tasks per thread will use comparably less memory, but may cause the search to take 1.5 to 2 times as long.

[-verbose 0/1] (Console output message verbosity; Default: 0)
   0: Report total progress only
   1: Report total and per-thread progress/status

[-tda 0/1] (Target decoy strategy; Default: 0)
   0: Don't use a decoy database
   1: Search with a decoy database (forward + reverse proteins)

[-m FragmentationMethodID] (Fragmentation Method; Default: 0)
   0: As written in the spectrum or CID if no info
   1: CID
   2: ETD
   3: HCD
   4: UVPD

[-inst InstrumentID] (Instrument ID; Default: 0)
   0: Low-res LCQ/LTQ
   1: Orbitrap/FTICR/Lumos
   2: TOF
   3: Q-Exactive

[-e EnzymeID] (Enzyme ID; Default: 1)
   0: Unspecific cleavage
   1: Trypsin
   2: Chymotrypsin
   3: Lys-C
   4: Lys-N
   5: glutamyl endopeptidase
   6: Arg-C
   7: Asp-N
   8: alphaLP
   9: no cleavage

[-protocol ProtocolID] (Protocol ID; Default: 0)
   0: Automatic
   1: Phosphorylation
   2: iTRAQ
   3: iTRAQPhospho
   4: TMT
   5: Standard

[-ntt 0/1/2] (Number of Tolerable Termini; Default: 2)
   When EnzymeID is 1 (trypsin),
     2: Only search for fully-tryptic peptides
     1: Search for semi-tryptic and fully-tryptic peptides
     0: Non-tryptic search

[-mod ModificationFileName] (Modification file; Default: standard amino acids with fixed C+57; only if -mod is not specified)

[-minLength MinPepLength] (Minimum peptide length to consider; Default: 6)

[-maxLength MaxPepLength] (Maximum peptide length to consider; Default: 40)

[-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file; Default: 2)

[-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file; Default: 3)

[-n NumMatchesPerSpec] (Number of matches per spectrum to be reported; Default: 1)

[-addFeatures 0/1] (Include additional features in the output; enable this to post-process results with Percolator; Default: 0)
   0: Output basic scores only
   1: Output additional features

[-ccm ChargeCarrierMass] (Mass of charge carrier; Default: mass of a proton, 1.00727649)

[-ignoreMetCleavage 0/1] (N-terminal methionine cleavage behavior; Default: 0)

[-maxMissedCleavages Count] (Exclude peptides with more than this number of missed cleavages from the search; Default: -1, meaning no limit)

[-minNumPeaks Count] (Minimum number of ions a spectrum must have to be examined; Default: 10)

[-iso NumIsoforms] (Number of isoforms to consider per peptide; Default: 128)

[-numMods Count] (Maximum number of dynamic (variable) modifications per peptide; Default: 3)

[-allowDenseCentroidedPeaks 0/1] (Default: 0 (disabled); 1: (for mzML/mzXML input only) allows inclusion of spectra with high-density centroid data in the search)
   MS-GF+ checks the distance between consecutive peaks in the spectrum, and if the median distance is less than 50 ppm, they are considered profile spectra regardless of the value provided in mzML and mzXML files.
   This parameter allows overriding this check when the mzML/mzXML file says the spectrum is centroided.
      

Examples:

Example command (using a parameter file):

java -Xmx3500M -jar MSGFPlus.jar -s Dataset.mzML -d ProteinList.fasta -conf MSGFPlus_PartTryp_MetOx_20ppmParTol.txt

Example command (high-precision spectra, using arguments):

java -Xmx3500M -jar MSGFPlus.jar -s Dataset.mzML -d IPI_human_3.79.fasta -inst 1 -t 20ppm -ti -1,2 -ntt 2 -tda 1 -o PSMs.mzid

Example command (low-precision spectra):

java -Xmx3500M -jar MSGFPlus.jar -s Dataset.mzML -d IPI_human_3.79.fasta -inst 0 -t 0.5Da,2.5Da -ntt 2 -tda 1 -o PSMs.mzid

Parameters:

MS-GF+ output

MS-GF+ outputs results as an mzIdentML (version 1.1) file. See http://www.psidev.info/mzidentml/ for details on the mzIdentML format. For every PSM, MS-GF+ reports the following scores:

MS-GF+ output example

Shown below is a sample of the MS-GF+ output in table form, as extracted from a simple MzIdentML file: test.mzid

There are two options for converting an MS-GF+ output file (.mzid) into a tab-separated file (.tsv).

  1. The MzIDToTsv utility built into MSGFPlus.jar (see the MzIDToTsv page)
  2. The Mzid-To-Tsv-Converter standalone application, available on GitHub
#SpecFile SpecID ScanNum FragMethod Precursor IsotopeError PrecursorError(ppm) Charge Peptide Protein DeNovoScore MSGFScore SpecEValue EValue QValue PepQValue
test.mgf index=0 26559 CID 1285.3457 1 -5.049801 3 K.IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK.T test 299 244 1.4807088E-31 3.2871733E-29 0.0 0.0
test.mgf index=0 26559 CID 1285.3457 1 -5.049801 3 K.IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK.T test_isoform 299 244 1.4807088E-31 3.2871733E-29 0.0 0.0
test.mgf index=1 -1 CID 870.11743 0 0.14029178 3 K.NLANPTSVILASIQM+15.995LEYLGMADK.A test2 156 136 2.2559852E-22 4.4217308E-20 0.0 0.0
(Text file of this table: test_Unrolled.tsv)