MS-GFDB

MS-GF+ Documentation home

MS-GFDB is an old application that is no longer under development. It was supserseded by MS-GF+.
MS-GF+ has all the functionalities provided by MS-GFDB, plus numerous improvements.

Differences between MS-GF+ and MS-GFDB

MS-GFDB

Usage: java -Xmx2000M -jar MSGFDB.jar
	-s SpectrumFile (*.mzXML, *.mzML, *.mgf, *.ms2, *.pkl or *_dta.txt)
	-d DatabaseFile (*.fasta or .fa)
	-t ParentMassTolerance (e.g. 2.5Da, 30ppm, or 0.5Da,2.5Da)
	   Use comma to set asymmetric values. E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the left (expMass<theoMass) and 2.5Da to the right (expMass>theoMass).
	[-o outputFileName] (Default: stdout)
	[-thread NumOfThreads] (Number of concurrent threads to be executed, Default: Number of available cores)
	[-tda 0/1] (0: don't search decoy database (default), 1: search decoy database to compute FDR)
	[-m FragmentationMethodID] (0: as written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: Merge spectra from the same precursor)
	[-inst InstrumentID] (0: Low-res LCQ/LTQ (Default for CID and ETD), 1: High-res LTQ (Default for HCD), 2: TOF)
	[-e EnzymeID] (0: No enzyme, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: Glu-C, 6: Arg-C, 7: Asp-N, 8: aLP, 9: Endogenous peptides)
	[-c13 0/1/2] (Number of allowed C13, Default: 1)
	[-nnet 0/1/2] (Number of allowed non-enzymatic termini, Default: 1)
	[-mod ModificationFileName] (Modification file, Default: standard amino acids with fixed C+57)
	[-minLength MinPepLength] (Minimum peptide length to consider, Default: 6)
	[-maxLength MaxPepLength] (Maximum peptide length to consider, Default: 40)
	[-minCharge MinPrecursorCharge] (Minimum precursor charge to consider if not specified in the spectrum file, Default: 2)
	[-maxCharge MaxPrecursorCharge] (Maximum precursor charge to consider if not specified in the spectrum file, Default: 3)
	[-n NumMatchesPerSpec] (Number of matches per spectrum to be reported, Default: 1)
	[-uniformAAProb 0/1] (0: use amino acid probabilities computed from the input database (default), 1: use probability 0.05 for all amino acids)

Parameters:

If multiple MS-GFDB processes access the same database file, it is strongly recommended to index the database prior to the database search by running BuildSA (see below).

If -tda 1 is specified, MS-GFDB automatically creates a combined target/reversed database file (DBFileName.revConcat.fasta). Thus, when specifying "-d" parameter, DatabaseFile must contain only target proteins.

MS-GFDB output

MS-GFDB outputs a tab-delimited file with the following columns: #SpecFile, Scan#, FragMethod, Precursor, PMError, Charge, Peptide, Protein, DeNovoScore, MSGFScore, SpecProb, P-value, EFDR.

MS-GFDB output example

#SpecFile SpecIndex Scan# FragMethod Precursor PMError(ppm) Charge Peptide Protein DeNovoScore MSGFScore SpecProb P-value FDR PepFDR
090121_NM_Trypsin_20.mzXML 2838 2838 CID 964.7707 1.5199227 3 K.TIQNSSVSPTSSSSSSSSTGETQTQSSSR.L IPI:IPI00002349.2|SWISS-PROT:Q7Z417|TREMBL:A1L3A7|ENSEMBL:ENSP00000225388|REFSEQ:NP_065823|H-INV:HIT000001036|VEGA:OTTHUMP00000181037 Tax_Id=9606 Gene_Symbol=NUFIP2 Nuclear fragile X mental retardation-interacting protein 2 190 181 9.380133E-30 2.9333857E-22 0.0 0.0
090121_NM_Trypsin_20.mzXML 3671 3671 ETD 1113.4758 0.6583758 2 R.VGPADDGPAPSGEEEGEGGGEAGGK.E IPI:IPI00016725.2|SWISS-PROT:Q9UKN8|TREMBL:B3KNH2;Q05CN7|ENSEMBL:ENSP00000361219|REFSEQ:NP_036336|H-INV:HIT000071196|VEGA:OTTHUMP00000022434 Tax_Id=9606 Gene_Symbol=GTF3C4 General transcription factor 3C polypeptide 4 162 158 1.9912463E-28 6.0892146E-21 0.0 0.0
090121_NM_Trypsin_20.mzXML 3031 3031 ETD 651.64874 1.7510794 3 K.GAAAAAAASGAAGGGGGGAGAGAPGGGR.L IPI:IPI00644073.1|VEGA:OTTHUMP00000038687 Tax_Id=9606 Gene_Symbol=INTS3 18 kDa protein 214 202 6.7318633E-28 2.093763E-20 0.0 0.0
090121_NM_Trypsin_20.mzXML 19088 19088 CID 1199.0916 10.392676 2 K.VNFSPPGDTNSLFPGTWYLER.V IPI:IPI00945760.1|TREMBL:B7Z784;B7Z7M8;B7Z8R3|REFSEQ:NP_001159579 Tax_Id=9606 Gene_Symbol=HMGCS2 hydroxymethylglutaryl-CoA synthase, mitochondrial isoform 2 precursor 243 243 2.9611275E-27 8.838129E-20 0.0 0.0
090121_NM_Trypsin_20.mzXML 3030 3030 CID/ETD 651.64874 1.7510794 3 K.GAAAAAAASGAAGGGGGGAGAGAPGGGR.L IPI:IPI00644073.1|VEGA:OTTHUMP00000038687 Tax_Id=9606 Gene_Symbol=INTS3 18 kDa protein 389 389 7.508096E-33 2.335189E-25 0.0 0.0

BuildSA

Index a protein database for fast searching.

Usage: java -cp MSGFDB.jar msdbsearch.BuildSA
	-d DatabaseFile (*.fasta or *.fa)
	[-tda 0/1/2] (0: target only, 1: target-decoy database only, 2: both)

Parameters:

BuildSA creates a suffix array of the protein database. For an input database file DBFileName.fasta, BuildSA will generate 4 auxiliary files (DbFileName.canno, DBFileName.cnlcp, DBFileName.csarr, DBFileName.cseq).It needs to be executed only once per each database file.