BuildSA
MS-GF+ Documentation home
BuildSA
Index a protein database (FASTA file) for fast searching.
Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA
-d DatabaseFile (*.fasta or *.fa or *.faa; if a directory path, index all FASTA files)
[-tda 0/1/2] (0: Target database only, 1: Concatenated target-decoy database only, 2: Both (Default))
[-o OutputDir] (Directory to save index files; default is the same as the input file)
[-decoy DecoyPrefix] (Prefix for decoy protein names; default is XXX)
Parameters:
-
-d DatabaseFilePath
- Name of a protein database file, or a directory containing one or more protein database files
- Database file names must end with ".fasta" or ".fa" or ".faa"
-
-tda 0/1/2
- If 0, only "DatabaseFile" will be indexed.
- If 1, a new database file (*.revConcat.fasta) will be generated by appending reversed proteins. This forward-reverse database will be indexed.
- If 2, both the original database and the forward-reverse database file will be indexed.
-
-o OutputDirectory
- Path to the output directory (use double quotes if the path contains spaces)
- By default, the index files are created in the same directory as the Database
-
-decoy DecoyPrefix
- Text to prepend to protein names when including decoy (reverse sequence) proteins in the .revCat.fasta file and related index files
- Defaults to XXX (though an underscore is also added, giving names like
XXX_Contaminant_TRYP_BOVIN
)
- Use
-decoy REV
to get names like REV_Contaminant_TRYP_BOVIN
- If -decoy is used with BuildSA it should also be used with MS-GF+
BuildSA creates a suffix array of the protein database. For an input database file DBFileName.fasta, BuildSA will generate 4 auxiliary files:
- DBFileName.canno
- DBFileName.cnlcp
- DBFileName.csarr
- DBFileName.cseq
BuildSA only needs to be executed once for each protein database file.