BuildSA

Index a protein database (FASTA file) for fast searching.

Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA
	    -d DatabaseFile (*.fasta or *.fa or *.faa; if a directory path, index all FASTA files)
	    [-tda 0/1/2] (0: Target database only, 1: Concatenated target-decoy database only, 2: Both (Default))
	    [-o OutputDir] (Directory to save index files; default is the same as the input file)
	    [-decoy DecoyPrefix] (Prefix for decoy protein names; default is XXX)

Parameters:

-d DatabaseFilePath
- Name of a protein database file, or a directory containing one or more protein database files
- Database file names must end with ".fasta" or ".fa" or ".faa"
-tda 0/1/2
- If 0, only "DatabaseFile" will be indexed.
- If 1, a new database file (*.revConcat.fasta) will be generated by appending reversed proteins. This forward-reverse database will be indexed.
- If 2, both the original database and the forward-reverse database file will be indexed.
-o OutputDirectory
- Path to the output directory (use double quotes if the path contains spaces)
- By default, the index files are created in the same directory as the Database
-decoy DecoyPrefix
- Text to prepend to protein names when including decoy (reverse sequence) proteins in the .revCat.fasta file and related index files
- Defaults to XXX (though an underscore is also added, giving names like XXX_Contaminant_TRYP_BOVIN)
- Use -decoy REV to get names like REV_Contaminant_TRYP_BOVIN
- If -decoy is used with BuildSA it should also be used with MS-GF+

BuildSA creates a suffix array of the protein database. For an input database file DBFileName.fasta, BuildSA will generate 4 auxiliary files:

DBFileName.canno
DBFileName.cnlcp
DBFileName.csarr
DBFileName.cseq

BuildSA only needs to be executed once for each protein database file.