sta—Single Template Alignment
The Single Template Alignment (STA) program in PRIME is designed for protein sequences with medium to high sequence identity (>20%). Features of this program include the following:
-
STA differs from standard sequence alignment programs such as BLAST by taking into account secondary structure matching as well as profile-sequence matching. As a result, STA often generates better alignment than BLAST in the regions where sequence conservation is relatively weak.
-
STA uses a position-specific substitutional matrix (PSSM) for the query sequence, derived from PSI-BLAST, to match the template sequence.
-
In order to minimize the inaccuracy in a single secondary structure prediction, a novel and robust algorithm has been designed to derive a composite secondary structure for the query sequence from multiple predictions. The composite SSP is aligned to the SSA of the template.
-
STA can apply constraints, such as user-selected residue pairs to be aligned together or residues in either query or template that are not to be aligned (gaps).
-
STA can use templates directly from the PDB, and can also use templates from the Prime fold library.
Syntax
$SCHRODINGER/sta [options] jobname
The input is read from the file jobname.inp and the output is written to jobname.out.
Command Input File
The input file consists of lines containing keyword-value pairs, one per line. The keywords are described in Table 1.
|
Keyword syntax |
Description |
|
|
Name of the query sequence. |
|
|
File name of the query sequence in FASTA format. |
|
|
Alignment file format. Default is Maestro. Set to |
|
|
The first residue of the query that is used in alignment. Default: 1. |
|
|
The last residue of the query sequence that is used. Default: the last residue of the original query sequence. |
|
|
Secondary structure prediction file of the query sequence, in CASP format. One occurrence of this keyword is required for each file, and there must be at least one SSP file specified. If only one prediction is used, the keyword is |
|
|
Filename of the template structure, in PDB format. |
|
|
Optional. Needed in the input file only when there are user-specified constraints. In that case, the value is set to |
|
|
Optional. User-specified residue pairing, used only when MODE user PAIR 12 15 20 24 PAIR 60 65 |
|
|
Optional. User-specified gaps, used only when MODE user GAP P10 P11 T34 T35 |
Files
-
Input file—named jobname
.inp. Contains keywords for running the program. -
Query sequence file—File containing the complete sequence of the query, in FASTA format. Specified in the input file.
-
Secondary structure prediction files in CASP format for the query sequence.
-
Template PDB file—Template structure file, in PDB format. The sequence used for the template is taken from
SEQRESlines in the PDB file. IfSEQRESis not found in the PDB file, the template sequence is then taken fromATOMrecords. -
Output alignment file—File named jobname
.outcontaining pairwise alignment between the query and the template. By default, the alignment is in Maestro format. To generate plain text format, includeFORMAT devin the command input file. In plain text format, aligned residues from the query and the template are placed on top of each other—see below for an example. -
Log file—File named jobname
.logcontaining progress of thestaprogram, including warnings and error messages.Return Value and Errors
sta returns 0 for success and non-zero for failure. In case of failure, check the log file for details, and search for ERROR. To further analyze the failure, use the -DEBUG option when you run sta.
Examples
Below is an example input file, where query_seq.fasta stores the query sequence in FASTA format.
QUERY_NAME query QUERY_FILE query_seq.fasta PREDSS_0 query-ssp1.casp PREDSS_1 query-ssp2.casp PREDSS_3 query-ssp3.casp BGNRES 1 ENDRES 149 TEMPLATE_PDB pdb1vpe.ent
Below is an example of the plain text format for output alignment files. This format is generated when you include ALIFORMAT dev in the command input file.
probe length=70======tmplt_T0281_d1dq3a2 length=79 TotalScore= 42 SEQ= -4 SSTR= 102 SEQID= 0.061538 nali= 65 probe bgnRes=1 endRes=70 template bgnRes=3 endRes=75 ProbeAA: ..MWMPPRPEEVARKLRRLGFVERMAKGGHRLYTHPDGRIVVVPFH...SGELP....KG ProbeSS: LLLLLLHHHHHHHHHHLLLEEEEELLLEEEEELLLLLEEEeLLL LLLLL HH Fold AA: GNFGLPLNFNAFKEWASEYGVEFKTNGSQTIAIIND...ERISLGQWHTRNRVSKAVLVK Fold SS: LLLEELLLHHHHHHHHHLLLLEEEEELLEEEEEELL EEEELLLHHHHLLEEHHHHHH ProbeAA: TFKRILRDAGLTEEEFHNL.... ProbeSS: HHHHHHHHhLLLHHHHHLL Fold AA: MLRKLYEATK.DEEVKRMLHLIE Fold SS: HHHHHHHHHL LHHHHHHHHHHL
The file contains header information at the top, then in the alignment section, the query sequence (ProbeAA), query composite secondary structure (ProbeSS), template sequence (Fold AA) and template secondary structure (Fold SS) are given. Gaps are denoted by periods in the sequences and by spaces in the secondary structures.