sta—Single Template Alignment

The Single Template Alignment (STA) program in PRIME is designed for protein sequences with medium to high sequence identity (>20%). Features of this program include the following:

STA differs from standard sequence alignment programs such as BLAST by taking into account secondary structure matching as well as profile-sequence matching. As a result, STA often generates better alignment than BLAST in the regions where sequence conservation is relatively weak.
STA uses a position-specific substitutional matrix (PSSM) for the query sequence, derived from PSI-BLAST, to match the template sequence.
In order to minimize the inaccuracy in a single secondary structure prediction, a novel and robust algorithm has been designed to derive a composite secondary structure for the query sequence from multiple predictions. The composite SSP is aligned to the SSA of the template.
STA can apply constraints, such as user-selected residue pairs to be aligned together or residues in either query or template that are not to be aligned (gaps).
STA can use templates directly from the PDB, and can also use templates from the Prime fold library.

Syntax

$SCHRODINGER/sta [options] jobname

The input is read from the file jobname.inp and the output is written to jobname.out.

Command Input File

The input file consists of lines containing keyword-value pairs, one per line. The keywords are described in Table 1.

Table 1. Keywords for the sta program
Keyword syntax	Description
`QUERY_NAME` name	Name of the query sequence.
`QUERY_FILE` filename	File name of the query sequence in FASTA format.
`FORMAT` string	Alignment file format. Default is Maestro. Set to `dev` for plain text format—see Below is an example of the plain text format for output alignment files. This format is generated when you include ALIFORMAT dev in the command input file. for an example.
`BGNRES` n	The first residue of the query that is used in alignment. Default: 1.
`ENDRES` n	The last residue of the query sequence that is used. Default: the last residue of the original query sequence.
`PREDSS`[`_`i]filename	Secondary structure prediction file of the query sequence, in CASP format. One occurrence of this keyword is required for each file, and there must be at least one SSP file specified. If only one prediction is used, the keyword is `PREDSS`; if more predictions are used, the keyword is suffixed with an index i, starting from zero: `PREDSS_0,` `PREDSS_1`, and so on.
`TEMPLATE_PDB` filename	Filename of the template structure, in PDB format.
`MODE` string	Optional. Needed in the input file only when there are user-specified constraints. In that case, the value is set to `user`.
`PAIR` pair-list	Optional. User-specified residue pairing, used only when `MODE` is set to `user`. All the residue pairs can be added after the keyword in one or multiple lines. For example, the following lines pair residue 12, 20, 60 from the query sequence with residue 15, 24, 65 from the template. MODE user PAIR 12 15 20 24 PAIR 60 65
`GAP` gap-list	Optional. User-specified gaps, used only when `MODE` is set to `user`. Can be used with or without `PAIR`. Gaps are specified by the residues in either query (P) or template (T) that are aligned with them. For example, the following opens gaps against query residue 10, 11, template residue 34 and 35. MODE user GAP P10 P11 T34 T35

Files

Input file—named jobname.inp. Contains keywords for running the program.
Query sequence file—File containing the complete sequence of the query, in FASTA format. Specified in the input file.
Secondary structure prediction files in CASP format for the query sequence.
Template PDB file—Template structure file, in PDB format. The sequence used for the template is taken from SEQRES lines in the PDB file. If SEQRES is not found in the PDB file, the template sequence is then taken from ATOM records.
Output alignment file—File named jobname.out containing pairwise alignment between the query and the template. By default, the alignment is in Maestro format. To generate plain text format, include FORMAT dev in the command input file. In plain text format, aligned residues from the query and the template are placed on top of each other—see below for an example.
Log file—File named jobname.logcontaining progress of the sta program, including warnings and error messages.

Return Value and Errors

sta returns 0 for success and non-zero for failure. In case of failure, check the log file for details, and search for ERROR. To further analyze the failure, use the -DEBUG option when you run sta.

Examples

Below is an example input file, where query_seq.fasta stores the query sequence in FASTA format.

QUERY_NAME      query
QUERY_FILE      query_seq.fasta
PREDSS_0        query-ssp1.casp
PREDSS_1        query-ssp2.casp
PREDSS_3        query-ssp3.casp
BGNRES          1
ENDRES          149
TEMPLATE_PDB    pdb1vpe.ent

Below is an example of the plain text format for output alignment files. This format is generated when you include ALIFORMAT dev in the command input file.

probe length=70======tmplt_T0281_d1dq3a2 length=79 TotalScore= 42 SEQ= -4 SSTR= 102
 SEQID= 0.061538 nali= 65
probe bgnRes=1 endRes=70    template bgnRes=3 endRes=75
ProbeAA: ..MWMPPRPEEVARKLRRLGFVERMAKGGHRLYTHPDGRIVVVPFH...SGELP....KG
ProbeSS:   LLLLLLHHHHHHHHHHLLLEEEEELLLEEEEELLLLLEEEeLLL   LLLLL    HH
Fold AA: GNFGLPLNFNAFKEWASEYGVEFKTNGSQTIAIIND...ERISLGQWHTRNRVSKAVLVK
Fold SS: LLLEELLLHHHHHHHHHLLLLEEEEELLEEEEEELL   EEEELLLHHHHLLEEHHHHHH
ProbeAA: TFKRILRDAGLTEEEFHNL....
ProbeSS: HHHHHHHHhLLLHHHHHLL
Fold AA: MLRKLYEATK.DEEVKRMLHLIE
Fold SS: HHHHHHHHHL LHHHHHHHHHHL

The file contains header information at the top, then in the alignment section, the query sequence (ProbeAA), query composite secondary structure (ProbeSS), template sequence (Fold AA) and template secondary structure (Fold SS) are given. Gaps are denoted by periods in the sequences and by spaces in the secondary structures.