sta—Single Template Alignment

The Single Template Alignment (STA) program in PRIME is designed for protein sequences with medium to high sequence identity (>20%). Features of this program include the following:

  • STA differs from standard sequence alignment programs such as BLAST by taking into account secondary structure matching as well as profile-sequence matching. As a result, STA often generates better alignment than BLAST in the regions where sequence conservation is relatively weak.

  • STA uses a position-specific substitutional matrix (PSSM) for the query sequence, derived from PSI-BLAST, to match the template sequence.

  • In order to minimize the inaccuracy in a single secondary structure prediction, a novel and robust algorithm has been designed to derive a composite secondary structure for the query sequence from multiple predictions. The composite SSP is aligned to the SSA of the template.

  • STA can apply constraints, such as user-selected residue pairs to be aligned together or residues in either query or template that are not to be aligned (gaps).

  • STA can use templates directly from the PDB, and can also use templates from the Prime fold library.

    Syntax

$SCHRODINGER/sta [options] jobname

The input is read from the file jobname.inp and the output is written to jobname.out.

Command Input File

The input file consists of lines containing keyword-value pairs, one per line. The keywords are described in Table 1.

Table 1. Keywords for the sta program

Keyword syntax

Description

QUERY_NAME name

Name of the query sequence.

QUERY_FILE filename

File name of the query sequence in FASTA format.

FORMAT string

Alignment file format. Default is Maestro. Set to dev for plain text format—see Below is an example of the plain text format for output alignment files. This format is generated when you include ALIFORMAT dev in the command input file. for an example.

BGNRES n

The first residue of the query that is used in alignment. Default: 1.

ENDRES n

The last residue of the query sequence that is used. Default: the last residue of the original query sequence.

PREDSS[_i]filename

Secondary structure prediction file of the query sequence, in CASP format. One occurrence of this keyword is required for each file, and there must be at least one SSP file specified. If only one prediction is used, the keyword is PREDSS; if more predictions are used, the keyword is suffixed with an index i, starting from zero: PREDSS_0, PREDSS_1, and so on.

TEMPLATE_PDB filename

Filename of the template structure, in PDB format.

MODE string

Optional. Needed in the input file only when there are user-specified constraints. In that case, the value is set to user.

PAIR pair-list

Optional. User-specified residue pairing, used only when MODE is set to user. All the residue pairs can be added after the keyword in one or multiple lines. For example, the following lines pair residue 12, 20, 60 from the query sequence with residue 15, 24, 65 from the template.

MODE user
PAIR 12 15 20 24
PAIR 60 65

GAP gap-list

Optional. User-specified gaps, used only when MODE is set to user. Can be used with or without PAIR. Gaps are specified by the residues in either query (P) or template (T) that are aligned with them. For example, the following opens gaps against query residue 10, 11, template residue 34 and 35.

MODE user
GAP  P10 P11 T34 T35

Files

  • Input file—named jobname.inp. Contains keywords for running the program.

  • Query sequence file—File containing the complete sequence of the query, in FASTA format. Specified in the input file.

  • Secondary structure prediction files in CASP format for the query sequence.

  • Template PDB file—Template structure file, in PDB format. The sequence used for the template is taken from SEQRES lines in the PDB file. If SEQRES is not found in the PDB file, the template sequence is then taken from ATOM records.

  • Output alignment file—File named jobname.out containing pairwise alignment between the query and the template. By default, the alignment is in Maestro format. To generate plain text format, include FORMAT dev in the command input file. In plain text format, aligned residues from the query and the template are placed on top of each other—see below for an example.

  • Log file—File named jobname.logcontaining progress of the sta program, including warnings and error messages.

    Return Value and Errors

sta returns 0 for success and non-zero for failure. In case of failure, check the log file for details, and search for ERROR. To further analyze the failure, use the -DEBUG option when you run sta.

Examples

Below is an example input file, where query_seq.fasta stores the query sequence in FASTA format.

QUERY_NAME      query
QUERY_FILE      query_seq.fasta
PREDSS_0        query-ssp1.casp
PREDSS_1        query-ssp2.casp
PREDSS_3        query-ssp3.casp
BGNRES          1
ENDRES          149
TEMPLATE_PDB    pdb1vpe.ent

Below is an example of the plain text format for output alignment files. This format is generated when you include ALIFORMAT dev in the command input file.

probe length=70======tmplt_T0281_d1dq3a2 length=79 TotalScore= 42 SEQ= -4 SSTR= 102
 SEQID= 0.061538 nali= 65
probe bgnRes=1 endRes=70    template bgnRes=3 endRes=75
ProbeAA: ..MWMPPRPEEVARKLRRLGFVERMAKGGHRLYTHPDGRIVVVPFH...SGELP....KG
ProbeSS:   LLLLLLHHHHHHHHHHLLLEEEEELLLEEEEELLLLLEEEeLLL   LLLLL    HH
Fold AA: GNFGLPLNFNAFKEWASEYGVEFKTNGSQTIAIIND...ERISLGQWHTRNRVSKAVLVK
Fold SS: LLLEELLLHHHHHHHHHLLLLEEEEELLEEEEEELL   EEEELLLHHHHLLEEHHHHHH
ProbeAA: TFKRILRDAGLTEEEFHNL....
ProbeSS: HHHHHHHHhLLLHHHHHLL
Fold AA: MLRKLYEATK.DEEVKRMLHLIE
Fold SS: HHHHHHHHHL LHHHHHHHHHHL

The file contains header information at the top, then in the alignment section, the query sequence (ProbeAA), query composite secondary structure (ProbeSS), template sequence (Fold AA) and template secondary structure (Fold SS) are given. Gaps are denoted by periods in the sequences and by spaces in the secondary structures.