COM, COMPARE


NAME
COM, COMPARE - compare two sequences.

SYNOPSIS
COM
COM minimal_score segment_width
COM OFF
COM COR serial1 serial2
COM ZOO zoom_factor
COMPARE
COMPARE minimal_score segment_width
COMPARE OFF
COMPARE CORNER serial1 serial2
COMPARE ZOOM zoom_factor

DESCRIPTION
Compare two sequences. Garlic mantains two sequence buffers: the main sequence buffer and the reference sequence buffer. After initializing both buffers, the command COMPARE may be used to prepare the plot with comparison of these two sequences. Matching residues will be marked by dots or squares, depending on the zoom factor. Dots or squares which represent residue pairs which match exactly will be colored yellow, while residues which are different but have similar properties will be colored red. The reference sequence will be assigned to the horizontal (x) axis, while another (investigated) sequence will be assigned to the vertical (y) axis.

The plot will contain only residues which belong to the segments for which the score of matching residues is equal to or larger than minimal score. The minimal score (the number of matching pairs) and the width of the segment used for comparison may be specified by the user. Hard-coded defaults are SEGMENT_WIDTH and MINIMAL_SCORE, defined in defines.h file. The values used in the original garlic package: SEGMENT_WIDTH=5, MINIMAL_SCORE=5.

SCORING
Two sequences are compared using fixed width segments. The matrix shown below is used to identify matching residues. It was inspired by Dayhoff PAM250 matrix. However, while PAM250 matrix contains a range of values, both positive and negative, the only values used in the substitution matrix below are zero and one. One is used to identify matching residue pairs and zero for pairs which do not match.



The scoring scheme used in garlic is much simpler than the elaborate schemes used in advanced programs for sequence alignments. However, this scheme is still quite useful and I believe that many users will like to play with the various segment widths and scores.

KEYWORDS AND PARAMETERS
All keywords and associated parameters are explained in the table below. Note that the minimal score may not exceed the segment width.

KEYWORD AND/OR
PARAMETERS
DESCRIPTION
None Draw sequence comparison, using default segment
width and default minimal score.
OFF Return to the main drawing mode. The same may
be achieved by hiting the ESCAPE key.
minimal_score segment_width Draw sequence comparison, using the specified
segment width and the specified minimal score.
Both parameters should be positive.
CORNER ref_offset main_offset Draw sequence comparison, but skip some leading
residues. Skip (ref_offset - 1) residues of
the reference sequence and (main_offset - 1)
residues of the main sequence.

MOUSE USAGE
The pointing device (mouse) may be used to find the residue serial numbers and names for matching pairs and residues which follow after them. After a chosen pair is reached with the pointer, the information about this pair and residues which follow them will be available in the output window (the bottom right corner). Color codes: yellow is used for residues which are equal, magenta for acceptable substitutions and red for mismatching residues.



EXAMPLES
COMMAND DESCRIPTION
load 9PAP.pdb
sel het
sel com
seq from 1
seq copy
load 1HUC.pdb
sel a,b/*/*/*
seq from 2
compare 8 10
Load the structure of papain, select all hetero atoms and
then select the complement of this selection. The purpose
of this trick is to exclude all hetero atoms from selection.
Extract the sequence and copy it to the reference buffer.

After that, load the structure of human cathepsin B, select
chains A and B (one molecules consists of two chains), extract
the sequence and compare it with the content of the reference
buffer. The minimal score is 8 and the segment width is 10.
load 9PAP.pdb
sel het
sel com
seq from 1
seq copy
seq load sample.fasta
compare 6 7
Load the structure of papain and copy the sequence to the
reference buffer using the same method as in the previous
example. Read the second sequence from the specified FASTA
file (one letter codes). Compare two sequences, using the
specified minimal score and segment width (5/7).
This is the typical usage of sequence comparison: compare
a fresh sequence with the sequence of a solved structure.
com zoom 4 Change zoom factor to 4.
com cor 100 100 Skip the first hundred residues of both sequences.

NOTES
(1) You don't need protein 3D structures to compare two sequences. The minimal information required for protein comparison is the primary structure, i.e. the protein sequence.

RELATED COMMANDS
The command SEQ (SEQUENCE) is used to manipulate the content of the main sequence buffer and of the reference sequence buffer. LOAD is used to load the PDB file. SELECT, ADD and RESTRICT are used for selection.