PreDom Help

DiD (View interface)

How to use

  1. Select the domain structure database to view in the "Database" cell.
  2. Enter the PDB ID in the [PDB ID] field, and click [Find chains] to list all the chain IDs in the specified PDB data on the next menu. Then select the chain ID from the list.
    To clear the chain list, click [Reset]. (The [PDB ID] field is also initialized.)
  3. Click [View] to submit the query.

Description of the result

List of the domains (bottom left table)
The list of the domains in the specified data.
Field name Description
domaindomain number
showswitch to show/hide the domain (uncheck to hide)
scheme of interface residuesbuttons to change the scheme of the domain-domain interface residues
residue numberrange of the residue number of the domain
interface area area of the domain-domain interface (Å2)
total:the area of the whole domain-domain interface
core:the area of the domain-domain interface originating in the core interface residues
Amino acid sequence (top left box)
The amino acid sequence of the specified data. For each domain, the core interface residues are indicated with deep background color, while the peripheral interface residues are indicated with intermediate color. By placing the mouse pointer over one of the interface residues, the residue name and the residue number are displayed in the "interaction residue" field below.
3D Structure (right box)
The Jmol visualization of the specified data. Initially, the domain-domain interface residues are shown in cpk format, while other residues are shown in cartoon format. As with the amino acid sequence, the core interface residues are indicated with deep color. By placing the mouse pointer over one of the interface residues in the amino acid sequence, the residue color turns white to show its position in the structure.
Data about a residue (below the 3D structure)
By clicking one of the residues in the amino acid sequence, the data describing the residue are shown here.

DiD (Predict interface)

How to use

  1. Select the domain structure database to use in the "Database" cell.
  2. Click [Browse...] next to the [PDB format file] field and select the PDB format file of the target data. Then click [Find chains] to list all the chain IDs in the specified data, and select the chain ID from the list.
    To clear the chain list, click [Reset]. (The [PDB format file] field is also initialized.)
  3. Modify the value of the following parameters if needed.
    Threshold of sequence identity
    The mininum of the required sequence identity with the target domain of the homologous domains.
    Threshold of sequence coverage
    The minimum of the required sequence coverage by the target domain of the homologous domains.
    Threshold of E-value
    The maximum of the allowed E-value for performing BLAST search.
    Threshold of IP score
    The minimum of the required IP score (score evaluated on the basis of the occurrence of each amino acid residue on the domain-domain interface).
    Threshold of residue contact
    The maximum of the allowed distance between atoms in the candidate domain-domain interface residues. A residue pair is regarded to be in contact with one another when all the distances between atoms of each residue are within this value.
    Minimum size of interface
    The minimum of the required number of contacted residues. Predicted domain-domain interaction residues are discarded when the number of contacted residues among them is less than this value.
  4. Click [Start Prediction] to execute prediction.

Description of the result page

List of the results (bottom left table)
List of the obtained results. All the results by the KIP method are listed first, then the results by the IP method follow.
Field name Description
methodMethod of the prediction (KIP or IP)
homologous domainID of the homologous domain (only for the results by the KIP method). By clicking the link, the amino acid sequence and the 3D structure of that domain are displayed on another window.
Identity/coverageThe sequence identity/coverage of the homologous domain (only for the results by the KIP method).
Amino acid sequence (top left box)
The amino acid sequence of the target domain. The predicted domain-domain interface residues of the result selected by the leftmost radio button in the result list are indicated with blue-green background color. By placing the mouse pointer over one of the interface residues, the residue name and the residue number are displayed in the "interaction residue" field below.
3D Structure (right box)
The Jmol visualization of the target domain. The predicted domain-domain interface residues are shown in cpk format, while other residues are shown in cartoon format. By placing the mouse pointer over one of the interface residues in the amino acid sequence, the corresponding residue color in this figure turns white to indicate its position in the structure.
Item Description
[cartoon][cpk] buttonsButtons to change the display format of the residues except the domain-domain interface residues
[spin] checkboxSwitch to start/stop to spin the structure.
Setting of the definition of interface residues (topmost checkbox)
Select the requirement for the domain-domain interface residues to display.
Checked:
Only the residues whose positions and residue names both coincide with those of the domain-domain interface residues in the homologous domains are displayed as the interface residues.
Unchecked:
All the residues located at the positions of the domain-domain interface residues in the homologous domain are displayed as the interface residues.

DiD (Statistics)

Contents

The following statistics concerning the representative domains for different protein domain structure databases (SCOP and CATH) are shown in this web page.

Predict 3D structure of multi-domain proteins

How to use

Select the data format of the query by changing the [Input type] pulldown menu.

    FASTA
    Using amino acid sequence data in FASTA format, the whole 3D structure of the protein is predicted.
    PDB
    Using 3D structure data of each domain, the whole 3D structure of the query protein is predicted. Currently, PreDom:Structure can accept up to three domain structures.

prediction using amino acid sequence data

  1. Select “Amino acid sequence in FASTA format” or “FASTA format file”. If you select “Amino acid sequence in FASTA format”, input the amino acid sequence of the query protein in FASTA format into the text area field. If you select “FASTA format file”, click [Browse…] and select the FASTA format file of the query protein.
  2. Click [Start Prediction] to execute prediction.
  3. There are four types of prediction results according to the query protein sequence.
    1. If the 3D structure data of the almost identical protein to the query protein (sequence identity ≥ 95%) is found in the PDB by the BLAST search, the 3D structure of the found protein is shown in JSmol viewer. Additionally, the alignment obtained from the BLAST result is shown.
    2. If the 3D structure data of the homologous protein (sequence identity ≥ 25%) is found in the PDB by the BLAST search, you can perform homology modeling using the found structure as template. To perform the homology modeling, the license key of MODELLER is needed.
      1. If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the template protein is also shown in JSmol viewer.
      2. Click [Click here] button in the "Status" row to open a new window for setting some parameters of a homology modeling execution.
      3. Enter your job name (optional), e-mail address and license key of MODELLER into the window and click [Run prediction] to run a homology modeling.
      4. Completion of the program execution is notified by e-mail.
      5. When you access the result page of which URL is described in the e-mail, results of a homology modeling, such as five predicted structures of the query, information of the template structure, the alignment of query and template proteins obtained from MODELLER and quality scores of model structures (molpdf, DOPE score and GA341 score), are shown.
      6. You can download the coordinate data in PDB format of predicted structures.
    3. If 3D structures of continuous two domains in the query protein sequence are already known or predictable by homology modeling, a whole 3D structure of the two domains is predicted using DINE score.
      1. If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the PDB entry is also shown in JSmol viewer.
      2. Click [Click here] button in the "Status" row to open a new window for setting some parameters of DINE score. Details of parameters are described in "prediction using 3D structure data" section.
      3. Enter your job name (optional) and e-mail address and click [Run prediction] to run a prediction process. If homology modeling is required for predicting 3D structure(s) of one domain or both two domains, the license key of MODELLER is also needed.
      4. Completion of the program execution is notified by e-mail.
      5. When you access the result page of which URL is described in the e-mail, prediction results are shown. See "2-domain protein" paragraph in "Description of the result page" section.
    4. If 3D structures of more than two domains in the query protein sequence are already known or predictable by homology modeling, known structures or template structures for homology modeling are shown.
      1. If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the PDB entry is also shown in JSmol viewer.
      2. Click [Click here] button in the "Status" row to open a new window for setting parameters of homology modeling.
      3. Enter your job name (optional), e-mail address and license key of MODELLER. Then, click [Run prediction] to run homology modeling.
      4. Completion of the program execution is notified by e-mail.
      5. When you access the result page of which URL is described in the e-mail, predicted structures by homology modeling are shown. If the query contains only three domains and all domain structures are already known or can be modeled by homology modeling, a whole structure of them is predicted. See "multi-domain protein composed of more than 2 domains" paragraph in "Description of the result page" section.

prediction using 3D structure data

  1. Click [Browse...] next to one of the [PDB format file] fields and select the PDB format file of the domain. Then click the next [Find chains] button to list all the chain IDs in the specified PDB data, and select the chain ID from the list. To clear the chain list, click [Reset]. (The [PDB format file] field is also initialized.)
    Repeat this procedure for another domain.
  2. Modify the value of the following parameters if needed.
    • Parameters for KIP method
      Threshold of sequence identity
      The minimum of the required sequence identity with the target domain of the homologous domains for the domain-domain interface prediction.
      Threshold of sequence coverage
      The minimum of the required sequence coverage by the target domain of the homologous domains for the domain-domain interface prediction.
      hold of BLAST E-value
      The maximum of the allowed E-value for performing BLAST search.
    • Parameters for IP method
      Threshold of IP score
      The minimum of the required IP score (score evaluated on the basis of the occurrence of each amino acid residue on the domain-domain interface).
      Threshold of residue contact
      The maximum of the allowed distance between atoms in the candidate domain-domain interface residues. A residue pair is regarded to be in contact with one another when all the distances between atoms of each residue are within this value.
      Minimum size of interface
      The minimum of the required number of contacted residues. Predicted domain-domain interaction residues are discarded when the number of contacted residues among them is less than this value.
  3. Completion of the program execution is notified by e-mail. Enter arbitrary name in the [Job name] field for identification of the job, and the e-mail address to receive notification in the [E-Mail] field.
  4. Click [Start Prediction] to execute prediction.

Description of the result page

2-domain protein

    List of the candidate structures (top table)
    Ten most plausible candidate structures on the basis of the DINE score are listed in descending order of the DINE score.
    The DINE score defined as:

    DINE = wdock * Sdock + wint * Sint + wete * Sete

    Sdock,Sint,Sete:

    Scores of the docking (interface complementarity of physicochemical aspect by ZRANK), interface (ratio of predicted interface residues in the domain-domain interface) and end-to-end distance (fitness of the distance between the domain ends to statistical one), respectively.

    wdock,wint,wete:

    Weights of the docking score, domain interface score and end-to-end distance score, respectively.
    The default values of wdock, wint and wete are 7, 8 and 1, respectively. You can change the weight values in this list and re-calculate the DINE score for each candidate structure based on the changed weight values. Then, the candidate structures are re-sorted by re-calculated values.

    Bottom area
    Information about a candidate structure selected by the radio buttons in the above list is shown in this area.
    Amino acid sequence (top left box)
    The amino acid sequence of the candidate structure. By placing the mouse pointer over one of these residues, the residue name and the residue number are displayed in the “interaction residue” field below. Numerical data about the domain-domain interface residues is also shown in the legend.
    List of the domains (bottom left table)
    List of the input domains.
    Field name Description
    domain numberdomain number of each query
    residue numberrange of the residue number of the domain and the linkers
    showswitch to show/hide the domain (uncheck to hide)
    scheme of predicted interface residuesbuttons to change the scheme of the domain-domain interface residues
    3D Structure (right box)
    The JSmol visualization of the candidate structure. Initially, the domain-domain interface residues are shown in wireframe format, while other residues in cartoon format. As with the amino acid sequence, the domain linker residues are indicated with deep color. By placing the mouse pointer over one of the interface residues or the domain linker residues in the amino acid sequence, the corresponding residue in this figure turns white to show its position in the structure.
    Results of the docking simulation (below the 3D structure)
    Numerical results of the docking simulation are shown here. Click the [Download PDB] button to download a PDB file of the above structure.

multi-domain protein composed of more than 2 domains

    If the query protein is composed of more than two domains, the 3D structure is predicted as follows.

    1. 3D structures of franking two domains (domain1-domain2, domain2-domain3 and so on) are predicted using DINE score.
    2. Decoy structures are constructed by superposing domain-n of predicted domain-(n-1)-domain-n and domain-n-domain-(n+1) structures.
    3. The decoy structures without any collisions of all domains are sorted by harmonic mean of the DINE scores of domain1-domain2, domain2-domain3 and so on.

    Using the pull-down menu on the top left, one of ten most plausible candidate structures can be shown.

    Amino acid sequence (top left box)
    The amino acid sequence of the candidate structure. By placing the mouse pointer over one of these residues, the residue name and the residue number are displayed in the “interaction residue” field below. Numerical data about the domain-domain interface residues is also shown in the legend.
    List of the domains (bottom left table)
    List of the input domains.
    Field name Description
    domain numberdomain number of each query
    residue numberrange of the residue number of the domain and the linkers
    showswitch to show/hide the domain (uncheck to hide)
    scheme of predicted interface residuesbuttons to change the scheme of the domain-domain interface residues
    3D Structure (right box)
    The JSmol visualization of the candidate structure. Initially, the domain-domain interface residues are shown in wireframe format, while other residues in cartoon format. As with the amino acid sequence, the domain linker residues are indicated with deep color. By placing the mouse pointer over one of the interface residues or the domain linker residues in the amino acid sequence, the corresponding residue in this figure turns white to show its position in the structure.Click the [Download PDB] button to download a PDB file of the above structure.

Evtrace

How to use

  1. BLAST search

    Search for homologous sequences of the query sequence by BLAST.

    1. Select a data format of query protein in the [Input] field. PDB ID, file in the PDB format, amino acid sequence in the FASTA format and file of amino acid sequence in the FASTA format are accepted.
      PDB ID
      Input PDB ID of the query protein in lowercase. If the PDB ID is found in our database, you can select a chain ID from all chain IDs in the PDB entry.
      PDB format file
      Select the 3D structure file of the query protein in the PDB format.
      FASTA format file
      Select the amino acid sequence file of the query protein in the FASTA format.
      FASTA format text
      Input the amino acid sequence of query protein in the FASTA format.
    2. Modify following BLAST parameters in the [BLAST parameters] field if needed.
      Database
      Database name searched by BLAST
      E-value
      Expectation value threshold
    3. Click [BLAST] to run a BLAST search.
      Command: blastp -query (amino acid sequence of query) -db (database name) -evalue (Expectation value threshold)
  2. Homologous sequences to query data

    Select sequences from the BLAST results and then submit them to the evolutionary trace analysis. In the evolutionary trace analysis, a phylogenetic tree is constructed from the multiple alignment of those sequences by the NJ method, and grouping of sequences is performed on the tree.

    1. BLAST results are shown in the table below (click each ID to view the pairwise alignment of query and subject sequences). Sequences satisfying default conditions for sequence identity (≥20%) and coverage (≥90%) are shown in red and selected with checkboxes initially. Selections can be changed by modifying the conditions on the drop-down lists or check/uncheck each checkbox.
    2. Modify a condition for the evolutionary trace analysis if needed.
    3. Multiple alignment method
      Program for multiple alignment. ClustalW2 or MAFFT can be used.
      Command-line options for clustalW2: -align -infile -outfile
      Command-line options for MAFFT: --auto --clustalout
    4. Click [Exec trace] to perform evolutionary trace analysis.
  3. Results of evtrace

    Results of the evolutionary trace analysis are shown.

    Following parameters for sequence grouping and clustering of trace residues can be changed here.
    Tident
    Cutoff value of identity for sequence grouping. Click a radio button to change the results to those under the corresponding cutoff condition.
    Tdist
    Distance threshold for nearest-neighbor clustering of trace residues. Two clusters are merged if they are closer than this threshold. Click [Change] after entering a value to update results.
    Tresnum
    Number threshold for nearest-neighbor clustering of trace residues. Clusters with the number of residues less than this threshold are ignored. Click [Change] after entering a value to update results.

Description of the result

Following results are shown on each tab.

[Sequence group]
List of sequences with their group ID. All sequences selected in the previous page are grouped using the phylogenetic tree shown in the [Tree] tab. The pairwise alignment of query and subject sequences is shown when the ID is clicked.
[Multiple alignment (group)]
Summary of the multiple alignment obtained by the selected program. Only consensus sequences for each group and the query sequence are shown here. The invariant and class-specific trace resides are shown in pink and gray, respectively. Click [Download raw data] to download the file of the result.
[Multiple alignment (sequence)]
Multiple alignment obtained by the selected program. All sequences and the query sequence are shown here. The invariant and class-specific trace resides are shown in pink and gray, respectively. Click [Download the multiple alignment] to download the file of the result.
[Trace residues (TR)]
Invariant (pink) and class-specific (gray) trace residues on the amino acid sequence and 3D structure.
[Clustering TRs]
Spatial clusters of trace residues on the amino acid sequence and 3D structure. If the trace residue is located at internal position (the surface accessibility < 0.1), the trace residue is not used in the clustering.
[Tree]
Phylogenetic trees of the amino acid sequences.