How to use
- Select the domain structure database to view in the "Database" cell.
- Enter the PDB ID in the [PDB ID] field, and click [Find chains] to list all the chain IDs in the specified PDB data on the next menu. Then select the chain ID from the list.
To clear the chain list, click [Reset]. (The [PDB ID] field is also initialized.)
- Click [View] to submit the query.
Description of the result
- List of the domains (bottom left table)
- The list of the domains in the specified data.
Field name |
Description |
domain | domain number |
show | switch to show/hide the domain (uncheck to hide) |
scheme of interface residues | buttons to change the scheme of the domain-domain interface residues |
residue number | range of the residue number of the domain |
interface area |
area of the domain-domain interface (Å2)
total: | the area of the whole domain-domain interface |
core: | the area of the domain-domain interface originating in the core interface residues |
|
- Amino acid sequence (top left box)
- The amino acid sequence of the specified data. For each domain, the core interface residues are indicated with deep background color, while the peripheral interface residues are indicated with intermediate color. By placing the mouse pointer over one of the interface residues, the residue name and the residue number are displayed in the "interaction residue" field below.
- 3D Structure (right box)
- The Jmol visualization of the specified data. Initially, the domain-domain interface residues are shown in cpk format, while other residues are shown in cartoon format. As with the amino acid sequence, the core interface residues are indicated with deep color. By placing the mouse pointer over one of the interface residues in the amino acid sequence, the residue color turns white to show its position in the structure.
- Data about a residue (below the 3D structure)
- By clicking one of the residues in the amino acid sequence, the data describing the residue are shown here.
How to use
- Select the domain structure database to use in the "Database" cell.
- Click [Browse...] next to the [PDB format file] field and select the PDB format file of the target data. Then click [Find chains] to list all the chain IDs in the specified data, and select the chain ID from the list.
To clear the chain list, click [Reset]. (The [PDB format file] field is also initialized.)
- Modify the value of the following parameters if needed.
- Threshold of sequence identity
- The mininum of the required sequence identity with the target domain of the homologous domains.
- Threshold of sequence coverage
- The minimum of the required sequence coverage by the target domain of the homologous domains.
- Threshold of E-value
- The maximum of the allowed E-value for performing BLAST search.
- Threshold of IP score
- The minimum of the required IP score (score evaluated on the basis of the occurrence of each amino acid residue on the domain-domain interface).
- Threshold of residue contact
- The maximum of the allowed distance between atoms in the candidate domain-domain interface residues. A residue pair is regarded to be in contact with one another when all the distances between atoms of each residue are within this value.
- Minimum size of interface
- The minimum of the required number of contacted residues. Predicted domain-domain interaction residues are discarded when the number of contacted residues among them is less than this value.
- Click [Start Prediction] to execute prediction.
Description of the result page
- List of the results (bottom left table)
- List of the obtained results. All the results by the KIP method are listed first, then the results by the IP method follow.
Field name |
Description |
method | Method of the prediction (KIP or IP) |
homologous domain | ID of the homologous domain (only for the results by the KIP method). By clicking the link, the amino acid sequence and the 3D structure of that domain are displayed on another window. |
Identity/coverage | The sequence identity/coverage of the homologous domain (only for the results by the KIP method). |
- Amino acid sequence (top left box)
- The amino acid sequence of the target domain. The predicted domain-domain interface residues of the result selected by the leftmost radio button in the result list are indicated with blue-green background color. By placing the mouse pointer over one of the interface residues, the residue name and the residue number are displayed in the "interaction residue" field below.
- 3D Structure (right box)
- The Jmol visualization of the target domain. The predicted domain-domain interface residues are shown in cpk format, while other residues are shown in cartoon format. By placing the mouse pointer over one of the interface residues in the amino acid sequence, the corresponding residue color in this figure turns white to indicate its position in the structure.
Item |
Description |
[cartoon][cpk] buttons | Buttons to change the display format of the residues except the domain-domain interface residues |
[spin] checkbox | Switch to start/stop to spin the structure. |
- Setting of the definition of interface residues (topmost checkbox)
- Select the requirement for the domain-domain interface residues to display.
- Checked:
- Only the residues whose positions and residue names both coincide with those of the domain-domain interface residues in the homologous domains are displayed as the interface residues.
- Unchecked:
- All the residues located at the positions of the domain-domain interface residues in the homologous domain are displayed as the interface residues.
Contents
The following statistics concerning the representative domains for different protein domain structure databases (SCOP and CATH) are shown in this web page.
- List of the IDs of the representative domains
- Histogram of the area of the domain-domain interface
- Histogram of the number of the domain-domain interface residues
- Scatter plot of the number of residues (whole domain versus domain-domain interface)
- Propensity of the domain-domain interface residues
How to use
Select the data format of the query by changing the [Input type] pulldown menu.
- FASTA
- Using amino acid sequence data in FASTA format, the whole 3D structure of the protein is predicted.
- PDB
- Using 3D structure data of each domain, the whole 3D structure of the query protein is predicted. Currently, PreDom:Structure can accept up to three domain structures.
prediction using amino acid sequence data
- Select “Amino acid sequence in FASTA format” or “FASTA format file”. If you select “Amino acid sequence in FASTA format”, input the amino acid sequence of the query protein in FASTA format into the text area field. If you select “FASTA format file”, click [Browse…] and select the FASTA format file of the query protein.
- Click [Start Prediction] to execute prediction.
- There are four types of prediction results according to the query protein sequence.
- If the 3D structure data of the almost identical protein to the query protein (sequence identity ≥ 95%) is found in the PDB by the BLAST search, the 3D structure of the found protein is shown in JSmol viewer. Additionally, the alignment obtained from the BLAST result is shown.
- If the 3D structure data of the homologous protein (sequence identity ≥ 25%) is found in the PDB by the BLAST search, you can perform homology modeling using the found structure as template. To perform the homology modeling, the license key of MODELLER is needed.
- If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the template protein is also shown in JSmol viewer.
- Click [Click here] button in the "Status" row to open a new window for setting some parameters of a homology modeling execution.
- Enter your job name (optional), e-mail address and license key of MODELLER into the window and click [Run prediction] to run a homology modeling.
- Completion of the program execution is notified by e-mail.
- When you access the result page of which URL is described in the e-mail, results of a homology modeling, such as five predicted structures of the query, information of the template structure, the alignment of query and template proteins obtained from MODELLER and quality scores of model structures (molpdf, DOPE score and GA341 score), are shown.
- You can download the coordinate data in PDB format of predicted structures.
- If 3D structures of continuous two domains in the query protein sequence are already known or predictable by homology modeling, a whole 3D structure of the two domains is predicted using DINE score.
- If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the PDB entry is also shown in JSmol viewer.
- Click [Click here] button in the "Status" row to open a new window for setting some parameters of DINE score. Details of parameters are described in "prediction using 3D structure data" section.
- Enter your job name (optional) and e-mail address and click [Run prediction] to run a prediction process. If homology modeling is required for predicting 3D structure(s) of one domain or both two domains, the license key of MODELLER is also needed.
- Completion of the program execution is notified by e-mail.
- When you access the result page of which URL is described in the e-mail, prediction results are shown. See "2-domain protein" paragraph in "Description of the result page" section.
- If 3D structures of more than two domains in the query protein sequence are already known or predictable by homology modeling, known structures or template structures for homology modeling are shown.
- If [Show] button is clicked, the alignment obtained from the BLAST result is shown. The 3D structure of the PDB entry is also shown in JSmol viewer.
- Click [Click here] button in the "Status" row to open a new window for setting parameters of homology modeling.
- Enter your job name (optional), e-mail address and license key of MODELLER. Then, click [Run prediction] to run homology modeling.
- Completion of the program execution is notified by e-mail.
- When you access the result page of which URL is described in the e-mail, predicted structures by homology modeling are shown. If the query contains only three domains and all domain structures are already known or can be modeled by homology modeling, a whole structure of them is predicted. See "multi-domain protein composed of more than 2 domains" paragraph in "Description of the result page" section.
prediction using 3D structure data
- Click [Browse...] next to one of the [PDB format file] fields and select the PDB format file of the domain. Then click the next [Find chains] button to list all the chain IDs in the specified PDB data, and select the chain ID from the list. To clear the chain list, click [Reset]. (The [PDB format file] field is also initialized.)
Repeat this procedure for another domain.
- Modify the value of the following parameters if needed.
- Parameters for KIP method
- Threshold of sequence identity
- The minimum of the required sequence identity with the target domain of the homologous domains for the domain-domain interface prediction.
- Threshold of sequence coverage
- The minimum of the required sequence coverage by the target domain of the homologous domains for the domain-domain interface prediction.
- hold of BLAST E-value
- The maximum of the allowed E-value for performing BLAST search.
- Parameters for IP method
- Threshold of IP score
- The minimum of the required IP score (score evaluated on the basis of the occurrence of each amino acid residue on the domain-domain interface).
- Threshold of residue contact
- The maximum of the allowed distance between atoms in the candidate domain-domain interface residues. A residue pair is regarded to be in contact with one another when all the distances between atoms of each residue are within this value.
- Minimum size of interface
- The minimum of the required number of contacted residues. Predicted domain-domain interaction residues are discarded when the number of contacted residues among them is less than this value.
- Completion of the program execution is notified by e-mail. Enter arbitrary name in the [Job name] field for identification of the job, and the e-mail address to receive notification in the [E-Mail] field.
- Click [Start Prediction] to execute prediction.
Description of the result page
2-domain protein
- List of the candidate structures (top table)
- Ten most plausible candidate structures on the basis of the DINE score are listed in descending order of the DINE score.
- The DINE score defined as:
DINE = wdock * Sdock + wint * Sint + wete * Sete
- Sdock,Sint,Sete:
Scores of the docking (interface complementarity of physicochemical aspect by ZRANK), interface (ratio of predicted interface residues in the domain-domain interface) and end-to-end distance (fitness of the distance between the domain ends to statistical one), respectively.
- wdock,wint,wete:
Weights of the docking score, domain interface score and end-to-end distance score, respectively.
The default values of wdock, wint and wete are 7, 8 and 1, respectively. You can change the weight values in this list and re-calculate the DINE score for each candidate structure based on the changed weight values. Then, the candidate structures are re-sorted by re-calculated values.
- Bottom area
- Information about a candidate structure selected by the radio buttons in the above list is shown in this area.
- Amino acid sequence (top left box)
- The amino acid sequence of the candidate structure. By placing the mouse pointer over one of these residues, the residue name and the residue number are displayed in the “interaction residue” field below. Numerical data about the domain-domain interface residues is also shown in the legend.
- List of the domains (bottom left table)
- List of the input domains.
Field name |
Description |
domain number | domain number of each query |
residue number | range of the residue number of the domain and the linkers |
show | switch to show/hide the domain (uncheck to hide) |
scheme of predicted interface residues | buttons to change the scheme of the domain-domain interface residues |
- 3D Structure (right box)
- The JSmol visualization of the candidate structure. Initially, the domain-domain interface residues are shown in wireframe format, while other residues in cartoon format. As with the amino acid sequence, the domain linker residues are indicated with deep color. By placing the mouse pointer over one of the interface residues or the domain linker residues in the amino acid sequence, the corresponding residue in this figure turns white to show its position in the structure.
- Results of the docking simulation (below the 3D structure)
- Numerical results of the docking simulation are shown here. Click the [Download PDB] button to download a PDB file of the above structure.
multi-domain protein composed of more than 2 domains
If the query protein is composed of more than two domains, the 3D structure is predicted as follows.
- 3D structures of franking two domains (domain1-domain2, domain2-domain3 and so on) are predicted using DINE score.
- Decoy structures are constructed by superposing domain-n of predicted domain-(n-1)-domain-n and domain-n-domain-(n+1) structures.
- The decoy structures without any collisions of all domains are sorted by harmonic mean of the DINE scores of domain1-domain2, domain2-domain3 and so on.
Using the pull-down menu on the top left, one of ten most plausible candidate structures can be shown.
- Amino acid sequence (top left box)
- The amino acid sequence of the candidate structure. By placing the mouse pointer over one of these residues, the residue name and the residue number are displayed in the “interaction residue” field below. Numerical data about the domain-domain interface residues is also shown in the legend.
- List of the domains (bottom left table)
- List of the input domains.
Field name |
Description |
domain number | domain number of each query |
residue number | range of the residue number of the domain and the linkers |
show | switch to show/hide the domain (uncheck to hide) |
scheme of predicted interface residues | buttons to change the scheme of the domain-domain interface residues |
- 3D Structure (right box)
- The JSmol visualization of the candidate structure. Initially, the domain-domain interface residues are shown in wireframe format, while other residues in cartoon format. As with the amino acid sequence, the domain linker residues are indicated with deep color. By placing the mouse pointer over one of the interface residues or the domain linker residues in the amino acid sequence, the corresponding residue in this figure turns white to show its position in the structure.Click the [Download PDB] button to download a PDB file of the above structure.
How to use
- BLAST search
Search for homologous sequences of the query sequence by BLAST.
- Select a data format of query protein in the [Input] field. PDB ID, file in the PDB format, amino acid sequence in the FASTA format and file of amino acid sequence in the FASTA format are accepted.
- PDB ID
- Input PDB ID of the query protein in lowercase. If the PDB ID is found in our database, you can select a chain ID from all chain IDs in the PDB entry.
- PDB format file
- Select the 3D structure file of the query protein in the PDB format.
- FASTA format file
- Select the amino acid sequence file of the query protein in the FASTA format.
- FASTA format text
- Input the amino acid sequence of query protein in the FASTA format.
- Modify following BLAST parameters in the [BLAST parameters] field if needed.
- Database
- Database name searched by BLAST
- E-value
- Expectation value threshold
- Click [BLAST] to run a BLAST search.
Command: blastp -query (amino acid sequence of query) -db (database name) -evalue (Expectation value threshold)
- Homologous sequences to query data
Select sequences from the BLAST results and then submit them to the evolutionary trace analysis. In the evolutionary trace analysis, a phylogenetic tree is constructed from the multiple alignment of those sequences by the NJ method, and grouping of sequences is performed on the tree.
- BLAST results are shown in the table below (click each ID to view the pairwise alignment of query and subject sequences). Sequences satisfying default conditions for sequence identity (≥20%) and coverage (≥90%) are shown in red and selected with checkboxes initially. Selections can be changed by modifying the conditions on the drop-down lists or check/uncheck each checkbox.
- Modify a condition for the evolutionary trace analysis if needed.
- Multiple alignment method
- Program for multiple alignment. ClustalW2 or MAFFT can be used.
Command-line options for clustalW2: -align -infile -outfile
Command-line options for MAFFT: --auto --clustalout
- Click [Exec trace] to perform evolutionary trace analysis.
Results of evtrace
Results of the evolutionary trace analysis are shown.
Following parameters for sequence grouping and clustering of trace residues can be changed here.
- Tident
- Cutoff value of identity for sequence grouping. Click a radio button to change the results to those under the corresponding cutoff condition.
- Tdist
- Distance threshold for nearest-neighbor clustering of trace residues. Two clusters are merged if they are closer than this threshold. Click [Change] after entering a value to update results.
- Tresnum
- Number threshold for nearest-neighbor clustering of trace residues. Clusters with the number of residues less than this threshold are ignored. Click [Change] after entering a value to update results.
Description of the result
Following results are shown on each tab.
- [Sequence group]
- List of sequences with their group ID. All sequences selected in the previous page are grouped using the phylogenetic tree shown in the [Tree] tab. The pairwise alignment of query and subject sequences is shown when the ID is clicked.
- [Multiple alignment (group)]
- Summary of the multiple alignment obtained by the selected program. Only consensus sequences for each group and the query sequence are shown here. The invariant and class-specific trace resides are shown in pink and gray, respectively. Click [Download raw data] to download the file of the result.
- [Multiple alignment (sequence)]
- Multiple alignment obtained by the selected program. All sequences and the query sequence are shown here. The invariant and class-specific trace resides are shown in pink and gray, respectively. Click [Download the multiple alignment] to download the file of the result.
- [Trace residues (TR)]
- Invariant (pink) and class-specific (gray) trace residues on the amino acid sequence and 3D structure.
- [Clustering TRs]
- Spatial clusters of trace residues on the amino acid sequence and 3D structure. If the trace residue is located at internal position (the surface accessibility < 0.1), the trace residue is not used in the clustering.
- [Tree]
- Phylogenetic trees of the amino acid sequences.