Help file for CAPIH

CAPIH aims to provide HIV and AIDS researchers a comparative view of host-HIV-1 protein interactions between human and three major animal models: chimpanzee, rhesus macaque, and mouse. The interface demonstrates that a considerable number of differences exist between human and the model organisms, including amino acid substitutions, insertions/deletions, and post-translational modifications (PTMs). The interface provides interactive protein-protein interaction (PPI) graphs and comparisons of human-chimpanzee-macaque-mouse orthologous proteins. The users can find potential missing proteins in the PPI networks, and identify genetic changes that are associated with host-HIV-1 PPIs by inspecting the multiple sequence alignments accompanying the PPI graphs.

Part I. System Requirements

Windows operating system 2000 or above, or Macintosh OSx 10.4 or above is required. CAPIH has been tested on Internet Explorer 6.0, FireFox 3.0 and Safari 3. Java Runtime Environment is needed to view the PPI graphs, multiple sequence alignments and the identified genetic changes, and the feature settings of the alignment viewer. Java can be obtained from

Part II. Query Schemes

The user can search for the protein of interest by six query schemes. Alternatively, the user can also choose to search by "All", which includes all six query schemes. CAPIH provides an auto-complete function for each query scheme but NOT for the "All" queries. However, if the users cannot find any result from a specific query, they are advised to use the "All"query scheme for more results, and then perform a refine search when necessary.

1.   Gene/Protein accession number:

CAPIH integrates eight sequence accession numbering systems, as listed below:

Accession System


Ensembl protein


Ensembl gene


Ensemble transcript


RefSeq protein


RefSeq transcript



HGNC:16286 (2~5 digits)





Q8WXG5 ("Q" accessions may include both numbers and alphabets)



The users can also refine their searches by querying again at "Refine your search", or start a new search by returning to the Homepage.

2.   Gene description

The user can also search for any keywords that appear in the gene descriptions. Examples include "tumor","cyclin", "transporter", "cytokine", etc. If the keyword search returns a large number of entries, the user will have to refine the search using another keyword.

3.   Gene ontology description

This query scheme is designed for users who are interested in particular cellular functions, biological processes, or cellular compartments. For example, the users may search for "cell cycle","immune response", "viral", and so on. Please refer to the Gene Ontology web site for a complete list of GO terms.

4.   Protein domain

If the users are interested in finding a specific protein domain that may be involved in host-HIV-1 interactions, they may use the "Protein domain" query scheme. Examples include "PDZ", "zinc finger", "forkhead", "catalytic", etc. Protein domain names can be found at domain prediction databases. Some of them are given below:




5.   Tissue

This query scheme allows the users to focus on proteins that are expressed in specific tissues. For example, the users may be interested in the proteins that function in brain or liver or testis/ovary. The tissues included in the CAPIH database can be found Tissue_list.

Part III. Results

1.   Result gene list

An example of the searching result is given below. The user will obtain a list of genes that match the query keyword. In this example, the keyword is "apoptosis" and the query scheme is "Search by All". A list of 177 genes returns.

The user can then refine the search by entering a second keyword. For example, if we submit a second keyword "caspase", the interface returns only 6 entries.

The user can then click on the "[+]" sign in front of the gene of interest, and a new window will pop up with detailed information. The results include three parts: summary of lineage-specific genetic changes, multiple sequence alignments with feature settings, and an interactive PPI graph.

2.   Summary of genetic changes

For example, if the user clicks on the gene "CASP6". A radar graph can be seen as below:

The radar diagrams show the distributions of species-specific genetic changes in different genomic regions. Blue, green, and purple color, respectively, represents the number of substitutions, post-translational modifications, and indels.

AA: amino acid

IPR: Interpro-predicted protein domain

CDS: coding sequnces

3UTR:3' untranslated region

5UTR:5' untranslated region

3.   Multiple sequence alignment and feature settings

By default, the CAPIH interface shows the amino acid sequence alignment with all the features being color-shaded. These include all the protein domains and species-specific genetic changes.

For a clearer view, the user can choose from the "feature settings" box to show only some of the features.

Of course, the user may also choose to view the nucleotide sequence or UTR alignments.
Note that the "Conservation" section indicates the conservation of physiochemical properties at each position, whereas the "Quality" section refers to the BLOSUM62 score. The "Consensus" section shows the most common amino acid residue of each position with the corresponding percentage. For more information, please refer to the JalView website at

4.   PPI graph

This is an interactive graph, with circles and lines representing proteins and interactions, respectively. Solid lines are human-HIV-1 PPIs, while dashed lines represent human PPIs. White, blue, and green circles represent the target protein, the HIV-1 protein(s), and the non-target human proteins that interact with the target. The semicircles around each human protein indicate the presence of orthologues in chimpanzee (grey), rhesus macaque (orange), and mouse (pink). The user can use the left mouse click to drag and relocate the proteins nodes for clearer viewers. The users can also obtain more information of the proteins and PPIs from the upper-right panel by clicking on the circles and triangles, respectively. For further information, the user can click on the hyperlinks provided in this panel. Note that the PPIs are classified into seven different groups, which are represented by different colors as shown in the bottom-right panel. The graph can be zoomed in/out by holding and moving the right mouse click. The entire graph can also be moved along by holding and moving the left mouse click.

Part IV. Additional Information

1.   Protein Table and Missing Proteins

The human proteins known to interact with HIV-1 proteins are listed in the Protein Table. Note that some proteins that are listed in the HIV-1, Human Protein Interaction Database are not included in the Protein Table. These missing proteins may have no Ensembl counterparts, be redundant, or have no orthologues in any of the three compared species. These proteins can be found in the Missing Protein table.

2.   ID cross reference

This table gives ID cross references for the proteins listed in the Protein Table to eight different accession systems: Ensembl protein/gene/transcript, RefSeq protein/transcript, HUGO, Uniprot, and UCSC gene. All the accessions are hyperlinked to their original database with more detailed information.

3.   Download

The contents of CAPIH are downloadable. The "Download" hyperlink allows the users to retrieve most of the CAPIH metadata in simple text for further analyses. These including PPI data, ID cross reference table, species-specific PTMs/indels/substitutions, and experimentally verified PTMs.

4.   Statistics

This section gives the users an overview of this database, including the distribution of species-specific substitutions/indels, the numbers of species-specific PTMs, and the numbers of protein domains affected by species-specific PTMs.

Part V. Examples

Here we take the well known group of host restriction factors for example to illustrate how CAPIH can be useful for HIV-related studies.

Querying CAPIH by the keyword "APOBEC" returns 7 entries, including AICDA (activation-induced cytidine deaminase) and six proteins of the APOBEC3 family (A, B, C, D, F, and G), as shown below.

Among the seven proteins, only AICDA has the mouse orthologue whereas all of the six APOBEC3 members are found only in primates, which is consistent with the observation that the family has expanded in primates[1]. Since the APOBEC3 proteins are known to be involved in host defense against retroviruses, these proteins have undergone substantial changes because of positive selection [2,3]. This is a good example of remarkably different host factors even between very closely related species such as human and chimpanzee. Indeed, we found a considerable number of genetic changes between the human-chimpanzee orthologues. As shown below, the numbers of amino acid substitutions, indels, and potential PTM alterations are amazingly large. Recall that the majority of the other human-chimpanzee orthologous proteins are almost identical, with an average of two amino acid substitutions each orthologous pair [4].






























However, it is noteworthy that both human APOBEC 3F (ENSP00000379976) and 3G (ENSP00000263247) are assigned the same chimpanzee orthologue (ENSPTRP00000024783), with a much higher sequence identity for the 3F than the 3G orthologous pair. Similarly, human APOBEC 3C/3D (ENSP00000355340 / ENSP00000370983) also share the same chimpanzee orthologue (ENSPTRP00000024782). In addition, in the current version, human APOBEC 3D does not have the cytidine deaminase domain ("CDA"; IPR016193), which is the major determinant of antiviral activity and virus specificity. Therefore, to be conservative, we may narrow our analysis to 3A, 3B, 3C, and 3F.

Notably, the CDA domain is considered responsible for the host-retrovirus PPIs and the host-range specificities of retroviruses. Therefore, genetic changes that occur in this protein domain have the potential of changing host-virus PPIs, and therefore, the host susceptibility to retroviral diseases. It is evident that the APOBEC3 members have experienced very different evolutionary paths in this domain. As shown below, 3B and 3C have obviously diverged more than the other two both in turns of the number of amino acid substitutions and the number of potential PTM changes. In contrast, 3A has been relatively conserved. It is therefore speculated that APOBEC 3B/3C may have played an important role in the divergence of hominoid immune responses against retroviruses. Nevertheless, the changes in 3A and 3F, though not as drastic, can also have functional effects. Functional studies are required to confirm the biological implications of these changes. Also noteworthy is that no indels are found in CDA in all four proteins. This indicates that indels are subject to strong negative selection in this domain even if the substitution rate is accelerated.






A.a substitution















For any other questions, please email to


1.   Chiu YL, Greene WC: APOBEC3G: an intracellular centurion. Philosophical transactions of the Royal Society of London 2009,364(1517):689-703.

2.   Sawyer SL, Emerman M, Malik HS: Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G. PLoS biology 2004, 2(9):E275.

3.   Zhang J, Webb DM: Rapid evolution of primate antiviral enzyme APOBEC3G. Human molecular genetics 2004, 13 (16):1785-1791.

4.   Consortium TICSaA: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005, 437(7055):69-87.