2) For Anti-Hiv And Anti-Cancer Cocktail Detection
skip to: page content | site navigation | section menu
Journal of Medical Chemical, Biological and Radiological Defense
J Med CBR Def  |  Volume 7, 2009
Submitted 22 February 2009 |  Accepted 3 June 2009  |  Revised 22 June 2009 | Published 4 August 2009

Utilizing The Quantum Intelligence System For Drug Discovery (QIS D2) For Anti-HIV And Anti-Cancer Cocktail Detection

Ying Zhao, Sherry Wei, Ian Oglesby and Charles Zhou

Quantum Intelligence, Inc.

3375 Scott Blvd, Suite 100

Santa Clara, CA 95054


Suggested citation: Zhao, Y.; Wei, S.; Oglesby, I.; Zhou, C. (2009), “Utilizing The Quantum Intelligence System For Drug Discovery (QIS D2) For Anti-HIV And Anti-Cancer Cocktail Detection”, JMedCBR 7, 7 July 2009, http://www.jmedcbr.org/issue0701/Zhou/Zhou_07_09.html.




Screening for a chemical’s affinity with biological targets and/or molecular targets (e.g., proteins and genes) is critical in a typical drug development process. The most common method for drug-target affinity screening is to computationally “dock” small molecules or ligands into a molecular target. For example, the interactions between the countermeasure human acetylcholinesterase (HuAChE) and the nerve agent VX have been studied using a method similar to docking. However, this docking method has its shortcomings, complicated by insufficient 3D structure data for certain protein targets and the existence of different drug-protein binding mechanisms outside of the prototypical “lock and key” model. The Quantum Intelligence System for Drug Discovery, QIS D2, is a screening system designed to predict and extrapolate the affinity between a chemical and its molecular target using samples of experimental data. Previously, we showed that the QIS D2 system is capable of handling ~40,000 chemicals, performing automatic sequence clustering using about ~1200 structure fragments, and accurately predicting the chemicals’ impacts on ~60 efficacy targets, ~500 toxicity targets, ~11,000 gene targets, ~200 molecular targets, and ~60 pathway targets. Here, we extended the application of the QIS D2 system to examine its capabilities in screening two FDA approved drug cocktails, anti-HIV and anti-cancer, and compared our results to publicly available data. Our findings indicate that the QIS D2 screening results are consistent with the recommended combined treatments in the anti-HIV case and with publications of clinical studies in the anti-cancer case. Furthermore, we believe that the QIS D2 system can be expanded to screen for the efficacy of drug cocktails against biothreat agents, such as Arenaviruses (Junin, Lassa Fever), Bunyaviruses (Rift Valley Fever), Filoviruses (Ebola and Marburg), Poxvirus (Smallpox), B. anthraci (Anthrax), Y. Pestis (Plague), F. tularensis, C. burnetii, Burkholderia mallei (Glanders), B. pseudomallei, Brucella species (Brucellosis), B. melitenis, and B. suis, etc. The potential to use the QIS D2 system to select a list of drugs or drug cocktails against these biothreat agents is also discussed within.



Medical countermeasures against both known and unknown pathogens and infectious diseases are discovered through a traditional drug development life-cycle. The phases of this cycle include: initial candidate screening, preclinical discovery, large multi-center clinical trials, and post-marketing safety surveillance.

A typical drug discovery process starts with a virtual screening (in silico) of therapeutic efficacy, with a specific biological target matched against a large chemical library. A chemical library may contain the 3D structures of millions of chemicals and/or their fragments. The biological targets are also screened against so called “drug-able” or “drug-like” properties such as pharmacologic absorption, distribution, metabolism, excretion, and toxicity (ADME/tox). The in silico screening of efficacy and ADME/tox properties has been intensively studied for drug discovery [Hert, 2004; Cavasotto et al., 2004]. Potentially effective drugs identified in silico can then be validated in vitro. Such a drug discovery process typically takes more than ten years before FDA issues a license.

More recently, screening a chemical’s affinity with biological targets and/or with molecular targets (e.g., proteins and genes) has become critical to drug discovery. When the structure of a target is known (usually from X-ray crystallography or homology modeling), the most common drug-target affinity screening process is computationally “docking” small molecules or ligands into that target. The current state-of-the-art docking programs include GOLD [Jones et al.,1997], FlexX [Rarey et al, 1996], Glide [Schrodinger, 2003], ICM [Totrov et al.,1997] and EUDOC [Pang et al, 2001]. The GOLD (Genetic Optimization for Ligand Docking) program uses a genetic algorithm (GA) to explore the full structural fit. FlexX predicts the geometry of the protein-ligand complex and estimates the binding affinity. Particularly, FlexX is used to screen for the inhibitors of human aldose reductase [Kraemer et al, 2004]. The Glide (Grid-Based Ligand Docking with Energetics) algorithm represents the shape and properties of the receptor on a grid that provides progressively more accurate scoring of the ligand pose. The Internal Coordinate Mechanics (ICM) program accounts for protein flexibility when docking ligands to protein kinases [Cavasotto, 2004] and is based on a stochastic algorithm that relies on global optimization of the entire flexible ligand in the receptor field (flexible ligand/grid receptor approach). EUDOC [Pang et al, 2001] uses the fast affine transformation to translate and rotate a ligand in the putative binding pocket of a receptor to search for energetically favorable orientations and positions. EUDOC is used in virtual screening for farnesyltransferase inhibitor leads [Perola et al 2000; Perola et al 2004]. Structure-activity relationship modeling (SAR modeling) has been extensively studied since the1970s. Chemical fingerprint-based methods have also been used for virtual screening [Hert, 2004], where chemical intermolecular structure similarity is defined using a Tanimoto Coefficient [Miller,2002]. Homology deals with prediction of ligand-binding function for several unannotated sequences and prediction of specific residue–ligand contacts in proteins without solved structures. The Homology module in Insight II [Perola et al 2004] builds a three-dimensional model of a protein using both its amino acid sequence and the structures of known, related proteins. Systematic discovery of multi-component therapeutics has gained momentum in recent years. Cell-based high-throughput screening (cHT) is typically used in identifying effective combinations of therapeutic compounds [Borisy et al,2003].

The success of these molecular docking programs depends on detailed understanding of the target protein 3D structures, including catalytic sites and protein geometry. These docking programs usually search for the optimal combination of target protein shape, orientation, and conformation (pose) for ligand “fit” based on the lock-and-key model. The lock-and-key model describes the behavior of drug-protein interaction by assuming that the drug molecule acts like a key which fits into the binding site of a target protein that acts as a lock or pocket.. [Perola et al 2004].

As an example, the interactions between the nerve agent VX{O-ethyl S-[2-(diisopropylamino)ethyl] methyl-phosphonothioate} and its countermeasure human acetylcholinesterase (HuAChE) have been studied extensively by X-ray crystallography, site directed mutagenesis, and kinetic studies of the AChE mutants with selective covalent and noncovalent ligands. This example involves SAR, lock-and-key and docking approaches discussed above, where the site-directed mutagenesis of AChE and enzyme kinetics reveal an acyl pocket residue Phe295, and to a lesser extent Phe297, that determine the specificity toward acylating substrates and phosphylating agents, i.e., the nerve agent VX [Ordentlich 2005].

However, the current collection of molecular docking applications is limited to the lock-and-key type of drug-protein interaction. Firstly, a 3D structure of a protein target is not always available for the docking program. Secondly, the mechanisms of drug-protein binding are not limited to the lock-and-key type of fitting. Other forms of drug-protein interactions, such as novel binding mechanisms, are much less likely to be discovered when using the current docking algorithms.

Quantum Intelligence System for Drug Discovery, QIS D2, is a screening system developed from a DARPA biodefense Bio-SPICE (Biological Simulation Program for Intra- and Inter-Cellular Evaluation) project via Small Business Innovation Research (SBIR) funds. QIS D2 predicts and extrapolates the affinity between a chemical and a molecular target using sample experimental data, taking into full consideration the similarities between chemicals and molecular targets and expanding drug discovery beyond molecular docking of lock-and-key interactions. QIS D2 includes the following capabilities for drug discovery:

  • It is a statistical machine learning system which sifts through experimental data (training data set) and looks for the patterns and correlations that link known information such as chemical structures to unknown information such as drug characteristics of efficacy, toxicity, and affinities with molecular targets.
  • It uses the discovered patterns to predict a large number of drug characteristics. QIS D2 includes a tied-mixture Expectation and Maximization (EM) method that is especially capable of predicting large-scale targets. A validation data set is used to confirm the predictive accuracy.
  • It computes the sensitivity of drug characteristics with respect to the known attributes; the sensitivity can be used to optimize future drug design.

In summary, the QIS D2 system can be successfully trained, tested and validated on evidence data sets (either experimental or logical) for predicting the potential in vitro or in vivo effects of drug molecules in biological systems. We developed a data model for the QIS D2 system that integrates diversified information (e.g. chemical structures, toxicity, efficacy and association) with molecular targets (e.g. genes or proteins), and affinity scores along biological pathways. We developed several technologies, such as pathway scoring, large-scale tied-mixture (EM) prediction and Context-Concept-Cluster analysis to enhance the system’s predictive capabilities. The QIS D2 system is scaled up to handle ~40,000 chemicals, perform an automatic sequence clustering using ~1200 structure fragments, and accurately predict ~60 efficacy targets, ~500 toxicity targets, ~11,000 gene targets, ~200 molecular targets and ~60 pathway targets [Zhao, et al. 2005a, 2005b, 2006].

In this paper, we used publicly available data to examine our system’s efficacy in drug-discovery of cocktails for FDA approved drugs. In particular, we applied QIS D2 to screen for anti-HIV and anti-cancer cocktails. Our screening results are consistent with the recommended combined treatments by FDA in the anti-HIV example and with publications of clinical studies in the anti-cancer example.

Compared to the previous approaches, the drug-target interactions in our system are not limited to the lock-to-key type. Instead, these interactions can be directly learned and discovered by cross-examining and mining the public and other sources of evidence/data.



The structural information for the chemical used in QIS D2 is first generated using a proprietary method. This method is similar to other chemical fragment based methods used in the structure-activity relationship (SAR) modeling [Hert, 2004], except the fragments are ordered in sequence. The input to the coding method is a corina [Miller, 2002] fragment is a structural descriptor. As an example, Figure 1 shows compound NCI structure number NSC 89, which can be represented as an ordered sequence of fourteen fragment descriptors. In other words, NSC 89 has a sequence element S1,…., S14, while S1=C:__C-1__C-1__O-2, etc.

In this first case study, we described how we utilized QIS D2 to screen anti-HIV drug cocktails by combining the structure, efficacy and toxicity data of two drugs. Using this information and our QIS D2 system, we predicted the efficacy of the anti-HIV drug cocktails for HIV/AIDS treatment. Then we compared the results from our system with the recommended treatments by FDA.

The steps for using the system are summarized as follows:

Step 1: Build the drug cocktail database

We looked for chemicals with anti-HIV activities from the following four databases:

  1. "FDA Approved Anti-HIV Drugs" from "NIAID HIV/OI Searchable Chemical Database”[NIAID]. We used 14 drugs as shown in Table 1.
  2. "Combination Therapy - a Simple Fact Sheet from the AIDS Treatment Data Network"[ATDN]
  3. The AIDS Antiviral Screen from the NCI/NIH’s Devleopmental Therapeutics Program. The "aids_ec50_may04.txt" file contains 39,365 chemicals from the NCI DTP website [NCI DTP], which includes the concentrations necessary to see a protective effect on the AIDS virus infected cells. The protective effect is measured by 50% of such cell survival – EC50. We use EC50 as the efficacy measure.
  4. The "aids_ic50_may04.txt" file is a file of 39,350 chemicals from the same NCI website[NCI DTP] which contains the concentrations necessary to inhibit the 50% growth of uninfected cells (IC50). Here, we use IC50 as the toxicity measure.

Among the four sources, we selected 194 chemicals that show high efficacy for anti-HIV properties, either approved by FDA or with antiviral experimental evidence, i.e., low EC50 and high IC50.

The chemical structure data are from NCI Open Compounds Database (http://cactus.nci.nih.gov/ncidb2/download.html). This database contains ~260,000 compound structures in 2D/3D with canonical properties as of 2003. We used this database to build the structure fragments for the 194 chemicals. These chemicals consist of a total of 152 structure fragments.

Step 2: Make drug cocktail A+B

From the 194 chemicals, there are total 18,721 possible pair-wise cocktails

  • Assume Drug A has a structure sequence S1a,S2a,…and Drug B has a structure sequence S1b,S2b,…. For Cocktail A+B, then the structure value would be S1a + S1b, S2a + S2b,….
  • Efficacy is computed according to the Bliss additivism model [Berenbaum 2003]. The combined response C for two single compounds with effects A and B is C =A +B – A*B, where each effect is expressed as fractional inhibition between 0 and 1. When two drugs work in a similar mechanism, the subtraction part takes out the overlapping area in terms of probability of inhibition [Borisy, et al. 2003]. We labeled Drug A’s efficacy measure as Ea and Drug B’s efficacy measure as Eb. We then normalized the efficacy measure of EC50 and transformed it into a probability of inhibition of the virus for Drug A. For Cocktail A+B, the efficacy measure would be Ea+Eb-Ea*Eb.
  • We labeled Drug A’s toxicity measure as Ta and Drug B’s toxicity measure as Tb. Likewise, based on a toxicity measure of IC50, we normalized the drugs’ toxicity measure and transform it into a probability of inhibition of growth of the normal cells. For Cocktail A+B, the toxicity measure would be Ta+Tb-Ta*Tb

Step 3: Apply QIS D2

The list of 18,721 chemical combinations is entered into the QIS D2 system to screen for the best cocktails with maximum efficacy and minimum toxicity as defined above. Here is the outline of how to use QIS D2:


1. Divide the data into training and test sets

For the 18,721 available cocktail combinations, we divided them into two groups, each containing half of the total data.

Training and test sets are divided randomly. The training set includes 9,361 cocktails and is used to discover the predictive patterns and correlations between the cocktails’ efficacies and their structures. The test set includes 9,360 cocktails and is held-out initially and used later to validate the discovered patterns.


2. Cluster the training cocktails

The next step is to cluster the cocktails in the training data based on their structures. Before clustering the cocktails, we first grouped the 194 single drugs into 20 clusters. A chemical’s structure information is decomposed into a sequence using a clustering method called “Context-Concept-Cluster search/match”, or CCC search/match. The method employs statistical context patterns for extracting contexts, concepts, and clusters from training samples. This method is related to Latent Semantic Analysis (LSA) for information indexing, search, and retrieval [Letsche et al 1997; Dumais et al 1988]. Two drugs are grouped into the same cluster when their structures, reflected in the structure sequences, are similar based on the algorithm. Table 1 shows the clustering results for the 14 FDA approved anti-HIV drugs that are included in the 194 chemicals. Table 1 also includes the categorizations of action mechanisms, e.g., nucleoside/tide reverse transcriptase inhibitors and protease inhibitors of the 14 drugs obtained from their published label information. Among the 20 clusters, only Clusters 1, 2, 3, 11, 14 and 15 contain the 14 drugs in Table 1. Clusters 4, 5, 6, 7, 8, 9, 10, 13, 16, 18, 19 and 20 do not include any of these drugs; they contain the rest of 194 chemicals. The structural similarity between two chemicals may contribute to the similarity of action mechanisms. This relationship is reflected in Table 1 where the clustering results are consistent with their names, typically derived from their structures and action mechanisms. The data shown in Table 1 is a validation of the clustering algorithm.


3. Generate cocktail clusters

We applied the same clustering algorithm to group the training set of the 18,721 cocktails into 20 clusters. The QIS D2 system sorts the 20 clusters by calculating the percentage of cocktails contained in each cluster that have a combined efficacy above a threshold. In other words, the clusters that have higher percentages of cocktails with high efficacies are listed before the ones with lower percentages and low efficacies. The sorted clusters of cocktails are, in order of high to low efficacy, Cluster 8, 15, 17, 1, 0, 19, 18, 12, 11, 3, 13, 7, 4, 14, 10, 2, 5, 16, 6, 19. These results suggest that Cluster 8 has the highest number of cocktail combinations among all the clusters, with 98 out of 187 cocktails at high efficacy. The percentage of high efficacy cocktails in all the clusters in the training data is used as a prediction for the test data or future data when the efficacy is not known. Since the sorted clusters or efficacy predictions are generated from the training set, a chart, called train gains chart, is generated as shown in Figure 2.

In Figure 2 of a train gains chart, the x-axis is the cumulative percentage of total candidate cocktails for the sorted clusters for the training data. The y-axis is the cumulative percentage of the high efficacy cocktails (hits) for the sorted clusters. Three x-y relations are shown in Figure 2. The blue curve (second from top) is for the screening generated by the QIS D2 system, the red line (third from top) for random screening and the green line (top) for the perfect screening. The straight red line means if a random screening method were used, any percentage of 9,361 cocktail candidates would contain the same percentage of candidates that have the high efficacy. The green line represents the upper bound for any screening system in this example: since the training cocktail set contains 2,443(26%) cocktail candidates that have high efficacy, according to the formula and the threshold, if a perfect screening existed, it would screen out all 26% of the candidates. There would be two clusters: all the high efficacy candidates would be in the first cluster spanning from 0% to 26% in the x-axis as represented in the first segment of the straight green line; the rest of the candidates with low efficacy would be in the second cluster as represented in the second segment of the straight green line. A practical screening system such as the QIS D2 system usually generates a blue curve, i.e. a gains chart between the red and green line. For instance, at the point of Cluster 17, the x-axis value for the blue curve represents 936 numbers of candidates accumulatively from Cluster 8, 15, and 17, or 10% of the total 9,361 candidates. The y-axis value at this same point represents 440 numbers of high efficacy candidates, or 18% of the total 2,443 high efficacy ones. The yellow line is a lift curve which computes the ratio between the blue curve and the red reference. The lift curve is linked to the concept of Return of Investment (ROI) in drug discovery. If there was unlimited resource for experiments in drug discovery, in order to hit all potential high efficacy cocktails, one should study and test on all the candidates. However, if the resource is limited, a good screening method would help to identify the candidates that are highly likely to be effective in the end; therefore, experiments can only be done on these selected candidates. For example, Cluster 8 has a lift of 2.5, which indicates that a $1 investment would yield $2.5 in return. However, if one works on all the candidates without initial screening, a $1 investment would result in $1 return. ROI for QIS D2 would be high when experiments are focused on the top sorted clusters. Likewise, ROI would decrease when experiments included more and more clusters.


4. Validate clusters using the test set

After the training phase, the cluster models from the QIS D2 system were used to validate the test set with 9,360 cocktails. Using the same parameters generated from the training set, each test set cocktail candidate is also distributed into respective clusters. A test gains chart is then drawn as shown in Figure 3. The clusters of candidates for the test set are sorted in the same order as the training set. For the purpose of validation, a test gains chart is generated as the follows: the x-axis is the cumulative percentage of total 9,360 test candidate cocktails for the clusters sorted in the same order as the training clusters. The y-axis is the cumulative percentage of the high efficacy cocktails (hits) in the test set for the clusters.

If the QIS D2 system captures the chemical structure cluster patterns and their relations to the probability of having high efficacy candidates accurately, the training and test gains charts should look similar to each other. This is indeed the case in Figure 2 and 3. Consequently, the test gains chart validates the accuracy and efficiency of the QIS D2 system.

In reality, if we do not know the actual cocktail efficacy for a test candidate, we will use a predicted efficacy by assigning the percentage of high efficacy candidates for the cluster it belongs to along with the training set.



According to the "Combination Therapy - a Simple Fact Sheet from the AIDS Treatment Data Network”[ATDN] .The recommended therapeutic combinations are determined from using one drug or a combination of drugs from Column A and a combination of drugs from Column B as shown in Table 2. These recommendations have already been thoroughly studied in clinical trials and serve as a real-life validation of QIS D2. In other words, when the QIS D2 system predicts cocktails that are consistent with the clinical results, the evidence validates the model and assumptions that go into the design of the QIS D2 system. The red numbers in the brackets of Table 2 are the cluster IDs for single drugs shown in Table 1. The cluster numbers are only the labels for identification of the groupings. The total number of recommended pair-wise combinations by the fact sheet is 48. This number includes all the combinations in Column A and one drug from Column A combined with one from Column B, while excluding the ones whose structures are not available (marked with NA) and the ones not recommended to mix with others. Our results show that the top cocktail clusters, Cluster 8, 15, 17, 1, 0, 19, and 18, contain 46 (96%) of the 48 total recommended combinations (the shaded region). This 96% indicates high accuracy of the QIS D2 system. Besides being accurate, our system identifies a total of 4,867 cocktail in the top seven clusters that are also likely to be high efficacy combinations in real-life, which is 26% of total possible combinations. The 48 recommended cocktails in the fact sheets represent only a small fraction of the seven clusters. The results from our system’s analysis indicate there is a great opportunity to discover novel cocktails in these top seven clusters that are not included in the fact sheets.




Here within we applied the QIS D2 system to search for anti-cancer drug cocktails. Similar to the methodology utilized in the first case study, we combined the structure, efficacy, and toxicity data of two drugs in predicting cancer drugs for breast cancer treatment. Then we compared the results from our tool with publications of clinical studies. The steps are summarized as follows:

Step 1: Build the drug cocktail database

We extracted 143 chemicals with a total of 114 structure fragments as input into our QIS D2 system from three databases: EPA Fathead Minnow Acute Toxicity Database (EPAFHM), FDA Maximum (Recommended) Daily Dose Database (FDAMDD), and NCTR Estrogen Receptor Binding Database (NCTRER).


Step 2: Make drug cocktail A+B

From the 143 chemicals, there are a total of 10,153 possible cocktail combinations. For a single drug efficacy measure, we used the NCI anti-cancer databases, which include evidence of anti-cancer efficacy measures for 41,000 compounds that at the appropriate concentrations inhibit growth by 50% in sixty different cancer cell lines. The toxicity data is from Registry of Toxic Effects of Chemical Substances (RTECS). About 500 toxic unique effects across a wide range of categories, including primary irritation, mutagenic effects, reproductive effects, and tumorigenic effects, have been collected by the National Institute for Occupational Safety and Health (NIOSH) from the 1970’s for 150,000 chemicals. We also collected up to ~11,000 molecular targets (genes and proteins) from the NCI anti-cancer database. The association of a drug with a molecular target is defined as its correlation along the sixty cancer lines [Scherf, 2000]. A cocktail A+B, its structure, efficacy, and toxicity are defined similar to the methods aforementioned in the anti-HIV case study.


Step 3: Apply QIS D2

A total of 10,153 cocktails are analyzed in the QIS D2 system using a process similar to the anti-HIV case study, where a training and test data sets are randomly divided amongst the 10,153 cocktails. The structure clusters are generated and sorted according to the training data. The training and test gains charts are also similar with the reasons given in the previous case study. Therefore, we only use the test gains chart shown in Figure 4 for the analysis below.


The predicted high efficacy clusters from QIS D2 are sorted as Cluster 15, 16, 1, 3, etc. in Figure 4. The results are validated in two breast cancer treatment studies as shown here. The four drugs in the studies are shown in Table 3.

The clinical studies [von Minckwitz et al., 2005; Nabholtz et al., 2003] suggest AT (doxorubicin and docetaxel) significantly improves the clinical efficacy measures TTP (Time to progression, primary end point) and ORR (overall response rate) compared with AC (doxorubicin and cyclophosphamide) in patients with MBC (metastatic breast cancer). A recent clinical study [Martin, 2005] compares TAC ( docetaxel plus doxorubicin and cyclophosphamide) versus FAC (fluorouracil plus doxorubicin and cyclophosphamide) and concludes that TAC significantly improves the rates of disease-free and overall survival among women with operable node-positive breast cancer.

AT and TAC are included in Cluster 15 and 3, which are high efficacy clusters from the QIS D2 system. AC and FAC are included in Cluster 14 and 11, which are low efficacy clusters. Therefore, the clusters are very consistent with the clinical results. The possible effective cocktail that is not mentioned in the clinical studies is 5-Fluorouracil combined with Docetaxel in Cluster 4.



We used the QIS D2 system to screen drug cocktails for HIV and cancer treatments. The predictive nature of the QIS D2 system uses chemical structure clusters to evaluate treatments and drugs from existing databases.

The same approach and predictive modeling can be used for biothreat countermeasures. The QIS D2 system can be applied to screen the efficacies of FDA approved drug cocktails against various biothreat and emerging threat agents.

Since the mechanisms of complex drug-protein interactions are variable (e.g., mutations in the threat agent or target protein can affect the interactions), a more effective approach for generating biothreat countermeasures is to develop cocktails of known drugs having different mechanisms of action. Conventional drug development methods are limited in these applications due to their focus on the lock-and-key model for drug-protein interactions. Moreover, the traditional chemical libraries used in conventional virtual screening algorithms typically only contain a small number of active compounds that have the potential to become effective drugs. A more rapid and effective approach to targeting biowarfare and infectious disease agents is to utilize cocktails of available drugs (compounds) through intelligent mining existing information databases. By using a library of approved drugs, 100% of the compounds are biologically active. Essential information about these compounds, including pharmacologic absorption, distribution, metabolism, excretion, and toxicity (ADME/tox) are also available. This approach allows effective countermeasures to be deployed rapidly in the field without going through the lengthy safety checks involved with using unknown compounds.

FDA labels for each drug contain rich information about the drug regarding its mechanism of action, microbiology, clinical pharmacology, drug interactions, adverse effects, dosage and administration. Each drug is developed independently and specifically for certain diseases. Cross-examining and mining the data altogether could potentially result in novel candidate countermeasures by simply extrapolating the patterns learned across the board. In addition, analyzing the existing drugs takes advantages of safety tests and data that already have been obtained for humans; therefore, deployment of the drug in the field for a new indication may be expedited.

For future application of this system, the drug information from databases of FDA licensed drugs and drugs in advanced clinical development can first be compiled. The efficacy patterns across the collection of these drugs can be developed using the QIS D2 system with respect to pathogen-directed microbial targets and host and pathogen interactions. For example, sulfadiazine has been found to be effective against Burkholderia mallei infection in both animal models and human subjects. B. mallei is usually sensitive to tetracyclines, ciprofloxacin, streptomycin, novobiocin, gentamicin, imipenem, ceftrazidime, and sulfonamides. Thus, the associations of these drugs with microbial targets in B. mallei are factored into the model building. Specifically, a predictive model linking a drug’s efficacy to a pathogen microbial target, host-directed mechanism is the model can then be used to select existing drugs or cocktails as countermeasures for biothreat agents of interest, such as the intracellular pathogenic bacteria: Y. pestis, F. tublarensis, Brucella species, B. mallei, B. psedomallei, and C. burnetii. The principle for calculating the cocktail efficacy is the same as described for anti-HIV and breast cancer treatment.

Compared to other approaches as we discussed in the introduction, our approach discovers and learns efficacy patterns of therapeutic countermeasures directly from existing drugs, for which a wide collection of experimental data are available for learning and modeling and discovering patterns. Our system can analyze diversified information and generate predictive models for countermeasure against new biothreats, therefore it is flexible to model a wide range of drugs or drug targets. The ultimate objective is to achieve fast and effective selection and construction of new therapeutic countermeasures prior to experimental tests. In addition to the low cost of our system in comparison to experimental screening, QIS D2 can also be generalized to screen countermeasures against multiple pathogens, broad-spectrum pathogen-directed or host-directed therapeutics. The collection of the drugs mainly comes from FDA licensed drugs, drug candidates in advanced stages of development and also drug candidates under clinical development or in IND-enabling (Investigational New Drug) studies. The eventual goal is to use the system for the design of therapeutic protocols for biothreat agents of greater efficacy and faster implementation that lead to therapeutic countermeasures against bacterial and viral threat agents, fulfilling the mission of medical treatment of Chemical, Biological and Radiological (CBR) defense.


Using publicly available data, we presented two case studies showcasing the QIS D2 system’s ability to screen for drug cocktails used in HIV and breast cancer treatment. In the first example, the results from the QIS D2 system show that the top cocktail clusters contain 96% of the total FDA recommended anti-HIV cocktails. In the second example, the top two clusters from QIS D2 system contain the two most effective anti-cancer drug combinations from the clinical studies. These results give us confidence that our approach to clustering drugs using their structures and correlating the clusters with efficacies can guide us in the development of drug treatments for other conditions and diseases. Our system is uniquely accurate and also capable of directly learning drug-target relation patterns by cross-examining and mining the public and other sources of evidence/data, capable of discovering diversified types of drug-target relations that are not limited to the lock-and-key type interactions. The two diseases were selected because of the extensive database of drugs for each and the reasonable understanding of the disease mechanism. Our system is potentially applicable to a wide range of drug discovery applications, especially to the CBR defense where this advantage would be useful.

One of the limitations on the QIS D2 system is the requirement for as complete a dataset as possible for drugs that have been used to treat the different biothreats. Part of our success with HIV and breast cancer was possible because the FDA had large datasets of drugs used to treat the two diseases developed from published studies. The extent of the existing biothreat datasets (dose, animal model, cell target, treatment target, etc.) is much smaller than those for HIV and cancer, therefore the same strength of analysis cannot be expected. In a two-year collection and analysis effort funded under DoD (the BioAgent Fate Program), all available sources, including classified, documentation for biothreat agents were compiled and reviewed. That effort revealed that information on the fundamentals of hazard posed by biothreat agents actually was sparser than originally thought. Inconsistencies in agent preparation and dose-delivery methods, especially aerosol generation, result in much therapeutic efficacy information being qualitative nature rather than quantitative. Unlike the HIV and cancer case studies, there is almost no human data on most of the biothreat agents as delivered by the route of hostile intent, e.g., inhalation anthrax is the predominant concern from hostile use and yet the natural occurrence of anthrax is predominantly by ingestion. The development of animal models for most biothreat diseases is extremely challenging and not at all complete. Despite these limitations and challenges, using QIS D2, starting from approved FDA drug cocktails, can help to guide further drug and treatment modalities in biodefense.


This project is partially supported by DARPA contract #W31P4Q-04-C-R197 (2004). We are grateful for the collaboration and support of the Bio-SPICE community, especially Dr. Sri Kumar. We also want to thank Dr. A. Khan from DTRA, Dr. J. Huggins from USARIID and Dr. Marti Jett from WRAIR and the reviewers for the Journal of Medical Chemical, Biological and Radiological Defense.




Figure 1: A chemical’s structure is represented as a sequence



Figure 1: A chemical’s structure is represented as a sequence. [go to text reference]

Figure 2: The train gains chart for cocktails from 20 clusters.



Figure 2: The train gains chart for cocktails from 20 clusters. [go to text reference]


Figure 3: The test gains chart for the cocktails from 20 clusters. The shaded area includes Cluster 8, 15, 17, 1, 0, 19, 18 which contains 46 (96%) of the total 48 recommended combinations.



Figure 3: The test gains chart for the cocktails from 20 clusters. The shaded area includes Cluster 8, 15, 17, 1, 0, 19, 18 which contains 46 (96%) of the total 48 recommended combinations. [go to text reference]

Figure 4: The test gains chart for efficacy of different drug clusters for breast cancer group I cells.



Figure 4: The test gains chart for efficacy of different drug clusters for breast cancer group I cells. [go to text reference]


Nucleoside/tide Reverse Transcriptase Inhibitors(NRTI)
abacavir (Ziagen, 1592U89 Succinate)
lamivudine, 3TC (Epivir)
zidovudine, AZT (Retrovir, Azidothymidine)
stavudine, d4T (Zerit, 2',3'-Didehydro-3'-deoxythymidine)
didanosine, ddI (Videx, Videx EC, Dideoxyinosine)
zalcitabine, ddC (HIVID, Dideoxycytidine)
Protease Inhibitors (PI)
amprenavir (Agenerase, VX-478, 141W94)
nelfinavir (Viracept, AG-1343)
saquinavir (Fortavase, Invirase, Ro 31-8959)
indinavir (Crixivan, MK639, L-735,524)
ritonavir (Norvir, ABT-538)
Non-nucleoside Reverse Transcriptase Inhibitors (NnRTI)
delavirdine (Rescriptor, BHAP, U-90152)
nevirapine (Viramune, BI-RG-587)
Calanolide A


Table 1: FDA Approved anti-HIV drugs clustered by structures using the QIS D2 system and compared with their action mechanisms [go to text reference]

Column A
Column B
Strongly Recommended
Sustiva (efavirenz, NA)
Videx (didanosine, 17) + Epivir (lamivudine, 17)
Crixivan (indinavir, 11)
Videx (didanosine, 17) + Zerit (stavudine, 17)
Viracept (nelfinavir, 15)
Epivir (lamivudine, 17) + Zerit (stavudine, 17)
Norvir (ritonavir, 11) + Crixivan (indinavir, 11)
Videx (didanosine, 17) + Retrovir (zidovudine, 17)
Kaletra (NA)
Epivir (lamivudine, 17) + Retrovir (zidovudine, 17)
Norvir (ritonavir, 11) + Fortovase (NA)
Recommended as
Ziagen (abacavir, 2)
Retrovir (zidovudine, 17)+ HIVID (zalcitabine, 17)
Agenerase (amprenavir, 14)
Rescriptor (delavirdine, 3)
Viracept (nelfinavir, 15) + Fortovase (NA)
Viramune (nevirapine, 1)
Norvir (ritonavir, 11)
Fortovase (NA)
Not recommended because of insufficient data
hydroxyurea in combo with ARVs
Norvir (ritonavir, 11) + Agenerase (amprenavir, 14)
Norvir (ritonavir, 11) + Viracept (nelfinavir, 15)
Viread (NA)
Not Recommended
and should not
be offered
Invirase (saquinavir, 11)
Zerit (stavudine, 17) + Retrovir (zidovudine, 17)
HIVID (zalcitabine, 17) + Videx (didanosine, 17)
HIVID (zalcitabine, 17) + Epivir (lamivudine, 17)
HIVID (zalcitabine, 17) + Zerit (stavudine, 17)



Table 2: Anti-HIV cocktail recommendation. The red numbers in the brackets of are the cluster IDs for single drugs shown in Table 1. [go to text reference]

Drug:NSC number
docetaxel (Taxotere):628503



Table 3: Four anti-cancer drugs [go to text reference]




Borisy AA, Elliott PJ, Hurst NW, Lee MS, Lehar J, Price ER, Serbedzija G, Zimmermann GR, Foley MA, Stockwell BR, Keith CT., (2003) Systematic discovery of multicomponent therapeutics. Proc Natl Acad Sci U S A. 100(13), 7977-82.

Berenbaum, M.C. (1981) Criteria for analyzing interactions between biologically active agents. Adv Cancer Res. 35:269–335.

Cavasotto, C.N. and Abagyan, R.A. (2004) Protein flexibility in ligand docking and virtual screening to protein kinases. J Mol Biol. 337(1), 209-25.

Dumais, S. T., Landauer, T. K., Deerwester, S. (1988) Using latent semantic analysis to improve information retrieval. In Proceedings of CHI'88: Conference on Human Factors in Computing. New York, ACM.

Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A., (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J Chem Inf Comput Sci. 44(3), 1177-85.

Jones G, Willett P, Glen RC, Leach AR, Taylor R., (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 267, 727-48.

Kraemer O, Hazemann I, Podjarny AD, Klebe G., (2004) Virtual screening for inhibitors of human aldose reductase. Proteins, 55(4), 814-23.

Letsche, T.A. and Berry, M. W., (1997) Large-scale information retrieval ith latent semantic indexing. Information Sciences, Info. Sci.: an Inter J. 100(1-4), 105-137.

Miller, M.A. (2002) Chemical database techniques in drug discovery. Nature Reviews Drug Discovery 1, 220-227.

Martin M, Pienkowski T, Mackey J, Pawlicki M, Guastalla JP, Weaver C, et al, (2005) Adjuvant docetaxel for node-positive breast cancer. N Engl J Med. 352(22), 2302-13.

Nabholtz JM, Falkson C, Campos D, Szanto J, Martin M, Chan S, et al. (2003) Docetaxel and doxorubicin compared with doxorubicin and cyclophosphamide as first-line chemotherapy for metastatic breast cancer: results of a randomized, multicenter, phase III trial. J Clin Oncol. 21(6), 968-75.

National Research Council 2006. Overcoming Challenges to Develop Countermeasures against Aerosolized Bioterrorism Agents: Appropriate use of Animal Models: the Committee on Animal Models for Testing Interventions Against Aerosolized Bioterrorism Agents, Board on Life Sciences, Institute for Laboratory Animal Research, Division on Earth and Life Sciences, National Research Council. National Research Council, 2006, ISBN: 0-309-66093-9, available at http://books.nap.edu/openbook.php?record_id=11640&page=59

Pang, Y.P., Perola, E., Xu, K., Prendergast, F.G., (2001) EUDOC: a computer program for identification of drug interaction sites in macromolecules and drug leads from chemical databases. Journal of Computational Chemistry, 22(15),1750 - 1771.

Perola, E., Walters, W.P. and Charifson, P.S. (2004) A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins, 56(2), 235-49.

Perola E, Xu K, Kollmeyer TM, Kaufmann SH, Prendergast FG, Pang YP., (2000) Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. J Med Chem.,43(3), 401-8.

Rareya, M., Kramera, B., Lengauera, T., and Klebeb, G., (1996) A fast, flexible docking method using an incremental construction algorithm. J Mol Biol. 261, 470 –489.

Book: Schrodinger (2003) Glide, version 2.5., New York: Schrodinger.

Totrov M. (1997) Flexible protein –ligand docking by global energy optimization in internal coordinates. Proteins, 215 –220.

Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nat Genet, 24(3), 236-44.

von Minckwitz, G., Raab G, Caputo A, Schütte M, Hilfrich J, et al. (2005) Doxorubicin with cyclophosphamide followed by docetaxel every 21 days compared with doxorubicin and docetaxel every 14 days as preoperative treatment in operable breast cancer: the GEPARDUO study of the German Breast Group. J Clin Oncol. 23(12), 2676-85.

Zhao, Y., Zhou, C. C., Ogleby, I., and Zhou, C., (2005a) Large-scale Drug Function Prediction by Integrating QIS D2 and Bio-SPICE. Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops, Stanford, CA.

Zhao, Y. and Zhou, C. C. (2005b) Drug Characteristics Prediction. Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops, Stanford, CA.

Zhao, Y., Wei, S., Oglesby, I. and Zhou, C. C. (2006) DARPA Scientific and Technical Final Report “Development of Predictive Algorithm for In Silico Drug Toxicity and Efficacy Assessment”, DI-MISC-80711A for Contract No# W31P4Q-04-C-R197

Website: ATDN - http://www.atdn.org/simple/combo.html
Website NIAID - http://www.niaid.nih.gov/daids/dtpdb/FDADRUG.asp
Website: NCI DTP - http://dtp.nci.nih.gov/docs/aids/aids_data.html.