Introduction

The design of drug molecules possessing desired physical, chemical and biological properties is a challenging problem in the pharmaceutical industry. Structure-activity-relationships (SARs) are models that attempt to relate certain structural aspects of molecules to their physicochemical/biological/toxicological properties [1]. The ideal goal of SAR research is to predict the behavior of chemical species from a minimal set of input data [2]. In recent years, nonempirical graph theoretical parameters have been used in SAR studies for predicting chemical behavior [1, 38]. These graph invariants are usually a single number or a vector, which can be used to characterize and order molecules and predict properties [9]. A graph invariant is a graph theoretical property that has the same value for isomorphic graphs [10]. A single real-number graph invariant characterizing a molecular graph is usually called a topological index (TI). A graph G = (V, E) is an ordered pair of two sets V and E, the former representing a nonempty set and latter representing unordered pair of elements of set V. When V represents the atoms of a molecule and element of E symbolize covalent bonds between pairs of atoms, then G becomes a molecular graph (or constitutional graph) [11]. Topological indices are generally derived from graph by which the structural formula of a molecule can be represented. Numerical indices derived from the molecular graphs are usually called topological indices (topostructural and topochemical) [12]. Topostructural descriptors quantify information strictly about the adjacency and topological distance between atoms within a molecular structure, while topochemical descriptors encode information about molecular topology and information about the chemical nature of the atoms and bonds within a molecule.

Although a number of topological indices have been reported in the literature, only a handful of them have been employed successfully in SAR studies. Hosoya’s index [12, 13] Randic’s molecular connectivity index (X) [3, 14] the higher-order connectivity indices (n X) for the paths of length n and defined by Kier and Hall [7] the hydrogen E-state index [15, 16] Balaban’s index (J) [1720] Wiener’s index [2123] Zagreb group parameters, M 1 and M 2 [24, 25] eccentric connectivity index [2632] are some of the topological indices frequently employed in the SARs studies.

The importance of cyclin-dependent kinases (CDKs) to the process of cell division has stimulated the interest for the development of potential targets for proliferative diseases such as cancer, psoriasis and restenosis [3335] and for the prevention of chemotherapy-associated side effects such as alopecia [36]. The CDKs are serine/threonine protein kinases that become active when they associate with their respective cyclin subunits. Cyclins are so called because of their characteristic pattern of appearance and disappearance during the cell division cell cycle [37]. The CDKs consist of a catalytic subunit (CDK1-CDK8) and a regulatory subunit (cyclin A—cyclin H). These proteins are regulated in several ways: subunit production, complex formation, (de) phosphorylation, cellular localization and interaction with various natural protein inhibitors [38]. More recently, however, it has become clear that CDKs are involved in many other cellular processes, including regulation of transcription, differentiation, cell death, neuronal functions and neurodegeneration, transcription and exocytosis [3943].

The G1 is the phase of the cell cycle wherein the cell is responsive to growth factor-dependent signals. As such, G1 regulation is frequently disrupted in cancer through deregulation of cyclin/CDK activity. Deregulation of the G1 phase provides tumorigenic cells with a growth advantage. Cyclin E, the regulatory cyclin for CDK2, is considered a requisite regulator of G1 progression. Cyclin E is overexpressed in cancer, suggesting that cyclin E/CDK2 deregulation contributes to tumorigenesis [44]. The CDK2 activity is required for progression through G1 to the S phase of the cell cycle, and CDK2 is one of the key components of the G1 checkpoint. Checkpoints serve to maintain the proper sequence of cell cycle events and allow the cell to respond to insults or to proliferative signals, while the loss of proper checkpoint control in cancer cells contributes to tumorigenesis [45, 46]. In preclinical studies, CDK inhibitors have shown the ability not only to block neoplastic cell proliferation, but also to induce, through a variety of mechanisms, programmed cell death. The latter capacity may stem from the diverse effects that CDK inhibitors exert on multiple kinases and apoptotic regulatory molecules. In addition, there is abundant preclinical evidence that CDK inhibitors can potentiate, generally in a dose-dependent and sequence-dependent manner, the anti-tumor effects of many established cytotoxic agents [47]. These observations make CDK2 and its regulatory pathways compelling targets for the development of novel chemotherapeutic agents.

Inhibition of CDKs as regulating enzymes within the cell cycle resulted in antiproliferative effects and made them an interesting target for the development of novel small-sized cytostatics for combined cytostatic therapies [4850]. Flavopiridol is the first CDK inhibitor, the most important explored targets in cancer therapy, is presently undergoing phase II clinical trials [51, 52]. The present CDK inhibitors are either nonselective or show inhibition profiles toward various CDK subtypes such as CDK1, −2, and −5 and CDK4 and −6 [53]. Despite intense efforts, no specific CDK inhibitor has been discovered so far [52].

In the present study, the relationship of Wiener’s topochemical index—a distance-based topochemical descriptor, molecular connectivity topochemical index- an adjacency-based topochemical descriptor and eccentric connectivity topochemical index—an adjacency-cum-distance based topochemical descriptor with CDK2 inhibitory activity of indole-2-ones has been investigated.

Methodology

Calculations of topological indices

Wiener’s topochemical index (W c): is a modified form of oldest and most widely used distance based TI— Wiener’s index [2123] and this modified index takes into consideration the presence as well as relative position of heteroatoms in a molecular structure. Various modifications of Wiener’s index has been reported which includes hyper-Wiener’s index [54], new hyper-Wiener index [55] and Wiener’s topochemical index [56]. To overcome the problem of degeneracy W c is used. This is defined as the sum of the chemical distances between all the pairs of vertices in hydrogen suppressed molecular graph, i.e.

$$ W_{{\text{c}}} = \frac{1} {2}{\sum\limits_{I = 1}^n {{\sum\limits_{j = 1}^n {P_{{i_{c} j_{c} }} } }} } $$
(1)

where \(P_{{i_{c} j_{c}}}\) is the chemical length of the path that contains the least number of edges between vertex i and j in the graph G, n is the maximum possible number of i and j.

Molecular connectivity topochemical index (X A): is a modified form of one of the most widely used adjacency based TI— molecular connectivity index [3, 14] and it takes into consideration the presence as well as relative position of heteroatom(s) in a molecular structure. The molecular connectivity topochemical index is reported in the literature as atomic molecular connectivity index [57]. The authors now feel that atomic molecular connectivity index should be renamed as molecular connectivity topochemical index on similar grounds as those of W c [56], Zagreb topochemical index (M c1 ) [58], and Eccentric connectivity topochemical index cc ) [59] for the sake of simplicity and to avoid any kind of confusion.

The molecular connectivity topochemical index or atomic molecular connectivity index is denoted by X A and is expressed as

$$ \chi ^{{\text{A}}} = {\sum\limits_{i = 1}^n {{\left( {V^{c}_{i} V^{c}_{j} } \right)}^{{-1/2}} } } $$
(2)

where, n is the number of vertices, V c i and V c j are the modified degrees of adjacent vertices i and j forming the edge i, j in a graph G. The modified degree of a vertex can be obtained from the adjacency matrix by substituting row element corresponding to heteroatom, with relative atomic weight with respect to carbon atom [57].

Eccentric connectivity topochemical index: is denoted by ξ cc and is defined as the summation of the product of chemical eccentricity and the chemical degree of each vertex in the hydrogen suppressed molecular graph having n vertices, that is

$$ \xi ^{{\text{c}}}_{{\text{c}}} = {\sum\limits_{i = 1}^n {{\left( {E_{{ic}} \times V_{{ic}} } \right)}} } $$
(3)

Where V icis the chemical degree of vertex i, E ic the chemical eccentricity of the vertex i and n is the number of the vertices in graph G [59]. Eccentric connectivity topochemical index is a modified form of an adjacency-cum-distance based TI—eccentric connectivity index [2632] and this modified index takes into consideration the presence as well as relative position of heteroatom (s) in a molecular structure.

Model development

A data set [60] comprising 67 indole-2-ones based upon the basic structure depicted in Fig. 1 was selected for the present investigations. The data set comprised both active and inactive compounds. The values of the W c were computed for each analogue in the data set using an in-house computer program. For the selection and evaluation of range-specific features, exclusive activity ranges were discovered from the frequency distribution of response level and subsequently identifying the active range by analyzing the resultant data by maximization of the moving average with respect to the active compounds (<35% = inactive, 35–65% = transitional, ≥65% = active) [61]. Subsequently, each analogue was assigned a biological activity, which was then compared with the reported CDK2 inhibitory activity. The CDK2 inhibitory activity was reported quantitatively as IC50 at different concentrations. The analogues possessing IC50 values of <5 nM were considered to be active and analogues possessing an IC50 values of ≥ 5 nM were considered to be inactive for the purposes of the present study.

Fig. 1
figure 1

Basic structure of indole-2-ones

The percentage degree of prediction of a particular range was derived from the ratio of the number of compounds predicted correctly to the total number of compounds present in that range. The overall degree of prediction was derived from the ratio of the total number of compounds correctly to that of the total number of compounds present in both the active and inactive ranges.

The aforementioned procedure was followed for X A and ξ cc . The results are summarized in Tables 1 and 2.

Table 1 Relationship of Wiener’s topochemical index, molecular connectivity topochemical index and eccentric connectivity topochemical index with CDK2 inhibitory activity
Table 2 Proposed models for the prediction of CDK2 inhibitory activity

Results and discussion

Efficient discovery and creation of novel drug molecules depend on the ability to explore and quantify the relationships between molecular structure and function—particularly the biological activity. The problem in the development of a suitable correlation between chemical structures and properties can be attributed to the nonquantitative nature of chemical structures. Graph theory was successfully employed through the translation of chemical structures into characteristic numerical descriptors by resorting to graph invariants [62, 63]. Topological descriptors are such numerical graph invariants, which quantify the chemical structures so as to facilitate the development of suitable correlations with quantified biological activities.

The importance of CDKs in cell-cycle regulation, their interaction with oncogenes and tumor suppressors, and their frequent deregulation in human tumors, has encouraged an active search for agents capable of perturbing the function of CDKs [64]. The potential use of these inhibitors is being extensively evaluated not only for cancer chemotherapy but also in restenosis, psoriasis, tumoral angiogenesis, atherosclerosis, glomerulonephritis, Alzheimer’s disease and viral infections.

In the present investigations, the W c—a distance-based topochemical descriptor, X A —an adjacency-based topochemical descriptor and ξc c—an adjacency-cum-distance based topochemical descriptor has been employed to study relationship with CDK2 inhibitory activity of indole-2-one derivatives. The selected data set comprising of 67 analogues included both the active and inactive compounds.

Retrofit analysis of the data in Tables 1 and 2 reveals the following information with regard to models based upon W c:

  • A total of 48 out of 54 compounds were classified correctly in both the active and inactive ranges. The overall accuracy of prediction was found to be 88.89% with regard to CDK2 inhibitory activity.

  • The active range had W c values of 1940.251–2029.979. All the analogues in the active range exhibited CDK2 inhibitory activity.

  • Two inactive ranges—a lower inactive range with index values of <1940.251 and an upper inactive range with index values of >2795.718 were observed. Activity of 40 out of 46 compounds in these inactive ranges was predicted correctly.

  • A transitional range with W c values varying from >2029.979 to 2795.718 was observed indicating a gradual transition from active to upper inactive range and vice versa.

  • The average IC50 value of correctly predicted compounds in the active range was found to be only 2.26 nM. This clearly indicates high potency of the active range.

Retrofit analysis of the data in Tables 1 and 2 reveals the following information with regard to model based upon X A :

  • A total of 52 out of 59 compounds were classified correctly in both the active and inactive ranges using model based upon X A . The overall accuracy of prediction was found to be 88.13% with regard to CDK2 inhibitory activity.

  • The active range had X A values of 10.606–12.260. Nine out of 11 analogues in the active range exhibited to CDK2 inhibitory activity.

  • A transitional range with X A values varying from >11.260 to 12.352 was observed indicating a gradual transition from active to upper inactive range and vice versa.

  • The average IC50 value was found to be 2.28 nM for correctly predicted compounds in the active range. This clearly indicates high potency of the active range.

  • Two inactive ranges—a lower inactive range with index values of <10.606 and an upper inactive range with index values of >12.352 were observed. Activity of 43 out of 48 compounds in these inactive ranges was predicted correctly.

Retrofit analysis of the data in Tables 1 and 2 reveals the following information with regard to ξc c:

  • A total of 44 out of 49 compounds were classified correctly in both the active and inactive ranges. The overall accuracy of prediction was found to be 89.79% with regard to CDK2 inhibitory activity.

  • The active range had ξc c values of 864.354–907.513. All five analogues in the active range exhibited CDK2 inhibitory activity.

  • A transitional range with ξc c values varying from 767.648 to <864.354 was observed indicating a gradual transition from active to lower inactive range and vice versa.

  • The average IC50 value was found to be 2.18 nM for correctly predicted compounds in the active range. This clearly indicates high potency of the active range.

  • Two inactive ranges—a lower inactive range with index values of <767.648 and an upper inactive range with index values of >907.513 were observed. Activity of 39 out of 44 compounds in these inactive ranges was predicted correctly.

Conclusion

Investigations reveal significant correlations of all the three-topochemical indices with CDK2 inhibitory activity of indole-2-one derivatives. The overall accuracy of prediction varied from minimum of 88% for model based on χA to a maximum of ~90% for model based on ξ c c. High predictability of the proposed models based upon the topochemical indices offer a vast potential for providing lead structures for the development of potent CDK2 inhibitory agents.