Skip to main content
Log in

Database diversity assessment: New ideas, concepts, and tools

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

We present some new ideas for characterizing and comparing largechemical databases. The comparison of the contents of large databases is nottrivial since it implies pairwise comparison of hundreds of thousands ofcompounds. We have developed methods for categorizing compounds into groupsor series based on their ring-system content, using precalculatedstructure-based hashcodes. Two large databases can then be compared bysimply comparing their hashcode tables. Furthermore, the number of distinctring-system combinations can be used as an indicator of database diversity.We also present an indepen- dent technique for diversity assessment calledthe ’saturation diversity‘ approach. This method is based on picking as manymutually dissimilar compounds as possible from a database or a subsetthereof. We show that both methods yield similar results. Since the twomethods measure very different properties, this probably says more about theproperties of the databases studied than about the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barnard, J.M., J. Chem. Inf. Comput. Sci., 33 (1993) 532.

    Google Scholar 

  2. Downs, G.M. and Willett, P., In Lipkowitz, K.B. and Boyd, D.B. (Eds.) Reviews in Computational Chemistry, Vol. 7, VCH, New York, NY, U.S.A., 1996, pp. 1–66.

    Google Scholar 

  3. Carhart, R.E., Smith, D.H. and Venkataraghavan, R., J. Chem. Inf. Comput. Sci., 25 (1985) 64.

    Google Scholar 

  4. Nilakantan, R., Bauman, N., Dixon, J.S. and Venkataraghavan, R., J. Chem. Inf. Comput. Sci., 27 (1987) 82.

    Google Scholar 

  5. Sheridan, R.P., Rusinko III, A., Nilakantan, R. and Venkataraghavan, R., Proc. Natl. Acad. Sci. USA, 86 (1989) 8165.

    Google Scholar 

  6. Sheridan, R.P., Nilakantan, R., Rusinko III, A., Bauman, N., Haraki, K.S. and Venkataraghavan, R., J. Chem. Inf. Comput. Sci., 29 (1989) 255.

    Google Scholar 

  7. Shemetulskis, N.E., Dunbar, J.B., Dunbar, B.W., Moreland, D.W. and Humblet, C., J. Comput.-Aided Mol. Design, 9 (1995) 407.

    Google Scholar 

  8. Shemetulskis, N.E., Weininger, D., Blankley, C.J., Yang, J.J. and Humblet, C., J. Chem. Inf. Comput. Sci., 36 (1996) 862.

    Google Scholar 

  9. Boyd, S.M., Beverley, M., Norskov, L. and Hubbard, R.E., J. Comput.-Aided Mol. Design, 9 (1995) 417.

    Google Scholar 

  10. Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.H., J. Med. Chem., 38 (1995) 1431.

    Google Scholar 

  11. Jarvis, R.A. and Patrick, E.A., IEEE Trans. Comput., C22 (1973) 1025.

    Google Scholar 

  12. Cummins, D.J., Andrews, C.W., Bentley, J.A. and Cory, M., J. Chem. Inf. Comput. Sci., 36 (1996) 750.

    Google Scholar 

  13. Pickett, S.D., Mason, J.S. and McLay, I.M., J. Chem. Inf. Comput. Sci., 36 (1996) 1214.

    Google Scholar 

  14. Nilakantan, R., Bauman, N., Haraki, K.S. and Venkataraghavan, R., J. Chem. Inf. Comput. Sci., 30 (1990) 65.

    Google Scholar 

  15. Bemis, G.W. and Murcko, M.A., J. Med. Chem., 39 (1996) 2887.

    Google Scholar 

  16. World Drug Index (WDI), developed and published by Derwent Publications, London, U.K.

  17. ORACLE, a database management system distributed by Oracle Corporation.

  18. Available Chemicals Directory (ACD), a database of commercially available compounds distributed by MDL Information Systems, San Leandro, CA, U.S.A.

  19. NCI3D, the public-domain portion of the National Cancer Institute’s database distributed by MDL Information Systems, San Leandro, CA, U.S.A.

  20. MACCS, an acronym for Molecular Access System, a chemical database management system supplied by MDL Information Systems, San Leandro, CA, U.S.A.

  21. Allen, F.H., Bellard, S., Brice, M.D., Cartwright, B.A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B.G., Kennard, O., Motherwell, W.D.S., Rodgers, J.R. and Watson, D.G., Acta Crystallogr., B35 (1979) 2331.

    Google Scholar 

  22. Durrett, R., In Probability, Theory and Examples, Wadsworth, Belmont, CA, U.S.A., 1991, pp. 45–46.

    Google Scholar 

  23. Holliday, J.H., Ranade, S.H. and Willett, P., Quant. Struct.–Act. Relatsh., 14 (1995) 501.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nilakantan, R., Bauman, N. & Haraki, K.S. Database diversity assessment: New ideas, concepts, and tools. J Comput Aided Mol Des 11, 447–452 (1997). https://doi.org/10.1023/A:1007937308615

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007937308615

Navigation