Article Info

Comparison of Similarity Method to Improve Retrieval Performance for Chemical Data

Suhaila Zainudin, Nevy Rahmi Nurjana
dx.doi.org/10.17576/apjitm-2018-0701-08

Abstract

Drug discovery is the process through which new drugs are discovered. One of the most common techniques in drug discovery is similarity searching based on virtual screening that involves comparing the similarity between molecule structures in chemical database using established similarity methods. The objective of this study is to identify the similarity of the structure in chemical dataset using Mean Pairwise Similarity (MPS) calculation and to determine the best coefficient to be used in similarity searching which involves of molecular descriptor ECFP2 fingerprint and three types of similarity coefficient which are Tanimoto, Soergel and Euclidean. From the results, it was deduced that Tanimoto and Soergel coefficients has a better performance than Euclidean coefficient. For future work, different combinations of fingerprints such as Daylight, BCI, Unity MDL and similarity coefficient can be studied further.

keyword

mean pairwise similarity; virtual screening; similarity searching; retrieval; chemoinformatics

Area

Data Mining and Optimization