Abstract
The main theme of this paper is to take inspiration from methods used in computer science and related disciplines, and to apply these to develop improved biotechnology. In particular, our proposed improvements are made by adapting various information theoretic coding techniques which originate in computational and information processing disciplines, but which we re-tailor to work in the biotechnology context. (a) We apply Error-Correcting Codes, developed to correct transmission errors in electronic media, to decrease (in certain contexts, optimally) error rates in optically-addressed DNA synthesis (e.g., of DNA chips). (b) We apply Vector-Quantization (VQ) Coding techniques (which were previously used to cluster, quantize, and compress data such as speech and images) to improve I/O rates (in certain contexts, optimally) for transformation of electronic data to and from DNA with bounded error. (c) We also apply VQ Coding techniques, some of which hierarchically cluster the data, to improve associative search in DNA databases by reducing the problem to that of exact affinity separation. These improvements in biotechnology appear to have some general applicability beyond biomolecular computing.
As a motivating example, this paper improves biotechnology methods to do associative search in DNA databases. Baum [B95] previously proposed the use of biotechnology affinity methods (DNA annealing) to do massively parallel associative search in large databases encoded as DNA strands, but many remaining issues were not developed. Using in part our improved biotechnology techniques based on Error-Correction and VQ Coding, we develop detailed procedures for the following tasks:
-
(i)
The database may initially be in conventional (electronic, magnetic, or optical) media, rather than the form of DNA strands. For input and output (I/O) to and from conventional media, we apply DNA chip technology improved by Error-Correction and VQ Coding methods for error-correction and compression.
-
(ii)
The query may not be an exact match or even partial match with any data in the database, but since DNA annealing affinity methods work best for these cases, we apply various VQ Coding methods for refining the associative search to exact matches.
-
(iii)
We also briefly discuss how to extend associative search queries in DNA databases to more sophisticated hybrid queries that include also Boolean formula conditionals with a bounded number of Boolean variables, by combining our methods for DNA associative search with known BMC methods for solving small size SAT problems. For example, these extended queries could be executed on natural DNA strands (e.g., from blood or other body tissues) which are appended with DNA words encoding binary information about each strand, and the appended information could consist of the social security number of the person whose DNA was sampled, cell type, the date, further medical data, etc.
A postscript version of this paper is at URL http://www.cs.duke.edu/~reif/paper/Error-Restore/Error-Restore.ps.
Supported by Grants NSF/DARPA CCR-9725021, CCR-96-33567, NSF IRI- 9619647 and EIA-0086015, ARO contract DAAH-04-96-1-0448, and ONR contract N00014-99-1-0406.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adleman, L.M., “Molecular Computation of Solution to Combinatorial Problems”, Science, 266, 1021, (1994).
Adleman, L.M., P.W.K. Rothemund, S. Roweis, E. Winfree, “On Applying Molecular Computation To The Data Encryption Standard”, 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton, June, 1996
Bach, E., A. Condon, E. Glaser, and C. Tanguay, “Improved Models and Algorithms for DNA Computation”, Proc. 11th Annual IEEE Conference on Computational Complexity, J. Computer and System Sciences, to appear.
Barnes, W.M., “PCR amplification of up to 35-kb DNA with high fidelity and high yield from bacteriophage templates”, Proc. Natl. Acad. Sci., 91, 2216–2220, (1994).
Baum, E. B., “How to build an associative memory vastly larger than the brain”, Science, pp 583–585, April 28, 1995.
Baum, E. B. “DNA Sequences Useful for Computation, 2nd Annual DIMACS Meeting on DNA Based Computers”, Princeton University, June 1996.
Beaucage, S.L., and Caruthers, M.H. (1981). “Deoxynucleoside phosphoramidites— A new class of key intermediates for deoxypolynucleotide synthesis”, Tetrahedron Lett. 22,1859–1862.
E. R. Berlekamp, “Algebraic Coding theory”, McGraw-Hill Book Company, NY (1968).
Blanchard, A. P., R. J. Kaiser and L. E. Hood, “High-density oligonucleotide arrays”, Biosens. Bioelec., Vol. 11, 687–690, (1996).
Boneh, D., C. Dunworth, R. Lipton, “Breaking DES Using a Molecular Computer”, Princeton CS Tech-Report number CS-TR-489-95, (1995).
Boneh, D., and R. Lipton, “Making DNA Computers Error Resistant”, Princeton CS Tech-Report CS-TR-491-95, Also in 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton University, June 1996.
Boneh, D., and R. Lipton, “A Divide and conquer approach to DNA sequencing”, Princeton University, 1996.
N. E. Broude, T. Sano, C. L. Smith, and C. R. Cantor, “Enhanced DNA Sequencing by hybridization”, Proc. Natl. Acad. Sci., Vol. 91, pp. 3071–3076, (April, 1994).
Cai, W., A. Condon, R.M. Corn, Z. Fei, T. Frutos, E. Glaser, Z. Guo, M.G. Lagally, Q. Liu, L.M. Smith, and A. Thiel, “The Power of Surface-Based Computation”, Proc. First International Conference on Computational Molecular Biology (RECOMB97), January, 1997.
Cai, W., E. Rudkevich, Z. Fei, A. Condon, R. Corn, L.M. Smith, M.G. Lagally, “Influence of Surface Morphology in Surface-Based DNA Computing”, Submitted to the 43rd AVS National Symposium, Abstract No. BI+MM-MoM10, (1996).
Chee, M., R. Yang, E. Hubbell, A. Berno, X. C. Huang, D. Stern, J. Winkler, D. J. Lockhart, M. S. Morris and S. P. A. Fodor, “Accessing genetic information with high-density DNA arrays”, Science, Vol. 274, 610–614, (1996).
Chen, J., and D. Wood, “A New DNA Separation Technique with Low Error Rate”, Third Annual DIMACSWorkshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Clelland, C.T., Risca, V., and C. Bancroft. “Genomic Steganography: Amplifiable Microdots”. To appear in Nature, 1999.
Cover, T. M. and J. A. Thomas, “Elements of Information Theory”, John Wiley, New York, NY, (1991).
Deaton, R., R.C. Murphy, M. Garzon, D.R. Franceschetti, and S.E. Stevens, Jr., “Good encodings for DNA-based solutions to combinatorial problems”, Proceedings of the 2nd Annual DIMACS Meeting on DNA Based Computers, June 1996.
Deaton, R., R.C. Murphy, M. Garzon, D.R. Franceschetti, and S.E. Stevens, Jr., “Reliability and efficiency of a DNA-based computation”, Phys. Rev. Lett. 80, 417–420 (1998).
Deaton, R., R.C. Murphy, J.A. Rose, M. Garzon, D.R. Franceschetti, and S.E. Stevens, Jr., “A DNA Based Implementation of an Evolutionary Search for Good Encodings for DNA Computation”, ICEC’97 Special Session on DNA Based Computation, Indiana, April, 1997.
Deputat, M., G. Hajduczok, E. Schmitt, “On Error-Correcting Structures Derived from DNA”, Third Annual DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Drmanac, R, S. Drmanac, Z. Strezoska, T. Paunesku, I. Labat, M. Zeremski, J. Snoddy, W. K. Funkhouser, B. Koop, L. Hood, and R. Crkenjakov “DNA Sequence Determination by Hybridize: A Strategy for Efficient Large-Scale Sequencing”, Science, 260, 1649–1652, (1993).
Fodor, S. P. A., J. L. Read, C. Pirrung, L. Stryer, A. T. Lu and D. Solas, “Light-directed spatially addressable parallel chemical synthesis”, Science, Vol. 251, 767–773, (1991).
Frutos, A.G., A.J. Thiel, A.E. Condon, L.M. Smith, R.M. Corn, “DNA Computing at Surfaces: 4 Base Mismatch Word Design”, Third Annual DIMACSWorkshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Garzon, M., R. Deaton, P. Neathery, R.C. Murphy, D.R. Franceschetti, S.E. Stevens Jr., “On the Encoding Problem for DNA Computing”, Third Annual DIMACSWorkshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Gehani, A., T. H. LaBean, and J.H. Reif, “DNA-based Cryptography”, 5th DIMACS Workshop on DNA Based Computers, MIT, June, 1999. DNA Based Computers, V, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. E. Winfree), American Mathematical Society, 2000. http://www.cs.duke.edu/~reif/paper/DNAcypt/crypt.ps
Gehani, A. and J. Reif, “Micro flow bio-molecular computation”, 4th DI-MACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999). Also, special issue of Biosystems, Vol. 52, Nos. 1–3, (ed. By L. Kari, H. Rubin, and D. H. Wood), pp 197–216, (1999). http://www.cs.duke.edu/~reif/paper/geha/microflow.ps.
Gersho, A., R. Gallager, and R. M. Gray, “Vector Quantization and Signal Compression”, Kluwer Academic Publishers, (1991).
Gray, J. M. T. G. Frutos, A.M. Berman, A.E. Condon, M.G. Lagally, L.M. Smith, R.M. Corn, “Reducing Errors in DNA Computing by Appropriate Word Design”, University of Wisconsin, Department of Chemistry, October 9, 1996.
Gray, R. M., “Source Coding Theory”, Klewer Academic Publishers, Boston, (1990).
Grumbach, S., and F. Tahi, “Compression of DNA Sequences”, Proceedings of the IEEE Data Compression Conference (DCC’94), Snowbird, UT, 72–82, March 1994.
Hamming, R. W., “Error Detection and error correcting codes”, Bell System Technical Journal, Vol. 29, 147–160, (1950).
Hartemink, A., David Gifford, J. Khodor, “Automated constraint-based nucleotide sequence selection for DNA computation”, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Hartemink, A.J., D.K. Gifford, “Thermodynamic Simulation of Deoxy-oligonucleotide Hybridize for DNA Computation”, Third Annual DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Jain, A. K. and R. C. Dubes, “Algorithms for clustering data,” Prentice Hall, Englewood Cliffs, N.J., (1988).
Khodor, J., and David K. Gifford, “The Efficiency of Sequence-Specific Separation of DNA Mixtures for Biological Computing”, Third Annual DI-MACS Workshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Khodor, J., D. Gifford, “Design and implementation of computational systems based on programmed mutagenesis”, 4th DIMACSWorkshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Lipschutz, R.J., Fodor, P.A., Gingeras, T.R., and Lockhart, D.J. Nature Genetics Supplement, vol 21, pp 20–24 (1999).
LaBean, T. H., Yan, H., Kopatsch, J., Liu, F., Winfree, E., Reif, J.H. and Seeman, N.C., “The construction, analysis, ligation and self-assembly of DNA triple crossover complexes”, J. Am. Chem. Soc. 122, 1848–1860 (2000). http://www.cs.duke.edu/~reif/paper/DNAtiling/tilings/JACS.pdf
LaBean, T. H., E. Winfree, J. H. Reif, “Experimental Progress in Computation by Self-Assembly of DNA Tilings”, 5th International Meeting on DNA Based Computers(DNA5), MIT, Cambridge, MA, (June, 1999). To appear in DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. E. Winfree, to appear American Mathematical Society, 2000. http://www.cs.duke.edu/~thl/tilings/labean.ps
Landweber, L.F. and R. Lipton, “DNA 2 DNA Computations: A Potential ‘Killer App’?”, 3nd Annual DIMACS Meeting on DNA Based Computers, University of Pens., (June 1997).
van Lint, J. H., “Coding Theory”, Lecture Notes in Mathematics, Springer Verlag, NY, (1971).
Lipton, R.J. “DNA Solution of Hard Computational Problems”, Science, 268, 542–845, (1995).
Liu, Q., A. Frutos, L. Wang, A. Thiel, S. Gillmor, T. Strother, A. Condon, R. Corn, M. Lagally, L. Smith, “Progress towards demonstration of a surface based DNA computation: A one word approach to solve a model satisfiability problem”, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Liu, Q., Z. Guo, A.E. Condon, R.M. Corn, M.G. Lagally, and L.M. Smith, “A Surface-Based Approach to DNA Computation”, Proc. 2nd Annual Princeton Meeting on DNA-Based Computing, June 1996.
Liu, Q., A.J. Thiel, A.G. Frutos, R.M. Corn, L.M. Smith, “Surface-Based DNA Computation: Hybridize and Destruction”,Third Annual DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Loewenstern, D. and Yainilos, P., “Significantly lower entropy estimates for natural DNA sequences”, J.A Storer and M Cohn (Eds.), IEEE Data Compression Conference, Snowbird, UT, pp. 151–161, (March, 1997).
Mao, C., T.H. LaBean, J. H. Reif, and N.C. Seeman, “An Algorithmic Self-Assembly”, Nature, Sept 28, (2000). http://www.cs.duke.edu/~reif/paper /SELF-ASSEMBLE/AlgorithmicAssembly.pdf
Marshall, A., Hodgson, J. 1998 Nature Biotechnology 16, pp 27–31.
McGall, G.H., Barone, A.D., Diggelmann, M., Ngo, N., Gentalen, E., and Fodor, S.P.A. “The Efficiency of Light-Directed Synthesis of DNA Arrays on Glass Substrates”. J. Am. Chem. Soc., 119(22): 5081–5090, (1997).
Mills, A., B. Yurke, P. Platzman, “Error-tolerant massive DNA neural-network computation”, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Mir, K.U., “A Restricted Genetic Alphabet for DNA Computing”, 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton University, (June 1996).
Nevill-Manning, C.G. and I.H. Witten, “Protein is Incompressible”, J.A Storer and M Cohn (Eds.), IEEE Data Compression Conference, Snowbird, UT, pp. 257–266, (March, 1999).
Pease, A. C., D. Solas, E. J. Sullivan, M. T. Cronin, C. P. Holmes and S. P. Fodor, “Light-generated oligonucleotide arrays for rapid DNA sequence analysis”, Proc. Natl Acad. Sci. USA, Vol. 91, 5022–5026, (1994).
V. Pless, “Introduction to the theory of error-correcting codes,” John Wiley and Sons, NY (1982).
Orlian, M., F. Guarnieri, C. Bancroft, “Parallel Primer Extension Horizontal Chain Reactions as a Paradigm of Parallel DNA-Based Computation”, Third Annual DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Reif, J. (ed.), Synthesis of Parallel Algorithms, Morgan Kaufmann, (1993).
Reif, J.H., “Parallel Molecular Computation: Models and Simulations”, Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA95), ACM, Santa Barbara, 213–223, June 1995. Algorithmica, special issue on Computational Biology, 1999. (http://www.cs.duke.edu/~reif/paper/paper.html)
Reif, J.H., “Local Parallel Biomolecular Computation”, 3rd DIMACS Meeting on DNA Based Computers, Univ. of Penns., (June, 1997). DIMACS Series in Discrete Mathematics and Theoretical Computer Science, ed. H. Rubin, (1999). (http://www.cs.duke.edu/~reif/paper/Assembly.ps and Assembly.fig.ps)
Reif, J.H., “Paradigms for Biomolecular Computation”, First International Conference on Unconventional Models of Computation, Auckland, New Zealand, January 1998. Unconventional Models of Computation, edited by C.S. Calude, J. Casti, and M.J. Dinneen, Springer Pub., Jan. 1998, pp 72–93. (http://www.cs.duke.edu/~reif/paper/paradigm.ps)
J.H. Reif, T. H. LaBean, and Seeman, N.C.,Challenges and Applications for Self-Assembled DNA Nanostructures, Invited paper, Sixth International Meeting on DNA Based Computers (DNA6), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Leiden, The Netherlands, (June, 2000) ed. A. Condon. To be published by Springer-Verlag as a volume in Lecture Notes in Computer Science, (2000). http://www.cs.duke.edu/~reif/paper/SELFASSEMBLE/selfassemble.ps
Roberts, S.S., “Turbocharged PCR”, Jour. of N.I. H. Research, 6, 46–82, (1994).
Rose, J.A., R. Deaton, M. Garzon, and S.E. Stevens Jr., “The Effect of Uniform Melting Temperatures on the Efficiency of DNA Computing”, Third Annual DIMACSWorkshop on DNA Based Computers, University of Pennsylvania, June 23–26, 1997. Published in DNA Based Computers, III, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 48 (ed. H. Rubin), American Mathematical Society, (1999).
Roweis, S., E. Winfree, R. Burgoyne, N.V. Chelyapov, M.F. Goodman, P.W.K. Rothemund, L. M. Adleman, “A Sticker Based Architecture for DNA Computation”, 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton University, June1996, Also as Laboratory for Molecular Science, USC technical report A Sticker Based Model for DNA Computation, May 1996.
R96 Rubin, H. “Looking for the DNA killer app.”, Nature, 3, 656–658, (1996).
Shannon, C. E., “A mathematical theory of communication”, Bell System Technical Journal, Vol. 27, 379–423 and p 623–656, (1948).
Shannon, C. E., “Communication in the presence of noise”, Proceedings of the I. R. E., Vol. 37, 10–21, (1949).
Suyama, A., “DNA chips — Integrated Chemical Circuits for DNA Diagnosis and DNA computers”, To appear,(1998).
Wang, L., Q. Liu, A. Frutos, S. Gillmor, A. Thiel, T. Strother, A. Condon, R. Corn, M. Lagally, L. Smith, “Surface-based DNA computing operations: DESTROY and READOUT”, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Winfree, E., F. Liu, Lisa A. Wenzler, N. C. Seeman, “Design and Self-Assembly of Two Dimensional DNA Crystals”, Nature 394: 539–544, 1998. (1998).
Winfree, E., X. Yang, N.C. Seeman, “Universal Computation via Self-assembly of DNA: Some Theory and Experiments”, 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton, June, 1996.
Wood, D. H., “Applying error correcting codes to DNA computing”, 4th DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June, 1998. DNA Based Computers, IV, DIMACS Series in Discrete athematics and Theoretical Computer Science, (ed. H. Rubin), American Mathematical Society, (1999).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reif, J.H., LaBean, T.H. (2001). Computationally inspired biotechnologies: Improved DNA synthesis and associative search using Error-Correcting Codes and Vector-Quantization?. In: Condon, A., Rozenberg, G. (eds) DNA Computing. DNA 2000. Lecture Notes in Computer Science, vol 2054. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44992-2_11
Download citation
DOI: https://doi.org/10.1007/3-540-44992-2_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42076-7
Online ISBN: 978-3-540-44992-8
eBook Packages: Springer Book Archive