ABSTRACT
With increasing amounts of data available on the web and a diverse range of users interested in programmatically accessing that data, web automation must become easier. Automation helps users complete many tedious interactions, such as scraping data, completing forms, or transferring data between websites. However, writing web automation scripts typically requires an expert programmer because the writer must be able to reverse engineer the target webpage. We have built a record and replay tool, Ringer, that makes web automation accessible to non-coders. Ringer takes a user demonstration as input and creates a script that interacts with the page as a user would. This approach makes Ringer scripts more robust to webpage changes because user-facing interfaces remain relatively stable compared to the underlying webpage implementations. We evaluated our approach on benchmarks recorded on real webpages and found that it replayed 4x more benchmarks than a state-of-the-art replay tool.
- IFTTT - make your work flow.Google Scholar
- The propublica nerd blog - propublica.Google Scholar
- A free web & mobile app for reading comfortably - readability.Google Scholar
- Alexa top 500 global sites, July 2013.Google Scholar
- Beautiful soup: We called him tortoise because he taught us. http://www.crummy.com/software/BeautifulSoup/, July 2013.Google Scholar
- Browser scripting, data extraction and web testing by imacros. http://www.iopus.com/imacros/, July 2013.Google Scholar
- Scrapy. http://scrapy.org/, July 2013.Google Scholar
- Selenium-web browser automation. http://seleniumhq. org/, July 2013.Google Scholar
- Amazon price tracker, Dec. 2015.Google Scholar
- Greasemonkey :: Add-ons for firefox, Nov. 2015.Google Scholar
- S. Barman. End-User Record and Replay for the Web. PhD thesis, EECS Department, University of California, Berkeley, Dec 2015.Google Scholar
- M. Bolin, M. Webber, P. Rha, T. Wilson, and R. C. Miller. Automation and customization of rendered web pages. In Proceedings of the 18th annual ACM symposium on User interface software and technology, UIST ’05, pages 163–172, New York, NY, USA, 2005. ACM. doi: 10.1145/1095034. Google ScholarDigital Library
- 1095062.Google Scholar
- 1095062.Google Scholar
- B. Burg, R. Bailey, A. J. Ko, and M. D. Ernst. Interactive record/replay for web application debugging. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST ’13, pages 473–484, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2268-3. doi: 10.1145/ 2501988.2502050. Google ScholarDigital Library
- S. Chasins, S. Barman, R. Bodik, and S. Gulwani. Browser record and replay as a building block for end-user web automation tools. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW ’15 Companion, pages 179–182, Republic and Canton of Geneva, Switzerland, 2015. Google ScholarDigital Library
- International World Wide Web Conferences Steering Committee. ISBN 978-1-4503-3473-0. doi: 10.1145/2740908.Google Scholar
- 2742849.Google Scholar
- 2742849.Google Scholar
- N. Dalvi, P. Bohannon, and F. Sha. Robust web extraction: An approach based on a probabilistic tree-edit model. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pages 335–348, New York, NY, USA, 2009. ACM. ISBN 978- 1-60558-551-2. doi: 10.1145/1559845.1559882. Google ScholarDigital Library
- R. Ennals, E. Brewer, M. Garofalakis, M. Shadle, and P. Gandhi. Intel mash maker: Join the web. SIGMOD Rec., 36(4):27–33, Dec. 2007. ISSN 0163-5808. doi: 10.1145/ 1361348.1361355. Google ScholarDigital Library
- f. dfgdfg, S. Flesca, and F. Furfaro. Xpath query relaxation through rewriting rules. IEEE Transactions on Knowledge and Data Engineering, 23(10):1583–1600, Oct 2011. ISSN 1041-4347. doi: 10.1109/TKDE.2010.203. Google ScholarDigital Library
- P. L. Fernandez, L. S. Heath, N. Ramakrishnan, and J. P. C. Vergara. Reconstructing partial orders from linear extensions, 2006.Google Scholar
- T. Furche, G. Gottlob, G. Grasso, C. Schallhart, and A. Sellers. Oxpath: A language for scalable data extraction, automation, and crawling on the deep web. The VLDB Journal, 22(1):47–72, Feb. 2013. ISSN 1066-8888. Google ScholarDigital Library
- doi: 10. 1007/s00778-012-0286-6.Google Scholar
- G. Grasso, T. Furche, and C. Schallhart. Effective web scraping with oxpath. In Proceedings of the 22Nd International Conference on World Wide Web Companion, WWW ’13 Companion, pages 23–26, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee. ISBN 978-1-4503-2038-2. Google ScholarDigital Library
- R. Hutton. Amazon discount tracker camelcamelcamel tips users to deals, December 2013.Google Scholar
- Import.io. Import.io | web data platform & free web scraping tool, Mar. 2016.Google Scholar
- A. Koesnandar, S. Elbaum, G. Rothermel, L. Hochstein, C. Scaffidi, and K. T. Stolee. Using assertions to help enduser programmers create dependable web macros. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16, pages 124–134, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-995-1. doi: 10.1145/1453101.1453119. Google ScholarDigital Library
- J. Kranzdorf, A. Sellers, G. Grasso, C. Schallhart, and T. Furche. Visual oxpath: Robust wrapping by example. In Proceedings of the 21st International Conference Companion on World Wide Web, WWW ’12 Companion, pages 369–372, New York, NY, USA, 2012. ACM. ISBN 978- 1-4503-1230-1. doi: 10.1145/2187980.2188051. Google ScholarDigital Library
- K. Labs. Kimono: Turn websites into structured APIs from your browser in seconds, Mar. 2016.Google Scholar
- M. Leotta, A. Stocco, F. Ricca, and P. Tonella. Reducing web test cases aging by means of robust xpath locators. In Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pages 449–454, Nov 2014. Google ScholarDigital Library
- doi: 10.1109/ISSREW.2014.17.Google Scholar
- G. Leshed, E. M. Haber, T. Matthews, and T. Lau. Coscripter: automating & sharing how-to knowledge in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, pages 1719–1728, New York, NY, USA, 2008. ACM. doi: 10.1145/1357054.1357323. Google ScholarDigital Library
- I. Li, J. Nichols, T. Lau, C. Drews, and A. Cypher. Here’s what i did: Sharing and reusing web activity with actionshot. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, pages 723–732, New York, NY, USA, 2010. ACM. doi: 10.1145/1753326.1753432. Google ScholarDigital Library
- J. Lin, J. Wong, J. Nichols, A. Cypher, and T. A. Lau. End-user programming of mashups with vegemite. In Proceedings of the 14th international conference on Intelligent user interfaces, IUI ’09, pages 97–106, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- doi: 10.1145/1502650.1502667.Google Scholar
- H. Mannila and C. Meek. Global partial orders from sequential data. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, pages 161–168, New York, NY, USA, 2000. ACM. ISBN 1-58113-233-6. doi: 10.1145/347090.347122. Google ScholarDigital Library
- J. Mickens, J. Elson, and J. Howell. Mugshot: deterministic capture and replay for javascript applications. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI’10, pages 11–11, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- B. Petrov, M. Vechev, M. Sridharan, and J. Dolby. Race detection for web applications. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pages 251–262, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1205-9. doi: 10.1145/ 2254064.2254095. Google ScholarDigital Library
- K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A selective record-replay and dynamic analysis framework for javascript. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 488–498, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- stackoverflow.com. Posts containing ’scraping’ - stack overflow, July 2016.Google Scholar
- J. Wong and J. I. Hong. Making mashups with marmite: Towards end-user programming for the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’07, pages 1435–1444, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-593-9. doi: 10.1145/ 1240624.1240842. Google ScholarDigital Library
- R. Yandrapally, S. Thummalapenta, S. Sinha, and S. Chandra. Robust test automation using contextual clues. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA 2014, pages 304–314, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2645-2. doi: 10.1145/ 2610384.2610390. Google ScholarDigital Library
- T. Yeh, T.-H. Chang, and R. C. Miller. Sikuli: using gui screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology, UIST ’09, pages 183–192, New York, NY, USA, 2009. ACM. doi: 10.1145/1622176.1622213. Google ScholarDigital Library
Index Terms
- Ringer: web automation by demonstration
Recommendations
Ringer: web automation by demonstration
OOPSLA '16With increasing amounts of data available on the web and a diverse range of users interested in programmatically accessing that data, web automation must become easier. Automation helps users complete many tedious interactions, such as scraping data, ...
Is 3D Finally Ready for the Web?
3D content still is not widely found on the Web. Now, though, several new technologies may widen 3D's presence on the Web by transforming browsers into computing platforms powerful enough to play the content.
WPBench: a benchmark for evaluating the client-side performance of web 2.0 applications
WWW '09: Proceedings of the 18th international conference on World wide webIn this paper, a benchmark called WPBench is reported to evaluate the responsiveness of Web browsers for modern Web 2.0 applications. In WPBench, variations of servers and networks are removed and the benchmark result is the closest to what Web users ...
Comments