

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//HiPEDS – EPSRC Centre for Doctoral Training - ECPv6.15.11//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:HiPEDS – EPSRC Centre for Doctoral Training
X-ORIGINAL-URL:https://wp.doc.ic.ac.uk/hipeds
X-WR-CALDESC:Events for HiPEDS – EPSRC Centre for Doctoral Training
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:UTC
BEGIN:STANDARD
TZOFFSETFROM:+0000
TZOFFSETTO:+0000
TZNAME:UTC
DTSTART:20160101T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=UTC:20170823T120000
DTEND;TZID=UTC:20170823T130000
DTSTAMP:20260426T001213
CREATED:20180420T133506Z
LAST-MODIFIED:20180420T133506Z
UID:1805-1503489600-1503493200@wp.doc.ic.ac.uk
SUMMARY:HiPEDS Seminar: Web Data Extraction: A Crash Course
DESCRIPTION:HiPEDS CDT Seminar Series with Giorgio Orsi\, Senior Research Scientist at Meltwater.\n\n\n\nAbstract: Data acquisition plays an important role in modern organisations and is a strategic business process for data-driven companies such as insurers\, retailers\, and search engines. Data acquisition processes range from manual data collection and purchase\, to cheaper but often technically challenging methods such as automated collection and crowdsourcing. The abundance of web data has made web scraping (also known as web data extraction or web wrapping) an essential tool in data acquisition processes. A wrapper is a program that turns web content into structured data using techniques ranging from visual analysis of the rendered page to DOM tree mining. Web scraping is often the only viable data collection method for websites\, in particular when no API is available. Although web scraping typically relies on inducing a wrapper for every source\, a number of semi- or fully automated techniques for web scraping have emerged. These recent advances have finally allowed for accurate and fully automated wrapper induction at the scale of hundreds of thousands of sources. They have also contributed to revitalised the area\, as evident from a growing number of web scraping startups\, e.g.\, Import.io\, DiffBot\, ScrapingHub\, and Wrapidity. \n  \nThis lecture is a crash course in Web Scraping. We will start with an overview of the available techniques and technologies\, discussing when and where they are appropriate. We will then introduce the Open Source OXPath language for declarative web scraping. \n  \nBio: Giorgio Orsi is a Senior Research Scientist at Meltwater and an Honorary Researcher at the School of Computer Science of the University of Birmingham. His research deals with the algorithmic aspects of large-scale data processing and with the logical foundations of information integration and knowledge representation. Giorgio is a co-investigator of the EPSRC Programme Grant VADA (Value Added Data Systems) and a co-founder of Wrapidity\, an Oxford University startup\, that was recently acquired by Meltwater to boost collection of outside data using AI.
URL:https://wp.doc.ic.ac.uk/hipeds/event/hipeds-seminar-web-data-extraction-a-crash-course/
LOCATION:Huxley Building\, Room 217/218\, Imperial College London\, London\, SW7 2AZ\, United Kingdom
ORGANIZER;CN="Giannis Evagorou":MAILTO:g.evagorou15@imperial.ac.uk
END:VEVENT
END:VCALENDAR