What is Web Scraping, and What are the Legal Issues Involved?

By Christopher AdkinsPublished On: July 1st, 2015

Spiders, bots and crawlers: Oh MY! Most everyone who has a website or online social media profile is victim of web scraping, whether they know it or not.

What is web scraping?

Web scraping is a technique used by web crawlers, or bots, to extract information from websites. Collected information is transformed into structured data that can be stored and analyzed, typically in a database or spreadsheet. This technology drives a substantial amount of business, and many companies’ viability relies on it. However, controversy can arise when commercial companies use scraping software to collect substantial amounts of data from websites for their own profit.

When does web scraping violate website’s terms of use?

Web scraping is used for many reasons, including price comparisons and targeted advertising. Often websites prohibit scraping through their terms of use. There are two types of terms of use online: clickwrap and browsewrap. Clickwrap terms require the user to click in agreement with the terms of use. Browsewrap terms are simply listed on the website, without requiring any action. Consequently, if the user never saw the terms of use, there was no contract formed because there was no meeting of the minds’.

What are legal risks for businesses that use web scraping?

Companies using web scraping can be subject to legal risks, but under current law, it is unclear what crawlers can and cannot do. A large risk at issue with web crawlers is the unanswered question behind breaking terms and conditions of websites when scraping information. The law is unclear as to whether that activity amounts to trespass to chattels or breach of contract. Some website owners’ claims have been viable in these situations, so there is a risk. When the scraper uses the scraped information commercially, they will likely be subject to more liability. Additionally, if the scraper collects copyrighted information, its operators could be liable for infringement.

Can web scraping give rise to a trespass to chattels action?

Trespass to chattels is a tort claim arising when a party has intentionally interfered with another person’s lawful possession of movable personal property. Because, traditionally, trespass to chattels has included dispossession of the property by taking it, destroying it, or barring the owner’s access to it, it has been argued in the digital age that websites are considered as chattels.

In eBay v. Bidder’s Edge, a notable claim involving scraping as “illegal data mining,” a California court held that the thousands of queries a day, electronic signals retrieving information from eBay’s system, by Bidder’s Edge were sufficiently tangible to constitute a trespass action. However, eBay had not actually suffered any injury or harm from the trespass. While the court acknowledged this, they stated that eBay was not required to wait until they suffered harm before they sought an injunction.

The Supreme Court of California interpreted the eBay decision further in Intel v. Hamidi, stating that showing a risk of future harm substantiated claims of Internet trespass to chattels. Accordingly, to determine if there is a substantial likelihood of future harm, a court should look to the volume or frequency of interferences. Subsequent courts in other jurisdictions have applied this analysis, requiring that the plaintiff demonstrate damage, or substantial risk of future damage, to their computer system. Thus, the degree of protection for online content is not settled, and will depend on the type of access made by the scraper, the amount of information accessed and copied, the degree to which the access adversely affects the site owner’s system, and the types and manner of prohibitions on such activity.

Can web scraping give rise to a breach of contract action?

In regard to breach of contract claims for violating a site’s terms of service, the United States Court of Appeals for the Second Circuit held in Specht v. Netscape Communications Corp. that terms of use are not enforceable if there is not reasonable notice of the existence of the terms and unambiguous consent to that license. Merely clicking on a button does not show assent to license terms if those terms were not obvious and if it was not explicit to the consumer that clicking meant agreeing to the license. California courts went on to determine in Ticketmaster Corp v. Tickets.com, Inc. that a hyperlink to the terms of use placed in the footer of a web page does not constitute prominent notice of those terms. However, if the terms are prominent, then a user will be held to the terms on inquiry notice.

Clickwrap agreements seem to carry more weight, as a Texas state court found grounds of trespass to chattels and breach of contract in American Airlines, Inc. v. Farechase, Inc. The court enforced an ‘if you use this site, you agree’ terms of service statement on American Airlines websites, and enjoined software company Farechase from accessing and scraping data to redistribute and sell it to travel agents and online travel systems.

Can web scraping give rise to a copyright infringement action?

Scraping, generally, raises some copyright law issues. Visiting a copyrighted website temporarily for the purpose of extracting factual information and reproducing it does not violate website owner’s copyright. Any factual information that is extracted is protected under fair use in The Copyright Act. However, if the extracted information is a copyrighted work, the scraper may be subject to copyright liability.

The Digital Millennium Copyright Act of 1998 was enacted to control and regulate copyright issues in a technological world. Section 1201(a)(1) of the DMCA states “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” This provision speaks to web scraping, particularly when bots avoid measures that website owners make to protect their content.

What types of damages have been awarded in previous web scraping cases?

Typically, injunctions are the only remedies sought. A plaintiff in a case against a scraper must show substantial harm in order to receive damages. Courts have generally not awarded damages, only injunctions, in cases involving web scraping.

It seems that the design and nature of the crawled web sites determines the legal liability, versus the actual activity of the crawler itself. If your business operates scraping technology, be wary of what you crawl! If you operate a website, check out these tips to create an effective user policy.

Disclaimer: This website provides general information and discussion about legal topics. The content is not legal advice and should not be relied upon as such. Always seek the advice of a licensed attorney for legal matters.

What is Web Scraping, and What are the Legal Issues Involved?