How to Scrape Yelp Data: Tutorial

Yelp has become an invaluable resource for consumers looking for information on local businesses. With over 200 million reviews of restaurants, shops, services and more, Yelp offers a wealth of data for anyone looking to better understand the local business landscape.

In this comprehensive tutorial, we‘ll cover everything you need to know to effectively scrape Yelp data using Python. Whether you‘re looking to conduct market research, monitor your brand‘s online reputation or gain competitive intelligence, scraping Yelp data can provide powerful insights.

Why Scrape Yelp Data?

Here are some of the most common reasons people scrape Yelp data:

Market Research: Analyze consumer sentiment and preferences to inform business strategy and marketing. Gain insights into competitors.
Reputation Management: Monitor reviews and ratings for your own business. Be the first to know about emerging issues or problems.
Location Planning: Identify high-demand areas and saturated markets to find the best places to open new locations.
Data Analysis: Yelp data can be combined with other data sources for deeper analysis of consumer behavior and economic trends.
Competitive Intelligence: Track competitors‘ reviews, ratings and number of reviews over time. Monitor new entrants into your market.
Customer Service: Follow up quickly on negative reviews and customer complaints to provide great service.
Lead Generation: Reach out to reviewers as potential new customers and convert them.

The data available from each Yelp business listing includes:

Business name, address, phone number and website
Categories like cuisine types for restaurants
Price range
Hours of operation
Parking and amenity information
Reviews and ratings
Photos
And more

With over 39 million business listings in countries around the world, Yelp provides an immense amount of data to tap into.

Is Web Scraping Yelp Allowed?

Yelp‘s terms of service expressly prohibit scraping their data. However, many businesses still scrape Yelp data for internal analysis and insight. It‘s important to carefully check Yelp‘s terms and consult qualified legal counsel before beginning any scraping project.

When scraping Yelp or any website, it‘s best to take precautions like using proxies and reasonable crawl delays to avoid overloading servers. Scrape responsibly.

Now let‘s look at how to effectively scrape Yelp while avoiding blocks and bans.

Yelp Scraping Project Setup

To scraper Yelp listings and reviews, we‘ll use Python 3 with three key packages – Requests, BeautifulSoup and Pandas.

If you don‘t already have Python 3 installed, download it from python.org and run the installer.

Then open a terminal or command prompt and install the packages we‘ll need:

pip install requests beautifulsoup4 pandas

Now create a new Python file and import the libraries:

from bs4 import BeautifulSoup
import requests 
import pandas as pd

BeautifulSoup will help parse HTML and XML content from the pages we scrape. Requests will let us send HTTP requests to fetch pages. Pandas will allow us to organize the scraped data into a DataFrame.

Scraping a Yelp Business Page

Let‘s start by scraping data from a single Yelp business page. Here are the steps:

Get the URL for the business page you want to scrape. For example:

url = ‘https://www.yelp.com/biz/gary-danko-san-francisco‘

Use Requests to download the page content:

page = requests.get(url)

Pass the page content to BeautifulSoup to create a parsed document:

soup = BeautifulSoup(page.content, ‘html.parser‘)

Inside this document, we can now use CSS selectors or other methods to extract the data we want. For example, to get the business name:

name = soup.select_one(‘h1[class="css-11q1g5y"]‘).text

Extract other data like the number of reviews, rating, phone number, etc. For example:

reviews = soup.select_one(‘span[class=" css-1fdy0l5"]‘).text 
category = soup.select_one(‘span[class=" css-1e4fd8g"]‘).text
phone = soup.select_one(‘p[class="css-17ih8de"]‘).text

Store the extracted data in a Python dictionary:

data = {
  ‘name‘: name,
  ‘reviews‘: reviews,
  ‘category‘: category,  
  ‘phone‘: phone
}

And that‘s it! With just a few lines of Python code we‘ve scraped key data from a Yelp listing. The full code would look like:

import requests
from bs4 import BeautifulSoup

url = ‘https://www.yelp.com/biz/gary-danko-san-francisco‘

page = requests.get(url)
soup = BeautifulSoup(page.content, ‘html.parser‘)

name = soup.select_one(‘h1[class="css-11q1g5y"]‘).text
reviews = soup.select_one(‘span[class=" css-1fdy0l5"]‘).text  
category = soup.select_one(‘span[class=" css-1e4fd8g"]‘).text
phone = soup.select_one(‘p[class="css-17ih8de"]‘).text

data = {
  ‘name‘: name,
  ‘reviews‘: reviews,    
  ‘category‘: category,
  ‘phone‘: phone
}

print(data)

To get the CSS selectors, you can inspect elements directly in your browser. Just right click and choose "Inspect". Then select the element you want data from to see its CSS selector.

Now let‘s look at scaling up to scrape multiple pages.

Scraping Yelp Search Results

To scrape multiple listings from Yelp search results, we need to:

Get the URL for the search results page. For example, a search for "restaurants in San Francisco":

url = ‘https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Francisco,+CA‘

Fetch the page with Requests and parse it with BeautifulSoup as before.
Use a CSS selector to get all listing divs:

results = soup.select(‘div[class="container__09f24__rOJ9a hoverable__09f24__Ow1FB margin-t3__09f24__AqcTw margin-b3__09f24__uhzZp padding-t3__09f24__TM6dY padding-r3__09f24__eoPmA padding-b3__09f24__hLrAI padding-l3__09f24__TPKFt border-color--default__09f24__NPAKY"]]‘)

Loop through each result and extract the data into a dictionary:

for result in results:

  name = result.select_one(‘a[class="css-166la90"]‘).text

  reviews = result.select_one(‘span[class=" css-1fdy0l5"]‘).text 

  rating = result.select_one(‘div[class="i-stars__373c0___sZu0"]‘)[‘aria-label‘]

  data.append({
    ‘name‘: name,
    ‘reviews‘: reviews,
    ‘rating‘: rating
  })

Store all the data in a Pandas DataFrame and export to a CSV:

df = pd.DataFrame(data) 
df.to_csv(‘yelp_data.csv‘, index=False)

This allows us to scrape entire search result pages with hundreds of listings for further analysis.

Scraping Yelp Reviews

In addition to listing data, reviews provide great insights. To scrape reviews from a Yelp business page:

On the business page, get all the review divs:

reviews = soup.select(‘div[class="container__09f24__JeJaa margin-b3__09f24__wQDhs padding-b3__09f24__q34t2 border-color--default__09f24__NPAKY"]‘)

Loop through the review divs.
Extract rating, date, text and username from each.

for review in reviews:

  rating = review.select_one(‘div[class="i-stars__373c0__1GMEk i-stars--regular-4__373c0__38snW"]‘).get(‘aria-label‘)

  date = review.select_one(‘span[class="css-chan6m"]‘).text

  text = review.select_one(‘p[lang="en"]‘).text

  username = review.select_one(‘a[class="css-166la90"]‘).text

Store in a dictionary and append to a list.
Convert the review data to a DataFrame for analysis.

Now you can scrape detailed review data along with business info!

Tips for Effective Yelp Scraping

Here are some tips for scraping Yelp efficiently and avoiding blocks:

Use proxies: Rotating proxies mask scrapers and avoid easy detection by Yelp.
Add random delays: Don‘t hit Yelp too aggressively. Add 3-5 second delays between requests.
Scrape during off-peak hours: Try to scrape at night or early morning when traffic is lower.
Check for captcha: Detect captcha pages and handle appropriately, often by solving manually.
Use a headless browser: Tools like Selenium and Splinter can mimic human browsing behavior more closely.
Scale gradually: Start small and slowly expand the amount of data you pull per day/week as needed.

Analyzing Yelp Data

Once you‘ve scraped Yelp data, the real work begins! Here are some ideas for how to analyze and extract value from Yelp data:

Sentiment analysis – Identify positive and negative sentiment in review text.
Review summarization – Automatically detect key topics in reviews.
Competitor benchmarking – Compare ratings and review volumes over time.
Identification of customer pain points – Find common complaints in reviews.
Market mapping – Analyze search results for competitor data, saturation by region etc.
Predictive modeling – Estimate future ratings and review volumes.
Integrating other data – Combine Yelp data with weather, demographic, or search trend data for deeper insights.

The analysis options are nearly endless. Yelp data opens up a world of possibilities for better understanding consumers and gaining competitive intelligence.

Scraping Yelp with the Oxylabs API

While a custom Python scraper works well, for large scale scraping an API can save tons of development time. Oxylabs offers an API that handles proxies, browsers, CAPTCHAs, and more so you can focus just on results.

To use the Oxylabs API:

Sign up for an account.
Install the Python wrapper:

pip install oxylabs

Import and instantiate a client:

from oxylabs import OxylabsAPI

client = OxylabsAPI(api_key=‘YOUR_API_KEY‘)

Use the get method to fetch any URL:

page = client.get(url=‘https://www.yelp.com/biz/gary-danko-san-francisco‘, render=True)

It handles proxies, browsers, CAPTCHAs and blocks automatically. The render=True parameter will load JavaScript content.

You can now pass the page result to BeautifulSoup to parse and extract data just like before.

For larger jobs, Oxylabs offers plans for parallel scraping, and integrations like Scala and Scrapy. The API saves tons of effort so you can focus on data analysis and application.

Conclusion

Scraping data from Yelp provides a wealth of information for understanding local businesses, consumers, competitors and markets. With just a bit of Python code and packages like BeautifulSoup, Requests and Pandas, you can extract everything from business listings to detailed reviews.

By following best practices like using proxies, scraping responsibly, and analyzing Yelp data effectively, you can gain powerful insights without running into major issues.

Yelp data powers everything from academic studies to competitive intelligence and reputation management. Combining scraped Yelp data with other sources can uncover strategic opportunities and take your market research to the next level.

I hope this comprehensive guide provides everything you need to start scraping Yelp data successfully today! Let me know if you have any other questions.