Doing competitors‘ or benchmark analysis for business intelligence can be a burdensome task as it requires gathering and analyzing data from multiple sources. The purpose of this article is to provide a guide on how to automate major parts of the data extraction and analysis process with Python.
As an experienced data analyst and Python developer with over 5 years of experience automating business intelligence workflows, I highly recommend taking advantage of Python‘s capabilities for accelerating competitors‘ analysis. After learning how to leverage Python for these tasks, you can dedicate more time to extracting strategic insights rather than manual data collection.
Here‘s an overview of the main steps we‘ll cover:
- Use web scraping APIs and packages like Selenium to gather competitor website data and content
- Leverage APIs like Clearbit and BuiltWith to collect technology stack and contact data
- Connect marketing analytics APIs like SEMrush and Ahrefs to pull in campaign insight
- Utilize packages like Pandas, NumPy and Matplotlib to process and analyze the data
- Visualize and present the analyzed competitor intelligence in Jupyter Notebooks
Let‘s go through each of these steps in more detail.
Gathering Competitor Website Data
The first step is gathering the raw data from competitor websites. This can include content, metadata, HTML, and more. Python has some great web scraping tools that can automate this process.
Web Scraping Packages
For basic web scraping tasks, I recommend using Python packages like Beautiful Soup and Scrapy. Beautiful Soup is great for parsing HTML and pulling out specific elements, while Scrapy lets you crawling entire websites and extract data.
Here‘s some sample code using Beautiful Soup to scrape a product description:
import requests
from bs4 import BeautifulSoup
url = ‘https://www.competitor-site.com/product-page‘
response = requests.get(url)
soup = BeautifulSoup(response.content, ‘html.parser‘)
description = soup.find(id="product-description").get_text()
print(description)
Scrapy can go through an entire site and extract prices, descriptions, images and more into a structured format.
Browser Automation with Selenium
For more complex scraping cases involving JavaScript rendering or login walls, I recommend Selenium. This lets you automate and control browsers like Chrome and Firefox through Python code.
Here‘s some sample Selenium code to scrape competitor pricing after a login:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get(‘https://www.competitor-site.com‘)
username_input = driver.find_element(By.ID, ‘username‘)
username_input.send_keys(‘myusername‘)
password_input = driver.find_element(By.ID, ‘password‘)
password_input.send_keys(‘mypassword‘)
login_button = driver.find_element(By.XPATH, ‘//button[text()="Log in"]‘)
login_button.click()
prices = driver.find_elements(By.CLASS_NAME, ‘product-price‘)
for price in prices:
print(price.text)
driver.quit()
This logs into the competitor website, navigates to the product listings, and extracts the pricing information.
Web Scraping Services
Services like ScraperAPI and ProxyCrawl also offer cloud-based web scraping solutions, handling proxy rotation, CAPTCHAs, and more automatically. These can be great alternatives to dealing with the complexities of Selenium and Scrapy configurations.
Here‘s an example using ScraperAPI (after setting API key) to extract competitor content from a JavaScript-heavy site:
import requests
api_key = ‘XXX‘ # ScraperAPI key
url = ‘https://www.competitor-site.com/product/fancy-product‘
params = {‘api_key‘: api_key, ‘url‘: url}
response = requests.get(‘http://api.scraperapi.com‘, params=params)
content = response.json()[‘html‘]
print(content)
The key is choosing the right tool or service for each specific scraping need, balancing cost, data needs and complexity.
Gathering Technology Stack Details
Understanding the technologies and services competitors use for their website and marketing provides great intelligence. Python has integrations with APIs that can surface this type of insight.
Clearbit Logo and Tech Stack API
Clearbit offers an API that can return the technologies and services detected on any website domain. You pass the competitor domain, and it returns details on the tech stack, including frameworks, JS libraries, analytics tools, and more.
Here‘s a sample request:
import clearbit
import json
clearbit.key = ‘sk_XXX‘ # Set Clearbit API key
domain = ‘competitorwebsite.com‘
response = clearbit.Enrichment.find(domain=domain)
print(json.dumps(response, indent=2))
This prints out a JSON response like:
{
"tech": [
"Google Analytics",
"Google Tag Manager",
"jQuery",
"React",
"Wordpress"
],
"logo": "https://logo.clearbit.com/competitorwebsite.com"
}
BuiltWith – Web Technology Profiling API
BuiltWith offers an API that serves similar technology detection data on websites. It returns details on platforms, frameworks, ecommerce platforms, plugins, and more.
Here‘s a sample Python request to the BuiltWith API:
import requests
api_key = ‘XXX‘ # BuiltWith API key
domain = ‘competitorwebsite.com‘
url = f‘https://api.builtwith.com/v13/api.json?KEY={api_key}&LOOKUP={domain}‘
response = requests.get(url)
tech_stack = response.json()[‘Results‘][0][‘Meta‘][‘Applications‘]
print(tech_stack)
This outputs the list of detected technologies, like:
["jQuery", "Google Analytics", "iPhone Mobile Compatible", "Facebook for Websites"]
Marketing Campaign Analytics
Marketing analytics tools like SEMrush, Ahrefs, and SimilarWeb provide great visibility into competitor‘s digital marketing and SEO campaigns. Their APIs let you pull this data directly into your Python analysis.
SEMrush API
SEMrush offers robust organic and paid search data on competitors. You can use the SEMrush API to extract things like:
- Top organic keywords rankings
- Paid search spend
- Display ad campaigns
- Landing page optimization opportunities
Here‘s sample Python code to pull top organic keywords from the SEMrush API:
import semrush
client = semrush.semrushclient.SemrushClient("XXX") # Set SEMrush API key
domain = "competitorwebsite.com"
response = client.domain_organic(domain, database=‘us‘)
top_keywords = response.json()[‘data‘][0:10]
for keyword in top_keywords:
print(keyword[‘Phrase‘])
This prints the first 10 ranking keywords for that competitor domain.
Ahrefs API
The Ahrefs API provides backlink data and organic/paid keywords details. You can extract things like:
- Top backlink sources
- Referring domains over time
- Keyword rankings and traffic
- Paid ad history
Here‘s a sample script to get the top 10 referring domains:
import requests
api_key = ‘XXX‘ # Ahrefs API key
url = ‘https://apiv2.ahrefs.com/domains/referring-domains‘
params = {
‘target‘: ‘competitorwebsite.com‘,
‘order_by‘: ‘domains_rating‘,
‘limit‘: 10,
‘token‘: api_key
}
response = requests.get(url, params=params)
data = response.json()
for domain in data[‘referring_domains‘]:
print(domain[‘domain‘])
This iterates through the top referring domains to the competitor.
Analyzing the Data with Python
Once you‘ve collected all the raw competitive data, it‘s time to process and analyze it to surface insights. Python‘s data analysis libraries like Pandas, NumPy and Matplotlib are perfect for this task.
Loading Data into Pandas
The first step is loading the scraped competitor data into Pandas DataFrames for analysis:
import pandas as pd
df = pd.read_excel(‘competitor_data.xlsx‘)
df = pd.read_csv(‘competitor_keywords.csv‘)
You can load and consolidate the data from all your scraping scripts and APIs into Pandas.
Data Cleaning
Next, I recommend cleaning the data to fix any errors, inconsistencies, or missing values:
import numpy as np
df[‘Phone‘].fillna(value=‘No Phone‘, inplace=True)
df[‘Address‘] = df[‘Address‘].str.title()
df.dropna(subset=[‘Product Name‘], inplace=True)
df[‘Price‘] = df[‘Price‘].replace(‘Call for Quote‘, np.nan)
This fills missing phone values, normalizes addresses, drops rows missing product name, and replaces text price values.
Analysis with the Pandas API
Once clean, the Pandas API makes it easy to start doing analysis on the competitive dataset:
# Summary stats
print(df[‘Price‘].describe())
# Segment by product category
by_category = df.groupby(‘Category‘).agg({‘Revenue‘: ‘sum‘})
# Plot average price over time
df.groupby(‘Month‘)[‘Price‘].mean().plot()
The .groupby(), .agg(), .describe(), .plot() and other methods enable fast insights.
Matplotlib Visualizations
For more advanced visualizations, Matplotlib is a great tool. You can create charts, graphs and more to better understand the competitor analysis data:
# Top 10 keywords bar chart
import matplotlib.pyplot as plt
df[‘Keyword‘].value_counts()[0:10].plot(kind=‘bar‘)
plt.title(‘Top 10 Keywords‘)
plt.xlabel(‘Keywords‘)
plt.ylabel(‘Ranking‘)
This plots an informative keywords bar chart from the dataset.
Presenting Results in Jupyter Notebooks
To easily present and share the final analysis and visualizations, I love using Jupyter Notebooks. The mix of code, plots, markdown cells and more make Notebooks the perfect competitor intelligence reporting format.
For example, you can walk through the analysis in a logical flow:
- Import and prep the data
- Surface interesting trends in the data
- Create informative plots for the trends
- Summarize key takeaways and recommendations
This helps easily tell the data story and highlight the most important insights to the stakeholders consuming the report.
In summary, by tapping into Python‘s powerful libraries for web scraping, APIs, data analysis and reporting, you can automate and accelerate time-intensive competitor intelligence workflows. The time savings add up to hours and days that can be redirected to actually making strategic decisions based on the competitor insights.
On top of open source Python libraries, leveraging cloud services like ScraperAPI and Clearbit APIs can save additional development work building out specialty crawlers and data connectors.
As you build out your own competitor data analysis pipelines, I recommend:
-
Starting with a clear goal of the insights you want to unlock – this focuses your data collection
-
Being ethical – stay away from any unauthorized data access or terms of service violations
-
Automating incrementally – tackle scraping and integration in small steps vs. a giant system
-
Checking the data quality – bad data leads to bad insights
-
Visualizing key metrics – visual presentation of trends helps interpretation
-
Iterating over time – plan periodic refreshes to stay up to date
Competitor intelligence is a never-ending activity. By tapping into Python‘s strengths for data automation, you can ensure your efforts scale and accelerate over time. The insights uncovered today create opportunity for better strategic decisions tomorrow.