How To Automate Competitors‘ & Benchmark Analysis With Python

Doing competitors‘ or benchmark analysis for business intelligence can be a burdensome task as it requires gathering and analyzing data from multiple sources. The purpose of this article is to provide a guide on how to automate major parts of the data extraction and analysis process with Python.

As an experienced data analyst and Python developer with over 5 years of experience automating business intelligence workflows, I highly recommend taking advantage of Python‘s capabilities for accelerating competitors‘ analysis. After learning how to leverage Python for these tasks, you can dedicate more time to extracting strategic insights rather than manual data collection.

Here‘s an overview of the main steps we‘ll cover:

Use web scraping APIs and packages like Selenium to gather competitor website data and content
Leverage APIs like Clearbit and BuiltWith to collect technology stack and contact data
Connect marketing analytics APIs like SEMrush and Ahrefs to pull in campaign insight
Utilize packages like Pandas, NumPy and Matplotlib to process and analyze the data
Visualize and present the analyzed competitor intelligence in Jupyter Notebooks

Let‘s go through each of these steps in more detail.

Gathering Competitor Website Data

The first step is gathering the raw data from competitor websites. This can include content, metadata, HTML, and more. Python has some great web scraping tools that can automate this process.

Web Scraping Packages

For basic web scraping tasks, I recommend using Python packages like Beautiful Soup and Scrapy. Beautiful Soup is great for parsing HTML and pulling out specific elements, while Scrapy lets you crawling entire websites and extract data.

Here‘s some sample code using Beautiful Soup to scrape a product description:

import requests
from bs4 import BeautifulSoup

url = ‘https://www.competitor-site.com/product-page‘
response = requests.get(url)

soup = BeautifulSoup(response.content, ‘html.parser‘)
description = soup.find(id="product-description").get_text()
print(description)

Scrapy can go through an entire site and extract prices, descriptions, images and more into a structured format.

Browser Automation with Selenium

For more complex scraping cases involving JavaScript rendering or login walls, I recommend Selenium. This lets you automate and control browsers like Chrome and Firefox through Python code.

Here‘s some sample Selenium code to scrape competitor pricing after a login:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get(‘https://www.competitor-site.com‘)

username_input = driver.find_element(By.ID, ‘username‘)
username_input.send_keys(‘myusername‘)

password_input = driver.find_element(By.ID, ‘password‘)    
password_input.send_keys(‘mypassword‘)

login_button = driver.find_element(By.XPATH, ‘//button[text()="Log in"]‘)
login_button.click()

prices = driver.find_elements(By.CLASS_NAME, ‘product-price‘)
for price in prices:
   print(price.text)

driver.quit()

This logs into the competitor website, navigates to the product listings, and extracts the pricing information.

Web Scraping Services

Services like ScraperAPI and ProxyCrawl also offer cloud-based web scraping solutions, handling proxy rotation, CAPTCHAs, and more automatically. These can be great alternatives to dealing with the complexities of Selenium and Scrapy configurations.

Here‘s an example using ScraperAPI (after setting API key) to extract competitor content from a JavaScript-heavy site:

import requests 

api_key = ‘XXX‘ # ScraperAPI key 

url = ‘https://www.competitor-site.com/product/fancy-product‘

params = {‘api_key‘: api_key, ‘url‘: url}
response = requests.get(‘http://api.scraperapi.com‘, params=params)

content = response.json()[‘html‘]
print(content)

The key is choosing the right tool or service for each specific scraping need, balancing cost, data needs and complexity.

Gathering Technology Stack Details

Understanding the technologies and services competitors use for their website and marketing provides great intelligence. Python has integrations with APIs that can surface this type of insight.

Clearbit Logo and Tech Stack API

Clearbit offers an API that can return the technologies and services detected on any website domain. You pass the competitor domain, and it returns details on the tech stack, including frameworks, JS libraries, analytics tools, and more.

Here‘s a sample request:

import clearbit
import json

clearbit.key = ‘sk_XXX‘ # Set Clearbit API key

domain = ‘competitorwebsite.com‘
response = clearbit.Enrichment.find(domain=domain)

print(json.dumps(response, indent=2))

This prints out a JSON response like:

{
  "tech": [
    "Google Analytics",
    "Google Tag Manager", 
    "jQuery",
    "React",
    "Wordpress"
  ],
  "logo": "https://logo.clearbit.com/competitorwebsite.com" 
}

BuiltWith – Web Technology Profiling API

BuiltWith offers an API that serves similar technology detection data on websites. It returns details on platforms, frameworks, ecommerce platforms, plugins, and more.

Here‘s a sample Python request to the BuiltWith API:

import requests

api_key = ‘XXX‘ # BuiltWith API key
domain = ‘competitorwebsite.com‘

url = f‘https://api.builtwith.com/v13/api.json?KEY={api_key}&LOOKUP={domain}‘

response = requests.get(url)
tech_stack = response.json()[‘Results‘][0][‘Meta‘][‘Applications‘]

print(tech_stack)

This outputs the list of detected technologies, like:

["jQuery", "Google Analytics", "iPhone Mobile Compatible", "Facebook for Websites"]

Marketing Campaign Analytics

Marketing analytics tools like SEMrush, Ahrefs, and SimilarWeb provide great visibility into competitor‘s digital marketing and SEO campaigns. Their APIs let you pull this data directly into your Python analysis.

SEMrush API

SEMrush offers robust organic and paid search data on competitors. You can use the SEMrush API to extract things like:

Top organic keywords rankings
Paid search spend
Display ad campaigns
Landing page optimization opportunities

Here‘s sample Python code to pull top organic keywords from the SEMrush API:

import semrush

client = semrush.semrushclient.SemrushClient("XXX") # Set SEMrush API key

domain = "competitorwebsite.com"
response = client.domain_organic(domain, database=‘us‘)

top_keywords = response.json()[‘data‘][0:10] 

for keyword in top_keywords:
  print(keyword[‘Phrase‘])

This prints the first 10 ranking keywords for that competitor domain.

Ahrefs API

The Ahrefs API provides backlink data and organic/paid keywords details. You can extract things like:

Top backlink sources
Referring domains over time
Keyword rankings and traffic
Paid ad history

Here‘s a sample script to get the top 10 referring domains:

import requests

api_key = ‘XXX‘ # Ahrefs API key

url = ‘https://apiv2.ahrefs.com/domains/referring-domains‘

params = {
  ‘target‘: ‘competitorwebsite.com‘,
  ‘order_by‘: ‘domains_rating‘,
  ‘limit‘: 10,
  ‘token‘: api_key
}

response = requests.get(url, params=params)
data = response.json()

for domain in data[‘referring_domains‘]:
  print(domain[‘domain‘])

This iterates through the top referring domains to the competitor.

Analyzing the Data with Python

Once you‘ve collected all the raw competitive data, it‘s time to process and analyze it to surface insights. Python‘s data analysis libraries like Pandas, NumPy and Matplotlib are perfect for this task.

Loading Data into Pandas

The first step is loading the scraped competitor data into Pandas DataFrames for analysis:

import pandas as pd

df = pd.read_excel(‘competitor_data.xlsx‘) 
df = pd.read_csv(‘competitor_keywords.csv‘)

You can load and consolidate the data from all your scraping scripts and APIs into Pandas.

Data Cleaning

Next, I recommend cleaning the data to fix any errors, inconsistencies, or missing values:

import numpy as np

df[‘Phone‘].fillna(value=‘No Phone‘, inplace=True)

df[‘Address‘] = df[‘Address‘].str.title()

df.dropna(subset=[‘Product Name‘], inplace=True)

df[‘Price‘] = df[‘Price‘].replace(‘Call for Quote‘, np.nan)

This fills missing phone values, normalizes addresses, drops rows missing product name, and replaces text price values.

Analysis with the Pandas API

Once clean, the Pandas API makes it easy to start doing analysis on the competitive dataset:

# Summary stats
print(df[‘Price‘].describe())

# Segment by product category 
by_category = df.groupby(‘Category‘).agg({‘Revenue‘: ‘sum‘})

# Plot average price over time
df.groupby(‘Month‘)[‘Price‘].mean().plot()

The .groupby(), .agg(), .describe(), .plot() and other methods enable fast insights.

Matplotlib Visualizations

For more advanced visualizations, Matplotlib is a great tool. You can create charts, graphs and more to better understand the competitor analysis data:

# Top 10 keywords bar chart
import matplotlib.pyplot as plt

df[‘Keyword‘].value_counts()[0:10].plot(kind=‘bar‘)
plt.title(‘Top 10 Keywords‘)
plt.xlabel(‘Keywords‘)  
plt.ylabel(‘Ranking‘)

This plots an informative keywords bar chart from the dataset.

Presenting Results in Jupyter Notebooks

To easily present and share the final analysis and visualizations, I love using Jupyter Notebooks. The mix of code, plots, markdown cells and more make Notebooks the perfect competitor intelligence reporting format.

For example, you can walk through the analysis in a logical flow:

Import and prep the data
Surface interesting trends in the data
Create informative plots for the trends
Summarize key takeaways and recommendations

This helps easily tell the data story and highlight the most important insights to the stakeholders consuming the report.

In summary, by tapping into Python‘s powerful libraries for web scraping, APIs, data analysis and reporting, you can automate and accelerate time-intensive competitor intelligence workflows. The time savings add up to hours and days that can be redirected to actually making strategic decisions based on the competitor insights.

On top of open source Python libraries, leveraging cloud services like ScraperAPI and Clearbit APIs can save additional development work building out specialty crawlers and data connectors.

As you build out your own competitor data analysis pipelines, I recommend:

Starting with a clear goal of the insights you want to unlock – this focuses your data collection
Being ethical – stay away from any unauthorized data access or terms of service violations
Automating incrementally – tackle scraping and integration in small steps vs. a giant system
Checking the data quality – bad data leads to bad insights
Visualizing key metrics – visual presentation of trends helps interpretation
Iterating over time – plan periodic refreshes to stay up to date

Competitor intelligence is a never-ending activity. By tapping into Python‘s strengths for data automation, you can ensure your efforts scale and accelerate over time. The insights uncovered today create opportunity for better strategic decisions tomorrow.