How to build a web scraper with Google Sheets

Must-have marketer skill: Web Scraping

Here’s a trick: You can do it in Google Sheets. Scrape links, tables, emails, social handles. ANYTHING in a sheet.

Here’s the 2-minute tutorial?

The function to know: =IMPORTXML().

This pulls in any structured data type like CSV, RSS, HTML, or XML from a URL. For this tutorial, we will work with XML or “xpaths.”

=IMPORTXML() takes 2 Arguments

  1. The URL you want to use, including the “https://” bit and wrapped in “” . For example: “https://kolbyhatch.com

2. The xpath of the element you want to scrape.

Let’s briefly talk xpaths ?

Think of these as roadmaps to data types on a webpage.

w3schools has a great tutorial on them:

https://www.w3schools.com/xml/xpath_intro.asp

Now xpaths can be simple or complicated, but getting them is easy.

  1. On the webpage, open inspector (⌃⌘C)
  2. Find the HTML element you want to scrape (title, H1, href, etc.)
  3. Right-click, Copy –> “Copy xpath”

For more basic scrapping like, page titles, or links on a webpage, you can use these generic xpaths:

  • All links = //@href
  • All H1s = //h1 (works for any heading, h2, h3)
  • Title = //title

(There’s more on the w3school tutorial above)

Putting it all together:

=ImportXML(“https://thehustle.co/home/“, //@href)

This will scrape all links on The Hustle’s homepage.

Using =ImportXML() function and copied xpaths you can build sheets that scrape…

  • Front pages of publishers, giving you article links, titles, authors
  • Scrape emails from company “about” pages
  • Scrape social handles from pages

[insert endless list]

Try it for yourself: I made a simple scraper that pulls links from any webpage.

Copy the sheet to your drive, drop in a URL and boom.

Here it is:

https://docs.google.com/spreadsheets/d/12Yc4FnOMEWli4mmQw6IJQxDUWwdS-mhz1jlBCgkfPSw/edit?usp=sharing

Personally, I’ve used this trick to quickly pull data from hosted tables, or scrape social handles. You’ll have to tweak/copy the right xpaths).

It’s good for building lists of leads or contacts.

It does have limitations. You can crash a sheet if the data set is too large — I’ve had issues over 300 cells.

And sites like FB, TikTok, Instagram, make it really hard to do this.

Related Posts:

You caught us early!

Thanks for taking an interest in Subtrics! We’re currently onboarding a limited number of clients at the moment.

Join our waitlist below to reserve your spot in line. We’ll reach out as we get availability.

A valid email is required to create the new Google Form template.

I will never sell, share, or spam any emails submitted to me.

Occassionally, I share updates to my subscribers. You are free to unsubscribe at any time.