Extract data from URL to Excel

Businesses rely on the internet to get all kinds of critical information—from contact information, shipping tracking, competitor pricing, data from portals, and more. And while these tasks seem simple, searching websites and portals and copying and pasting the information into Excel can quickly take up a lot of your precious time. Plus, manually entering data into a spreadsheet is highly prone to human error. But with robotic process automation (RPA), you can streamline these repetitive tasks with automated data scraping from websites.

Automated data scraping collects data across many sources and pulls it into one spot—like an Excel spreadsheet—to eliminate errors and give you time back to work on more critical projects. Here are just some of the ways real companies are using automated data scraping:

  • Gathering contact information from an online portal
  • Price comparisons for competitive analysis
  • Monitoring real estate prices from MLS
  • Testing data for machine learning projects
  • Tracking shipments from UPS, FedEx, etc.
  • And many more!

Text

In the video above, you’ll see an Automate bot running a task that enters UPS tracking numbers into the UPS website, performs automated data scraping to get delivery tracking information, and enters it into an Excel file. After the task runs, it goes on to show how that task was built. All but step 1 are shown in the video.

Step 1: Download an Automate trial

Step 2: Build the task by starting with variables. (If you need a basic primer on how to build Automate tasks, Automate Academy is a great place to learn.)

In this task, you’ll add variables for file names, rows, etc. Notice that this task builder is drag and drop, with no coding required!

Step 3: Open Excel workbook to get tracking numbers. You’ll store this as a dataset to use later on.

Step 4: Add a step to create a report workbook to write the dataset to.

Step 5: Use the report workbook with tracking numbers and column headings in a web browser activity.

Step 6: Identify which pieces of information you need. This will include telling the Automate bot where to find the data you want scraped. Put this on a loop to go through all the tracking numbers to do automated data scraping from the UPS website into Excel.

Step 7: For each piece of data you want scraped from the website, write the variable value to a cell in the workbook.

This is just one example of Excel automation. There are so many other ways Automate and Excel can work together to take manual work off your plate.

Excel’s Power Query (or Get & Transform since Excel 2016) is a great tool for building queries to get data from the web. Within a couple of minutes you can build a query that will pull data from a webpage and transform it into the desired format. This is great for getting data from a webpage that is updated frequently as you will be able easily refresh your query to pull the new data.

Remember, if you’re not using Excel 2016 or later, then you’ll need to install the power query add-in.

Data to Extract

In this post we’re going to take a look at how we can pull data from a series of similar pages. I’m a big MMA fan, so the example we’re going to look at is getting a list of all UFC results from Wikipedia.

Extract data from URL to Excel

If you visit the Wikipedia page for UFC events there’s a table of Past Events. If you click on one of the events you’ll see a results table. If you look at a few more events, you’ll notice the structure is the exact same and they all have a results table. This is the data I want to get, but from all 400+ events listed in the past event section. If the number of pages was any larger, you might be better off using another tool like Python, but we’re going to be using Power Query.

Create a Query Function

First, we will create a query to extract the data on one page. We will then turn this into a function query where the input is an event page URL. This way we can apply the query to each URL in a list of all the URL’s.

Extract data from URL to Excel

Head to the Data tab in the ribbon and press the From Web button under the Get & Transform section. If you’re working with Excel 2013 or earlier via the add-in, then this will be found under the Power Query tab.

Extract data from URL to Excel

Enter the URL and press the Ok button.

Extract data from URL to Excel

Excel will connect with the page and the Navigator dialog box will open.

  1. A list of tables available to import from the webpage will be listed. Select the Results table.
  2. A preview of our selected data will appear.
  3. Press the Edit button. This will open the Query Editor window.

Extract data from URL to Excel

Rename the query to fGetWikiResults. This will be the name we call to use our query function later on.

Extract data from URL to Excel

Now we can edit our query to turn it into a query function. Go to the View tab and press the Advanced Editor button. This will allow us to edit the code that Excel has created to extract the data from this URL.

Extract data from URL to Excel

We will need to edit this code to the following. The parts that need to be added/changed are highlighted in red.


let GetResults=(URL) =>

let
    Source = Web.Page(Web.Contents(URL)),
    Data1 = Source{1}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Data1,{{"Header", type text}, {"Weight class", type text}, {"", type text}, {"2", type text}, {"3", type text}, {"Method", type text}, {"Round", Int64.Type}, {"Time", type time}, {"Notes", type text}})
in
    #"Changed Type"
in GetResults

Press the Done button when finished editing the query. This will turn our query into a parametrized query with the URL as an input.

Extract data from URL to Excel

You should see the data preview in the query editor has been replaced with a parameter input. We don’t need to enter anything here and we can just leave it blank.

Extract data from URL to Excel

We can then save our query function by going to the Home tab and pressing the Close & Load button.

Extract data from URL to Excel

You should now see the fGetWikiResults query function in the Queries & Connections window.

Get a List of URL’s

Now we will need to get our list of event page URL’s from the Past Events page. We could use power query to import this table but this would just pull in the text and not the underlying hyperlink. The best way to get the list of URL’s is to parse the source code from the page. You can view any webpage’s source code by pressing Ctrl + U from the Chrome browser.

You’ll need to be fairly familiar with HTML to find what you’re looking for. The first couple lines of HTML we are interested in looks like this. I have highlighted the hyperlinks we’re interested in to demonstrate where they are. You can parse these out in another Excel workbook using some filters and basic text formula. We will also need to concatenate the starting part of the address (ie. https://en.wikipedia.org/wiki/UFC_217).

<tr>
<td>416</td>
<td><a href="/wiki/UFC_217" title="UFC 217">UFC 217: Bisping vs. St-Pierre</a></td>
<td><span class="sortkey" style="display:none;speak:none">000000002017-11-04-0000</span><span style="white-space:nowrap">Nov 4, 2017</span></td>
<td><a href="/wiki/Madison_Square_Garden" title="Madison Square Garden">Madison Square Garden</a></td>
<td><a href="/wiki/New_York_City,_New_York" class="mw-redirect" title="New York City, New York">New York City, New York</a>, U.S.</td>
<td><span style="display:none" class="sortkey">7004182010000000000</span>18,201<sup id="cite_ref-21" class="reference"><a href="#cite_note-21">[21]</a></sup></td>
</tr>
<tr>
<td>415</td>
<td><a href="/wiki/UFC_Fight_Night:_Brunson_vs._Machida" title="UFC Fight Night: Brunson vs. Machida">UFC Fight Night: Brunson vs. Machida</a></td>
<td><span class="sortkey" style="display:none;speak:none">000000002017-10-28-0000</span><span style="white-space:nowrap">Oct 28, 2017</span></td>
<td><a href="/wiki/Gin%C3%A1sio_do_Ibirapuera" title="Ginásio do Ibirapuera">Ginásio do Ibirapuera</a></td>
<td><a href="/wiki/S%C3%A3o_Paulo" title="São Paulo">São Paulo</a>, Brazil</td>
<td><span style="display:none" class="sortkey">7004102650000000000</span>10,265<sup id="cite_ref-22" class="reference"><a href="#cite_note-22">[22]</a></sup></td>
</tr>

Extract data from URL to Excel

Once we have the full list of event URL’s, we can turn the list into an Excel Table using the Ctrl + T keyboard shortcut and name it URL_List.

Use the Query Function on our URL List

We are now ready to use the fGetWikiResults query function on our list of event URL’s.

Extract data from URL to Excel

Create a query based on the URL_List table. Select a cell in the table and go to the Data tab in the ribbon and press the From Table/Range button under the Get & Transform section.

Extract data from URL to Excel

Now we will add a custom column to the query. This is where we’ll invoke our fGetWikiResults query function. Go to the Add Column tab and press the Custom Column button.

Extract data from URL to Excel

Add a New column name to the custom column and then add the Custom column formula fGetWikiResults([URL]).

Extract data from URL to Excel

The new custom column will contain a Table for each URL and we will need to expand this table to see the results. Left click on the Results Data column filter icon seen in the column heading. Select Expand from the menu and press the OK button.

Extract data from URL to Excel

Some of the column headings were missing in our source data, so we can rename them. Double left click on the column heading to rename it.

Extract data from URL to Excel

We can now Close & Load the query and the results data will load into a new sheet. This will take a good few minutes so be patient. This is why you should probably start considering Python or similar tools if you have any more pages than this example.

About the Author

Extract data from URL to Excel

John is a Microsoft MVP and qualified actuary with over 15 years of experience. He has worked in a variety of industries, including insurance, ad tech, and most recently Power Platform consulting. He is a keen problem solver and has a passion for using technology to make businesses more efficient.

How do I get data from URL to Excel?

Select Data > Get & Transform > From Web. Press CTRL+V to paste the URL into the text box, and then select OK. In the Navigator pane, under Display Options, select the Results table.

Can Excel pull live data from a website?

Quick Importing of Live Data You can easily import a table of data from a web page into Excel, and regularly update the table with live data. Open a worksheet in Excel.
How to extract data from a website.
Code a web scraper with Python. It is possible to quickly build software with any general-purpose programming language like Java, JavaScript, PHP, C, C#, and so on. ... .
Use a data service. ... .
Use Excel for data extraction. ... .
Web scraping tools..

How do I automatically extract data from a website into Excel at regular intervals?

To extract data from websites, you can take advantage of data extraction tools like Octoparse. These tools can pull data from websites automatically and save them into many formats such as Excel, JSON, CSV, HTML, or to your own database via APIs.