How to Scrape the HREF Attribute Using XPathOnURL SEO Tools

The joys of XPath and SEO Tools for Excel. Here we are going to talk through how to scrape the HREF attribute using the =XPathOnURLfunction as this is often needed when you want to scrape the actual links from a website which match a certain criteria.

If you just want the quick answer, here is how you do it;

=XPathOnURL(“http://www.example.com”, “//a”, “href”)

Well something like that anyways, as it depends on the actual XPath you need to use to get the actual links you need. One thing to note about the XPathOnURL function is that it doesn’t work quite the same as standard XPath but this will be explained a little later.

Firstly if you want to scrape the HREF attribute then you may actually be able to do this much quicker using the Google Chrome plugin calledXPath Helper, but that isn’t always the case.

Example

I came across an example recently when I needed to scrape one HREF attribute on around 100 pages, so the XPath Helper plugin wouldn’t quite do it. Below shows the setup that I was working with whereby there was a lot of pages where I needed to scrape some data, particularly one specific HREF attribute as is shown in the example below;

In the example above I am wanting to scrape a link to an image file which just happens to be within the first Div on the page (in this fictitious example!).

Normally if I wanted to do this via standard XPath then I would use the XPath of: //div/a[@href] – which is saying “get the HREF attribute which is contained within an A tag which is contained within a DIV tag.

When using the XPathOnURL function within SEO Tools then this doesn’t quite work in the same way. Instead if you want to pull back an attribute instead of the content between the opening and closing tags, then you need to add the extra parameter within the function which is: , “href” – which is telling the function to pull back the HREF attribute instead.

I am sure that you will come across a need for this at some point – especially if doing a lot of scraping!

That is all of the explanation I am going to do here. Go and give it a go yourself if you ever need to scrape the HREF attribute using XPathOnURL

Bio
Latest Posts

Michael Cropper

Founder & Managing Director at Contrado Digital Ltd

Michael has been running Contrado Digital for over 10 years and has over 15 years experience working across the full range of disciplines including IT, Tech, Software Development, Digital Marketing, Analytics, SaaS, Startups, Organisational and Systems Thinking, DevOps, Project Management, Multi-Cloud, Digital and Technology Innovation and always with a business and commercial focus. He has a wealth of experience working with national and multi-national brands in a wide range of industries, across a wide range of specialisms, helping them achieve awesome results. Digital transformation, performance and collaboration are at the heart of everything Michael does.

Latest posts by Michael Cropper (see all)

WGET for Windows - April 10, 2025
How to Setup Your Local Development Environment for Java Using Apache NetBeans and Apache Tomcat - December 1, 2023
MySQL Recursive Queries – MySQL While Loops – Fill Zero Sum Dates Between Dates - October 6, 2023

How to Scrape the HREF Attribute Using XPathOnURL SEO Tools

Example

Michael Cropper

Latest posts by Michael Cropper (see all)

Submit a Comment Cancel reply

Hire Michael

Training Courses

Like What You’re Reading?

Subscribe to Newsletter