Often can be the case where interesting requests come in from people you are working with which there doesn’t appear to be a useful tool available for quickly gathering the information. This is an interesting required which has come in today about how to quickly identify all external links on a website.
This is actually quite a common issue since with various content management systems simply adding various content (and links) around hundreds/thousands of pages across a website, how do you easily find all of the external links correctly?
If you are looking for a quick answer then this is the XPath required to identify external links on a single web page;
So what does this actually mean?
- //a : Get me any links that …
- [not(contains( : … do not contain …
- @href : … a link which ….
- ‘www.michaelcropper.co.uk’))] : … contains this website address and …
- /@href : … get the HREF attribute for this link
Make sense? Good. Lets look at actually using this XPath in a useful way.
SEO Tools Plugin
Now the interesting thing is when using XPathOnURL with SEO Tools, this doesn’t actually bring back the HREF attribute, instead it pulls back the first URL on the page which may be good enough for this purpose. So the function would be as follows when the URL you want to test against is in cell A1;
=XPathOnUrl(A1, “//a[not(contains(@href, ‘www.michaelcropper.co.uk’))]/@href”)
In the example above I was testing on the URL http://www.michaelcropper.co.uk/2012/06/googles-business-plan-steal-content-and-screw-publishers-1081.html as that contains a link to an external website. So now we want to look at scaling this up for a bunch of URLs on a website.
Now you know how to check if a specific URL contains an external link, then the next step would be to do this for all URLs on the site you want to check.
Simply get Xenu installed and run the program on any website which you can then export all website URLs into an Excel file.
Now you will have a huge list of all URLs on a website, where you can then run the same XPathOnURL function on to identify all pages on a website which contain at least one external link.
This is likely only one solution to a problem and doesn’t actually allow you to create a definitive list of every single external link on every page of the website, but it does tell you which pages on a website contain an external link to another website.
Simple, but effective.
Latest posts by Michael Cropper (see all)
- How to Create a Bootable USB for Ubuntu Server - September 10, 2018
- How to Loop Through a Map in Java Using JSTL on a JSP - August 12, 2018
- How to Increase the Maximum Connections on Apache Tomcat - June 16, 2018