Finding and Fixing Broken Links with BLC

Broken links cause real problems, not only for your users, but also for the health of your site. Broken links are a real indicator of a neglected website. People want fresh takes, not stale leftovers.

It honestly amazes me how many links end up on our websites. They are the fundamental building block of the web, and they are everywhere. We often times don’t realize just how many links exist on our site, nor do we have a system for maintaining them.

One of my clients has a website with over 300,000 links on it. Three hundred thousand! How do you manage all of those links?

In the past I have used automated tools for this, tools like SiteImprove or Sitechecker. But these tools are expensive, and for a lot of sites they are just flat out overkill. I had another client that was spending over $15k a month for one of these tools, and they weren't even using it! They thought that just owning the tool improved things, but that’s not the case. A tool can tell you what is wrong, but it can't fix anything for you. Fixing the links is up to you.

Owning a website is a lot like owning a home: if you take care of your yearly maintenance, then your home will take care of you. It’s not glamorous, but without maintenance, the rest of your work is all for naught: it doesn't matter if you have a shiny new roof if the foundation collapses underneath you.

So let's learn how to keep our foundation in good shape by taking care of our links. Like I said, links are the fundamental building block of the web, and without them our websites are nothing, so let's take care of them.

Getting Past the Fancy Tools #

SiteImprove and others like it are nice, but they are frankly overkill for most businesses. Most website maintenance can be easily done with less expensive tools.

My favorite tool for checking broken links is a terminal utility, called broken-link-checker (BLC for short). Don’t worry if you aren't a fan of the terminal, using this utility will be a good way to get your feet wet.

If you don’t already have Node and npm installed on your computer, you'll have to start there. Go to nodejs.org and follow the instructions, this shouldn't take long.

NPM is a package manager for your computer, which means it makes it much easier to install, uninstall, and maintain programs running on your computer.

Once you have npm, you should be able to open a terminal (or cmd in Windows) and run node -v to see the version you're running. For our terminal utility to work, you need to have v9 or greater installed.

node -v
>> v13.12.0

I have v13 on my machine, so should be good to go. Now we need to install our terminal utility, by running this command:

npm install broken-link-checker -g

This will install this program globally, which means you'll be able to run it at any time from your terminal.

After you install this utility globally, you can use it at any time from your terminal. You can use it to scan any URL, whether you own the website or not.

One of the perks of this utility is that you can even use your own local server if you want, it works just as well. Or you can use your live site. All you need is a URL.

To scan my entire site, I would run this command:

blc https://timothymiller.dev -ro

And that outputs something like this:

Getting links from: https://timothymiller.dev/
├───OK─── https://timothymiller.dev/posts/2020/filtering-out-politics-in-feedbin/
├───OK─── https://timothymiller.dev/posts/2020/creating-a-raspberry-pi-nas/
├───OK─── https://timothymiller.dev/posts/2020/a-superior-git-remote/
├───OK─── https://timothymiller.dev/posts/2020/making-a-real-bonefide-plugin-for-11ty/
├───OK─── https://timothymiller.dev/posts/2020/own-your-own-feedbin-data-with-11ty/
├───OK─── https://timothymiller.dev/posts/2020/switching-to-an-ergonomic-colemak-keyboard/

...and so on. My own small site has over 2000 links!


For those of you unused to terminal commands, let's talk about what that command does. Here it is again:

blc https://timothymiller.dev -ro

Each of these words, separated by a space, is an instruction for your computer. Here's what they each mean:

  • blc: this is how you fire up the broken-link-checker program. blc stands for Broken Link Checker.
  • https://timothymiller.dev this is the link you want to scan. This can be localhost or any other URL.
  • -ro these are parameters, they allow you to change how BLC scans your site. r stands for "recursive", which means it will scan the whole website, rather than just the one URL you provided. o stands for "ordered", which means it will order the links in the same order they appear on the actual page.

BLC has many more options that you can use, and you can see those by running blc --help.

Once the scan finishes, you'll also see a final report that gives a basic summary of the results:

Finished! 2093 links found. 1754 excluded. 7 broken.
Elapsed time: 1 minute, 7 seconds

Look at that, over 2000 links checked in just one minute! Like I said, it’s amazing how links multiply, even with a small site, and how much time this tool can save you.

I almost always use the -ro options when using this tool. But there are a few more options that I use often and find useful.

It’s handy to differentiate between internal and external links. Internal links could break any time you change your site: you could disable or delete a page, you could make a typo, you could make a bad redirect. But it takes a change to break a link in this case, if you aren't changing your site often, you don’t need to scan your internal links often. You can scan just your internal links by using the --exclude-external, or -e option.

External links, on the other hand, could break at any time. It’s a good idea to check your external links every month or two, and find adequate replacements if and when anything breaks. This way you won’t end up with a huge backlog of links that take days (or weeks) to fix. To scan only external links, use the --exclude-internal or -i option.

Outputting to a File #

Watching the output in your terminal works fine for smaller sites, but for larger sites you'll probably want to capture that output somewhere. I like to create a .txt file with all of these links, and then I can search for broken links whenever I have the time.

To do this, we first need to decide where we want to save this file. I often use my desktop, to do that you would need to run this command first:

cd ~/Desktop

This will change directories (cd) to your desktop. Now, when we run the command to create a file, it will create it on your desktop.

Then to create the file, we need to add a little more to the end of our command:

blc https://timothymiller.dev -ro >> timlinks.txt

That should do it! Now you won’t see any output in your terminal, but it will run and save everything into that text file.

Common Problems #

BLC works incredibly well for many situations. However, I have found a number of issues in my day-to-day use, and sometimes you have to interpret the results with a grain of salt.

For example, I have seen a number of false negatives, where BLC flags links that are not broken. Here are some of the most common false negatives that I’ve seen:

Redirects 404ing #

BLC seems to struggle with some types of redirects, or with too many redirects. This is a known issue, and they may fix it in the next version of the tool (v0.8), but there’s no telling when that version will come out.

If you notice a lot of 404s that work when you visit in your browser, you might want to try adding the --get flag. This might take a little longer, but it might fix some of your redirects. Other redirects you may just have to verify manually. I had one site that redirected five or six times, and I couldn't find a way to eliminate that false negative.

Mysterious 403 errors #

Every once in a while I get an HTTP_403 error, which means that site blocked BLC. I’ve tried a number of different ways to get around this, but haven’t yet found a solution. This is another thing that they may fix in the next version, if we’re lucky. Fortunately this issue doesn't seem to come up too much, so I just manually verify that the url works and move on.

Fin. #

BLC is a great tool to have in your tool belt, and works incredibly well for 95% of use cases. I wish it was more actively developed and maintained, but despite that it is still an incredibly useful tool, one that I will undoubtedly use for years to come.

← Home

Changelog
  • Improve grammar in latest article
  • Add new blog post about BLC