Replace all links using regex

After scraping a web page you will want to make many small textual edits. Relying on a basic find and replace in your IDE is not enough. Even find/replace globally means little when you find yourself battling small nuances in links. Simple things like the differences between ‘.jpg’, ‘.JPG’, or ‘.JPEG’ can make you spend too much time finding/replacing. It takes some time to understand what regex is and how to use it. If you apply yourself to noticing when it comes into play and sitting there to build the right idea each time, you will grow in your skills as a problem solver.

What regex does is let you focus on certain patterns in the text you are working with. Instead of trying to find/replace individual occurrences of text, say “$12.35”, “$3.56”, and “$0.98” you can search for every bit of text that starts with a ‘$’ and has one or more numbers followed by a ‘.’ followed by two numbers.

You can do this with numbers and letters. You can even search for instances like ‘3em’ and ‘1em’ and target just the ’em’ or the number. Take a look at the following regex. Regex gets copy-pasted into the find field.

(?<=a href=(\"|'))[^\"']+(?=(\"|'))

That is an idea. It is composed of several groups of smaller ideas. The first thing to understand is that regex will return zero or more results. This regex is looking for anything between a set of “” or ” immediately after ‘a href=’

This regex literally only works on links that have the href attribute immediately after a space character after an ‘a’. Also the first part ‘?<=' is not supported in JS. Those letters tell the computer that our regex wants to exclude the letters 'a href=' from being found/replaced.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.