I am trying to write a simple scraper tool that will extract a specific URL from a webpage. The page has many URLs, but I want to get the one that ends with a specific set of characters.
For example, if somewhere in the page source there is a url that looks like this:
I want to return
https://www.website.com/dog.pdf without the quotes. If there is more than one match, I only want to return the first one.
So the Regex should extract everything after
source: and up to and including the
I've looked at other questions, but most answers refuse to provide a RegEx and instead say to use
endswith(). But since the page source could be massive, I'm worried about performance. I am new to Python, though, and perhaps I'm just not understanding how to use those methods.