How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

Noah@lemmy.dbzer0.com · edit-2 9 days ago

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

aMockTie@beehaw.org · 8 days ago

In my experience, this scenario typically means that there is some sort of API (very likely undocumented) that is being used on the backend. That requires a bit more investigation and testing with browser developer tools, the JS Console, and often trial and error. But once you overcome that (admittedly very complex and technical) hurdle, you can almost always get away with just using the requests library at that point.

I’ve had to do that kind of thing more times than I’d like to admit, but the juice is almost always worth the squeeze.

chicken@lemmy.dbzer0.com · 8 days ago

Well if I was doing it I probably would be trying to focus on browser emulation to avoid having to dig into those sorts of details. It sounds like OP is a beginner and needs a simple method.

aMockTie@beehaw.org · 8 days ago

I agree that OP sounds like a beginner, and what you’ve suggested is likely the best approach for someone who is familiar with frontend tools and frameworks. Selenium (and admittedly BeautifulSoup) is probably too low level for this particular user, but that doesn’t mean they can’t still learn some fundamentals while solving this problem without resorting to something as heavy and complicated as background browser emulation and rendering. I could be wrong though.

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.