How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

Noah@lemmy.dbzer0.com · edit-2 5 days ago

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

Noah@lemmy.dbzer0.com · 5 days ago

I don’t want a point and click scraper, just a guide that isn’t assuming I have background + simple mans terms for easier reading. Thanks for believing in me to be able to build the basic skills necessary! Much appreciated :3

umami_wasabi@lemmy.ml · edit-2 5 days ago

I don’t a single guide for you but I can layout a road map.

A programming language. I prefer Python.
Basic HTML syntax and CSS selectors
HTTP, specifically methods, status code (no need to memorize all cuz you can go look it up), and cookies

After you got those foundation ready, you can go on and try to build a webscraper. I advice aginst using Scrapy. Not because it is bad but too overwhelming and abstracted for any beginner. I will instead advice you use requests for HTTP, and BeautifulSoup4 for HTML parsing. You will build a more solid foundation and transition to scrapy later when you need those advanced function.

When you get stuck, don’t afraid to pause on your attempt and read tutorials again. Head to the Python Community on Discord to get interactive help. We welcome noobs as we once were noobs too. Just don’t ever mention scraping there as they can’t help if they suspect you’re trying to do something inappropriate, malicious, or illegal. They are notoriously aginst yt-dlp which frustrates me a bit. Phrase it nicely and in an generic way. I will be there occasionally offering help.

Noah@lemmy.dbzer0.com · 5 days ago

The discord thing is a no-go since I don’t really know how to make my issue palatable. That’s why I used lemmy. Thanks again!

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.