Readit News logoReadit News
avghaloplayer commented on CyberScraper-2077 – A LLM Based Web Scraper   github.com/itsOwen/CyberS... · Posted by u/avghaloplayer
avghaloplayer · a year ago
Hello everyone, a few days ago I decided to create an LLM-based web scraper. This could come in handy for data analysts and scrapers. The app is in continuous development, with new improvements and features added every week. So if you want to check it out, that would be really awesome.

Info about CyberScraper-2077:

Rip data from the net, leaving no trace. Welcome to the future of web scraping.

About

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Whether you're a corpo data analyst, a street-smart netrunner, or just someone looking to pull information from the digital realm, CyberScraper 2077 has got you covered.

Features

AI-Powered Extraction: Utilizes cutting-edge AI models to understand and parse web content intelligently.

Sleek Streamlit Interface: User-friendly GUI that even a chrome-armed street samurai could navigate.

Multi-Format Support: Export your data in JSON, CSV, HTML, SQL or Excel – whatever fits your cyberdeck.

Stealth Mode: Implemented stealth mode parameters that helps it from getting detected as bot.

Ollama Support: Use a huge libarary of open source LLMs.

Async Operations: Lightning-fast scraping that would make a Trauma Team jealous.

Smart Parsing: Structures scraped content as if it was extracted straight from the engram of a master netrunner.

Ethical Scraping: Respects robots.txt and site policies. We may be in 2077, but we still have standards.

Caching: We implemented content-based and query-based caching using LRU cache and a custom dictionary to reduce redundant API calls.

Upload to Google Sheets: Now you can easily upload your extract csv data to google sheets with one click.

Proxy Mode (Coming Soon): Built-in proxy support to keep you ghosting through the net.

Navigate through the Pages: Navigate through the webpage and scrap the data from different pages.

If you are unable to scrape a website and you are getting blocked, try out the current browser features:

Github: https://github.com/itsOwen/CyberScraper-2077

avghaloplayer commented on Ask HN: GitHub Copilot Alternatives    · Posted by u/accidbuddy
avghaloplayer · a year ago
At this point there are so many good alternatives to Codepilot!
avghaloplayer commented on Ask HN: Which programming language can I learn on my Termux-enabled Android tab?    · Posted by u/FerretFred
avghaloplayer · a year ago
You can download apps like Sololearn. I know they are not that great, but you can learn the basics of any language you want!
avghaloplayer commented on Hot Take: Low Code/No Code platforms die as LLMs get better    · Posted by u/livewirecrazy
avghaloplayer · a year ago
I agree with your statement. Also, the recent interview with the former Google CEO revealed many things about what AI will be capable of in the upcoming years. If you think about it, it's insane!
avghaloplayer commented on Ask HN: Is unlimited vacation still a thing in tech jobs?    · Posted by u/chirau
avghaloplayer · a year ago
Some employers does some doesn't.
avghaloplayer commented on Ask HN: How to store and share passwords in a company?    · Posted by u/hu3
avghaloplayer · a year ago
I would suggest you to use a self hosted open source password manager like Passbolt.
avghaloplayer commented on Show HN: Every open source tool from the "What's HN working on" thread   github.com/getomni-ai/dat... · Posted by u/themanmaran
samstave · a year ago
https://github.com/SoMaCoSF/CyberScraper-2077

Need to figure out how to connect Datasette with CyberScraper.

---

Heh - I forgot to look at your username. :-) long time fan.

BTW I recall when you guys announced Lanyrd on YC.

avghaloplayer · a year ago
Hey I am the developer behined cyberscraper-2077, I will look into your feature request :)

u/avghaloplayer

KarmaCake day2August 24, 2024
About
Hello there :)

My github: https://github.com/itsOwen

View Original