-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist scraped web pages #530
Comments
Thanks for trying aider and filing this issue. Re-scraping a webpage should only take a moment, and ensures you have a fresh copy of the data it contains. Persisting or caching the content could lead to problems with not picking up new page content. Can you help me understand the problem you are having with re-scraping? |
Agreed. Although "should only take a moment" when done multiple times a day adds up, esp if not on good connection. My specific use-case is when I need to use specific features of tools/frameworks like Laravel or Filament. I find myself needing to re-scrape in order to provide context to tasks. I may also be using the tool wrong, yk 🤷♂️ |
I wonder if this also fits into the broader RAG feature. |
This is in furtherance of #400
It will be a good addition to not have to re-scrape same webpage over and over, as it is wasteful etc.
Scraped pages should persist perhaps in a system-wide context such that subsequent calls to
/web http://already-scraped.com/specific-page
will only re-scrape if not already scraped or if user specifically asks to, perhaps a switch to be provided to the/web
command.Many thanks for the awesome job!
The text was updated successfully, but these errors were encountered: