Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing current full web crawling functionality #114

Open
BradKML opened this issue Apr 3, 2023 · 0 comments
Open

Testing current full web crawling functionality #114

BradKML opened this issue Apr 3, 2023 · 0 comments

Comments

@BradKML
Copy link

BradKML commented Apr 3, 2023

Currently I am testing to see if PyWebCopy can download the whole website (subdomain) rather than merely a webpage. Unfortunately it did not work as intended. save_webpage and save_website should be different

import os # borrowed from https://stackoverflow.com/a/14125914
relative_path = r'book_test'
current_directory = os.getcwd()
final_directory = os.path.join(current_directory, relative_path)
if not os.path.exists(final_directory): os.makedirs(final_directory)

from pywebcopy import save_website
save_website(url='https://www.nateliason.com/notes', project_folder=final_directory, 
             project_name="test_site", 
             bypass_robots=True, debug=True, open_in_browser=False,
             delay=None, threaded=False,)

In the debug logs, none of the URL calls go beyond the main URL, it did not jump down the layers into other related URLs. What could be the cause of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant