Skip to content

webcrawler using a tor-proxy, elasticsearch and scrapy

Notifications You must be signed in to change notification settings

heckenmann/tor-scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

tor-scrapy

webcrawler using a tor-proxy, elasticsearch and scrapy

What you need

  • docker
  • docker-compose
  • internet connection :\

How to create and run

You can set the entrypoint for the crawler in the docker-compose.yml under scrapy / urls. Many urls are comma separated.

docker-compose up -d

The crawler starts its work automatically.

How to stop / start

docker-compose stop
docker-compose start

How to delete

docker-compose down

Use

To find something in your index, you can use kibana. Open the address in your browser:

http://localhost:5601

As index pattern set "crawler". Timefield name is "timestamp".

Then click on "discover" on the left side to see all found pages. To search for an entry, type the keywords into the kibana-search-bar on the top. To get an kibana introduction, you can go to this site: https://www.elastic.co/guide/en/kibana/current/introduction.html

Releases

No releases published

Packages

No packages published