Skip to content

eliasdabbas/seo_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

SEO Crawler

Bare-bones Basic SEO Crawler using Python Scrapy

This project has become part of the advertools package, checkout the documentation page

Using Scrapy, get the main SEO elements for exploratory analysis of a website. It works by supplying a list of known URLs to crawl and return structured results.

The main elements include:

  • url: the actual URL
  • slug: the URI part of the URL
  • directories: splits the URI by slashes to return the different folders (directories) in each URI
  • title: the <title> tag
  • h1, h2, h3, h4: header tags
  • description: the meta description
  • link_urls: not activated, needs special configuration to make sure you are getting links to certain sites
  • link_text: depends on the above, extracts the anchor text of each link
  • link_count: number of links on page (based on your criteria)
  • load_time: page load time in seconds
  • status_code: response code of page 200, 301, 404, etc.

Many other elements should be added to the list but they differ from site to site, some examples:

  • publishing date
  • product price
  • content category
  • tags of an article
  • whether or not a certain keyword is in a certain location
  • type of content (inferred from a URL directory, or from certain content on page)
  • etc.

About

Bare-bones Basic SEO Crawler using Python Scrapy | check out the new version -->

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages