Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Best practices for writing rules

Patrick G edited this page Apr 17, 2018 · 4 revisions

Don't base rules on the order of elements on the page

Rules should never be based on the order of elements on the page (unless you control the order, for example in a file server where you can explicitly order).

Here are some examples of dos and donts for CSS selectors.

  • Do: table tr td:contains('Windows 64-bit') ~ td:contains('Download') a: This would select the link in a column with the text Download in a row with the text Windows 64-bit.
  • Don't: table tr:nth-child(5) td:nth-child(3) a: This would silently return the wrong link if the table ordering changes.
  • Do: .download-links a[href$='.exe']:contains('64-bit'): This would select the download link which has an extension of .exe and contains the text 64-bit.
  • Don't: .download-links a: This would select the first link in the class download-links, but what guarantees that this is the right one?

Don't use regexps for download links unless you absolutely have to

Using regexps for download links can break easily. Better options are to use the HTML extractors or to use a Template extractor.

Don't be too specific with your selectors

You only need to be specific enough for the selector to return the correct link now and in the reasonable future.

  • Do: a[href*='myprogram-'][href$='.exe']: This will check the program name and the extension of the installer, but it won't break if extra info is added in between.
  • Do: a[href*='myprogram-'][href$='.exe']:contains('Download installer'): If there are more than one matches for the selector, but you only want one, you can add the :contains selector
  • Don't: a[href*='myprogram'][href$='.exe']: What if a download link for myprogramaddon is added on the same page?
  • Don't: a[href$='.exe']: What if another executable is added to the page?
  • Don't: a[href*='myprogram-'][href$='.exe']:contains('Click here to download myprogram.'): What if the text of the link changes?

Watch out when using suffix checks in selectors for multiple architectures

An example explains this one best.

If you have two selectors (a[href*='myprogram-'][href$='.exe'] and a[href*='myprogram-'][href$='.64bit.exe']), remember that the order of links could change and you may accidentally return the 64bit version for the 32bit link. In this case, change the first selector to a[href*='myprogram-'][href$='.exe']:not([href$='.64bit.exe']).