-
Notifications
You must be signed in to change notification settings - Fork 5
Best practices for writing rules
Rules should never be based on the order of elements on the page (unless you control the order, for example in a file server where you can explicitly order).
Here are some examples of dos and donts for CSS selectors.
-
Do:
table tr td:contains('Windows 64-bit') ~ td:contains('Download') a
: This would select the link in a column with the text Download in a row with the text Windows 64-bit. -
Don't:
table tr:nth-child(5) td:nth-child(3) a
: This would silently return the wrong link if the table ordering changes. -
Do:
.download-links a[href$='.exe']:contains('64-bit')
: This would select the download link which has an extension of .exe and contains the text 64-bit. -
Don't:
.download-links a
: This would select the first link in the class download-links, but what guarantees that this is the right one?
Using regexps for download links can break easily. Better options are to use the HTML extractors or to use a Template extractor.
You only need to be specific enough for the selector to return the correct link now and in the reasonable future.
-
Do:
a[href*='myprogram-'][href$='.exe']
: This will check the program name and the extension of the installer, but it won't break if extra info is added in between. -
Do:
a[href*='myprogram-'][href$='.exe']:contains('Download installer')
: If there are more than one matches for the selector, but you only want one, you can add the :contains selector -
Don't:
a[href*='myprogram'][href$='.exe']
: What if a download link for myprogramaddon is added on the same page? -
Don't:
a[href$='.exe']
: What if another executable is added to the page? -
Don't:
a[href*='myprogram-'][href$='.exe']:contains('Click here to download myprogram.')
: What if the text of the link changes?
An example explains this one best.
If you have two selectors (a[href*='myprogram-'][href$='.exe']
and a[href*='myprogram-'][href$='.64bit.exe']
), remember that the order of links could change and you may accidentally return the 64bit version for the 32bit link. In this case, change the first selector to a[href*='myprogram-'][href$='.exe']:not([href$='.64bit.exe'])
.