Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adguard clarification #169

Open
ameshkov opened this issue May 20, 2019 · 10 comments
Open

Adguard clarification #169

ameshkov opened this issue May 20, 2019 · 10 comments

Comments

@ameshkov
Copy link

ameshkov commented May 20, 2019

Hi!

I've just found out that Adguard is listed as a tracker on whotracksme: https://whotracks.me/trackers/adguard.html

This is not quite true, but I can see where it comes from. Let me please clarify the situation.

  1. AdGuard for Windows/Mac is a network level content blocker so it cannot simply add custom JS/CSS to webpages like what browser extensions do.
  2. In order to do it, it injects a content script: <script src="https://local.adguard.com/blahblah/content-script.js"> that takes care of cosmetic rules.
  3. Connections to local.adguard.com are intercepted by the network driver and processed locally. Also, we changed the domain to local.adguard.org in the newer versions.
  4. This is a usual approach for network-level software. For instance, you have Kaspersky listed as a tracker because of the very same thing -- they add a content script to every page.

What's important here:

  1. There are no remote connections, everything is processed locally
  2. There is no tracking, fingerprinting or whatever.
@sammacbeth
Copy link
Contributor

Hi, thanks for this clarification, this seems like a reasonable explanation. We do detect all third-party calls from the page, so these injected ones will also appear in the data. The fingerprinting signal could be triggered if you are injecting a persistent identifier with your script, which our system is detecting as unique to individual users.

I think a simple fix for these cases will be to check the IP third-party requests are served on, and exclude ones served from localhost addresses.

@ameshkov
Copy link
Author

The fingerprinting signal could be triggered if you are injecting a persistent identifier with your script, which our system is detecting as unique to individual users.

Yeah, and I guess "Cookies" is triggered by using a subdomain of adguard.com (websites cookies are sent alongside the request). That's why we changed it to local.adguard.org in the newer versions.

I think a simple fix for these cases will be to check the IP third-party requests are served on, and exclude ones served from localhost addresses.

Well, it's trickier than that. Localhost cannot be used as it won't work for https websites.

There are two real IP addresses (194.177.23.34 for local.adguard.com and 176.103.133.77 for local.adguard.org). Network driver intercepts connections to them internally, but they look like valid connections to browsers and other user-mode software.

Smth like this:

socket.connect('176.103.133.77')
---> internally, it is changed to socket.connect('127.0.0.1:somerandomport')
---> socket.send/receive are now communicating with localhost, but the browser thinks it communicates with `176.103.133.77`

@sammacbeth
Copy link
Contributor

There are two real IP addresses (194.177.23.34 for local.adguard.com and 176.103.133.77 for local.adguard.org). Network driver intercepts connections to them internally, but they look like valid connections to browsers and other user-mode software.

In that case it is not possible from the perspective of where we measure to verify that this request is not sent externally. I suggest we leave this issue open as a reference, but leave the tracker listing on the site as-is.

@ameshkov
Copy link
Author

ameshkov commented May 20, 2019

Well, it still bothers me that AG is categorized as a tracker while we do exactly the opposite. I suppose this is a mistake that can be corrected.

If your only concern is verification, you can easily verify my words by yourself.

How to verify it by yourself:

  1. Install AdGuard for Mac (or Win for that matter);
  2. Run tcpdump: tcpdump -i en0 host 176.103.133.77 or host 194.177.23.34 -v;
  3. Disable AdGuard;
  4. Open local.adguard.com (if you installed the current stable -- AG v1.5) or local.adguard.org (for AG v2.0);
  5. You'll see a lot of stuff in the tcpdump output (as AG is not running);
  6. Enable AG back, restart the browser (just in case), and restart tcpdump;
  7. When you open local.adguard.com (see step 4) you'll see nothing in the tcpdump output;
  8. If you open example.org, you'll see how the content script is loaded from local.adguard.com, but there still will be nothing in the tcpdump output;

If this is not enough, we can arrange a demonstration (skype/zoom/whatever) and I'll answer any questions you might have, and even show parts of the code responsible for this logic so that you had no doubts.

@sammacbeth
Copy link
Contributor

I was able to verify that the current version of the AdGuard application behaves as you describe, however I am still unsure that removing the entry is the correct response:

WhoTracks.Me is a transparency tool - it shows which entities are tracking them, and who have the potential to track them. One example of this distinction is Google Fonts. Google fonts do not track users loading them, however we still display them on the WhoTracks.Me website. Why? They have a significant reach, and because of this they could switch to tracking users overnight simply by starting to set cookies for font requests.

As Adguard injects itself as a Man-in-the-middle on all pages loaded, WhoTracks.Me also considers AdGuard has having a potential to track. As local.adguard.com points at a server your control, an update to your client could also enable the requests to go out. This is not to suggest that this is something you would do, but there is precedent of acquisitions of services redefining privacy policies in order to extract tracking data out of existing user-bases who trusted the previous company.

Perhaps I am missing some technical issues, but I believe you could point this local hostname at 127.0.0.1. As you are intercepting all traffic, the same interception method should work as for external ips, but then at the extension level we could detect that this is internal. This would then allow us to filter this as safe traffic, and once users updated to this version your entry would no longer be shown on the site.

@ameshkov
Copy link
Author

ameshkov commented May 27, 2019

Hi Sam,

I have nothing against listing AdGuard on WhoTracks.me. There is nothing wrong with showing people that local.adguard belongs to us (even if it was pointing to localhost). In fact, when I use Ghostery, I want to see as much information as possible about every third-party domain. However, I think that these statements are a mistake that can be corrected.

considers AdGuard has having a potential to track
This is not to suggest that this is something you would do, but there is precedent of acquisitions of services redefining privacy policies in order to extract tracking data out of existing user-bases who trusted the previous company.

Just for the sake of argument, this is also true for any browser extension including Ghostery:)

Perhaps I am missing some technical issues, but I believe you could point this local hostname at 127.0.0.1.

It's been years since we opted to use this approach. At that time there were some browsers bugs that prevented us from simply using localhost. We'll re-evaluate this option, but anyway, there are some use cases when using a domain name is necessary. For instance, you can run AdGuard for Mac in proxy mode and configure your home devices to use it as a proxy server (thus you'll be able to manage content blocking centrally).

edit: just pointing the domain name won't work, browsers don't like domains pointing to localhost, so we'll need to actually use 127.0.0.1.

@sammacbeth
Copy link
Contributor

Regarding the statements:

  • Cookies - should disappear when enough users' browser stop sending a cookie to the local.adguard domain. You mentioned that you are migrating users to a cookie-less domain, so that should fix it.
  • Fingerprint - I'll have to have a look at why our unique-identifier detection mechanism is triggering on your requests. In the data it is barely over the threshold for this signal to be marked as 'fingerprinting', so it may have just been a temporal false positive. We can see in next month's release if this has changed.
  • Category description - This can be reworded to make it more neutral.

@ameshkov
Copy link
Author

Sam, this would be great, thank you!

Also, please don't get me wrong - I am not pushing for any immediate changes or whatever. I often use WhoTracks.me myself to find out what company is behind this or that tracker, and you're doing a fantastic job. But I believe this is in your best interest to have accurate data.

@Voltairine-de-Cleyre
Copy link

WhoTracks.Me is a transparency tool - it shows which entities are tracking them, and who have the potential to track them.

"...and [which entities] have the potential to track them"

That's a fairly vague definition that essentially includes every entity.

@GhostDog98
Copy link

I feel like the inclusion of "any potential tracking tool" reduces the efficacy of such a database to near zero, especially if these false positives are not clearly marked or are easily conflated.
If your anti-virus started deleting things because they have the "potential" to download malware since they contact third party apis, that would be far from acceptable. WhoTracks, however, lists any entity which is judged to have "potential" to track, which encompasses literally every entity as @Voltairine-de-Cleyre pointed out.
This becomes even harder to recommend usage of WhoTracks to my colleges or friends who are very much non-techy, and may see that something like "Google Fonts" is listed as a tracker, then aim to block that.
Beyond that, the claims of things like so:
image
Is kinda disingenuous when you have 21-22% of that given by google fonts, google tag manager, or google static cdn.

While I don't think manual review of every single record in your dataset is reasonable to expect, when a false positive like adguard is identified (which decreases this "potential" for tracking), it should probably be removed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants