Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark device as a fraudster - eg DeviceCheck #14

Open
Guar1s opened this issue Sep 21, 2022 · 9 comments
Open

Mark device as a fraudster - eg DeviceCheck #14

Guar1s opened this issue Sep 21, 2022 · 9 comments

Comments

@Guar1s
Copy link

Guar1s commented Sep 21, 2022

We are Allowme, a business unit of Tempest Security Intelligence, a cybersecurity company from Brazil, Latam, with more than 22 years in operation. Allowme's mission is to help companies protect the digital identities of their legitimate customers through a complete fraud prevention platform.

Context and threat
In the context of fraud in web applications, seeking financial gain depends, almost entirely, on the possibility of automating tactics. Based on this assumption, being able to re-identify an attacker is essential to remove the scale of an attacker. For this detection to be possible, an attacking device detected in a context should be easily identified during a new attack, targeted towards something new, or towards the same previous target.

Proposal
Somehow, persistently storing (non-manipulating) information on the device (4 bits) is essential to ensure more effective (efficient and effective) defenses and controls, while maximizing the user experience for legitimate users.

This signal will not be the only one used for fraud identification, however it may be relevant for a fraud application when this signal is marked true on a specific device.

This functionality can be implemented in both Web browsers (eg Chrome) and mobile operating systems (eg Android).

A similar implementation would be Apple's DeviceCheck, available at: https://developer.apple.com/documentation/devicecheck

Relevant signals
Secure and persistent storage area on the device
Browser lifetime

Privacy implications and safeguards
Since this information does not reveal any user PII and is only relevant to fraud detection and containment systems, there is no threat to user privacy.

@supanate7
Copy link

Thanks for proposing this!

A couple of questions for you:

  1. How would False Positives be handled? Would FP users not be able to access certain services/sites/apps? How might an FP user remove their fraud label(s)?
  2. What about legitimate users on compromised devices that have a mix of human-driven non-fraud and malware-driven fraud behavior?
  3. Definitions of fraud and/or trust vary from service to service, so how would the threshold for applying a "fraud" label be determined and potentially updated post-launch?

@bmayd
Copy link

bmayd commented Sep 27, 2022

@Guar1s
Being able to store a small amount of data to "label" things is a very interesting notion and worth pursuing, but as @supanate7 suggests, I don't get a good sense of what the proposal is intending to label or how and by whom labels get applied in a way that is trustworthy and meaningful.

Is the suggestion that labels get applied to devices or applications/apps? If devices are labeled, who labels them and, with only 4 bits, what kind of signal would be provided that would be meaningful across device contexts (activity one application considers inappropriate another may accept)? if applications/apps get labeled, who decides what data gets set and what it means? In the case of something general purpose like a browser, would a single label be applied to all origins or would origins set their own labels or ask a mediator to set a label? Again, not sure how we'd make labeling trustworthy and would appreciate more on that. It looks like the Apple implementation you reference has some significant dependencies on Apple infrastructure and tooling and relies on their ability to control the on-device environment which I think would be difficult to generalize to other OS/app contexts.

@supanate7
Regarding managing the label lifecycles and false-positives: something I've implemented that sounds similar to this proposal was labeling of web-sites that showed signs of supporting ad fraud. We put sites that behaved suspiciously into a "penalty box" by applying a "do not buy" flag with an associated time-to-live (TTL). After the TTL expired we'd remove the flag and allow buying again. If a site was flagged again within a given period we would increase the TTL and eventually ban it or stop receiving requests from it.

@Guar1s
Copy link
Author

Guar1s commented Oct 4, 2022

@supanate7 and @bmayd this implementation has as its fundamental idea the orchestration of 4-bit checkers by the company responsible for the website or mobile application.

The purpose of using 4 bits is not limited to fraud marking, but it can also identify a device that has already passed through security mechanisms, increasing the reliability of the transaction.

Mobile operating systems-> Associated with the device identifier that is only known to the OS manufacturer (eg Android) and also the app ID in the store.

Web Browsers-> Associated with a unique browser identifier that is only known to the manufacturer (eg Chrome) and also to the website domain.

Untitled (8)

It is very important that the values ​​used in the 4 bits are recorded permanently and that they can only be changed by the company responsible for the website or application.

All the bit manipulation intelligence to identify suspicious devices or devices already used in a fraud are the responsibility of the companies, as each of them can create their own intelligence using these and other signals that may or may not be linked to the website or application.

@philippp
Copy link
Contributor

philippp commented Oct 4, 2022

The quality questions may be secondary if services can set and read their own bits. Services could encode counters, isolate an affected population when debugging technical issues, and more.

It's a powerful concept, but if services can "shard" into N namespaces (e.g. A top level origin iFrames N subdomains, each subdomain sets its own 4 bits, and reports it to the embedding top-level origin), the service can use 4*N bits of storage (and build a strong identifier).

@Guar1s
Copy link
Author

Guar1s commented Oct 5, 2022

@philippp I understand your concern, but I think it is unlikely that it will be possible to orchestrate the generation of unique identifiers with only 4 bits, even using sub domains or another technique. We currently have about 5 billion users connected to the internet, taking into account that 4 bits have 16 unique combinations (1^2+1^2+1^2+1^2), to map just 1% of all users it would be necessary to use dozen of subdomains.

However, it is possible to mitigate this risk with a security-oriented deployment with anti bot mechanisms, time throttling to request and validate the token, among other possibilities

I made a initial draft as a suggestion for implementation and use as shown below:

Untitled (15)

@philippp
Copy link
Contributor

"It is very important that the values ​​used in the 4 bits are recorded permanently and that they can only be changed by the company responsible for the website or application."

My understanding from this is that every website has its own 4 bits that it can write and read on a client. If this is the case, fingerprinter.com could embed N iframed websites into its pages. Each of the embedded websites would be able to set and read 4 bits -- for a total of N*4 bits -- and report their 4 bits to finterprinting.com (their embedder).

@bmayd
Copy link

bmayd commented Oct 14, 2022

I too think we want to be extremely careful when considering any sort of durable, independently managed, highly reliable signal that can be written to, and retrieved from, a client. It is important to keep in mind the data-rich context these signals exist in, where a small number of reliable values can be leveraged for all sorts of purposes. The example @philippp provides of using multiple iFrames clearly describes a potential for abuse, but there are many other values like IP address, time zone, screen resolution, etc., which could be used instead of, or in addition to, multiple iFrames. Having a highly reliable signal also can be leveraged to make relatively unreliable signals much more robust.

The use-case you are pursuing is the arbitrary labeling of devices within a context: website, app, etc. for a range of purposes as this earlier point suggests:

The purpose of using 4 bits is not limited to fraud marking, but it can also identify a device that has already passed through security mechanisms, increasing the reliability of the transaction.

While the intent is for the mechanism to be used for risk assessment, it provides a general purpose means of reliably identifying sub-populations within a context and that capability can be used to support a broad range of use-cases, not all of which appropriate. I think any capability like this needs to be accompanied by carefully considered guards and tight control over how it can be applied to prevent or minimize abuse.

@dvorak42
Copy link
Member

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

@dvorak42
Copy link
Member

From the CG meeting, discussion about the entropy issues came up, about how if many sites are identifying fraudsters, it just becomes an additional fingerprint mechanism. There's also the concern that if this requires durable long-term marking of clients, this has worse identifiability properties than existing web storage mechanisms which can be cleared pretty easily to disconnect browsing sessions. There were some comments about whether it would be possible to combine this with the Private State Token (fka Trust Token) mechanisms to limit the amount of entropy being used and having sites use the same limited number of bits rather than having a bunch of parallel fraudster marking bits to alleviate some of the entropy concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants