Re: [dev] [surf] [patch] 13 patches from my Universal Same-Origin Policy branch

From: <tautolog_AT_gmail.com>
Date: Wed, 25 Mar 2015 19:38:23 -0700

Thank you for the reply, Markus. 

That is a very good point. The reason why I wanted to try this approach is because, even with being in a very large anonymous set in HTTP headers, the IP address network or region may be used split large sets down to individuals. I saw it in a paper, but I don't have it off hand. I thought, what about adding noise?

The nice thing about adding noise is that, no matter how much signal is picked up, all of the noise that is not filtered out is anonymizing. Yes, the noise patterns may become signal, but other noise can override that signal, too. Also, the process of filtering out can drop a lot of signal. 

It is nearly impossible to hide from an active, targeted, sophisticated surveillance, but "full-take", passive collection could be significantly hindered by small amounts of sophistication that breaks naive assumptions. 

Your suggestion is very good, and I am trying to build something like that, but with little affect on compatibility. Maybe collect the set of valid headers with large anonymity sets, and select a subset of headers that match the real configuration in only the most important features. That way, only obscure compatibility tests will fail. And have an option to provide the real user-agent string, when an issue happens. Afterall, if you only use the real one rarely, how can it be profiled? I suppose you could trick people to turn that on, but that is a fairly targeted action, not a full-take action, which is the primary issue.

I can collect a set of common user-agent strings, and can find a subset that are webkit, and use those. Since compatibility tests are usually about rendering engine, that would avoid most compatibility issues with a random user-agent. ‎Maybe provide the set of common user agent strings by rendering engine as a separate open source project. I deal with enough traffic to collect this myself. 

Websites rarely need to know if you are running Linux, but if you are going to download software, you can enable the correct OS to be sent. Besides, the correct API for websites should be that they request the browser to identify the OS, like they request device location, and the user accepts the request explicitly. It is such a rare need that not every website needs to know the operating system. ‎Browsers do this for credit card data (not that I would use it), they can have a form fill for operating system, too. When input name="operating_system", prompt to fill it. 

The "noise" I add to the accept-language header is easily identified as a new signal, so I am leaning toward abandoning it, but there are some interesting opportunities there. For example, it can be used when active surveillance is not an issue, but passive surveillance is an issue, to add friction to the passive surveillance machine. 

Ben
  Original Message  
From: Markus Teich
Sent: Wednesday, March 25, 2015 6:14 PM
To: dev_AT_suckless.org
Reply To: dev mail list
Subject: Re: [dev] [surf] [patch] 13 patches from my Universal Same-Origin Policy branch

Nick wrote:
> - [PATCH 07/13] add random entropy to user-agent and accept-language headers.
>
> I definitely like the idea, but wonder whether the solution in the patch is a
> bit overkill. After all, if we're basically just trying to defeat hashing
> correlations, then one random byte at the end of each variable should be
> enough. Also, unless I'm misreading it, am I correct in thinking the
> user-agent string is fully random? I'm currently using one from an oldish
> firefox, to reduce fingerprintability a bit, and I get annoying warnings on
> github and a few other places as a result - isn't it better to use a
> common-ish UA string with some random crap on the end, so most stupid websites
> won't do something annoying?

Heyho,

randomizing these headers at all rapidly shrinks the anonymity set size. Sure,
for a dumb adversary every request seems to come from another user, but a smart
adversary won't take long to detect these changes, filter them out and have a
nice list of all surf users (and browsers which use the same pattern, which
would probably be not many). When setting the headers to a very common value
(unfortunately I did not find _the_ most common UA and accept-language header
values), users are guaranteed to be part of a very huge anonymity set. If you
really want to randomize the headers, pick a pool of the most common values and
pick one of them at random. This can hower lead to different behaviour when
visiting a website twice.

I strongly advice against the randomization. It's also simpler in code to not
use it.

--Markus
Received on Thu Mar 26 2015 - 03:38:23 CET

This archive was generated by hypermail 2.3.0 : Thu Mar 26 2015 - 03:48:07 CET