Re: [dev] surf vertical and horizontal same-origin policy patch (updated, with profiling mitigation)

From: Ben Woolley <tautolog_AT_gmail.com>
Date: Fri, 23 Jan 2015 22:01:32 -0800

Hi all,

I have attached an update.

1. It is against the latest master.
2. It includes an originprompt.html and an originprompt-nojs.html that
works properly when javascript is disabled.
3. The Web Storage database has been moved into the per-origin folder,
even though it is probably already compliant with the same-origin
policy. This just makes certain, in case that changes. The spec allows
the same-origin policy to be broken here, and if cookies get blocked
due to industry pressure, then I want protections in place to prevent
this feature from taking the place of trojan cookies.
4. I added a randomized User-Agent if it is NULL in the config file.
WebKit normally returns a default when the user-agent property is NULL
or "".
5. I added an Accept-Language header that forwards the locale in
$LANG, and adds some additional random locales at a lower quality to
throw off naive profiling.
6. I left in the download directory config.
7. I fixed one case where the originprompt was being used even when
the navigation was explicit.

I read some papers on the profiling issue, and most seem to say that
lowering the diversity is the key, effectively lowering the
"bandwidth" of the "signal", and want to avoid randomizing anything.
However:
1. If noise is added to this "signal", then noise reduction techniques
must be used, and such techniques usually need an appropriate model or
profile of the noise to discard it, and that is a fairly difficult
thing to do at scale.
2. A valid concern is that semantics could suffer. But it is not
difficult to add noise that is semantically valid. If a profiling
method needed to rely on semantics, then the available bandwidth is
limited even further. For example, the order of values may be
semantically insignificant, but different orderings would be a
profiling value in itself, because they would alter a digest of the
header. By randomizing the order, the semantics would need to be
understood, and would provide less signal entropy. Naive digests would
be useless.
3. Digests are commonly used to share device identifiers in the
tracking industry, and it is trivial for the industry to tool that
same code to other headers, like User-Agent. By breaking naive digest
methods, the tracking industry would need to use more sophisticated
methods that returned less value.

Future plans:
1. I plan on doing more semantically valid randomization like what I
did to the Accept-Language header.
2. I was thinking of using dmenu instead of the HTML prompt, by using
a wrapper script that launched surf or aborted. This wrapper could
then isolate by merely exporting a different $HOME to surf, for each
origin. This would allow me to move a bunch of code out of surf.c and
into a shell script. If I can get the changes to surf.c down to just a
few lines, then, I can package up the wrapper separately, and make
changes to it without affecting the surf build.
3. This also may make it easier to support other embeddable browsers,
like dillo, since the per-origin $HOME would work there. The prompt
could even map different browsers to different origins. A simple
origin library with a standard interface could be used by various
browsers, just calling out to it whenever navigation occurs.
4. I thought about using GtkMenu when you click a link, but dmenu is
surf's conventional menu, and suits surf's keyboard-driven use cases.
5. I am thinking of using the stylesheet regex technique to map URLs
to origins, so that grouped origins like google subdomains can be
easier to set up. Currently, I use symbolic links to map origin
folders together. The main benefit is that the configuration can all
be in one place. Symbolic links are easy to create, but can be
difficult to maintain. However, if I break the code out into a
separate library, I would probably adopt thttpd's glob patterns ("*"
selected anything in between delimiters, while "**" selected anything
across delimiters).
6. I ran into a cross-origin POST issue. I still need to figure out a
good way to handle that other than mapping the origin profiles
together with a symbolic link.

As always, any input would be appreciated, and thanks again for
providing such an easy browser to work with.

Thank you,

Ben

On 1/8/15, stanio_AT_cs.tu-berlin.de <stanio_AT_cs.tu-berlin.de> wrote:
> Hi
>
> sounds very interesting. thanks. will review, test and report when I get
> some
> spare timeā€¦
>
>

Received on Sat Jan 24 2015 - 07:01:32 CET

This archive was generated by hypermail 2.3.0 : Sat Jan 24 2015 - 07:12:07 CET