Computers, Privacy & the Constitution

Search Engines and Technological Privacy Solutions

Few of us share all of our intimate thoughts, anxieties and desires with even the closest of friends. Yet we have no similar qualms about dutifully recording each of our fleeting thoughts in the query field of a search engine. The AOL search data fiasco amply demonstrates just how much information can be gleamed about a person even from 'anonymous' search logs. True, the New York Times did most of its sleuthing the old fashioned way, with reporters pouring over the logs—but there's no reason to think the same degree of profiling can't be achieved in automated fashion—and applied to all search engine users—as data mining techniques matures. Nor are government agencies ignorant of what can be learned from search logs.

But even the privacy-conscious tend to balk at the thought of search-engine abstinence. There have been calls for search engines to limit what data they retain and how long they store it for. Such proposals go hand in hand with calls for new legislation and government oversight. As Eben has suggested, however, many privacy concerns can be alleviated by general adoption of freedom-enabling software. Can we rely on hacks to blunt search engine profiling?

When discussing social networking sites and the privacy issues surrounding the service provider's ability to monitor which profiles each user spends time browsing, Eben has advanced wall warts as a potential solution: small, cheap Linux servers, network connected, which can host the owner's social networking profile (and provide back-up hosting for the profiles of his friends, perhaps.) It is easy to imagine how wall-warts could replace Facebook. A personal server--always on and accessible from everywhere--could host your email or your documents, removing the need for third-party services like Gmail or Google Docs. Once the appropriate wall-wart software is written, many online privacy concerns would disappear without any need for legislative solutions.

But not search engine surveillance. A web search engine requires significant hardware investments--servers to constantly index web pages, store the results, and scan through the abstract web map produced to return relevant search results. Google maintains at least half a million servers dedicated to these tasks. Since noone has figured out an adequate way to do the indexing and searching without a central server, privacy hacks have focused on enabling users to access the indexes created by companies like Google and Yahoo while revealing as little information to the search provider as possible.

One approach is to hide true searches amongst a cloud of ghost queries This is the approach attempted by the TrackMeNot Firefox plugin, which periodically sends randomized search-queries to popular search engines like AOL, Yahoo!, Google, and MSN, hoping to obfuscate a user's real searches with background noise. A nice idea in theory, but not so practical if the the random search noise is easy to filter out. Because TrackMeNot? is open-source, concerned search providers can examine its noise-generating algorithms, making it easier to identify features shared by the fake queries they generate. If fake queries can be categorized, seach engines can sort the wheat from the chaff, or fight back by blocking access to users of the plug-in. This is not to say that the approach is entirely without merit; newer versions of the TrackMeNot plugin have implemented increasingly sophisticated techniques geared to making fake queries look more like the real thing. As in the realm of cryptology, understanding the algorithm won't improve the chances of defeating it if the searches it generates are indistinguishable from real user searches in all their characteristics.

Scroogle exemplifies another common approach, which involves anonymizing search queries by routing them through a portal used by a number of other users. Since the search engine sees all the queries as coming from the proxy, it cannot use the originating computer's IP address as a unique identifier; it cannot categorize a series of searches as the thoughts of any particular person. The problem? One must trust the proxy not to keep its own logs, for one. And even if a trustworthy proxy exists (say, a website based in a country with laws severely limiting data retention), search engines can simply block requests from that proxy, once it is discovered. Unlike the game of whack-a-mole between the content industry and peer-to-peer file sharing services, search engines can block anonymizing proxies like Scroogle faster then new ones can gain popularity, since they do not need any judicial imprimatur to engage in effective self-help.

Tor is still the gold standard in terms of online anonymity, but the exit nodes of the Tor network can also be identified and blocked by search providers if few volunteers are willing to run relays. Many potential relay operators are dissuaded by the possibility of incurring legal liability for abetting criminal conduct by other users of the Tor network. Even if no liability exits, relay operators may still come under investigation by law enforcement, which can be a burden in itself.

Perhaps the solution lies in combining the Tor and TrackMeNot? approaches. A Firefox plugin could route the search requests of other plugin users, so that queries initiating from any particular address would represent the thoughts of many actual users. Since all queries would be user-generated, the plugin would be very difficult to detect. And because only search queries would be routed, there would be no danger of abetting anonymous copyright infringement, defamation, or trafficking in child pornography. More people should be willing to run a highly limited search-anonymization plugin than a full-fledged Tor relay. Still, what if one bad apple uses the plugin to make incriminating queries? If law enforcement has access to search engine logs uses them as a means to narrow down the list of suspects in a given crime, innocent people may come under investigation simply for running a plugin meant to preserve their privacy. The threat of that may be enough to dissuade many from employing such a plugin.

This suggests that FOSS alone probably cannot solve the problem. At the very least, what is needed are restrictions on the circumstances under which search logs are subject to subpoena. Law enforcement should not be allowed to go on fishing expeditions through the records of everyone's thoughts. A subpoena for the search history of a particular IP address should require preexisting evidence reasonably linking that IP address to illegal behavior.

-- AndreiVoinigescu - 17 May 2009


If we had wallwart servers, wouldn't this be pretty easy? It would be simple to maintain some TOR-like routing daemon that could run in the wallwart and reroute queries semi-anonymously. I also like the idea of having it only run for queries.

The real question I had when reading this was whether people would accept even this as a solution... it would be popular among a particular privacy-valuing subset, but it seems to me as though the average person may actually value google's "value adding" services enough to eschew the privacy filter and purposefully give them their information.

-- TheodoreSmith - 20 May 2009

It looks like I was beaten to the punch a bit by Ted's comment. You propose some interesting ideas here, Andrei, but even assuming you can get people to see that there is a problem, I, too, wonder if these are solutions people can accept. In my own paper, I argued that part of the reason people don't do the things you propose because they spurn freedom itself, but I also suspect laziness and technological ineptness are also partly to blame. I'll throw myself to the fire by saying that while I have AdBlock? and TrackMeNot? (because they were easy to install and worked in a framework I already understand), really, the only way I will get a wall wart server (might I also embarassingly contend that this name is, well, sort of distasteful to those of us non-tech people that need to be seduced by it?) is if Justin or Ted a) came to my house with the wallwart b) installed the wallwart and c) agreed to maintain the wallwart for me, forever. In some sense, this is just an example of the spurning of freedom I discuss in my paper; I could certainly learn to use this technology--- there is no reason I need others to do it for me--- but despite my strong feelings about these problems, I have yet to take some of these steps because they seem out of (easy) technological reach. I doubt that making the technology easier or integrating it into familiar contexts will have any effect on the underlying problem of rejecting the burdens of freedom, but it may at least cause people to seriously consider the questions, rather than rejecting them outright on ease or ability grounds.

-- DanaDelger - 20 May 2009

 

Navigation

Webs Webs

r3 - 20 May 2009 - 20:30:50 - DanaDelger
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM