Computers, Privacy & the Constitution

View   r1
AndreiVoinigescuSecondPaper 1 - 17 May 2009 - Main.AndreiVoinigescu
Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebNotify"

Search Engines and Technological Privacy Solutions

Few of us share all of our intimate thoughts, anxieties and desires with even the closest of friends. Yet we have no similar qualms about dutifully recording each of our fleeting thoughts in the query field of a search engine. The AOL search data fiasco amply demonstrates just how much information can be gleamed about a person even from 'anonymous' search logs. True, the New York Times did most of its sleuthing the old fashioned way, with reporters pouring over the logs—but there's no reason to think the same degree of profiling can't be achieved in automated fashion—and applied to all search engine users—as data mining techniques matures. Nor are government agencies ignorant of what can be learned from search logs.

But even the privacy-conscious tend to balk at the thought of search-engine abstinence. There have been calls for search engines to limit what data they retain and how long they store it for. Such proposals go hand in hand with calls for new legislation and government oversight. As Eben has suggested, however, many privacy concerns can be alleviated by general adoption of freedom-enabling software. Can we rely on hacks to blunt search engine profiling?

When discussing social networking sites and the privacy issues surrounding the service provider's ability to monitor which profiles each user spends time browsing, Eben has advanced wall warts as a potential solution: small, cheap Linux servers, network connected, which can host the owner's social networking profile (and provide back-up hosting for the profiles of his friends, perhaps.) It is easy to imagine how wall-warts could replace Facebook. A personal server--always on and accessible from everywhere--could host your email or your documents, removing the need for third-party services like Gmail or Google Docs. Once the appropriate wall-wart software is written, many online privacy concerns would disappear without any need for legislative solutions.

But not search engine surveillance. A web search engine requires significant hardware investments--servers to constantly index web pages, store the results, and scan through the abstract web map produced to return relevant search results. Google maintains at least half a million servers dedicated to these tasks. Since noone has figured out an adequate way to do the indexing and searching without a central server, privacy hacks have focused on enabling users to access the indexes created by companies like Google and Yahoo while revealing as little information to the search provider as possible.

One approach is to hide true searches amongst a cloud of ghost queries This is the approach attempted by the TrackMeNot Firefox plugin, which periodically sends randomized search-queries to popular search engines like AOL, Yahoo!, Google, and MSN, hoping to obfuscate a user's real searches with background noise. A nice idea in theory, but not so practical if the the random search noise is easy to filter out. Because TrackMeNot? is open-source, concerned search providers can examine its noise-generating algorithms, making it easier to identify features shared by the fake queries they generate. If fake queries can be categorized, seach engines can sort the wheat from the chaff, or fight back by blocking access to users of the plug-in. This is not to say that the approach is entirely without merit; newer versions of the TrackMeNot plugin have implemented increasingly sophisticated techniques geared to making fake queries look more like the real thing. As in the realm of cryptology, understanding the algorithm won't improve the chances of defeating it if the searches it generates are indistinguishable from real user searches in all their characteristics.

Scroogle exemplifies another common approach, which involves anonymizing search queries by routing them through a portal used by a number of other users. Since the search engine sees all the queries as coming from the proxy, it cannot use the originating computer's IP address as a unique identifier; it cannot categorize a series of searches as the thoughts of any particular person. The problem? One must trust the proxy not to keep its own logs, for one. And even if a trustworthy proxy exists (say, a website based in a country with laws severely limiting data retention), search engines can simply block requests from that proxy, once it is discovered. Unlike the game of whack-a-mole between the content industry and peer-to-peer file sharing services, search engines can block anonymizing proxies like Scroogle faster then new ones can gain popularity, since they do not need any judicial imprimatur to engage in effective self-help.

Tor is still the gold standard in terms of online anonymity, but the exit nodes of the Tor network can also be identified and blocked by search providers if few volunteers are willing to run relays. Many potential relay operators are dissuaded by the possibility of incurring legal liability for abetting criminal conduct by other users of the Tor network. Even if no liability exits, relay operators may still come under investigation by law enforcement, which can be a burden in itself.

Perhaps the solution lies in combining the Tor and TrackMeNot? approaches. A Firefox plugin could route the search requests of other plugin users, so that queries initiating from any particular address would represent the thoughts of many actual users. Since all queries would be user-generated, the plugin would be very difficult to detect. And because only search queries would be routed, there would be no danger of abetting anonymous copyright infringement, defamation, or trafficking in child pornography. More people should be willing to run a highly limited search-anonymization plugin than a full-fledged Tor relay. Still, what if one bad apple uses the plugin to make incriminating queries? If law enforcement has access to search engine logs uses them as a means to narrow down the list of suspects in a given crime, innocent people may come under investigation simply for running a plugin meant to preserve their privacy. The threat of that may be enough to dissuade many from employing such a plugin.

This suggests that FOSS alone probably cannot solve the problem. At the very least, what is needed are restrictions on the circumstances under which search logs are subject to subpoena. Law enforcement should not be allowed to go on fishing expeditions through the records of everyone's thoughts. A subpoena for the search history of a particular IP address should require preexisting evidence reasonably linking that IP address to illegal behavior.

-- AndreiVoinigescu - 17 May 2009


 
<--/commentPlugin-->

Revision 1r1 - 17 May 2009 - 18:17:34 - AndreiVoinigescu
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM