Header Ziff Davis Enterprise
Advertisement
Advertisement
Wednesday, December 17, 2008 6:06 PM/EST

Yahoo, Not Google, Moves Search Data Match Closer to an Endgame

There is a game worthy of the great "Tom & Jerry" cartoons afoot in the land of search engines.

On one side are Google, Yahoo and Microsoft, all of which collect user log data because they argue it helps them improve the way their search engines and other Web services work, maintain security and prohibit fraud. Google has great YouTube videos on the reasons for saving user logs, which include our search queries, IP addresses and cookies.

On the other side are privacy advocates, regulators and legislators, who argue that our civil liberties are being infringed upon at the level of bits and bytes. These groups don't want search engines storing user data any longer than they have to.

The European Commission's Article 29 Working Party, an ominous-sounding advisory panel made up of data protection commissioners from each of its 27 member countries, leads the way in cracking down on the data retention limits, calling for search engines to delete search records after six months.

Until today, the search engines came closest to this target in September, when Google reduced its data anonymization timeline from 18 months to nine months. This seemed like a monumental gesture at the time, but, oh, what a difference a few months make!

Yahoo today, Dec. 17, vowed to anonymize log data within 90 days for not only search, but also page views, page clicks, ad views and ad clicks, with certain exceptions for fraud, security and legal obligations. Yahoo's overture shaved 10 months off of its previous data erasure policy.

Microsoft, which seems painfully opposed to any sort of leadership position in search, is still stuck at 18 months. The chagrin in Redmond is palpable. To recap, Yahoo is quickest to nuke your data from its system at three months, Google at nine, Microsoft at 18.

Yahoo said in a statement:

Yahoo conducted a comprehensive review of its data practices across the globe. The heads of business and engineering units worked with privacy and data governance teams to thoroughly review data needs for global products and services, striving to ensure that Yahoo retains data only long enough to serve our business and create the highest-quality user experiences while maintaining the ability to fight fraud, secure systems and meet legal obligations.

Yahoo decided that three months of data retention was all it needed to continue successfully offering its services. But doesn't a reduction of 10 months seem like a drastic change to anyone, particularly as these companies fought tooth and nail to keep data as long as they wanted?

Why was Yahoo dragging its feet? Perhaps Yahoo wants to evoke as much good will as it can as it slides ever so slowly into obliteration. Search Engine Land and Ars Technica seem to think so.

How much more value can Yahoo derive from holding user data for 90 days? My suspicion is none. Why not just eliminate the storing of user data logs?

My belief is that Yahoo, Google and Microsoft can do this without seriously degrading their services, and eventually will. They just don't want to unless and until the EU, the Department of Justice or some major politicking group forces their hands.

This is quite the cat and mouse game indeed, but I wonder to what end? Google, Yahoo and Microsoft are slated to place their cases for data retention before the Article 29 Working Party panel in February. Will 2009 be the year Google, Yahoo and Microsoft cease storing our search and other user data altogether?

I put some of my questions to Google in the context of Yahoo's news today. Besides the company line about taking privacy seriously, Google Senior Privacy Counsel Jane Horvath responded:

When we make changes to our policies, they are dependent on what will be best for our users both in terms of the services we provide and the respect of their privacy. It is a balance that we are continually evaluating.

There will be big changes afoot regarding user data retention in 2009. The Big Three of search will succumb to their clearly softening stances on retaining our data.

We will see a glut of innovation, with the companies' search algorithms giving us similar or superior results based on our searches without storing and picking over our search queries, IP addresses and cookies.

I'm certain the companies can do this already; they've either become too reliant on our data or are scared to set us free and lose competitive advantages.

What do you think?

TrackBack

TrackBack

http://googlewatch.eweek.com/cgi-bin/mte/mt-tb.cgi/16044

Comments (27)

Nick :

You said (my emphasis):
"My belief is that Yahoo, Google and Microsoft can do this ***with*** seriously degrading their services, and eventually will. They just don't want to unless and until the EU, the Department of Justice or some major politicking group forces their hands."

Was that a major typo and proofing error, or do you really believe that removing the user data will degrade the services?

What - you're shocked that someone's pedantic about a typo that reverses the entire meaning of a sentence?

drfugawe :

"My belief is that Yahoo, Google and Microsoft can do this with seriously degrading their services,..."

Did you mean, ... can do this without ... ?

Proofreading is a good thing.

I think you're an idiot in over your head who has no idea what you're talking about.

zen :

I applaud Yahoo's decision.

Regardless as to whether they are playing the politics of inevitable obsolescence, Yahoo has successfully managed to rip the curtain open to the “Wizard of Fraudz”, i.e. none of the big three actually need to retain data longer than 90 days, other than to mine it for marketing and investment purposes.

They cannot hide the fact that, in their world, our every move is Gold.

Webgirl :

I think I will use Yahoo as a search engine from now on.

terrence :

Just wondering, what makes your so certain? I'm really asking, not being snarky.

Clint Boulton Author Profile Page:

Anonymous: And I think if you had anything interesting to say, you'd reply constructively, not throw unsupported thunderbolts from the mountain of anonymity. Reserve such comments for TC (aka TrollCave)where they are encouraged.

Clint Boulton Author Profile Page:

drfugawe :

It's been corrected. thanks.

another anon :

Yahoo can afford to throw away the data because they don't know what to do with it. That's probably not the case for Google. But I have to agree with the previous commenter: you really have no idea what you're talking about.

Clint Boulton Author Profile Page:

Nick.

My mistake. It's been fixed, thanks.

meanguy :

Many online behaviors are seasonal. Take for example holiday shopping. If you don't think that knowing what users did this xmas isn't valuable next xmas, you're mistaken.

What's the data retention policy on that grocery card you swipe for thirty cents off bananas?

I want Amazon to maintain all my shopping data indefinitely. Why? Because it has value to me.

If Google can make my online experience better, they can keep as much crap as they want as long as they want. Note: serving up ads that are more interesting to me counts as better experience.

Clint Boulton Author Profile Page:

Terrence:

Good question. I spoke with Microsoft's director of privacy tonight and will have another piece on this tomorrow. He claimed MSFT can't do search well without the data, but if YHOO can lower it to 3 months, then so can Google and Microsoft. From there, they can certainly bring it down to 24-48 hours if they focused on it. They (google, msft, yhoo) don't want to give up the data because they haven't figured out how to solve the search, security and click fraud challenges without collecting query, IP address and cookie info.

Clint Boulton Author Profile Page:

Another anon: You are a fount of insight. :) Yahoo may indeed be grasping at straws, and probably now lacks the search chops to solve the challenges of providing good search without collecting a data trove. Of the Big 3, I'd expect Google has the best shot at it. How would they do this? If I knew that, I'd start the next Google. I just don't believe the companies who say one minute they can't cut the data retention periods anymore, and then go and magically do it. Microsoft's position is they'd go to 6 months if the others did. If that's the case, why not go to 12, or even 9? Why are they still at 9? It won't wash. Every time these companies feel they have no recourse, they cave to the politics. In 2009, they may give this up ghost entirely. No, I don't have any evidence. I just see a progression of compromise that seems to be leading to zero retention. But what do I know? Maybe I'll be surprised and the companies will dig in their heels until they get tired of spending millions of legal fees during a recession. Maybe I have no clue what I'm talking about. :)

Pete :

How do we know they are purging the data? That would actually be hard to believe. It takes more work to dump it than it would to hide it.

Barry :

Clint, I think you're right. This primarily is a goodwill-building strategy, my guess is aimed more at the European market. The EU is more vigilant and distrustful of global multinational companies and much better about enforcing anti-trust laws than U.S. Justice Department, which now appears to be in the pocket of the corporate robber-barons of Wall Street.

Yahoo indeed is sliding "ever so slowly into obliteration" under the market assault by Google, not unlike Netscape did in the late nineties when Microsoft instantiated its Borg-like assimilation of the browser market. And we all know how that turned out - it was ten years of crappy "Internet Exploder" before the open-source movement began to catch up. But even Firefox is bankrolled in large part by Google. How's a competitor supposed to catch a break, anyway?

I see this move primarily as a stop-gap strategy for Yahoo, who is still casting about for an identity in an increasingly crowded market space. None the less, it's a smart move. Privacy laws eventually will catch up to these companies and they will be required to change their data retention policies anyway. Might as well score some goodwill brownie points in the process.

I would like to see the three companies work out some type of system, without alot of input from the government. We are in a mass right now due to government control of our business.

Hank :

meanguy et al.,

The grocery can retain the data on the card saving me thirty cents off my bananas forever--the data helps them and they reward me with a discount. BUT the card has NO connection to me. ALL of my grocery cards are unregistered. They can know what the holder of my card buys midweek and whatever else they can glean, but they have no need to know WHO I am (the clerk often hands me my receipt and says thank you Mr. ...er...ah... Cusomer! when they find no name on it).

Google and friends can certainly compile data about my searching proclivities without needing to associate it forever with Hank. Let them have everything and anything for a month or three to analyze, dissect, and study; however, let them then dissociate the data from the identifiable ME and turn it into something statistically useful.

Like the grocer, they can meld my pre-holiday buying habits into something usable for their needs, which is something greater than what I do as Hank. Our collective tendancies are meaningful in data mining. It is NOT important to know what Hank bought.

If you want a particular store to remember your hat size (i.e., Amazon), then feel free to let them store that information for you. Demand that they retain that data on you! I, meanwhile, will insist that Google NOT retain my personal information any longer than it should reasonably take them to find some useful patterns.

Regards,
Mr. ...er...ah... Customer :)

SumDumGuy :

Anonymization is not what you think it is.
I do data forensics for a living and this so-called "anonymization" would barely slow me or my team down if we were hired to identify an actual user of one of these services.

Google's stated method of anonymization is to zero out the last part of the IP address. For example, 192.168.1.100 becomes 192.168.1.xxx. The problem with that approach is that it isn't really anonymous - it is just slightly disguised.

There are additional pieces of information that can also help to identify a user - version of operating system, version of web browser, what web page they "came from" to get to the current page and what page they "went to" when leaving the current page, plus the time and date of their browsing.

All of these pieces of information, taken together, can easily undo the "anonymization" that google and presumably Yahoo and MS claim to do.

Jerry Filo :

I'm going to stop using Google and Gmail because I don't trust them at all. My gmail now shows ads that are related to my email message content. You may be OK with that but I'm not a fan of Google prying into my emails.

Jeff :

I hope your Yahoo holdings go up after writing this!

R S'Chevalier :

I'm concerned about their having peoples personal information, and I do believe much of it is sold, regardless that many claim not to share such information. Some of the emails I receive from foreign locations certainly proves to me such personal information is not being kept confidential.

R S'Chevalier :

Well... if big brother government can and does keep multi millions of personal information files, the majority of which is totally unnecessary, then the private business sector is simply the monkey see and monkey do. Of course that's a personal opinion.

Neo :

Definitely a move towards right direction from Yahoo. Now if only they could provide a better search result...
Google is collecting too much user data. Now that Google chrome is getting popular only god and google knows how much information they collect from us. Soon they will release an open source operating system which will be capable of collecting every single detail of a person.

First. It is clear that the writer has no clue what they are talking about in regards to search.

Second. This is the only move Yahoo! could have made to fend off Microsoft.

If you are Yahoo! and have MS breathing down your neck to buy out JUST your Search, which would leave Yahoo! a pointless and sad company in my mind, and Google has been forced to halt all dealings with you, what is your next move? You devalue the Search service in the eyes of would-be shark investors and the buyout crew at MS, while at the same time enhancing the publics perception of you.

Microsoft will be bound by the rules of Yahoo! has in place if they do indeed buy up the search division. Not only that, but if MS were to purchase the search division and try to change how long they held on to the public information, you can imagine the backlash and how it would devalue MicroHoo Search. And you better believe that Yahoo! knows Microsoft can not function on a 3 month data window.

Think about it. There is simply no reason to include page views, page clicks, ad views and ad clicks to Yahoo!'s list, and major Corps like Yahoo! never willingly give up something like this unless it can be weaponized in some way.

Look for a disgruntled response from Microsoft stating that they no longer feel they "need" Yahoo to compete with Google and now, after spending Billions of dollars on their Search, they have a plan to gain market share.

SumDumGuy :

Hey Hank - even if your "grocery card" does not have a name on it, chances are the store's back-office has already correlated your name, address and consumer habits with it by dint of paying with a credit card.

There are consumer profiling services that merchants can subscribe to. They feed the service the stream of their credit card transaction information and any additional information like "grocery card" numbers and delivery addresses, etc. The service correlates all the data, cross-references on common values like credit card numbers and then builds consumer profiles based on all associated data which are then made available to all of the subscriber companies (and anyone else willing to pay for them).

Hank :

Sorry you missed my point SumDumGuy. It was not a "how-to" on playing spy games. My point is that neither google nor grocery will NEED the name Hank to make their world go around. I do not object to their compiling statistical trends via my habits; however, I do not believe they require years of associating their mined data with Hank in order to benefit.

By the way (again, not the point of my comment), when my local grocery store first opened they handed out "temporary" cards at the door. One particular checker was always annoyed to find me still using it years later. I would try to avoid her lane, but sometimes we would end up together with her invariably telling me that my card could not be used for writing checks and was only meant to be used until I registered for a "real" card.

I payed with cash from a fee-free ATM and needed no checks cashed. I do not delude myself into thinking they could not find my name somehow (my car plates?). BUT I do wonder what good it would do them if they were able to find my out-of-town PO Box that my credit card is attached to. Or know the fake name my telephone is listed under.

I'm starting to sound paranoid when I'm simply ornery. My PO Box is my mailing address because I move around too much. I'll date myself, but my phone was in a fake name because, idiotically, unlisted numbers cost more (Hello, is Mr Kjer Wong there? does not bring me to the phone). ALL of my grocery cards are unregistered because I never saw a reason why my name should matter to them.

Thanks for the info; though, if they could explain to me why my PO Box number would benefit them, I'll simply give it to them and save them from having to try to trace it through some profiling service. Or, maybe, in these poor economic times, they could unsubscribe, save consumers some money, and have a larger profit. Afterall, the exact identity of Hank and his box are really not that important!

Back on point: I chuckle when google offers me recipes for spam when I delete my junk mail. I am not bothered by their "snooping" in order to generate advertising revenue and give me "Over 7274.989467 megabytes (and counting)" of FREE storage! I join with those arguing for shorter-term use of the identifiable (Hank) part of that data. Thus, I urge google to at least match Yahoo's play--regardless of the motive behind that move.

CAT :

The Wizard of Fraudz, or more precisely self-proclaimed AdSenseBoy Mindaugas Lipskas from Lithuania. He and like-minded cyber-criminals continue to commit fraud right under the nose of Google, Yahoo or GoDaddy for quite a while.

Plus they have just like other companies, e.g. German Telekom begun to create their own massive data pile in the wild with no control over it so far. Phishing, Spam and all kinds of Malware are their main form of income. Where those crooks do this by themselves, Telekom uses "Black Ops" called SAF which store their customer's data forever, not just a few days or weeks.

Post a Comment

 
 


Advertisement
Advertisement