by Mike Masnick
Wed, Oct 20th 2010 6:37am
by Mike Masnick
Fri, Feb 26th 2010 9:06am
from the it-ain't-personal dept
That said, if you want even more evidence that Google's ranking decisions aren't personal, but actually are based on what its system feels will give the best possible results, witness the story of Google employee Jason Morrison, who recently discovered that his own personal site had be delisted from Google. It actually took him a few weeks to notice this, but once he did, and dug into the issue (using Google's public tool and his own site's admin tools) he quickly realized that he had made a mistake that caused Google's crawlers to believe that his site was no longer up.
Now, that certainly doesn't preclude the possibility that Google takes revenge on sites it doesn't like, but it's at least more evidence that the ranking system really is pretty algorithmically focused -- and even Google employees aren't immune to being delisted for screwing up. If your site gets delisted from Google, it's not personal.
by Mike Masnick
Fri, Nov 20th 2009 1:49am
from the rethinking-the-niche dept
That said, I think this really only tells a part of the story -- and maybe not the most important or most interesting part. That's because (and, again, this may be due to my own econ education) it doesn't surprise me in the slightest that we'd see hits follow a winner takes all approach (that's how hits work). Nor is it a surprise that the effect would seem stronger as the world globalizes and borders and barriers become less of an issue. So, yes, of course there will be a "globalized" winner takes all situation at the hits level. But is that all?
What's much more interesting to me is what happens beyond the hits. And, as you start to dig down into subsectors or subcultures, you begin to notice an interesting pattern there as well: that those subsectors and subcultures follow that same power law pattern themselves. The big name bands in a subculture may seem "small" in the wider world, but they're huge within the subculture. Within that subculture, they're the winner who took all -- but from a more limited population.
In some ways, it's the fractalization of culture.
Just as a fractal repeats its same pattern as you zoom in and look closer on the smaller segments, so do cultural subsegments. And those segments continue to thrive, despite the recommendation systems just pushing people to the hits. Part of that may be that once you've begun exploring those subcultures, the recommendation engines and collaborative filters drive you towards the "hits within" the subculture -- or it may be that the impact of algorithmic recommendation engines isn't quite as dominating as some make it out to be. Yes, people do rely on those recommendation engines... somewhat. But they trust people they know even more. And once you get involved in a subculture you quickly find other people already involved in that culture who act as guides who point you both to the "hits" but also to the interesting and "diverse" long tail places to go as well.
So, yes, there is a winner take all effect found in the recommendation engines, but it hasn't resulted in less diversity within our cultural output or our cultural consumption -- and that's because people don't just follow that limited algorithmic overlord to find the content they want to consume. In fact, the original statistical model highlighted above more or less makes this point. Basically, it shows that even if each individual sees a more diverse culture, it can still end up with a more homogenized culture -- but really only among the hits. Basically, because the world is global, the really big hits go global and become winner-take-all in a much larger market. But, at the same time, the niches thrive as well.
by Mike Masnick
Fri, Jul 31st 2009 7:07am
from the do-computers-need-incentive-to-create? dept
In other words, Wolfram Research is claiming that each page of results returned by the Wolfram Alpha engine is a unique, copyrightable work, like a report or term paper. That makes Wolfram Alpha different not just from classic search engines, but from most software. While software companies routinely retain sole ownership of their software and license it to users, Wolfram Research has taken the additional step of claiming ownership of the output of the software itself. It's a bold assertion, and one that could have significant ramifications for the software industry as a whole.It really depends on the output, but in many cases I have trouble believing the output really is copyrightable. After all, you cannot copyright facts and (in the US, at least) you can't copyright a collection of facts, either. The article doesn't discuss that, and seems to assume that the output may be copyrightable, but I would think that it would need to be significantly more unique and have additional creativity before it could be covered (and then, only the unique parts would be covered). Still, there may be a legal gray area, as McAllister notes:
Suppose you have an Excel spreadsheet full of numbers that you input, but then you ask Excel to generate a series of complex graphs based on rules, formulae, and templates designed by Microsoft. Or what about pivot tables? What about mash-ups or tools like Mozilla Jetpack? If unique presentations based on software-based manipulation of mundane data are copyrightable, who retains what rights to the resulting works?I'm guessing that the graphs still wouldn't be copyrightable, as they'd really just be the same collection of data, but you could see a mathematically illiterate court finding otherwise...
by Mike Masnick
Fri, Jul 24th 2009 10:29am
from the confusion-abounds dept
In both cases, the companies were upset that when people started searching on their company names, the first suggestion was their company name followed by the word "arnaque," which means "scam." Of course, as you probably know, Google Suggest works by finding the most common searches on what you've typed and letting you know. So, all this really meant was that an awful lot of people were doing searches questioning whether or not these two companies were scams. But, is Google liable for its algorithm accurately suggesting the most common searches associated with those company names? It appears the courts split on that decision (it's worth noting that there was one major difference between the lawsuits: Direct Energie sued under civil code, while CNFDI sued for libel -- which apparently makes it a criminal case in France.
With Direct Energie, the judge seemed to not really understand Google Suggest or how it worked, declaring that no algorithm could justify the prejudice caused by Google. He then got confused, saying that it was clearly Google's fault because the search on "direct energie arnaque" was not the first alphabetically in the list, nor did it have the highest number of results. Despite it being explained by Google, the judge seems to have totally ignored the reason why it was at the top of the list (the number of people searching for it). Because of this, he said it's no limit on free speech to force Google to change the results, and ordered Google to do so (though, did not allow for any damages to be awarded). This seems to get the basic facts backwards, and it seems quite ridiculous to find Google guilty of such a charge when all its actually doing is accurately counting up what people are legitimately searching for.
The CNFDI ruling, seems much more reasonable. There was one oddity (though it's probably got more to do with French law than with the judge), and that is that the judge ruled that Google could be liable for libel because the company had been informed by CNFDI of the issue, thereby removing any safe harbors. In the US, Section 230 safe harbors on libel thankfully do not get waived if you've been informed. Instead, they take the much more logical position that a third party service provider should never be blamed for actions of its users. Thus, it would be flat-out ridiculous to blame Google for the phrases people are searching for. But, even having lost its local "safe harbor" protections, the judge properly recognized that the suggestion came from the algorithm looking at what people were searching for, and noted that the suggestion was based on "a valid observation." On top of that, he pointed out that search engines are "important tools for the free circulation of ideas and information," and the fact that many people were questioning whether CNFDI was a scam was, in fact, important and potentially useful information, and thus not libelous by itself. Finally, the court also noted that forcing Google to remove such a suggestion would be too big a burden on free speech and citizens' rights.
It should be no surprise that I think the second ruling is much more sensible, while the first ruling makes little sense, and appears to have been decided without a full understanding of what Google's Suggest feature is or how it works. Still, I imagine we'll be seeing similar cases around the world... and hopefully they'll find themselves in front of judges more like the one that dealt with the CNFDI case...
by Mike Masnick
Tue, Jul 14th 2009 8:22pm
from the entitlement-culture dept
Ryan, who alerted us to this story, has written up a biting, but reasonable, response, where he notes that being ranked highly in Google is no one's right. And demanding that Google be transparent about its algorithm is meaningless (while being especially ironic, given that this "well-known exec" is demanding transparency while wanting to remain anonymous himself). The key point Ryan makes:
You want an algorithm, here it is:Indeed. Create useful sites with useful content that people use, and don't be spammy, and you'll most likely rank well in Google. You don't need to force Google to reveal the nuts and bolts of its algorithm. That doesn't change anything. If you're trying to craft your websites to the specifics of the algorithm, you're already lost. If you're creating websites that match the "plain English" code above, you're going to be just fine.
1.) Sites that are useful to visitors will rank high.
2.) Popular sites that are useful to visitors will rank higher.
3.) Sites that don't offer any value to the web or are irrelevant to the query won't rank well.
4.) Sites that are harmful or spammy won't be included in the index.
Seriously, that's Google’s algorithm in plain English. There's your disclosure. The weighting factors and code behind it don't matter -- these principles are all you really need to know.
by Mike Masnick
Fri, Sep 19th 2008 7:55am
from the so-much-for-the-quants dept
Tue, Aug 21st 2007 5:36pm
from the crashing-down dept