by Mike Masnick
Wed, Sep 9th 2009 6:41am
Filed Under:
copyright, real estate listings, scraping
Companies:
century 21 canada, rogers, zoocasa
Why Doesn't Century 21 Canada Want More People Viewing Its Real Estate Listings?
from the someone-please-explain dept
Of course, the real estate business has always been focused on bogus exclusions on data though the MLS system -- and apparently they don't like the idea of that data being more widely available. But, still, it's difficult to see what right Century 21 has to complain about, since the site links to Century 21 postings and should only provide them with more traffic. Unless, of course, its fear is that it can't compete by offering enough useful info on its own site.
Power.com Says Facebook Can't Block Access To User Data
from the seems-like-a-tough-claim dept
Now the case is getting odder, as Power.com has countersued Facebook, claiming that Facebook is "unlawfully withholding the data that users own (as stated in Facebook’s own ToS)." Of course, if that's true, I'm not sure if Power.com has the standing to make that claim. Wouldn't that be an issue for the user to raise themselves? Besides, I don't think there's any rule that even if a site lets you retain the copyright on content that it needs to make it easy to access. So now we have lawsuits coming from both sides that don't make much sense. The two sites should just learn to play nicely with each other.
by Mike Masnick
Wed, Jun 10th 2009 9:21pm
Filed Under:
copyright, infringement, scraping, terms of service, trademark
Can Scraping Non-Infringing Content Become Copyright Infringement... Because Of How Scrapers Work?
from the this-seems-troubling dept
Judge Fogel concluded that the allegations of the complaint made out a sufficient claim of copyright infringement because Power Ventures "need only access and copy one page to commit copyright infringement." The court also found that the ToU prohibited downloading, scraping or distributing content from the Facebook Web site content except that belonging to the user, and that in any event, using automated methods, i.e., "data mining, robots, scraping, or similar data gathering or extraction methods" to access any content were also prohibited by the ToU. Thus, the court found that the allegation that Power Ventures accessed Facebook via automated means constituted made out a claim of direct copyright infringement, while the allegation that Facebook users utilized the Power.com interface to access their own profile pages made out claim of secondary copyright infringement.Thus, because the terms of service said you can't do any automated scraping of the site, it's suddenly infringing? Even worse, the court found that even though the data being used by Power.com isn't owned by Facebook (it's the users') the scraping was still copyright infringement, because in order to scrape the non-infringing content, Power.com had to first "scrape" the whole page. O'Toole explains:
OK, so far the court has found that Power.com made unauthorized copies of the Facebook Web site. What about the fact that Facebook does not own the copyright in its users' profile data? Facebook surmounted this hurdle by arguing that the content of the Facebook page that surrounded the user's data is copyrightable and is owned by Facebook. According to Facebook, the Power.com scraper operated in a manner that required it to copy the entire Web page in order to extract the user's profile data....All of this seems a bit troubling, as it would effectively rule out scraping even non-infringing content, just because the scraper had to first read through copyrighted content to get to the non-infringing stuff. But, that seems to go against the entire purpose of copyright law. The fact that the scraper reads copyrighted content shouldn't mean that it's infringement. It's not doing anything with that content other than using it to find the content it can make use of. Anyway, this ruling probably doesn't mean all that much, since it was just to reject the dismissal request, but it does seem odd that the judge gave so much weight to Facebook's terms of service, and seems to indicate the mere act of scraping can be copyright infringement.
Note that the court is conditioning its ruling on the assertion that the Power Ventures scraper necessarily copied the entire Web page before it processed the page and extracted the profile data. That comports with my (limited) understanding of how a Web scraper works. But is it true? If it were true, couldn't an argument be made that this is a fair use of the page? I'll leave that for better lawyers.
by Mike Masnick
Wed, Apr 22nd 2009 4:39am
Filed Under:
monetizing content, plagiarism, scraping, syndication
Companies:
fair syndication coalition
New Consortium Says If Others Can Monetize Better Than We Can... We Deserve Their Money?
from the please-explain dept
That makes no sense to me. If you can't monetize your own content better than other sites, you don't deserve to be in business. If other sites are actually getting traffic and ad revenue that you think you deserve, it means you're doing a bad job giving people a real reason to visit your site and to interact with your community. Simply demanding money from the sites that have done things better makes no sense. Of course, the reality is that most of these sites haven't done things better, and don't make any money. So the whole grandstanding seems rather wasted effort.
Focus on making your own site worth visiting. Stop worrying what others are doing with your content.
by Mike Masnick
Fri, Aug 8th 2008 11:11am
Filed Under:
aggregation, airlines, cancel, global distribution services, scraping, ticket prices
Companies:
ryanair
Airline Plans To Cancel All Flights Booked Through 3rd Party Websites
from the piss-off-your-customers-much? dept
Yes, we understand that these airlines prefer people to purchase flights from the airlines directly, but it still seems bizarre to try to cut off a great promotional channel. People already know to go look at 3rd party sites for airfare, so actively working against having your flights promoted doesn't make much sense. Then actively pissing off a bunch of your customers who booked through those sites by canceling their flights is even more braindead, as you've just formed a huge group of customers who will complain about your airline and spread the word about how you canceled their legitimately purchased flight for no reason other than spite and a confusion over business models. When Ryanair started promoting how some of its seats might come with sexual gratification, I'd bet many passengers didn't realize it would end with them getting screwed.
by Tom Lee
Tue, Jul 29th 2008 1:50am
Filed Under:
aggregation, airlines, global distribution services, scraping, ticket prices
Companies:
american airlines, kayak
The Airlines' Ongoing Struggle With Price Aggregation Sites
from the airlines-vs.-aggregators? dept
It's proving pretty difficult to figure out exactly what happened between American Airlines and Kayak last week. Last Wednesday TechCrunch reported that American Airlines was pulling its listings from the airfare search engine. Comments left by Kayak's CEO Steve Hafner and VP Keith Melnick chalked the split up to Kayak's display of AA fares from Orbitz: American had demanded that Kayak suppress the Orbitz listings, and Kayak refused.
Presumably one of two things is making American want to avoid comparison to Orbitz prices: either, as TechCrunch speculates, users clicking the Orbitz option put AA on the hook for two referral fees -- one to Kayak and one to Orbitz; or AA has struck a deal with Orbitz that provides the latter's users with cheaper fares than can be found on aa.com.
Either way, the news doesn't appear to be as dire as it first sounded. It doesn't seem that AA flights will be disappearing from Kayak -- it's just the links to buy them at aa.com that will go missing. As Jaunted points out this might wind up costing flyers a few more dollars, but it shouldn't be a major inconvenience for Kayak customers.
The more interesting aspect of this episode is how it reveals the stresses at play in the relationship between the airlines and travel search engines like Kayak. It's no secret, of course, that the airlines are having a rough time as rising fuel prices put even more pressure on their perennially-failing business model. But while an airline attempting to control the distribution of its prices is nothing new, one can't help but wonder whether ever-narrowing margins might lead to a shakeup of this market.
Kayak, like most travel search sites, gets its data from one of a handful of Global Distribution Services: businesses that charge airlines a fee to aggregate price and reservation information. Some airlines, like Southwest, opt out of the GDS system in order to avoid those fees. Others, like American, participate in the system but try to send as much online business as possible to their own sites. Presumably each airline tries to find an equilibrium point at which the business brought in by participation in a GDS and the payments associated with it add up to the most profit.
But so long as the financial temptation to retreat from the GDSes persists, GDS data will be less than complete. And that creates an opportunity for another kind of fare-aggregation business -- one based upon scraping the data from the airlines' websites. It's been done before, after all, albeit on a limited scale. And since most people recognize that prices can't be copyrighted, there doesn't seem to be any legal barrier stopping such an aggregator from stepping in (nothing besides the need to write a lot of tedious screen-scraping software, that is). Though, of course, that won't stop airlines from suing, but the legal basis for their argument seems pretty weak.
Whether such a business is likely to emerge and succeed, I couldn't say. But it does seem certain that as fuel prices rise we'll be seeing more and more travel industry infighting -- and more and more hoops for online fare-shoppers to jump through.
by Tom Lee
Fri, Jan 4th 2008 3:20pm
Filed Under:
privacy, robert scoble, scraping, social graphs, social networks
Just Assume Any Info You Put Online Is Public
from the welcome-to-the-new-world dept
Having noted that a script acting on Scoble's behalf can only access information that Scoble himself can reach manually, Julian argues that this can't be considered the only criterion in evaluating the situation:
[P]rivacy is not just a function of the publicity of your personal information, but of the searchability and aggregability of that information. Public closed-circuit surveillance cameras, for instance, typically capture the same information that a casual observer on the street is already privy to. But we recognize that being spotted by diverse random pedestrians, or even being captured on diffuse and disconnected private security cameras, is not intrusive in the same way as being captured on a citywide surveillance system that is searchable from a centralized location.
All of this seems true: individuals' attitudes about privacy are rightly driven by a pragmatic appraisal of the likelihood of someone doing something bad with the available information — a judgment based on the information's value and the cost of obtaining it. Ripping up your credit card statement before throwing it in the trash doesn't make it impossible for a dumpster-diving thief to target you, but it increases the difficulty of ripping you off enough that you'll probably be safe.
But I think Julian makes a mistake when he assumes that this is a viable way to conduct your life online. The problem with applying this approach to an digital context is that a user's estimation of the accessibility of a given piece of online information is almost invariably going to be too low — and will be getting more so by the second. The costs to automatically collecting data are very small and getting smaller.
There are a few reasons for this. First, the tools are getting better. Libraries like WWW::Mechanize are simple for any programmer to use and available in a variety of languages. And GUI-based applications like Dapper and Piggy Bank aim to make things even simpler. Second, if done properly, it's very difficult to prevent, detect or punish automated data collection. Facebook's script detection technology is impressively existent relative to that of its competitors, but it's still almost certainly trivial to subvert it with proxies, faked user agents and plausibly human delays. Third, once the data is collected it can, of course, be easily distributed.
And the situation is only going to get worse! In fact, it's getting worse at such a rapid rate that counting on the privacy of any even slightly public online information is a mistake.
The negative reaction to Scoble's script is coming from users who think of it as a violation of the covenant they perceived to surround their data. But that covenant was based upon their own mistaken understanding of the internet. Scoble's actions shouldn't be viewed by these users as a transgression against them, but rather as a pleasantly benign lesson.
It's fine to lament the situation, or to applaud Facebook for taking steps to keep its valuable, freely-acquired user data away from competitors (and, while they're at it, script-employing users). But this assertion of community norms is unlikely to stop those who, unlike Scoble, are genuinely acting in bad faith. The technology for containing digital cats in digital bags is woefully inadequate, and it's unlikely to improve anytime soon.
Thu, Jan 3rd 2008 3:11pm
Filed Under:
privacy, robert scoble, scraping, social graphs, social networks
Companies:
facebook
Is There A Conflict Between Open Social Graphs And Your Privacy?
from the what-about-your-friends? dept
Intuitively, it makes sense for users to be able to make whatever use they please of information about their own social networks. But in a social network, "your" information is someone else's as well. And on a site like Facebook, much of that information will have been provided in the context of a set of individually calibrated privacy controls, by people who expected it to be used in that context by a limited audience. Exporting that information without permission, then, raises important privacy questions.
Within Facebook, users have a fair amount of control over who can access what information about them. I can choose to block particular users on Facebook, rendering myself wholly invisible to them, as though I weren't even on the network. I can decide how much of my profile information will be visible to friends, to people who live in my region, to the general Facebook membership, and to the Internet at large. I can even decide how aggressively public, so to speak, such information will be. Lots of Facebook users are happy to let friends view their relationship status, but disable those status notifications in their news feeds, to prevent everyone they know from being simultaneously blasted with the news that "Bob has gone from being in a relationship to being single." Automated data collection "liberates" information from those constraints, possibly against the wishes of the people who provided it.
It's true that a script can only sweep up information that would already have been visible to a particular user anyway. But privacy is not just a function of the publicity of your personal information, but of the searchability and aggregability of that information. Public closed-circuit surveillance cameras, for instance, typically capture the same information that a casual observer on the street is already privy to. But we recognize that being spotted by diverse random pedestrians, or even being captured on diffuse and disconnected private security cameras, is not intrusive in the same way as being captured on a citywide surveillance system that is searchable from a centralized location. By the same token, I may be unhappy with the possibility of someone forming an external public database full of data I've freely shared with more narrow communities—personal, regional, or whatever.
None of this is to deny the initial intuition that it's desirable for users' social graphs to be portable to some extent. But as with all forms of intimacy, openness and privacy complement each other: We feel free to share information about ourselves to the extent that we have some assurances about how that information will be used. So while it's one thing to argue that Facebook should enable greater openness or portability in some particular way, subject to user control, it seems like quite another to criticize them for enforcing a rule about indiscriminate automated data collection.by Mike Masnick
Wed, Oct 10th 2007 9:41am
Filed Under:
copyright, news, scraping
Companies:
associated press, moreover, verisign
AP Sues VeriSign For Copyright Infringement; Mostly Pointless
from the what's-going-on-here? dept
Update: Rafat Ali from PaidContent stopped by in the comments to point to the full lawsuit documents, posted on his site. From there, it appears that the AP's lawsuit is mostly ridiculous, with just a little tiny bit of reasonable thrown in. Most of the claims are about the fact that Moreover is spidering and scraping AP news feeds, and providing both free and paid subscribers headlines and the opening lede. However, it's pretty difficult for the AP to make a copyright claim here, since those links are almost definitely fair use, especially since they point people to legitimate AP licensees. There's a little gray area where Moreover indexes and caches the articles on its own servers -- but Google has been doing that for years without much of a problem -- and if the AP is really upset about it, there's always the old robots.txt solution. The one area where the AP may have a claim (though, the evidence does not seem clear from the exhibits) is in saying that there are times when Moreover will show subscribers a full AP article hosted on its own servers, rather than passing them through to a licensee. If true, then that would likely be copyright infringement -- though the "damages" would be minimal, if anything. Finally, the claim that this is an AP trademark infringement by listing the AP as a news source seems laughable. All in all, the original assessment stands: this is the AP unable to adapt and lashing out at those who are helping to promote their content.





