Company Claims Its Software Can Magically Identify 'Rogue Sites'

from the that's-not-how-it-works dept

A company called RogueFinder is claiming that it has automated the process for finding rogue sites:

The basic idea is to draw links between seemingly unconnected ?rogue? web sites, e.g. web sites selling counterfeit goods. According to the RogueFinder web site, its software takes minutes to do what it takes forensics teams months to achieve.

It uses data from registries, registrars, web hosts, servers, and ISPs as well as inspecting the sites? ?invisible source code?.

Sounds useful for playing parlor tricks. Not so sure for a system involved in blocking protected speech. As we’ve discussed time and time again, one of the issues in all of this is that determining what is and what is not infringing is not an easy task. At all. It takes a human being who can actually analyze the situation and how it falls under copyright law — including exploring specific exemptions. It’s time that we got rid of the myth that there’s any significant way to magically identify what’s infringing and what’s not.

Filed Under: ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Company Claims Its Software Can Magically Identify 'Rogue Sites'”

Subscribe: RSS Leave a comment
That Anonymous Coward (profile) says:

Re: Re: Re: Re:

I thought I covered the magic smoke before… had to go find it again…

“The magic pixies who live inside the thinking box, when you double click it they are forced to once again pick up their instruments and reproduce the drivel to appease their human captors.

The magic smoke one sometimes sees leaving a computer case is actually the souls of pixies pushed to far and to hard to reproduce to many songs in a public performance.

Before you torrent that next album, won’t you stop and think of the pixies?


Anonymous Coward says:

Link a system like this up with SOPA automate it and you could probably shut down a significant chunk of the web, certainly what’s accessible in the US. The false positives would just be acceptable collateral damage with it seems limited recourse. Of course then the countermeasures to get around the system come online and we have a race to see who is better.

What a waste of time/money/effort.

Anonymous Coward says:

Re: Re: Re:

Sounds like a ‘make work’ program for lawyers….

We have software that can identify thousands of people you can sue automatically… Imagine not having to spend all that time and effort gathering ip addresses and fake names for your extortion schemes… er legal filings, with our automated software, you just point it to a piece of content, enter the number of suckers (aka litigants) you want to try and extort money from, and our system will use it’s “Magic Six Degrees of Kevin Bacon Methodology” to identify the appropriate number of individuals to include in your suit.

fine print: no warranty expressed or implied, all results made up on the spot based on random ip address associations, no guarantee of actual infringement or any proof is ever provided by this software, the results of this system are not valid for legal filings and should not be relied upon for initiating legal proceedings… (we know nobody reads the fine print… so if you use our software to identify people to sue, you are violating our licensing agreement on any suits filed, and you agree to pay us $1000 per name identified by our softwar and used in your suit)

Yes, THIS IS SOFTWAR…. get in the game or move on…

TechnoMage (profile) says:

I doubt it

If this were possible, don’t you think Google (or some other search engine) would be selling this service to the the MPAA/RIAA/etc.

Everyone likes to shout “free market” all the time, but forget to take it into account in most discussions that actually require it.

Plus first of all, all they are doing is using bots to crawl websites they ‘mark’ as rogue, and then to make a database of other sites that are linked to by sites that link to that original marked site(s).

This would be the 3rd homework problem in any class teaching how to make a search engine after 1) how to make a crawler-bot, 2) how to make a DB of sites 3) (this) how to link sites together in groups

Anonymous Coward says:

Spectral Evidence

Spectral Evidence
From Wikipedia, the free encyclopedia

Spectral evidence is a form of evidence based upon dreams and visions. It was admitted in court during the Salem witch trials by the appointed chief justice, William Stoughton. The booklet A Tryal of Witches taken from a contemporary report of the proceedings of the Bury St. Edmunds witch trial of 1662 became a model for, and was referenced in the Trials when the magistrates were looking for proof that such evidence could be used in a court of law.

Spectral evidence was testimony that the accused witch’s spirit (i.e. spectre) appeared to the witness in a dream or vision (for example, a black cat or wolf). The dream or vision was admitted as evidence. Thus, witnesses (who were often the accusers) would testify that “Goody Proctor bit, pinched, and almost choked me,” and it would be taken as evidence that the accused were responsible for the biting, pinching and choking even though they were elsewhere at the time.


(Citations omitted.)

Nom says:

It might be useful, but it will invariably have a lot of false positives and false negatives. Especially if you consider the logical malleability of law and how it relates to IP.

It might be useful as a tool to gather potential instances of infringement. However, these instances will still need people to verify if they are infringing or not.

Although, based upon many companies previous behavior when given tools to locate potentially infringing material, they are likely to take this software’s list and send out mass takedown notices without properly checking.

Can’t wait to see the false positive rate from this software.

Violated (profile) says:

Oh great like we need this.

DMCA law gave them cruise missiles. Over half target a business rival. One third were invalid attacks.

Now SOPA will give them nuclear weapons and you can watch part of the Internet get obliterated before your eyes.

Then what better then for lazy copyright owners to put the pending WWIII all on computer control.

Grae (profile) says:

Re: Re:

“invisible source code” is not a technical term based on actual technology; therefore this is not “technology being used to crack down on piracy”.

It’s more like a dash of poor understanding and a bucket of desperation sweat mixed with a cup of web crawling search engine bots to make a “magic” potion that cures the poor, poor ailment all the IP welfare leeches are suffering from known as “being forced to adapt your business model to fit reality.”

No one else in the world gets to sit around like a lazy piece of trash and perpetually make money off of work they did in the past. So make sure you keep the tear stains off your resume as you go out there and look for a real job.

Anonymous Coward says:

Re: Re:

Acutally, if I were to guess, it means that the software will find links on the pages, and then follow those links and/or access those ‘other’ pages.

For example, on our ‘job application’ page, we have been getting several ‘error’ emails (everytime someone doesn’t fill out the page correctly, and attempts to submit the job application in a ‘bad’ aka ‘sql injection’ type format, we are sent an email informing us of the submitting computer/user info…yes, we coded this ourselves…not a software package).

If you physically look at our page, all you see is a submit button. In the “source code” section, you can see where some things are processed and then it jumps to a different page. That 2nd page does some additional checking, and then inserts the data into the database. The user never sees anything except ‘processing’.

We have had bots recently that have been skipping the first page, and going directly to the 2nd page and attempting to inject code there. However, we have already built in for that possibility, so the 2nd page errors out and shoots us an email.

If you were to attempt to explain ‘invisible source code’ to a non techie, then technically, to them, the 2nd page is ‘invisible’. Nothing on their screen gives them the impression that they are on a different page.

abc gum says:

In our inexhaustible race to the bottom, would such snake oil sites fall under the guise of rogue?

1 vagrant, tramp
2 a dishonest or worthless person : scoundrel
3 a mischievous person : scamp
4 a horse inclined to shirk or misbehave
5 an individual exhibiting a chance and usually inferior biological variation

Anonymous Coward says:

Re: Re:

As I was reading the article I pictured a crowd of people gathered round a wagon with a man standing on it with saying “You sir, yes you. Are you plagued with rogue sites in the middle of the night? Want something to get rid of them? Well look no further, I’ve got the solution to all your ills.”

Anonymous Coward says:

The funny part is to some extent, this isn’t really a hard thing to do.

Many of the “rogue sites” do things that are common between sites. From linking images from “file hosts” to intentional misspellings of words, there are plenty of things you can use to filter down and give the old mark 1 eyeball something to look at.

Many of these sites use similar source code and layouts, those who move from domain to domain and host to host often upload the same site over and over again, with minor variations. Over time, you can build up a library of these pages and be able to spot similar sites. Duplicate content is one of the ways these sites often stand out.

You can also look at the products they offer, the hosts they use, the payment processors, and all of that stuff to look for commonalities. If you can filter down 100,000 sites offering “nike shoes” down to a list of 200-300 that are likely rogue, then review them by hand, you would probably have a pretty high success rate.

You could also use honeypots to catch their spam. Opening a wordpress site and allowing open comments is a great way to find out who is scamming what. Similar results can happen using various forum software and other types of sites that permit user comments or postings.

100% success rate? No way. Reasonable successful? I suspect that it can be done.

Anonymous Coward says:

Re: Re:

Again, it’s about what it’s gonna take to get there.

Some of us ‘old time network admins’ that just now got out from under the ‘omg how do we filter spam’ umbrella knows that it took YEARS before filtering out spam without filtering legit emails became manageable.

The hours we spent with configuring software solution after software solution….the months we spent reading log files, the years we spent making phone calls to the ISP, to the sending server IT dept, the finger pointing about who’s fault it is that a legit email didn’t make it.

(don’t give me that ‘3rd party crap’, those are the hardest to track down why an email didn’t make it to it’s destination…but I digress.)

Now….now….because some dying “entertainment” industry can’t save their own ass and want to go crying to Gov for a handout… I get to find out why our purchasing agent can’t find the rivets he needs to build this sidewall to the plane, because of more filtering crap.

Haven’t we learned by now?

SO I wanna know……is the **AA’s or the government going to reimburse businesses for IT time spent tracking down problems with legit business activities…such as purchasing steel, or shipping products because of the 92 different filtering softwares that are going to flood the market with horrible code, and a lack of understanding of business rules outside of their own world?

Griff says:

It might actually work

I sat on a plane soon after 9/11 with someone whose obscure research was suddenly now funded. It in effect looked at meta data of phone calls and could identify different types of groups (family, sales force, people planning a batchelor party, terrorist cell) by the different network behaviour.

So, based on various factors (a site 2 days old with gigs of content for example?) registrant’s associations with previous “rogues”, traffic patterns (if they could get access to this or deduce it from response times), I could see how sites could be characterised into types (news, eCommerce etc) pretty rapidly.

And this doesn’t need to be perfect – it just needs to make the xxAA’s job slightly less of a needle in a haystack. Simply finding a site which has music for download on has already narrowed the field a bit. (It’s not like the results are ever going to be used as actual proof of anything). And the backlinks (the company they keep) will will give clues too.

If they can cheaply trawl a million newly registered domains
and give a vague probability that a site might be a non legit download site, that changes the odds and the timelag in the game of whack a mole.

I suspect that this company will sell the “software” as a service, charge a fortune, but their “server” will actually be some google-literate students told to locate content in return for pocket money, focusing particularly on content pertinent to the paying customers they have signed up. In the distorted world of said content holders, this will appear to offer great value, and a follow on service of filing a takedown will be sold by the lawyers for each site located. Content holders will think this is hugely helpful and will be reminded of all the lost sales that it is preventing.

And not a single extra CD will be sold as a result.

One interesting question arises.

If said “software” finds a site (or thousands of sites) and verifies that they are indeed offering infringing content (how, by downloading ?) Is the holder of the software in violation of any laws ? Every time this software downloads, a sales is lost !

TtfnJohn (profile) says:

Beyond the thought that “invisible source code” might be something as innocent as shrouded PHP scripting using the Zend engine I have no idea what that is. And all that does is make the source inaccessible to prevent copying unless it’s encrypted which means the browser has to receive an encryption key so that it will actually run the code on the client side if that’s what’s intended.

The other possibility is that it’s code that runs on the server and the client side never sees it during the code’s execution. From what they describe it could be either or nothing at all. If that’s what’s happening they may be going to use the application to break into servers, something itself that’s illegal but I guess this band of lawyers gets to excuse this because they’re on the side of the “angels”. At least in their minds.

Now data mining CAN be useful. Not will be useful as their site (a multi page advertisement in reality) as there are no guarantees. First you have to know what you’re looking for. They claim they do though the sites they describe are usually those associated with harvesting credit card numbers, passwords, identity theft and that sort of thing in the sense that they set up look alike sites of of a bank and ask questions of the user no bank ever would. They may also have to do with the gray/black market for prescription drugs. They claim that by their software’s analysis of the data mined the can create a collection of, frankly, unbelievable connections between owners, hosts, ISPs and other data to bring the offender to court.

The thing is this, found on a the About Us page.
“ROGUEFINDER? Investigative Software is currently in active development by a team at RogueFinder LLC, located in New York City.
The impressive team includes experienced intellectual property attorneys, private investigators, software analysts and technical consultants. Each team member is involved in critically important elements of the software, including:..”

Whoops. The software isn’t finished yet. But, hey, we’re working on it.
Notably missing from the list are statistical analysts which one needs to do effective data mining as all data mining does is result in a stack of statistics which get tossed out of whack the moment something unexpected data comes along if you’re relying on a collection of preset

They also claim the software is patent pending, along with the usual copyright and trade mark claims. While I won’t, completely, dispute the last two the first seems unlikely as they would be relying on an aircraft carrier stuffed full of prior art to do what they claim to be able to do. (With unfinished software even!). As for copyright, there may be questions there too as some things cannot be subject to copyright. Things like facts, mathematical equations (aka algorithms) and many others that appear in software. The specific expression in that software is protected with copyright before someone tries to jump on me for that.

More than anything the site looks like an almost well written ad for vapourware stuffed with an over abunance of stock photos. If they’re looking to tag those who send out spam with the Nigerian scam in them, fake bank notices about expired passwords and what have we it’s gonna fail. If, for no other reason, that sites that those are run by organized crime, often Russian, who have far more resources available to them to counter this vapourware than this law firm has. And I can hear them laughing from here some 4000 miles away to the east as the crow flies. I can hear them tapping out software right now to counter what this software claims to do.

As for file sharing sites, the ones copyright purists want to target as SOPA and PIPA claim to do, virtually all of those are small operations with few, if any ads, collecting some support through donations and stuff. Not the kind of sites that are likely to be raking in money.

File lockers are both ad and subscription supported but their legitimate uses far outweigh any illegitimate uses. They do respond to takedown notices so they follow the letter and spirit of the DCMA as it is.

As I said, all they’ve done is warn the very people best equipped to counter them. And counter them they will while the law firm collects a hoped for ton of fees on games of whack a mole. I have yet to figure out how a New York based law firm can bring suit in Russia, Canada, France, the UK and so on when they’re not members of the Bar in any of those countries. Unless, once again, the idea is to collect a liability award in the United States and whack the site owners if they’re foolish enough to visit the U.S. at some point in the future under the name they used to register their site(s). Good luck there.

I’m not for a moment minimizing the threat of fake prescription drugs, the possibility of identity theft or other serious issues where organized crime would see a profit. Hell, I’ll even concede that perhaps another fake Dior handbag might hurt someone, somewhere though we already know and have known for years that the majority of those come from Hong Kong.

File sharing by individuals it won’t stop.

Still, if I was tasked with reviewing this software with an eye to using it I’d want to see real world data, test results, a complete and detailed description of the methodology and the complete source code. Until I got all of that not a penny would go their way.

Something about this stinks. Badly.

TtfnJohn (profile) says:

Re: Re:

Oh yeah, and the database needed to store all the data gathered to data mine the Internet would be enormous. Given that HTTP protocol is connect, exchange, drop then do it all again at the next click the number of connections and drops on the Web is massive. Beyond massive.

Even if they do write software to isolate suspicious transactions, at the end of the day it will still take human eyeballs to verify it all.

Of course, all it takes is to bust one 14 year old girl and one granny sharing thier own photos that are mistakenly identified as bearing an actionable copyright. Not that we haven’t been down that road before. Of course it won’t happen. Not in a million years! Ok, a million microseconds then.

DB (profile) says:

What Protected Speech?

Tell me again — if someone is enabling and encouraging the downloading of unauthorized copies of a work, which, under the law of the location the download is saved, how is that “protected speech”? Free speech is “free” as in “freedom” — you can convey your own ideas about society, government, critique and petition the government. “Free” speech is not “free” as in “I don’t want to pay”. Not to say there aren’t potential problems in SOPA/PIPA or creating a new ITC action in OPEN, but I still don’t see how it’s censorship or a restraint on free speech if it were used to stop illegal downloading from foreign sites.

Larry P. says:

Rogue Sites

The thing is that you wouldn’t need this software to be anywhere near 100 percent accurate. Investigators working for civil litigants and the feds now aren’t anywhere near 100 percent accurate, and their conclusions are routinely accepted when Courts seize sites. As for how it works, I would think it would be pretty easy to link sites based on a lot of variables that you can see without even getting into “invisible source code,” whatever that means lol

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...