How Deep Should Deep Search Go?

from the faux-form-entries dept

Wed, Apr 16th 2008 06:02pm - Mike Masnick

For years, people have talked about the “deep web” or “dark web” of information that’s hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there’s a lot of information that’s dynamically generated on the fly, based on how someone fills out a form. For a search engine, that’s problematic, as it doesn’t get to see any of that information and inform people that it’s there (even if it’s “public” info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering “fake” data into a form to dig deeper into it. It appears that Google is only doing this on specific sites — and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it’s an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was “private” or hidden from search engines is made public by this process?

Comments on “How Deep Should Deep Search Go?”

Anonymous Coward

April 16, 2008 at 6:38 pm

Already happened.

Like Google said in their statement, they’ll still respect the proper flags. If you or the person you hired for a web admin messed up and didn’t flag your site so it wouldn’t be indexed its YOUR fault.

robot.txt exists for a readon. So you CAN have private data on the Internet, or at least be able to file litigation if someone ignores the file.

Everyone’s favorite analogy is if you leave the front door of your house open you can’t complain. This is wrong. But they have the right idea.

With the Internet not only is your front door wide open, but by putting it up on a Webpage you’ve made a giant banner or neon sign saying “FREE STUFF HERE” so people will stop by.

The tools have already been given to you to avoid this worry. If you chose not to use them, willingly or due to ignorance, the responsibility is on your hands.

joesmith323@hotmail.com

April 16, 2008 at 7:06 pm

what does the site want?

It strikes me that some Google searches results already come back looking like a query result from the target web site.

It seems to me however that Google should not be doing this. It would be better for the site to have pages of text specifically designed for the crawler to review with a tag pointing to the appropriate “real” pages on the site.

Anonymous Coward

April 16, 2008 at 7:44 pm

Re: what does the site want?

“It strikes me that some Google searches results already come back looking like a query result from the target web site.”

On some sites, if you do a “search” on that site, the site will place a direct link to those search results on a Google-findable page. For example, some sites have lists of “most popular searches” and google can index those links.

I think that if a form has a finite number of choices, then Google should try searches. I also think Google should steer clear of POST requests.

Anonymous Coward

April 16, 2008 at 9:53 pm

From article, ” However, how long will it be until someone freaks out when they realize some info they thought was “private” or hidden from search engines is made public by this process?”

Not long. As the first response already said, it’s very easy to ask Google to keep private data private.

This is techdirt, generally readers who would understand what Google is doing will understand how Google allows you to counter them. Was that question really as provocative as you had hoped? I wouldn’t want to close on a line like that.

Stefan Mai

April 17, 2008 at 8:43 am

The issue is fairly clear cut. There are few forms on the internet that use GET submission because it pushes it up into the URL and you’re fairly limited on length and input types. The only place you SHOULD be using GET is for searching or accessing deeper information, not for submitting as in POST. The verbs are there for a reason, Google is respecting this, and it’s a null issue.

Add Your Comment

Monday
09:22	Australia's Next Target In The War On Kids Online: Your VPN (0)
05:24	Brendan Carr Is Illegally Dismantling U.S. Media Consolidation Law (2)
Sunday
13:00	Funniest/Most Insightful Comments Of The Week At Techdirt (2)
Saturday
12:00	This Week In Techdirt History: July 12th - 18th (2)
Friday
19:39	RFK Jr. May Have Violated The Hatch Act In Encouraging Iowa Congressional Candidates To Drop Out (9)
15:01	Ken Paxton Vowed To Crack Down On “Illegal Voting.” He May Have Violated Texas Election Law. (5)
12:57	Ctrl-Alt-Speech: Putting Some Meat On The Bans (0)
10:59	JK Rowling Threatens To Sue Amnesty International Into Oblivion For Expressing An Opinion About A Charity She Funds (80)
10:54	Daily Deal: 6-in-1 Magstand Mini Magnetic Charge Station + Bedside Lamp (0)
09:29	The Documents Trump Declassified To Blame China For The 2020 Election... Actually Show Russia Was The One Meddling For Him (38)

How Deep Should Deep Search Go?

from the faux-form-entries dept

Comments on “How Deep Should Deep Search Go?”

Already happened.

what does the site want?

Re: what does the site want?

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Monday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

How Deep Should Deep Search Go?

from the faux-form-entries dept

Comments on “How Deep Should Deep Search Go?”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Monday

Sunday

Saturday

Friday

More

Email This Story

Tools & Services

Company

Contact

More