How Deep Should Deep Search Go?

from the faux-form-entries dept

For years, people have talked about the "deep web" or "dark web" of information that's hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there's a lot of information that's dynamically generated on the fly, based on how someone fills out a form. For a search engine, that's problematic, as it doesn't get to see any of that information and inform people that it's there (even if it's "public" info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering "fake" data into a form to dig deeper into it. It appears that Google is only doing this on specific sites -- and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it's an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was "private" or hidden from search engines is made public by this process?
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: dark web, databases, deep search, hidden information, search engines
Companies: google


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    joesmith323@hotmail.com, 16 Apr 2008 @ 7:06pm

    what does the site want?

    It strikes me that some Google searches results already come back looking like a query result from the target web site.

    It seems to me however that Google should not be doing this. It would be better for the site to have pages of text specifically designed for the crawler to review with a tag pointing to the appropriate "real" pages on the site.

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here



Subscribe to the Techdirt Daily newsletter




Comment Options:

  • Use markdown. Use plain text.
  • Make this the First Word or Last Word. No thanks. (get credits or sign in to see balance)    
  • Remember name/email/url (set a cookie)

Follow Techdirt
Insider Shop - Show Your Support!

Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.