by Mike Masnick
Wed, Apr 16th 2008 6:02pm
For years, people have talked about the "deep web" or "dark web" of information that's hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there's a lot of information that's dynamically generated on the fly, based on how someone fills out a form. For a search engine, that's problematic, as it doesn't get to see any of that information and inform people that it's there (even if it's "public" info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering "fake" data into a form to dig deeper into it. It appears that Google is only doing this on specific sites -- and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it's an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was "private" or hidden from search engines is made public by this process?
If you liked this post, you may also be interested in...
- Judge Suggests Attorney General Jim Hood Is Unconstitutionally Threatening Google 'In Bad Faith'
- Copyright Bots Kill App Over 'Potentially Infringing' Images, Follow This Up By Blocking App For Use Of CC/Public Domain Images
- Accidentally Revealed FTC Document Details Some Questionable Google Practices, But Not The Ones Most People Focused On
- Spanish Court Limits Scope Of EU's Right To Be Forgotten
- EU Thinks It Has Jurisdiction Over The Global Internet: Says Right To Be Forgotten Should Be Global