by Mike Masnick
Wed, Apr 16th 2008 6:02pm
For years, people have talked about the "deep web" or "dark web" of information that's hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there's a lot of information that's dynamically generated on the fly, based on how someone fills out a form. For a search engine, that's problematic, as it doesn't get to see any of that information and inform people that it's there (even if it's "public" info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering "fake" data into a form to dig deeper into it. It appears that Google is only doing this on specific sites -- and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it's an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was "private" or hidden from search engines is made public by this process?
If you liked this post, you may also be interested in...
- Hold On... We May Actually Be In For A THIRD Oracle/Google API Copyright Trial
- Google Fiber Hasn't Hit A 'Snag,' It's Just Evolving
- Reports Shows UK Police Improperly Accessed Data On Citizens Thousands Of Times
- France Still Thinks It Regulates Entire Internet, Fines Google For Not Making Right To Be Forgotten Global
- As India Goes After Google, A Simple Question: Do You Really Want Governments Deciding Search Results?