by Mike Masnick
Wed, Apr 16th 2008 6:02pm
For years, people have talked about the "deep web" or "dark web" of information that's hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there's a lot of information that's dynamically generated on the fly, based on how someone fills out a form. For a search engine, that's problematic, as it doesn't get to see any of that information and inform people that it's there (even if it's "public" info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering "fake" data into a form to dig deeper into it. It appears that Google is only doing this on specific sites -- and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it's an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was "private" or hidden from search engines is made public by this process?
If you liked this post, you may also be interested in...
- Disappointing To See Google's Waymo Sue Over Patents
- Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot
- UK Search Engines Will Sign Up To A 'Voluntary' Code On Piracy -- Or Face The Consequences
- Bad Info In Law Enforcement Database Turned Former Cop Into A 'Suspected Gang Member'
- Senior Brazilian Court Says 'Right To Be Forgotten' Cannot Be Imposed On Search Engines