by Mike Masnick
Wed, Apr 16th 2008 6:02pm
For years, people have talked about the "deep web" or "dark web" of information that's hidden from the public (and search engines), sometimes behind registration or paywalls, but more often behind specific forms. That is, there's a lot of information that's dynamically generated on the fly, based on how someone fills out a form. For a search engine, that's problematic, as it doesn't get to see any of that information and inform people that it's there (even if it's "public" info). However, it looks like Google is attacking this problem by setting up its spiders to actually enter information into public forms to try to dig a layer or two deeper. The search engine is trying to be quite careful on this, as obviously it might make people question whether a search engine should be entering "fake" data into a form to dig deeper into it. It appears that Google is only doing this on specific sites -- and is paying attention to all robots.txt type info that wards off its spider. As for the more interesting question of what Google is entering into forms, apparently it tries to guess reasonable info from the context of the site. Who knows how well this actually works? But it's an interesting experiment. However, how long will it be until someone freaks out when they realize some info they thought was "private" or hidden from search engines is made public by this process?
If you liked this post, you may also be interested in...
- Short Sighted Newspaper Association Asks Trump To Whittle Down Fair Use, Because It Hates Google
- Senior Brazilian Court Says 'Right To Be Forgotten' Cannot Be Imposed On Search Engines
- Convicted Felon Ask Google To Delist Multiple Government Websites Because His Name Is Protected By 'Common Law Trademark'
- Kuwait Backtracks On Mandatory DNA Database Of All Citizens And Visitors
- Yes, Police Are Snooping Through Criminal Databases For Personal Reasons All The Time