Spider Trap Article Index for
Spider
Website Links For
Spider
 

Information About

Spider Trap




Common techniques used are:
  • creation of Infinitely deep Directory structures like
    http://foo.com/bar/foo/bar/foo/bar/foo/bar/.....

  • Dynamic Pages like calendars that produce an infinite number of pages for a web crawler to follow.

  • pages filled with a large number of characters, crashing the Lexical Analyzer Parsing the page.


There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.


EXTERNAL LINKS

  • http://danzcontrib2.free.fr/en/pieges.php

  • http://www.ikt-ret.dk/projects/werd.shtml

  • http://evolt.org/article/Using_Apache_to_stop_bad_robots/18/15126/

  • http://www.fleiner.com/bots/