Many web spiders and especially ‘dodgy’ content bots do not respect the robots.txt file.. below is some code which can be added to your .htaccess file which will help block bots which use user_agents. Banning via ip address although useful is a bit of a losing battle, as originator can just switch to another proxy.

## Bot Protection
RewriteCond %{HTTP_USER_AGENT} (Access|appid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Capture|Client|Copy|crawl) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Data|devSoft|Domain|download) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Engine|fetch|filter|genieo) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Jakarta|Java|Library|link|wsr-agent|MJ12bot|SeznamBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AhrefsBot|MJ12bot|nutch|Preview|Proxy|Publish|Kraken|Baiduspider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (scraper|spider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Win32|WinHttp) [NC]
RewriteRule .* - [F] 
## End Bot Protection