The following is the first post in a three-part series surrounding bot detection and neutralization based on botnet analysis. The series will begin by addressing commodity form/comment spam.
One of the unfortunate realities of running a site on the Internet is the amount of "background noise" -- the automated, unsophisticated, poorly targeted attacks, which make up the bulk of malicious web traffic. For the sake of this series, we're calling this 'botnet' traffic.
Identifying Bot Behavior - Form Spam
We've written before about active interrogation as a method for distinguishing bots from human users, but sometimes telling the two apart can require much lower precision and zero active measures.
Form spam looks very different from legitimate use of the forms -- where a legitimate user will post once or twice, at a rate consistent with hand-typing messages, consistent with, you know, having other things to do, sleeping, etc. A spammer or commodity bot, on the other hand, just doesn't look right. POSTs are sustained, even at a low rate, the content doesn't match the site it's posted to, and they overwhelmingly include links.
Some form spam examples:
These posts are easily recognized as spam to the human reader and many popular blogging platforms can detect them and send them to a spam folder if configured to do so, but this typically relies on either IP reputation (error prone), analysis of each and every post (inefficient), or a phrase blacklist (inefficient AND error prone).