So, all of this makes it easier to enter data on the web, a great thing. I asked the question this morning, “who enters the most data on the internet?”. The answer is spammers. It is generally thought that 90% of all e-mail sent is spam, and a quick glance at my blog’s spam counter sees 7,300 fake comments caught compared to 56 real comments.
So, why will HTML 5 forms be such a problem? Well, at the moment, spammers use automated tools to crawl the internet, looking for forms to fill in to spread their advertising links or perform XSS attacks. To bypass most validation, the crawlers look for labeled form fields to fill in. Quite simply, HTML 5 forms will make this job easier.
Instead of labelling forms with “e-mail”, there’s now a specific input type <input type=”email”> which validate an e-mail address. Common anti-spam methods of adding a second e-mail field hidden to normal users will be ignored as there is a clear (and CSS visible) e-mail address field.
Forms validation may be useful for the normal user, but it’s even more useful for the spammer. With limits of input fields now being contained in plain text in the input, it makes it trivial for bots to enter correct data.
So, what can be done about this? Well, I’m not sure. There are some anti-spam methods that will still work, for instance timing the entrance to the page and seeing how long it took to complete the form. Very short times are spam, short times are sent for moderation and normal times are approved. There’s captcha, which is inaccessible and then there’s blacklisting, which hasn’t worked for years.
If you have any theories, please share them here. If there’s a solution or something the working group can do to make spam more difficult rather than easier, it should get into the spec sooner, rather than later.