John Flinchbaugh Blog: Referrer Verification Implemented

I've written before about plans to verify referrers, and I've finally implemented it.

It works pretty much as I had described. If the referrer's already been added to the list, just increment it (and don't verify it again). This keeps me from pounding away on valid incoming referrers all day long. At this point, I do pound on incoming spam referrers, since they don't make it into the cache at all. I may end up implementing some sort of negative cache as well.

If the referrer is new:

Check it against the blacklist.
Call the referrer URL and search the content for a link to my site (which makes it a valid referrer).

To keep the normal user from feeling the latency introduced by the extra check, I've moved the whole referrer process into a message driven bean of its own.

Interestingly, I found Google forbids the "Java" User-agent and returned 403 Forbidden when I tried to verify a link. I had to change my User-agent header on the URLConnection. Google also blocked stock wget User-agent as well.

So that I can watch it work, the callback action logs the line of html which matched, or logs the whole page that didn't match. It's quite satisfying to watch it actually working. Maybe I'll be able to trim my blacklist to only the most obvious keywords, now that I won't really need to add to it nearly as frequently.