Fighting spam with our Jahia Akismet integration module

As you may know, spammers are constantly finding new ways to pollute web sites with links in order to try to optimize their search ranking. Even technical solutions such as CAPTCHAs are no longer sufficient since these are designed to catch automated scripts, while spammers now resort to actual humans to post spam.

One of our engineers at Jahia actually designed a module for our CMS that integrates with the commercial anti-spam Akismet service. Despite the fact that this module is not part of our packaged product, we have been successfully using it in product on our Jahia website for almost a year, and we thought it might be interesting to others so we will present it in detail here.

The Akismet service provides a REST API to submit form submissions to their service for analysis, and immediately responds by indicating if the messages is genuine or if it is a spam. The nice thing about the Jahia module integration is that this was done using our built-in rule engine, meaning that it makes it very easy to use with any content that is inserted into the content repository, whatever the user interface that was used (it could be the default web interface, a native mobile application, a REST API call, or anything else).

Here is the rule itself (from this file):

rule "A post is created - apply spam filtering"
   when
           A new node is created
           - it has the type jnt:post
   then
       Check content of the node for spam
end

As you can see this rule triggers for all new content nodes of type “jnt:post” that are created in the system. In Jahia’s out of the box module this can be : a forum post, an article or a blog post comment. If you write your own modules that also use the jnt:post type this rule will also automatically apply to them.

When the rule is executed, it submits the content of the node to the Akismet service, and if it is marked as spam, it will then simply add a mixin type called “jmix:spamFilteringSpamDetected” to the new content node. This mixin type will then clearly identify this content as spam, and this can be used to render a different interface, or perform queries to retrieve all spam nodes and delete them, or to send notifications to system administrators that spam has been inserted on the site.

For example, on our jahia.com website, if spam is detected it will not be visible by non authenticated users, but moderators will see something like this:

It is then easy to notice and delete it if it is indeed spam, or use the “Not a spam” button to remove the mixin and the content will then revert to a normal object.

So how can you install and use this module on your own Jahia system ? Well first here are a few requirements:

- Jahia 6.6.1.0 or later

- Having subscribed to the Akismet service and have a valid API key

- Your website must have internet access to at least the Akismet.com website

- Git & Java SDK & Maven installed on your system for checking out the source code and compiling it.

Installation steps :

  1. Checkout the project from the github repository : https://github.com/Jahia/jahia-spam-filtering
    (for example using : git clone https://github.com/Jahia/jahia-spam-filtering or one of the Github Windows or Mac clients)

  2. Compile using : mvn clean install

  3. Copy the target/spam-filtering-1.0-SNAPSHOT.war to your Jahia’s tomcat/webapps/ROOT/WEB-INF/var/shared_modules directory

  4. Start Jahia

  5. Open the file tomcat/webapps/ROOT/modules/spam-filtering/META-INF/spring/mod-spam-filtering.xml

  6. Replace the following part:
    <property name="apiKey">
     <bean class="org.jahia.utils.EncryptionUtils$EncryptedPasswordFactoryBean">
       <property name="password" value="kaP5+N/UcZ66cQK937+4EZPncOHYh8xa" />
     </bean>
    </property>
    with:
    <property name="apiKey" value=”YOUR_AKISMET_API_KEY”>
    where YOUR_AKISMET_API_KEY is the key the Akismet service sent you when you subscribe to it.

  7. Restart Jahia

  8. Try to insert a forum post with subject : “spam test” and no message, it should then be marked as spam.

As you can see, detecting and handling spam in a Jahia CMS installation is relatively straightforward, and I’m sure you will be happy to have spent the time setting this up.

 

Author : Serge Huber

Serge Huber is the Chief Technology Officer (CTO) at Jahia, and Co-Founder of Jahia Solutions Group SA as well as the Jahia project before the creation of the group. With more than 15 years’ experience in developing web content management (WCM) and content management system (CMS) solutions in various technologies, his history includes building high-visibility, mission-critical applications for organizations such as the French government, the Swiss Federal Institute of Technology of Lausanne and Garmin. He now oversees the future development and evolution of Jahia’s software and manages the interaction with open source communities such as the Apache Foundation, where he is a committer for the Apache Jackrabbit Project. In his spare time, he enjoys experimenting with innovative technology - the kind that is mind-blowing and future-changing.