URL Rewriting

Out of the box Jahia adds support for the improved SEO and user experience through various way of URL rewriting. This includes the following features:

  • URL mapping - allows providing mappings (user-friendly URLs, vanity URLs etc.) for frequently used or so called common resources, e.g. the last news page, company's profile, contact page, special offers etc. It also eases a web site or content migration activities, e.g. in case an old resource is replaced with the new one, but the URL locations should remain the same for preventing "broken links" for bookmarks and search engines.
  • SEO URL rewriting - implies URL "shortening" by removing "technical" information (deactivated by default).
  • Controlling browser caching - through the URL rewrite engine rules Jahia also "controls" the client side (Web browser and proxy server) caching of the pages and static assets.
  • Forcing secure browsing for logged in users -

This section provides an overview of aforementioned features and possible use cases.

As an introduction a short description of the URL format definition in Jahia and rewriting rules scope is given in the next sections.

When the URL rewriting rules are referred to, the UrlRewriteFilter-based rule definitions are meant here.

The lookup for rules is done in the following locations (using Ant-style patterns):

  • WEB-INF/etc/config/urlrewrite.xml - core Jahia rules
  • WEB-INF/etc/config/urlrewrite-*.xml
  • modules/**/META-INF/urlrewrite*.xml

And for SEO-related rules in:

  • WEB-INF/etc/config/seo-urlrewrite*.xml
  • modules/**/META-INF/seo-urlrewrite*.xml

This means that the URl rewriting rules can be also defined in custom modules.

URL format

The following is a "full" URL of a resource in Jahia, omitting the protocol, port and context path:

/cms/<rendering-mode>/<workspace>/<language>/<path>[.<template>].<template-type>

Where possible values for <rendering-mode> are:

  • render - preview mode
  • contribute - simplified editing mode
  • edit - full editing mode
  • studio - content studio mode

and <workspace> can be one of:

  • default - staged content
  • live - published content or content, created in the live mode directly

Some examples::

  • /cms/edit/default/en/sites/ACME/home/about-us.html
  • /cms/render/live/en/users/root.user-content.html - user-content is a rendering template here.

Rewriting scope

It is important to understand the scope the different URL mappings or URL rewriting is applied to.

The one exception to all rules is the case where the server is accessed using a localhost domain names. This mode is targeted for development or quick prototyping phases and in such cases neither URL mappings nor SEO URL rewrite rules are applied to the URLs and all links remain unchanged.

The following table summarizes the applicable conditions for URL rules, in case a domain name other than localhost or an IP address is used:

Rules/Mode Studio Edit Contribute Render/Preview Live
URL mappings
(vanity URLs)
no no no yes yes
SEO URL rules no no no no yes

Thus, in the Studio, Edit and Contribute modes the URL mappings and URL rewriting are not applied to make navigation and editing more consistent and predictable.

URL mapping

URL mappings are designed to provide language-dependent (if the site has multiple languages) custom URL (URL part) for a content resource.

Consider the following case, a HR department publishes new "Internship" page and would like to promote its URL in a more user-friendly way.

The internal page URL is e.g.:

http://127.0.0.1:8080/cms/render/live/en/sites/ACME/home/about-us/jobs/internship.html  

A HR manager adds the following URL mapping for the newly created internship page in English to be /internship and for the German (company site is multilingual with English and German versions) the /praktikum:

So the URLs for English and German versions of that page get "shortened" to:

http://127.0.0.1:8080/cms/render/live/internaship

http://127.0.0.1:8080/cms/render/live/praktikum

i.e. the language and path of the page are replaced with the custom URL - /internaship, that was defined.

The rest of the URL - the "technical" part - can be automatically removed by activating SEO URL rewriting as described further in this chapter.

Another use case could be in case of a content migration from a previous version of the site platform, say using an ASP.NET CMS system, and keeping important links still working and not leading to 404 - "Page not found" errors.

The old page was accessible under e.g. "about.aspx" name. Creating for a new About Us page the following mappings:

Will still allow the old URL to /about.aspx to work, but a client-side redirect will be sent to the browser to /about, as this mapping is declared as the default one.

This follows the "single URL for resource principle" for the improved SEO. Nevertheless this redirection can be deactivated in WEB-INF/etc/config/jahia.properties by setting the following flag to false:

# This parameter will control, that if vanity URLs exists for a node and if
# it has been accessed with a non-default vanity URL, we inform the client that
# the resource has permanently moved (HTTP status code 301)
permanentMoveForVanityURL                              = false

SEO URL rewriting

Jahia provides ready-to-use set of URl rewriting rules, located in the WEB-INF/etc/config/seo-urlrewrite.xml file, which are targeted to remove the "technical" part from the URLs to Jahia resources when rendering in Live mode.

In order to enable SEO URL rewriting the following flag should be set to "true" in the WEB-INF/etc/config/jahia.properties:

# Enable this option to make URL rewrite engine "shorten" the URL to
# content in live mode, .e.g.:
#     http://my.acme.org/cms/render/live/en/site/myAcme/home.html
# can become:
#     http://my.acme.org/cms/home.html
urlRewriteSeoRulesEnabled                              = true

Note that a flush of Jahia output caches is required after changing the value of this flag.

Activating this feature allows in most of the cases to "reduce", for example, the following URL:

http://my.acme.org/cms/render/live/en/site/global/home.html

to:

http://my.acme.org/cms/home.html

assuming that the global site has the my.acme.org defined as a domain name in site properties and English is the default language of the site.

For non-default site-language, the language remains in the URL after rewriting. E.g. for a German home page of the same site the URL will be:

http://my.acme.org/cms/de/home.html

There are multiple improvement change requests submitted for this feature (one of them for removing /cms, which can be achieved in most of the cases), based on our internal usability tests and early Jahia 6.5 BETA program feedback, which are scheduled to be addressed in the near future.

Controlling browser caching

The client side cache for pages and static assets (CSS, JavaScript etc.) is controlled using URL rewrite engine rules, defined in the WEB-INF/etc/config/urlrewrite.xml file:

    <!-- Client-side caching -->
    <rule>
        <name>Set not cache headers</name>
        <note>Resources that should not be cached</note>
        <from>^/(welcome.*|start|validateTicket|administration.*|cms/.*|engines/.*\.jsp(\?.*)?|tools/.*|gwt/org\.(.*)/.*\.nocache\..*)$</from>
        <set type="response-header" name="Expires">Wed, 09 May 1979 05:30:00 GMT</set> 
        <set type="response-header" name="Cache-Control">no-cache, no-store, must-revalidate, proxy-revalidate, max-age=0</set> 
        <set type="response-header" name="Pragma">no-cache</set>
        <set type="request" name="jahiaCacheControlSet">done</set>         
    </rule>
    <rule>
        <name>JCR files</name>
        <note>Do not set any cache expiration for JCR files</note>
        <from>^/(files|repository)/.*$</from>
        <set type="response-header" name="Cache-Control">public, must-revalidate, max-age=1</set> 
        <set type="request" name="jahiaCacheControlSet">done</set>         
    </rule>
    <rule>
        <name>Set cache expires headers</name>
        <note>Cache all other resources by default</note>
        <condition type="header" name="Expires" operator="equal">^$</condition>
        <condition type="attribute" name="jahiaCacheControlSet" operator="equal">^$</condition>
        <set type="expires">1 month</set>
        <set type="response-header" name="Cache-Control">public, max-age=2678400</set> 
    </rule>
    <!-- end of client-side caching -->

Those rules can be adjusted to achieve fine-grained tuning of the caching strategy.

Forcing secure browsing for logged in users

There is an option for "forcing" a switch to an SSL (HTTPS protocol) for a user session, from login to logout. This allows sites with higher security concerns to force secured connections for logged in users.

For details see Enabling SSL for logged in users.