Sanitizing file names

Gentics Content.Node offers central translation table that allows you to configure sanitization of file names.

Chapters

  1. Overview

1 Overview

Gentics Content.Node will use the $SANITIZE_CHARACTER array from the node.conf file to transform special characters from filenames and folder paths. This configuration is also used with the Aloha Editor headerids plugin to generate header ids from text contents.

The purpose of this feature is to transform special characters to characters that are allowed in a filename in a meaningful way.

For example: the character “è” is used in many languages, so it makes sense to replace it with “e” in the filename, because it would be lost otherwise. The default settings for sanitizing characters are:


		'é' => 'e',
		'è' => 'e',
		'ë' => 'e',
		'ê' => 'e',
		'à' => 'a',
		'ä' => 'ae',
		'â' => 'a',
		'Ä' => 'Ae',
		'ù' => 'u',
		'ü' => 'ue',
		'û' => 'u',
		'Ü' => 'Ue',
		'ö' => 'oe',
		'ô' => 'o',
		'Ö' => 'Oe',
		'ï' => 'i',
		'î' => 'i',
		'ß' => 'ss',
		' ' => '_'

This will transform strings as follows:


	"äöï 23.jpg" => "aeoei_23.jpg"
	"ia 23$%.html" => "ia_23__.html"

You can redefined the pre-defined set of replacements or just add new ones in “node.conf” file like this:

/Node/etc/node.conf

$SANITIZE_CHARACTER["ï"] = "i";
$SANITIZE_CHARACTER["ä"] = "ae";

Do not replace any character by “/” or “\”, since those are separators for path names.

Do not replace any character allowed by the following regular expression: /[^\s\w\.\-\(\)\[\]{}\$]/. Those characters are allowed in the filename by default. The characters are alphanumeric characters including “_” and all of these characters: “.-()[]{}$/”.

This only works with UTF-8 and all replacement characters should contain only alphanumeric characters or underscore.

Replacements to non-alphanumeric characters are not supported:

/Node/etc/node.conf

$SANITIZE_CHARACTER["a"] = "ä"; // ä will be replaced by _
$SANITIZE_CHARACTER["e"] = "ë"; // ë will be replaced by _