Extracting Translatable Strings From Zend_Validate

Recently I started working with Zend Framework’s Zend_Form and its integrated automatic translation functionality using Zend_Translate. In general, the functionality is really great, but if you have ever tried to translate all of the possible validation error messages, you know that finding them can be a long manual task. Naturally, I brought this up on #zftalk, and the consensus seemed to be that the translatable messages could be extracted using a little PHP and ReflectionClass magic.

ReflectionClass to the Rescue

The following is the script that I came up with, which pulls translatable error messages along with available message variables and generates a data structure based on the class name of the validators. Here is the script in its entirety, with some short comments describing how it works:

$zend_library = "/my/php/library";
set_include_path($zend_library);

// Create a new RecursiveDirectoryIterator for the Zend_Validate
// directory and recursively iterate over the files.

$messages = array();
$rdi = new RecursiveDirectoryIterator($zend_library . '/Zend/Validate');
foreach (new RecursiveIteratorIterator($rdi) as $path) {

    // Read in the file and generate the expected class name by
    // converting the portion of the path starting with Zend/Validate
    // and replacing '/' with '_'. For example, this converts the path
    // Zend/Validate/Regex.php to Zend_Validate_Regex.

    require_once($path);
    $class = strtr(preg_replace('#.*(Zend/Validate/.*)\.php$#', '\1', $path), '/', '_');

    if (class_exists($class)) {

        // If the expected class exists, create a ReflectionClass
        // instance and fetch the properties of the class. Make
        // sure that the property '_messageTemplates' exists, as
        // this indicates that we are dealing with an actual concrete
        // Zend_Validate class.

        $reflection = new ReflectionClass($class);
        if ($reflection->hasProperty('_messageTemplates')) {

            // Get the default properties as an associative array and
            // add the 'value' key to the _messageVariables property
            // since by default, 'value' is always available for substitution
            // into a message template.

            $props = $reflection->getDefaultProperties();
            $props['_messageVariables']['value'] = true;

            // Add an element to the $messages array, keyed on
            // the name of the current class. The element contains
            // an array of translatable templates, and an array of
            // valid message variables that can be substituted into
            // those templates.

            $messages[$class] = array(
                'templates' => $props['_messageTemplates']
              , 'variables' => array_keys($props['_messageVariables'])
            );
        }
    }
}

Printing out an element of the $messages array shows the structure that is now available to work with, which brings us to the next step… what can we do with this data?

print_r($messages['Zend_Validate_Regex']);

Array
(
    [templates] => Array
        (
            [regexNotMatch] => '%value%' does not match against pattern '%pattern%'
        )
    [variables] => Array
        (
            [0] => pattern
            [1] => value
        )
)

Processing the Output

Now we have the data that we need to start providing translations, what can be done with it? I am using the Zend_Translate_Tmx translation adapter, so I iterate over the data and write out a single TMX 1.4 format file for all of my validation error translations (including some notes regarding the source of the translatable string as well as the possible variables that can be used). The following code illustrates how this is done:

$format = '
        <tu tuid="%s">
            <note>Validation message template for class %s.</note>
            <note>Possible variables include %s</note>
            <tuv xml:lang="en">
                <seg>%s</seg>
            </tuv>
        </tu>
';

echo <<<XML
<?xml version="1.0" ?>
<!DOCTYPE tmx SYSTEM "http://www.lisa.org/fileadmin/standards/tmx1.4/tmx14.dtd.txt">
<tmx version="1.4">
    <header creationtool="Reflection" creationtoolversion="1.0" datatype="winres" segtype="sentence" adminlang="en-us" srclang="en-us" o-tmf="txt">
    </header>
    <body>
XML;

foreach ($messages as $class_name => $data) {
    $variables = join(', ', $data['variables']);
    foreach ($data['templates'] as $key => $template) {
        printf($format, $key, $class_name, $variables, $template);
    }
}

echo "    </body>\n"
  .  "</tmx>\n";

The collected data can of course also be formatted in whatever way is required by the type of Zend_Validate_Adapter that you are using.

Hopefully this will save you some time in your translation efforts. One thing to watch out for is that several validators re-use messageTemplate identifiers and so cannot be translated (see bug ZF-3164 for more info).

Chris Abernethy
PHP Wrangler, MySQL DBA, Linux SysAdmin and all around computer guy, developing LAMP applications since Slackware came on 10 floppy disks.

Got something to say? Go for it!