Thursday, July 12, 2012

How to implement i18n without performance overhead

i18n is always difficult to implement and costs a lot performance. Normally, implementations use gettext() or a custom t()-function to translate a string. t() searches a INI or XML file for a translation key and returns the value. For example t('setting', 'de'), gives the German translation 'Einstellung'.

Typical optimizations use associative arrays (hashmaps) loaded into APC or Memcached. This requires a lot of memory for the array and produces a lot of cpu cycles for calling t() all the time. So the question is, can we do this better?

Yes! We use a just-in-time compiler for our PHP files and write the compiled PHP files to disk, so APC can cache them like regular PHP files.

An example PHP class looks like this:
// example.php
<?php

class example {
  public function now() {
    return '{t}Hello World, now it is:{/t} '.date('{t}m/d/Y g:i a{/t}');
  }
}
The "{t}" and "{/t}" patterns serve as opening and closing tags indicating strings to be translated.

The translation files look like this:
// lang/en.ini (empty)

// lang/de.ini
Hello World, now it is: = Hallo Welt, jetzt ist es:
m/d/Y g:i a = d/m/Y H:i
Each language has one translation file (lang/<country-code>.ini). Each translation item is written into one line. The first element in the line is the English string, followed by " = " and the localized string.

Now we need a proxy to translate the PHP class before including it:
// instead of
require("core/example.php");
echo (new example())->now();

// we write
define('LANG', 'en');
require(translate('core/example.php'));
echo (new example())->now();

// input: example.php
// output: cache/<lang>_example.php_<timestamp>.php
function translate($file) {
  $cache_file = 'cache/'.LANG.'_'.basename($file).'_'.filemtime($file).'.php';
  // (re)build translation?
  if (!file_exists($cache_file)) {
    $lang_file = 'lang/'.LANG.'.ini';
    $lang_file_php = 'cache/'.LANG.'_'.filemtime($lang_file).'.php';
  
    // convert .ini file into .php file
    if (!file_exists($lang_file_php)) {
      file_put_contents($lang_file_php, '<?php $strings='.
        var_export(parse_ini_file($lang_file), true).';', LOCK_EX);
    }
    // translate .php into localized .php file
    $tr = function($match) use (&$lang_file_php) {
      static $strings = null;
      if ($strings===null) require($lang_file_php);
      return isset($strings[ $match[1] ]) ? $strings[ $match[1] ] : $match[1];
    };
    // replace all {t}abc{/t} by tr()
    file_put_contents($cache_file, preg_replace_callback(
      '!\{t\}([^\{]+)\{/t\}!', $tr, file_get_contents($file)), LOCK_EX);
  }
  return $cache_file;
}
Before including example.php, we check if a translated version is available or build a new one. The same happens if example.php is being changed. The build takes the translation file (.ini) and converts it to a (.php) file. Then example.php gets translated with the translation file. The output is stored in cache/.

To make things even faster, we can skip file_exists() and filemtime() by using a small static compiler instead of the just-in-time compiler:
// compiler.php
static $langs = array('en', 'de');
static $files = array('core/example.php');

foreach ($langs as $lang) {
  // load translations
  $strings = parse_ini_file('lang/'.$lang.'.ini');

  foreach ($files as $file) {
    // translate .php into localized .php file
    $tr = function($match) use (&$lang, &$strings) {
      return isset($strings[ $match[1] ]) ? $strings[ $match[1] ] : $match[1];
    };
    // replace all {t}abc{/t} by tr()
    file_put_contents('cache/'.$lang.'_'.basename($file), preg_replace_callback(
      '!\{t\}([^\{]+)\{/t\}!', $tr, file_get_contents($file)), LOCK_EX);
  }
}

// index.php
define('LANG', 'en');
require('cache/'.LANG.'_example.php');
echo (new example())->now();

Sometimes, strings need to be translated and combined with other values on different positions, depending on the language. e.g. "10 EUR" and "USD 10". This can be also done easily by using sprintf():
// PHP
$str = sprintf('{t}USD %d{/t}', 10);

// lang/de.ini
USD %d = %d EUR

By using a compiler for translations, we can make i18n a lot easier and faster!

3 comments:

  1. Hello! To better translate your strings, I suggest to have a look at https://poeditor.com/ which is an online localization tool with a simple and user-friendly interface. There is also a WordPress plugin available, if needed, https://wordpress.org/plugins/poeditor/ to easily manage your translations. See if it works.

    ReplyDelete
  2. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..
    Android App Development Company

    ReplyDelete
  3. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...Thank you very much for this one.
    web design Company
    web development Company
    web design Company in chennai
    web development Company in chennai
    web design Company in India
    web development Company in India

    ReplyDelete

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)