Externalized PHP Strings Performance for Internationalization - SQLITE Anyone?

October 26, 2005

I just got started down the road of interationalizing our core enterprise product and I have to admit, I'm scared shitless. :)
After getting training for 4 days on everything you must take into account its staggering how much work needs to be done to internationalize a product. Things that need to be taken into consideration:

  • Table Column length - if your tables are formatted for english only you're going to have a problem when german comes around and needs to expand by 40%. Is your app ready to have all the text in the gui expanded by at least 40%?

  • Text in images - A no no if you want to have localization turned around in a reasonable amount of time. Imaging redoing all your graphics in every language you want to support? yikes. So replace those text in images with just text and maybe a nice background image.

  • String Matching - You cannot do anything with strings anymore :) no more strstr, str_replace, preg_match's without unicode character support. How do you know what strings will be coming in? String length? Some japanese characters have 3 bytes in a character, strlen will just return the number 3 however its only one character. For this you'll need MBSTRING extension since PHP does not support multibyte chars natively (PHP6 will in the next couple of YEARS).

  • Inline CSS/Hardcoded Markup - have bold tags all over? lose em. Bold in Japanese does nothing but leave the text unreadable, Have small fonts? You'll need to at least support 10pt fonts and 13pt line height to allow for japanese/chinese height expansion. You can externalize your bold tags using a span with a css class called bold <span class="bold"> Also all fonts must be externalized to a localizable CSS file because ARIAL or VERDANA fonts do not support japanese or chinese characters

  • Unfair claims - in Germany for example its illegal to claim a product is "the best" or "the fastest" IE "My script is the fastest PHP script of its kind"

  • Externalize those strings! - No hardcoded strings can be found in your HTML or PHP code, they must be replaced from a strings file that can be sent out to a translator.

  • Security Questions - not everyone has a favorite basketball team, or social security #, or a favorite football team so think about common things in all countries if you're going to ask someone a question to remember their password

  • But they're just pictures! - The seemingly innocent thumbs-up gesture is offensive in Australia, a picture of your hand making a "hitchhiking" sign will get your ass kicked in Nigeria, showing your hand like a stop sign is offensive in greece on the same scale as giving someone the finger.

  • Concatination - root of all evil. A Simple String like
    1.  
    2. echo getString("welcome") . ', '.$user;

    which would echo out "Welcome, JimmyP" would be offensive in Japanese (Japanese readers expect the subject of this sentence to be at the start, not the end as well as not have -san at the end) and there is no way to externalize that so the proper way would be to do something like
    1.  
    2. $string = getString("welcomeUser");
    3. $value = str_replace("%USERNAME%", $user, $string);
    4. echo getString("welcomeUser");

    where your string would be "Welcome, %USERNAME%" which can be sent to a translator and understood that this is a welcome message with a user name Variable.


So what is the best way to externalize your strings? I did some initial performance testing and this is what I've come up with so far:

1. INI FILE - store all your strings in a .ini file such as welcomeMsg = "Hi There" which can be parsed with parse_ini_file()

2. Native PHP Arrays - store your strings in a .php file using $msg['welcomeMsg'] = "Hi There"; which can be just included as a native php array

3. XML Format - store your strings as <string code="welcomeMsg">Hi There</string> and parsed using php5's simpleXML

4. SQL LITE - storing your strings in a SQL lite DB per language and having a function wrapper for queries

the winner???
well in a test of 50,000, 10,000 and 1,000 string files #4 SQL LITE was FIVE times as fast as SimpleXML (2nd place) and something like 20 Times faster than parsing INI files.

here are the #'s

== Testing Methods ==

Testing was done adding additional text on every 4th iteration to add more realistic string data contents such as 400 character strings every 4th element to mimic filesize and parsing time.

=== Native PHP Arrays ===
This test was done using a file containing php's native arrays
<BR>
Example: $msg['key'] = "value";
<pre>
  1.  
  2. &lt;?PHP
  3. $msg['keyname0'] = "valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength";
  4. $msg['keyname1'] = "valueofstring1";
  5. $msg['keyname2'] = "valueofstring2";
  6. $msg['keyname3'] = "valueofstring3";
  7. ?&gt;


50,000 strings

.69
.68
.64
.59
.64
.60

10,000 strings

.23
.23
.23
.21
.22

=== XML File Format ===
Tested using an xml file that is parsed with php5's simplexml's methods for accessing strings

<BR>
Example: <string key="key">value</value>
  1.  
  2. &lt;?xml version="1.0" encoding="UTF-8" ?&gt;
  3. - &lt;strings&gt;
  4. &lt;string key="key0"&gt;valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength&lt;/string&gt;
  5. &lt;string key="key1"&gt;valueofstring1&lt;/string&gt;
  6. &lt;string key="key2"&gt;valueofstring2&lt;/string&gt;
  7. &lt;string key="key3"&gt;valueofstring3&lt;/string&gt;
  8. &lt;string&gt;
  9. &lt;/strings&gt;

<BR>
50,000 strings

.32 sec
.34 sec
.37 sec
.41 sec
.38 sec

10,000 strings

.14 sec
.15 sec
.15 sec
.07 sec
.15 sec

1,000 strings

.01 sec
.01 sec
.01 sec
.01 sec
.01 sec

=== INI FILE TYPE ===
Parsing .ini files using the parse_ini_file method in PHP
[php]
key0 ="valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"
key1 ="valueofstring1"
key2 ="valueofstring2"
key3 ="valueofstring3"
[/php

50,000 strings

.68
.65
.63
.60
.65

10,000 strings

.21
.22
.22
.18
.21

1,000 strings

.02
.02
.02
.02
.02

SQL LITE--------------
Got a little tip from Wez Furlong about SQLITE in PHP5 and boy was he right, using SQLITE on 50,000 strings, picking out the strings you needed took a teeeeeny tiny .07 seconds!

Comments

RSS feed for comments on this post.

  1. timvw says:
    November 3, 2005 @ 00:31 — Reply

    Have you had a look at http://www.php.net/gettext?

  2. Jim says:
    November 4, 2005 @ 01:06 — Reply

    yep, but we dismissed it after a few trials. It wasn't working for our needs so we needed a custom solution to work with our translation teams overseas.

  3. Max Barel says:
    July 23, 2006 @ 15:40 — Reply

    If this thread is still alive, can you elaborate on why gettext did't fill your needs? I plan to use it and don't want to miss the point.

  4. Amit Gupta says:
    November 6, 2005 @ 10:23 — Reply

    how about MySQL for storing Unicode characters? MySQL does support unicode chars using the unicode collation as well as several language specific collations!! PS:- which syntax hiliting plugin are you using? its broken in several places, if you just take a look, you'll know!!

  5. Elliot Anderson says:
    November 13, 2005 @ 08:08 — Reply

    "The seemingly innocent thumbs-up gesture is offensive in Australia" ... Im an Aussie and I have never heard that before.

  6. Shawn says:
    November 22, 2005 @ 08:33 — Reply

    great article. may just have to implement this in one of my big projects. '"The seemingly innocent thumbs-up gesture is offensive in Australia" ... Im an Aussie and I have never heard that before.' .. Ditto

  7. Jim says:
    November 22, 2005 @ 19:38 — Reply

    really you guys haven't heard of that one? maybe they got the country wrong then.

  8. Anonymous says:
    February 13, 2006 @ 15:22 — Reply

    There are some custom scripts you may use for UTF-8 encoding. See http://cms.naczasie.pl/lgpl/UTF-8.php, find DOWNLOAD and click :)

  9. Bruce Sauls says:
    May 28, 2010 @ 21:17 — Reply

    Comment pending moderation

  10. Ben says:
    April 14, 2006 @ 23:55 — Reply

    Hi, informative article, thanks. Just thought I'd point out what appears to be a mistake, for the benefit of future googlers. In your second code example, you have 'echo getString("welcomeUser");' but I think you mean 'echo $value;'.

  11. Mgccl says:
    June 19, 2007 @ 23:13 — Reply

    Just how can you access those strings in XML? I mean, once you convert the XML file into simpleXML objects, what would you do to find the key and return the string?

  12. speps says:
    July 1, 2007 @ 15:49 — Reply

    @Mgccl : see the PHP documentation for SimpleXMLElement->xpath() and maybe you will need to find doc about XPath too (google or wikipedia).

  13. Web developers says:
    August 21, 2009 @ 16:49 — Reply

    Humm... interesting, If this thread is still alive, can you elaborate on why gettext did't fill your needs? I plan to use it and don't want to miss the point. Thanks

  14. blu ray ripper says:
    April 18, 2010 @ 10:53 — Reply

    Comment pending moderation

  15. Hollywood Wallpapers says:
    April 19, 2010 @ 01:32 — Reply

    Comment pending moderation

  16. Aditya Yadav says:
    May 12, 2010 @ 06:14 — Reply

    Comment pending moderation

  17. 642-062 says:
    May 19, 2010 @ 07:52 — Reply

    Comment pending moderation

  18. shopping says:
    May 21, 2010 @ 12:17 — Reply

    Comment pending moderation

  19. HID lights says:
    May 23, 2010 @ 10:25 — Reply

    Comment pending moderation

  20. fat burning furnace scam says:
    June 1, 2010 @ 16:34 — Reply

    Comment pending moderation

  21. virbram five fingers says:
    June 5, 2010 @ 03:14 — Reply

    Comment pending moderation

  22. coach outlet says:
    June 7, 2010 @ 08:53 — Reply

    Comment pending moderation

  23. Billig Stromlieferant says:
    June 7, 2010 @ 12:27 — Reply

    Comment pending moderation

  24. links of london says:
    June 8, 2010 @ 01:36 — Reply

    Comment pending moderation

  25. MKV to iPad says:
    June 9, 2010 @ 07:50 — Reply

    Comment pending moderation

  26. LOuIs VuItToN Damier Graphite Canvas says:
    June 10, 2010 @ 05:16 — Reply

    Comment pending moderation

  27. Jany Robert says:
    June 11, 2010 @ 10:35 — Reply

    Comment pending moderation

  28. CT0-101 says:
    June 12, 2010 @ 09:30 — Reply

    Comment pending moderation

  29. laptop battery supplier says:
    June 17, 2010 @ 03:18 — Reply

    Comment pending moderation

  30. coach handbags says:
    June 17, 2010 @ 08:18 — Reply

    Comment pending moderation

  31. Jewelry Findings says:
    June 18, 2010 @ 09:01 — Reply

    Comment pending moderation

  32. jewellery earrings says:
    June 21, 2010 @ 09:45 — Reply

    Comment pending moderation

  33. Jerseys says:
    June 22, 2010 @ 01:10 — Reply

    Comment pending moderation

  34. nfl replica jerseys says:
    June 22, 2010 @ 02:01 — Reply

    Comment pending moderation

  35. acai berry says:
    June 23, 2010 @ 09:54 — Reply

    Comment pending moderation

  36. burberry bags says:
    June 23, 2010 @ 10:43 — Reply

    Comment pending moderation

  37. air jordan says:
    June 24, 2010 @ 08:22 — Reply

    Comment pending moderation

  38. air max says:
    June 24, 2010 @ 08:31 — Reply

    Comment pending moderation

Leave a Comment

Line and paragraph breaks automatic, HTML allowed: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <code> <em> <i> <strike> <strong>

Comments disabled due to spammers being losers that lead sad lives.