Externalized PHP Strings Performance for Internationalization - SQLITE Anyone?
October 26, 2005
I just got started down the road of interationalizing our core enterprise product and I have to admit, I'm scared shitless. :)
After getting training for 4 days on everything you must take into account its staggering how much work needs to be done to internationalize a product. Things that need to be taken into consideration:
which would echo out "Welcome, JimmyP" would be offensive in Japanese (Japanese readers expect the subject of this sentence to be at the start, not the end as well as not have -san at the end) and there is no way to externalize that so the proper way would be to do something like
where your string would be "Welcome, %USERNAME%" which can be sent to a translator and understood that this is a welcome message with a user name Variable.
So what is the best way to externalize your strings? I did some initial performance testing and this is what I've come up with so far:
1. INI FILE - store all your strings in a .ini file such as welcomeMsg = "Hi There" which can be parsed with parse_ini_file()
2. Native PHP Arrays - store your strings in a .php file using $msg['welcomeMsg'] = "Hi There"; which can be just included as a native php array
3. XML Format - store your strings as <string code="welcomeMsg">Hi There</string> and parsed using php5's simpleXML
4. SQL LITE - storing your strings in a SQL lite DB per language and having a function wrapper for queries
the winner???
well in a test of 50,000, 10,000 and 1,000 string files #4 SQL LITE was FIVE times as fast as SimpleXML (2nd place) and something like 20 Times faster than parsing INI files.
here are the #'s
== Testing Methods ==
Testing was done adding additional text on every 4th iteration to add more realistic string data contents such as 400 character strings every 4th element to mimic filesize and parsing time.
=== Native PHP Arrays ===
This test was done using a file containing php's native arrays
<BR>
Example: $msg['key'] = "value";
<pre>
50,000 strings
.69
.68
.64
.59
.64
.60
10,000 strings
.23
.23
.23
.21
.22
=== XML File Format ===
Tested using an xml file that is parsed with php5's simplexml's methods for accessing strings
<BR>
Example: <string key="key">value</value>
<BR>
50,000 strings
.32 sec
.34 sec
.37 sec
.41 sec
.38 sec
10,000 strings
.14 sec
.15 sec
.15 sec
.07 sec
.15 sec
1,000 strings
.01 sec
.01 sec
.01 sec
.01 sec
.01 sec
=== INI FILE TYPE ===
Parsing .ini files using the parse_ini_file method in PHP
[php]
key0 ="valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"
key1 ="valueofstring1"
key2 ="valueofstring2"
key3 ="valueofstring3"
[/php
50,000 strings
.68
.65
.63
.60
.65
10,000 strings
.21
.22
.22
.18
.21
1,000 strings
.02
.02
.02
.02
.02
SQL LITE--------------
Got a little tip from Wez Furlong about SQLITE in PHP5 and boy was he right, using SQLITE on 50,000 strings, picking out the strings you needed took a teeeeeny tiny .07 seconds!
After getting training for 4 days on everything you must take into account its staggering how much work needs to be done to internationalize a product. Things that need to be taken into consideration:
- Table Column length - if your tables are formatted for english only you're going to have a problem when german comes around and needs to expand by 40%. Is your app ready to have all the text in the gui expanded by at least 40%?
- Text in images - A no no if you want to have localization turned around in a reasonable amount of time. Imaging redoing all your graphics in every language you want to support? yikes. So replace those text in images with just text and maybe a nice background image.
- String Matching - You cannot do anything with strings anymore :) no more strstr, str_replace, preg_match's without unicode character support. How do you know what strings will be coming in? String length? Some japanese characters have 3 bytes in a character, strlen will just return the number 3 however its only one character. For this you'll need MBSTRING extension since PHP does not support multibyte chars natively (PHP6 will in the next couple of YEARS).
- Inline CSS/Hardcoded Markup - have bold tags all over? lose em. Bold in Japanese does nothing but leave the text unreadable, Have small fonts? You'll need to at least support 10pt fonts and 13pt line height to allow for japanese/chinese height expansion. You can externalize your bold tags using a span with a css class called bold <span class="bold"> Also all fonts must be externalized to a localizable CSS file because ARIAL or VERDANA fonts do not support japanese or chinese characters
- Unfair claims - in Germany for example its illegal to claim a product is "the best" or "the fastest" IE "My script is the fastest PHP script of its kind"
- Externalize those strings! - No hardcoded strings can be found in your HTML or PHP code, they must be replaced from a strings file that can be sent out to a translator.
- Security Questions - not everyone has a favorite basketball team, or social security #, or a favorite football team so think about common things in all countries if you're going to ask someone a question to remember their password
- But they're just pictures! - The seemingly innocent thumbs-up gesture is offensive in Australia, a picture of your hand making a "hitchhiking" sign will get your ass kicked in Nigeria, showing your hand like a stop sign is offensive in greece on the same scale as giving someone the finger.
- Concatination - root of all evil. A Simple String like
which would echo out "Welcome, JimmyP" would be offensive in Japanese (Japanese readers expect the subject of this sentence to be at the start, not the end as well as not have -san at the end) and there is no way to externalize that so the proper way would be to do something like
$string = getString("welcomeUser");
where your string would be "Welcome, %USERNAME%" which can be sent to a translator and understood that this is a welcome message with a user name Variable.
So what is the best way to externalize your strings? I did some initial performance testing and this is what I've come up with so far:
1. INI FILE - store all your strings in a .ini file such as welcomeMsg = "Hi There" which can be parsed with parse_ini_file()
2. Native PHP Arrays - store your strings in a .php file using $msg['welcomeMsg'] = "Hi There"; which can be just included as a native php array
3. XML Format - store your strings as <string code="welcomeMsg">Hi There</string> and parsed using php5's simpleXML
4. SQL LITE - storing your strings in a SQL lite DB per language and having a function wrapper for queries
the winner???
well in a test of 50,000, 10,000 and 1,000 string files #4 SQL LITE was FIVE times as fast as SimpleXML (2nd place) and something like 20 Times faster than parsing INI files.
here are the #'s
== Testing Methods ==
Testing was done adding additional text on every 4th iteration to add more realistic string data contents such as 400 character strings every 4th element to mimic filesize and parsing time.
=== Native PHP Arrays ===
This test was done using a file containing php's native arrays
<BR>
Example: $msg['key'] = "value";
<pre>
<?PHP $msg['keyname0'] = "valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"; $msg['keyname1'] = "valueofstring1"; $msg['keyname2'] = "valueofstring2"; $msg['keyname3'] = "valueofstring3"; ?>
50,000 strings
.69
.68
.64
.59
.64
.60
10,000 strings
.23
.23
.23
.21
.22
=== XML File Format ===
Tested using an xml file that is parsed with php5's simplexml's methods for accessing strings
<BR>
Example: <string key="key">value</value>
<?xml version="1.0" encoding="UTF-8" ?> - <strings> <string key="key0">valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength</string> <string key="key1">valueofstring1</string> <string key="key2">valueofstring2</string> <string key="key3">valueofstring3</string> <string> </strings>
<BR>
50,000 strings
.32 sec
.34 sec
.37 sec
.41 sec
.38 sec
10,000 strings
.14 sec
.15 sec
.15 sec
.07 sec
.15 sec
1,000 strings
.01 sec
.01 sec
.01 sec
.01 sec
.01 sec
=== INI FILE TYPE ===
Parsing .ini files using the parse_ini_file method in PHP
[php]
key0 ="valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"
key1 ="valueofstring1"
key2 ="valueofstring2"
key3 ="valueofstring3"
[/php
50,000 strings
.68
.65
.63
.60
.65
10,000 strings
.21
.22
.22
.18
.21
1,000 strings
.02
.02
.02
.02
.02
SQL LITE--------------
Got a little tip from Wez Furlong about SQLITE in PHP5 and boy was he right, using SQLITE on 50,000 strings, picking out the strings you needed took a teeeeeny tiny .07 seconds!
Anonymous says:
February 13, 2006 @ 15:22 — Reply
There are some custom scripts you may use for UTF-8 encoding. See http://cms.naczasie.pl/lgpl/UTF-8.php, find DOWNLOAD and click :)