Externalized PHP Strings Performance for Internationalization - SQLITE Anyone?
October 26, 2005
I just got started down the road of interationalizing our core enterprise product and I have to admit, I'm scared shitless. :)
After getting training for 4 days on everything you must take into account its staggering how much work needs to be done to internationalize a product. Things that need to be taken into consideration:
which would echo out "Welcome, JimmyP" would be offensive in Japanese (Japanese readers expect the subject of this sentence to be at the start, not the end as well as not have -san at the end) and there is no way to externalize that so the proper way would be to do something like
where your string would be "Welcome, %USERNAME%" which can be sent to a translator and understood that this is a welcome message with a user name Variable.
So what is the best way to externalize your strings? I did some initial performance testing and this is what I've come up with so far:
1. INI FILE - store all your strings in a .ini file such as welcomeMsg = "Hi There" which can be parsed with parse_ini_file()
2. Native PHP Arrays - store your strings in a .php file using $msg['welcomeMsg'] = "Hi There"; which can be just included as a native php array
3. XML Format - store your strings as <string code="welcomeMsg">Hi There</string> and parsed using php5's simpleXML
4. SQL LITE - storing your strings in a SQL lite DB per language and having a function wrapper for queries
the winner???
well in a test of 50,000, 10,000 and 1,000 string files #4 SQL LITE was FIVE times as fast as SimpleXML (2nd place) and something like 20 Times faster than parsing INI files.
here are the #'s
== Testing Methods ==
Testing was done adding additional text on every 4th iteration to add more realistic string data contents such as 400 character strings every 4th element to mimic filesize and parsing time.
=== Native PHP Arrays ===
This test was done using a file containing php's native arrays
<BR>
Example: $msg['key'] = "value";
<pre>
50,000 strings
.69
.68
.64
.59
.64
.60
10,000 strings
.23
.23
.23
.21
.22
=== XML File Format ===
Tested using an xml file that is parsed with php5's simplexml's methods for accessing strings
<BR>
Example: <string key="key">value</value>
<BR>
50,000 strings
.32 sec
.34 sec
.37 sec
.41 sec
.38 sec
10,000 strings
.14 sec
.15 sec
.15 sec
.07 sec
.15 sec
1,000 strings
.01 sec
.01 sec
.01 sec
.01 sec
.01 sec
=== INI FILE TYPE ===
Parsing .ini files using the parse_ini_file method in PHP
[php]
key0 ="valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"
key1 ="valueofstring1"
key2 ="valueofstring2"
key3 ="valueofstring3"
[/php
50,000 strings
.68
.65
.63
.60
.65
10,000 strings
.21
.22
.22
.18
.21
1,000 strings
.02
.02
.02
.02
.02
SQL LITE--------------
Got a little tip from Wez Furlong about SQLITE in PHP5 and boy was he right, using SQLITE on 50,000 strings, picking out the strings you needed took a teeeeeny tiny .07 seconds!
After getting training for 4 days on everything you must take into account its staggering how much work needs to be done to internationalize a product. Things that need to be taken into consideration:
- Table Column length - if your tables are formatted for english only you're going to have a problem when german comes around and needs to expand by 40%. Is your app ready to have all the text in the gui expanded by at least 40%?
- Text in images - A no no if you want to have localization turned around in a reasonable amount of time. Imaging redoing all your graphics in every language you want to support? yikes. So replace those text in images with just text and maybe a nice background image.
- String Matching - You cannot do anything with strings anymore :) no more strstr, str_replace, preg_match's without unicode character support. How do you know what strings will be coming in? String length? Some japanese characters have 3 bytes in a character, strlen will just return the number 3 however its only one character. For this you'll need MBSTRING extension since PHP does not support multibyte chars natively (PHP6 will in the next couple of YEARS).
- Inline CSS/Hardcoded Markup - have bold tags all over? lose em. Bold in Japanese does nothing but leave the text unreadable, Have small fonts? You'll need to at least support 10pt fonts and 13pt line height to allow for japanese/chinese height expansion. You can externalize your bold tags using a span with a css class called bold <span class="bold"> Also all fonts must be externalized to a localizable CSS file because ARIAL or VERDANA fonts do not support japanese or chinese characters
- Unfair claims - in Germany for example its illegal to claim a product is "the best" or "the fastest" IE "My script is the fastest PHP script of its kind"
- Externalize those strings! - No hardcoded strings can be found in your HTML or PHP code, they must be replaced from a strings file that can be sent out to a translator.
- Security Questions - not everyone has a favorite basketball team, or social security #, or a favorite football team so think about common things in all countries if you're going to ask someone a question to remember their password
- But they're just pictures! - The seemingly innocent thumbs-up gesture is offensive in Australia, a picture of your hand making a "hitchhiking" sign will get your ass kicked in Nigeria, showing your hand like a stop sign is offensive in greece on the same scale as giving someone the finger.
- Concatination - root of all evil. A Simple String like
which would echo out "Welcome, JimmyP" would be offensive in Japanese (Japanese readers expect the subject of this sentence to be at the start, not the end as well as not have -san at the end) and there is no way to externalize that so the proper way would be to do something like
$string = getString("welcomeUser");
where your string would be "Welcome, %USERNAME%" which can be sent to a translator and understood that this is a welcome message with a user name Variable.
So what is the best way to externalize your strings? I did some initial performance testing and this is what I've come up with so far:
1. INI FILE - store all your strings in a .ini file such as welcomeMsg = "Hi There" which can be parsed with parse_ini_file()
2. Native PHP Arrays - store your strings in a .php file using $msg['welcomeMsg'] = "Hi There"; which can be just included as a native php array
3. XML Format - store your strings as <string code="welcomeMsg">Hi There</string> and parsed using php5's simpleXML
4. SQL LITE - storing your strings in a SQL lite DB per language and having a function wrapper for queries
the winner???
well in a test of 50,000, 10,000 and 1,000 string files #4 SQL LITE was FIVE times as fast as SimpleXML (2nd place) and something like 20 Times faster than parsing INI files.
here are the #'s
== Testing Methods ==
Testing was done adding additional text on every 4th iteration to add more realistic string data contents such as 400 character strings every 4th element to mimic filesize and parsing time.
=== Native PHP Arrays ===
This test was done using a file containing php's native arrays
<BR>
Example: $msg['key'] = "value";
<pre>
<?PHP $msg['keyname0'] = "valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"; $msg['keyname1'] = "valueofstring1"; $msg['keyname2'] = "valueofstring2"; $msg['keyname3'] = "valueofstring3"; ?>
50,000 strings
.69
.68
.64
.59
.64
.60
10,000 strings
.23
.23
.23
.21
.22
=== XML File Format ===
Tested using an xml file that is parsed with php5's simplexml's methods for accessing strings
<BR>
Example: <string key="key">value</value>
<?xml version="1.0" encoding="UTF-8" ?> - <strings> <string key="key0">valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength</string> <string key="key1">valueofstring1</string> <string key="key2">valueofstring2</string> <string key="key3">valueofstring3</string> <string> </strings>
<BR>
50,000 strings
.32 sec
.34 sec
.37 sec
.41 sec
.38 sec
10,000 strings
.14 sec
.15 sec
.15 sec
.07 sec
.15 sec
1,000 strings
.01 sec
.01 sec
.01 sec
.01 sec
.01 sec
=== INI FILE TYPE ===
Parsing .ini files using the parse_ini_file method in PHP
[php]
key0 ="valueofstring0testinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielengthtestinglongerstringsinthisfileforgoodiegoodielength"
key1 ="valueofstring1"
key2 ="valueofstring2"
key3 ="valueofstring3"
[/php
50,000 strings
.68
.65
.63
.60
.65
10,000 strings
.21
.22
.22
.18
.21
1,000 strings
.02
.02
.02
.02
.02
SQL LITE--------------
Got a little tip from Wez Furlong about SQLITE in PHP5 and boy was he right, using SQLITE on 50,000 strings, picking out the strings you needed took a teeeeeny tiny .07 seconds!
timvw says:
November 3, 2005 @ 00:31 — Reply
Have you had a look at http://www.php.net/gettext?
Jim says:
November 4, 2005 @ 01:06 — Reply
yep, but we dismissed it after a few trials. It wasn't working for our needs so we needed a custom solution to work with our translation teams overseas.
Max Barel says:
July 23, 2006 @ 15:40 — Reply
If this thread is still alive, can you elaborate on why gettext did't fill your needs? I plan to use it and don't want to miss the point.
Amit Gupta says:
November 6, 2005 @ 10:23 — Reply
how about MySQL for storing Unicode characters? MySQL does support unicode chars using the unicode collation as well as several language specific collations!! PS:- which syntax hiliting plugin are you using? its broken in several places, if you just take a look, you'll know!!
Elliot Anderson says:
November 13, 2005 @ 08:08 — Reply
"The seemingly innocent thumbs-up gesture is offensive in Australia" ... Im an Aussie and I have never heard that before.
Shawn says:
November 22, 2005 @ 08:33 — Reply
great article. may just have to implement this in one of my big projects. '"The seemingly innocent thumbs-up gesture is offensive in Australia" ... Im an Aussie and I have never heard that before.' .. Ditto
Jim says:
November 22, 2005 @ 19:38 — Reply
really you guys haven't heard of that one? maybe they got the country wrong then.
Anonymous says:
February 13, 2006 @ 15:22 — Reply
There are some custom scripts you may use for UTF-8 encoding. See http://cms.naczasie.pl/lgpl/UTF-8.php, find DOWNLOAD and click :)
Bruce Sauls says:
May 28, 2010 @ 21:17 — Reply
Comment pending moderation
Ben says:
April 14, 2006 @ 23:55 — Reply
Hi, informative article, thanks. Just thought I'd point out what appears to be a mistake, for the benefit of future googlers. In your second code example, you have 'echo getString("welcomeUser");' but I think you mean 'echo $value;'.
Mgccl says:
June 19, 2007 @ 23:13 — Reply
Just how can you access those strings in XML? I mean, once you convert the XML file into simpleXML objects, what would you do to find the key and return the string?
speps says:
July 1, 2007 @ 15:49 — Reply
@Mgccl : see the PHP documentation for SimpleXMLElement->xpath() and maybe you will need to find doc about XPath too (google or wikipedia).
Web developers says:
August 21, 2009 @ 16:49 — Reply
Humm... interesting, If this thread is still alive, can you elaborate on why gettext did't fill your needs? I plan to use it and don't want to miss the point. Thanks
blu ray ripper says:
April 18, 2010 @ 10:53 — Reply
Comment pending moderation
Hollywood Wallpapers says:
April 19, 2010 @ 01:32 — Reply
Comment pending moderation
Aditya Yadav says:
May 12, 2010 @ 06:14 — Reply
Comment pending moderation
642-062 says:
May 19, 2010 @ 07:52 — Reply
Comment pending moderation
shopping says:
May 21, 2010 @ 12:17 — Reply
Comment pending moderation
HID lights says:
May 23, 2010 @ 10:25 — Reply
Comment pending moderation
fat burning furnace scam says:
June 1, 2010 @ 16:34 — Reply
Comment pending moderation
virbram five fingers says:
June 5, 2010 @ 03:14 — Reply
Comment pending moderation
coach outlet says:
June 7, 2010 @ 08:53 — Reply
Comment pending moderation
Billig Stromlieferant says:
June 7, 2010 @ 12:27 — Reply
Comment pending moderation
links of london says:
June 8, 2010 @ 01:36 — Reply
Comment pending moderation
MKV to iPad says:
June 9, 2010 @ 07:50 — Reply
Comment pending moderation
LOuIs VuItToN Damier Graphite Canvas says:
June 10, 2010 @ 05:16 — Reply
Comment pending moderation
Jany Robert says:
June 11, 2010 @ 10:35 — Reply
Comment pending moderation
CT0-101 says:
June 12, 2010 @ 09:30 — Reply
Comment pending moderation
laptop battery supplier says:
June 17, 2010 @ 03:18 — Reply
Comment pending moderation
coach handbags says:
June 17, 2010 @ 08:18 — Reply
Comment pending moderation
Jewelry Findings says:
June 18, 2010 @ 09:01 — Reply
Comment pending moderation
jewellery earrings says:
June 21, 2010 @ 09:45 — Reply
Comment pending moderation
Jerseys says:
June 22, 2010 @ 01:10 — Reply
Comment pending moderation
nfl replica jerseys says:
June 22, 2010 @ 02:01 — Reply
Comment pending moderation
acai berry says:
June 23, 2010 @ 09:54 — Reply
Comment pending moderation
burberry bags says:
June 23, 2010 @ 10:43 — Reply
Comment pending moderation
air jordan says:
June 24, 2010 @ 08:22 — Reply
Comment pending moderation
air max says:
June 24, 2010 @ 08:31 — Reply
Comment pending moderation