PHP Cool RegEx Tricks Tutorial by Jim Plush  

I came across an article on Advanced Regular Expressions by George Schlossnagle in an April 2004 edition of PHP Architecht this morning. It has been hiding in a drawer for a year. The article is outstanding, and 2 of the features I could most use in everyday life are Pattern Naming and Commenting your regex. I'm assuming you already have a good understanding of how preg_match works.

I know alot of the time I'll come across a regex that looks like:

^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?$

This is a mild example but honestly that will take you a nice chunk of time to decode into what it is actually looking for. Then you have to watch the parenthesis to figure out what array index each variable you need is going to be in.

What this tutorial will cover:

1. How to create a simple RegEx match

2. How to add pattern names to your regular expressions

3. How to add comments to your regular expressions

PART 1 EASY REGEX

Lets take the string below and use it in all the examples through the tutorial. This will make it easier to get the hang of everything.


$string = "My blog is located at http://www.litfuel.net/plush/";


Now lets create a very simple Regular Expression check that gives me the whole string in one variable


SECTION 1 - Simple RegEx Example

// We're going to match a string of any character or number
$regex = "#(.*)#";
if(preg_match($regex, $string, $matches))
{
     var_dump("<pre>
", $matches);
} else {
     echo 'NO STRING MATCH!';
}


That is going to output:

string(5) " "
array(2)
{
[0]=> string(106) "My blog is located at http://www.litfuel.net/plush/"
[1]=> string(106) "My blog is located at http://www.litfuel.net/plush/"
}

Now you can see that $matches[1] contains my entire string. Imagine if we had 20 pattern matches we wanted to display, having to remember every single index number would drive you batty. For example $matches[14] contains a phone number.

The answer to this problem naming your patterns using "?P<>"



Now lets take a look at how to make our life easier by NAMING our pattern we want to capture.


SECTION 2 - Naming our Pattern

// now lets call our pattern a name "string" so we can reference it as an associative array
$regex = "#(?P<string>.*)#";
if(preg_match($regex, $string, $matches))
{
     
echo "My String is: {$matches['string']}";
} else {
     echo 'NO STRING MATCH!';
}


That is going to output:

My String is: My blog is located at http://www.litfuel.net/plush/

As you can see I referred to the variable I wanted to out put as $matches['string']. You can start to see how handy that could become.




Lets take a more practical example, extracting the domain from my string using a named reg ex pattern. This is a really simple domain example, you wouldn't want to use this in production because I'm assuming http://www. will be the format, I just tried to make it simple to understand.


SECTION 3 - Extracting our domain with a named RegEx Pattern

$regex ="#http://www\.(?P<domain>.*\..{2,3})#";
if(preg_match($regex, $string, $matches))
{
     
echo "My Domain is: {$matches['domain']}";
} else {
     echo 'NO STRING MATCH!';
}


That is going to output:

My Domain is: litfuel.net

Again once you start to have multiple patterns it comes in handy or imagine you're using someone else regular expression in your code. When you var_dump you have to guess what each variable is supposed to be based on the pattern. Using named patterns, people could quickly see what you are trying to do.




That example was cool but you probably had to study it for a minute, what if I were able to break it up into different lines and comment each line so you could really undertand my regex pattern quickly. Lets use the "x" modifer which says ignore linespaces and let me add some comments.


SECTION 4 - Commenting Patterns
$regex = '!  
  http://www # we are looking for a pattern starting with http://www in our string
  \. # followed by a period that we are escaping with a backslash
  (?P<domain>.+\. # now we are calling our pattern "domain" and we are looking for any character 1 or more times
  .{2,3}) #followed by a period followed by any character 2 to 3 times
  !x';  

         
if(preg_match($regex, $string, $matches))
{
     
echo "My Domain is: {$matches['domain']}";
} else {
     echo 'NO STRING MATCH!';
}


That is going to output:

My Domain is: litfuel.net

Again the same string prints out but this one is obviously much easier to understand. You should also notice I changed my pattern matcher from # to ! because comments are noted with the # character.

I hope that you have learned something about sending mail with attachments from this tutorial. If you have any comments you can reach me at: jiminoc (At) gmail.com