Regular Expressions

First off, regular expressions are great.

They are a handy quick way to validate or parse data and you can use them in almost all languages. But of all things, I forget neat regex’s and in fairness they are a pita to recall as the syntax is plain nutty.

This is where it ends. I am going to reference all the neat regex’s in this blog as I come across them rather than rely on mother Google.

Starting with a couple of simple functions I use to validate email addresses (thanks WordPress core!) and URLs.

function is_url ( $url ) {
	return ( ! preg_match ( '/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i', $url ) ) ? FALSE : TRUE;
}

function is_email( $email ) {
	// Test for the minimum length the email can be
	if ( strlen( $email ) < 3 ) {
		return false;
	}

	// Test for an @ character after the first position
	if ( strpos( $email, '@', 1 ) === false ) {
		return false;
	}

	// Split out the local and domain parts
	list( $local, $domain ) = explode( '@', $email, 2 );

	// LOCAL PART
	// Test for invalid characters
	if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {
		return false;
	}

	// DOMAIN PART
	// Test for sequences of periods
	if ( preg_match( '/\.{2,}/', $domain ) ) {
		return false;
	}

	// Test for leading and trailing periods and whitespace
	if ( trim( $domain, " \t\n\r\0\x0B." ) !== $domain ) {
		return false;
	}

	// Split the domain into subs
	$subs = explode( '.', $domain );

	// Assume the domain will have at least two subs
	if ( 2 > count( $subs ) ) {
		return false;
	}

	// Loop through each sub
	foreach ( $subs as $sub ) {
		// Test for leading and trailing hyphens and whitespace
		if ( trim( $sub, " \t\n\r\0\x0B-" ) !== $sub ) {
			return false;
		}

		// Test for invalid characters
		if ( !preg_match('/^[a-z0-9-]+$/i', $sub ) ) {
			return false;
		}
	}

	// Congratulations your email made it!
	return true;
}

Lastly and most recently, I wanted to parse a string of code to find an assignment value.
I came across a neat regex to help me parse out the values of these variables.

$code = "var string_variable = Superduper;
var digit_variable = 123456;";

function get_assignment_value( $needle, $haystack, $type = 'string' ) {
	if( $type == 'digit' )
		preg_match( '/.'.$needle.' = (?P<value>\d+)/', $haystack, $matches );
	else
		preg_match( '/.'.$needle.' = (?P<value>\w+)/', $haystack, $matches );
	
	if( empty( $matches[ 'value' ] ) )
		return false;

	if( $type == 'digit' )
		return (int) $matches[ 'value' ];
	
	return $matches[ 'value' ];
}

var_dump( get_assignment_value( 'string_variable', $code ) ); // string(10) "Superduper"
var_dump( get_assignment_value( 'digit_variable', $code, 'digit' ) ); // int(123456) 
var_dump( get_assignment_value( 'digit_variable', $code  ) ); // string(6) "123456" 

My main source of guidance on this voodoo here and their sometimes hard to find but useful reference.

Also, here is a good starter on building a regex.

I know regular expressions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s