Php Unserialize string after non UTF8 characters stripped out

This may be a pretty rare problem but I’ll post it regardless.

I have a serialized string with non UTF8 characters. I strip them out using some code I posted a while back. Problem now is that the serialized string lengths are not correct as some of the string multibyte characters have been replaced with a single character ?.

When you try to unserialize this data you will get nothing.

Came across this bit of ingenuity via stackoverflow to address this problem. It simply works out the new string length and updates that portion of the serialized data using a regex with preg_replace.

My solution is very similar, only it uses preg_replace_callback as the \e modifier used in the example above is deprecated.

function mb_unserialize($string) {
    $string = preg_replace_callback(
        '!s:(\d+):"(.*?)";!s',
        function ($matches) {
            if ( isset( $matches[2] ) )
            	return 's:'.strlen($matches[2]).':"'.$matches[2].'";';
        },
        $string
    );
    return unserialize($string);
}

6 thoughts on “Php Unserialize string after non UTF8 characters stripped out

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s