tohokuaikiのチラシの裏

技術的ネタとか。

正規表現は重い

PHPのPCRE系の正規表現において、
複雑な正規表現 >>>> シンプルな正規表現
みたいで、もし複雑な正規表現にするか、単純にforeachで回した方がいいか迷ったらforeachの方がいいかもしれない。

<?php
$start = explode(" ", microtime());
$str = array("AB","BC","CD","DE","EF","FG","GH","HI","IJ","JK","KL",
                  "LM","MN","NO","OP","PQ","QR","RS","ST","TU","UV",
                  "VW","WX","XY","YZ","ZA",
                  "ab","bc","cd","de","ef","fg","gh","hi","ij","jk","kl","lm",
                  "mn","no","op","pq","qr","rs","st","tu","uv","vw","wx","xy","yz","za");
$text=<<<EOF
GNU GENERAL PUBLIC LICENSE
........づらづらっとテキストがいっぱい
EOF;

$regexp = implode('|', $str);
for ($i=0; $i<200; $i++)
{
        // 52回preg_match
	foreach ($str as $s){ preg_match_all("/".$s."/", $text, $m);}
        // 1回だけpreg_match
	preg_match_all("/(".$regexp.")/", $text, $m);
}

$stop =  explode(" ", microtime());
var_dump((intval($stop[1])-intval($start[1])) + ($stop[0]*1000000.0-$start[0]*1000000.0)/1000000.0);


まぁ、予想されたとおり52回の場合は、0.270004secで、1回の場合は3.180048secかかった。

ちなみに、PHPにはsubstr_countっていうのがあるんで、そもそも正規表現を使う必要が無くって、その場合0.01secだった。