Regular expressions are a very powerful tool for searching and processing text, and even in their simplest form can make many seemingly complex tasks very easy.
Fortunately, Perl successfully integrates the power of regular expressions. Its flexible pattern matching ability gives it a great advantage over other languages (such as C) in the CGI world. For example, one statement is normally enough for Perl to process one string, while C would probably require many statements to implement the same task.
An example is the conversion of all the + (plus) signs in a string to the "space" sign. (Recall that input data are encoded before they are sent to the CGI program, where a space sign converts into a plus sign. After we receive the input data, we need to convert the plus sign back to its original space sign. This is usually the first step after receiving the input data from the clients). In Perl only one statement:
$str=~tr/+//;can do it quite well while in C we need a subroutine:
void ADDToSpace(char *str) { register int i; for(i=0;str[i];i++) if(str[i]=='+' str[i]=' '; }
/pattern/The following is the common used pattern:
/pattern/ | Description |
x? | match zero or one character 'x' |
x* | match zero or more characters 'x' |
.* | match zero or more any character |
x+ | match one or more character 'x' |
.+ | match one or more any character |
{m} | match m characters |
[] | match characters included in [] |
[^] | match characters not in [] |
[0-9] | match any digit from '0' to '9' |
[a-z] | match any character from 'a' to 'z' |
[^0-9] | match any character not between '0' to '9' |
[^a-z] | match any character not between 'a' to 'z' |
^ | match first character in string |
$ | match last character in string |
\d | same as [0-9] |
\d+ | match more than one digit, same as [0-9]+ |
\D | same as [^0-9] |
\D+ | same as [^0-9]+ |
\w | match one alphanumeric (character or digit) , same as [a-zA-Z0-9] |
\w+ | same as [a-zA-Z0-9]+ |
\W | match a non-alphanumeric character, same as [^a-zA-Z0-9] |
\W | match more than one non-alphanumeric character, same as [^a-zA-Z0-9]+ |
\s | match one space character , same as [\n\t\r\f] |
\s+ | match more than one space character, same as [\n\t\r\f]+ |
\S | match one non-space character , same as [^\n\t\r\f] |
\S+ | match more than one non-space character , same as [^\n\t\r\f]+ |
a|b|c | match 'a' or 'b' or 'c' |
abc | match substring "abc" |
(pattern) | () is a very useful operator which will remember the string we found. The string found in the first () will be assigned to $1; the second, to $2 and so on. I will give an example later on. |
/patter/i | match string or character ignore the uppercase or lowercase |
Example | Description |
/perl/ | search string have substring "perl" |
/^perl/ | match string start with "perl" |
perl$ | match string end with "perl" |
/c|g|i/ | match string have 'c' or 'g' or 'i' |
/cg{2,4}i/ | match string with the character 'c' followed by 2 to 4 character 'g' followed by the character 'i' |
/cg*i/ | match string with the character 'c' followed by zero or more characters 'g' then followed by charater 'i' |
/c..i | match string with the character 'c' followed by any two characters then followed by the character 'i' |
/[cgi] | match string which includes 'c' or 'g' or 'i' |
/\d/ | match one digit |
/\W/ | match string with no alphanumeric characters |
print "Please input a string: \n"; $string=<STDIN> #accept an input string from standard input chop($string); #build-in function to chop the last newline character if($string=~/cgi/){ print "The input string include substring cgi! \n"; }else{ print " The input string does not include substring cgi! \n"; }
$string="chmod 711 cgi"; $string=~/(\w+)\s+(\d+)/;The (\w+) matches any number of characters. The matching substring will be assigned to variable $1. \s matches any number of spaces. The (\d+) matches any number of digits. $2 will get the matching result. So now $1="chmod"; $2="711". Note that the () is an important operator listed in the above table.
$_="chmod 711 cgi"; /(\w+)\s+(\d+)/;We will get the same result as Example 1. Note that if do not specify a operation string the default variable $_ will be used.
$string="chmod 711 cgi"; @list=split(/\s+/,$string); #split string using spaceNow we get:
@list=("chmod","711","cgi");
tr/SEARCHLIST/REPLACELIST/Which translates SEARCHLIST to REPLACELIST. Here are two examples:
$string="testing"; $string=~tr/et/ET; #now $string="TEsTing" $string=~tr/a-z/A-Z/ #Here $string="TESTING"
$string="CGI+Perl"; $string~tr/+//; #Here $string=" CGI Perl"
s/PATTERN/REPLACE/egWhich substitutes the PATTERN with the REPLACE pattern, where 'e' ang 'g' are the \ parameters:
$string="i:love:perl"; $string=~s/:/*/; # now $string="i*love:perl" $string=~s/:/*/; # now $string="i*love*perl" $string=~s/*/+/; # now $string="i+love+perl" $string=~s/+//g; # now $string="i love perl" $string=~s/perl/cgi; # now $string="i love cgi"
$string="i love perl"; $string=~s/(love)/<$1>/; # now $string="i<love>perl" # Here the first match "love" is assigned to $1
$string="www22cgi44"; $string=~s/(\d+)/$1*2/e; #now $string="www44cgi44"; #the paramater 'e' shows that the $1*2 is an equation instead of a common string