| 7.5 Use Parentheses to Build more Complicated Patterns |
|
|
Now to change the rules in a way that lets more complicated regular expressions be written:
In the previous section, we built regular expressions from quasichars, anchors, repeaters, and branches. The rules we gave for those regular expressions did not really require that quasichars only match single characters. That just made the rules easier to explain. All that mattered was that a quasichar could be tested to see if it matches a substring beginning at a definite place. A pattern, too, can be tested to see if it matches a substring beginning at a definite place. So, there is no reason not to let quasichars be patterns. Therefore, we do let quasichars be patterns but we insist that such quasichar patterns be surrounded with parentheses to keep things unambiguous. Explaining why a quasichar pattern that matches an emptyf string cannot have a repeater operand after it is more difficult. After all, the theory says that the * repeater is idempotent which should mean that a** is the same as a*. Why then should the practice forbid a** or (a*)*? I have not looked at the code to see why but I suppose it has something to do with avoiding infinite recursion or an infinite loop. Whatever the reason, theory and practice differ here. However, the divergence is not very consequential. Now for an example. Consider this, x*which matches zero or more copies of the letter x and this, cat|dogwhich matches "cat" or "dog." If we replace the quasichar x with the pattern in parentheses, we get (cat|dog)*which matches zero or more consecutive substrings, each of which is "cat" or "dog." To be even more concrete, regexp "(cat|dog)*" catdogcatbert Matchwill return true and set Match to catdogcat.
Here is a short example of the power of parentheses. Recall that the Tcl pattern matcher interprets ^ as an empty string just before the first character of the string you are trying to match. In other words, ^ is not just a control character the way ( is. Instead, ^ is seen as matching something. Now, consider the following, set LineBrk_ "\n" regexp "(^|$LineBrk_)To:" $Str MatchThis will match the first occurrence of "To:" which is immediately preceded by the start of the given string or a break between lines. In other words, it matches the first occurrence of "To:" at the beginning of a line.
|
Author's Home Page |
|
Order from Amazon. |