Short Review and Recommendation:) :) :) :) :) of 5 If you use regular expressions much, likely reading "Five Habits for Successful Regular Expressions" will enable you to use them more effectively. Look at the list of five habits in the section below this one. Unless you are sure you understand the implications of these brief suggestions and already have the habit of following them, I recommend reading this article. It's a short article and Tony Stubblebine makes his points clearly and concisely. George Woolley of Oakland.pm MiscellaneousThe Five Habits
[top] Some References
Notes:
Notes:
Some AbbreviationsThe following are abbreviations for "regular expression"
There are also variants of these involving caps, e.g. RegExp. IMHO is an abbreviation of "in my humble opinion". IMO is a variant on this that is less frequently used. This review talks a lot about regexes. The whole review is IMO. Note:
[top] In Case You Get BoredOr even: [top] March Meeting AnnouncementMarch Oakland.pm Meeting when: Tue. Mar. 9 at 7:30-9:30pm. (We meet 2nd Tuesdays.) where: Joshua Wait's place 1903 Virginia Street Apt. 3 Berkeley, CA 94709 directions: see links on home page what: * introductions * giveaways * talk by Tony Stubblebine "Regular Expression Best Practices" who: open to anyone interested. how much: no fee for our meetings. [top] Blurb for Tony's TalkRegular expressions are broken and obtuse. They're hard to write. They are even harder to read, especially if you are not the original programmer. Tony Stubblebine, author of "Regular Expression Pocket Reference," will explain how to decrease development time while increasing reliability and readability. Some Questions To PonderThis section contains questions to encourage discussion among Oakland.pm members during the time before Tony Stubblebine's March presentation. The questions are:
My Answers
[top] |
Longer ReviewContents
Note:
About This ReviewWarning: The review below in this column is quite long, especially considering it's a review of a short article. For a shorter review, see my review at the the top of the left column. Or you could just go read the Tony Stubblebine article. Intent: My intent in writing this review is
Contents: I take a look at all five habits the author recommends based on my own experience with regexes. For each habit, I briefly
You may be wondering why I have bad rather than good examples? I'm aware of several reasons:
Hey, there are good examples in the the article. [top] The Promise of the ArticleThe author says that regular expressions are
They are IMO also the best notation that's widely available for manipulating strings. Oh, well! The author says: if you adopt the five habits he describes, you'll eliminate most of the trial and error involved in regexes. Kool! [top] What The Stubblebine Article Is NotThe Stubblebine article is short and focused. The article is not
Some people believe that regexes are often used where there is a preferable alternative. Could be. In any case, this article does not address the issue of when to use a regex and when not to. [top] About the ReviewerI come to this review with a number of biases. Below I touch on some of them. String Manipulation: When I started writing programs way back in the early 60s, one of the first things I noticed was that in the more widely used languages the notation for handling mathematical expressions was reasonably well-developed but the facilities for manipulating strings sucked. I had a strong background in formal logic and in my view
so it seemed odd that the widely used languages were brain dead regarding strings. Perl: Up until 1994, I didn't have a favorite language. In 1994, I discovered Perl, and it quickly became my favorite language. Perl is not brain dead regarding strings; rather it has excellent regular expressions beautifully integrated into the language. And it's widely used. I'm a big advocate of Perl. The author of the article being reviewed gives examples from PHP, Python and Perl. As I said, I've been using Perl since 1994, and with a fair amount of emphasis on regexes. I've played with Python, but haven't used PHP at all. There was one year (just one, alas) when I had a dream job. I wrote code in nothing but Perl. I even got to teach a class in Perl. I was mostly writing filters to process website log files. I wrote an incredible number of regexes. Most of my use of regexes has been in the context of some shell or Perl running on some flavor of Linux/Unix. I've never used Perl on any flavor of Windows. I'm not an expert on Perl or regular expressions. O'Reilly and Tony Stubblebine: I own lots of O'Reilly books, and I think both O'Reilly and their books rock. And Tony works for O'Reilly. (Lucky him!) Also, I'm very active in Oakland.pm, and Tony is scheduled to speak to us at our March meeting. Consequences: Mostly, it's up to you to adjust for my biases. But please note that all the examples are Perl. [top] 1. Use Whitespace and CommentsThe Habit: This habit is to use white space and indentation when writing regexes and to comment them too, as you do when writing the rest of your code. You do indent and comment your code. Right? In Perl 5, implementing this habit requires use of the x modifier. Of course, there's more to it than that. What I've Been Doing: I do, indeed,
However, when it comes to regexes I rarely
The sad truth is that while I do sometimes use the x modifier, it's not my habit to do so. :( Perhaps partly to compensate for my bad habit I do
New Year's Resolution: Next year, I'll use white space and comments in my regexes to make them clearer for even moderately complex regexes. Bad Example: I once was asked to make modifications to a Perl script. The script contained a humongous regex all on one line that
Notes:
Silly Bad Example: m/ t h e \s r e d ( c a t | d o g ) / Notes on Silly Example
Notes on Compactness
[top] 2. Write TestsThe Habit: This habit involves
Hm, what's "test suite" mean here? I think it just means a well-conceived collection of test cases that has been captured for later reuse. (This understanding has been confirmed with the author.) What I've Been Doing: I do different things depending especially on
If there is a large amount of data, I may go through a separate phase where I research the data. "Know your data" is a dictum that I take seriously. This phase overlaps with, but isn't the same as collecting sample test data. When I was writing log filters, I used Unix shell commands extensively in this research, especially: grep , cut , sort and uniq Typically, my test cases would be in two files:
I also made sure that my filters (and changes to them) were tested by someone else after I finished my testing. Filters were not released until both I and the external tester were satisfied. New Year's Resolution: I resolve to keep this habit. Aside: I've found the module Test::More easy to understand and quite useful. But you may prefer Test::Simple which I gather is simpler. You can find out more about these modules in the Perldocs. For example, you could type the following at a Unix/Linux command line perldoc Test::Simple Bad Example: Someone modified one of my filters
The filter malfunctioned because of the changes and considerable effort was wasted in the resulting confusion. [top] 3. Group the Alternation OperatorThe Habit: This habit consists of grouping alternatives using parentheses. What I've Been Doing: I have this habit. I generally do this even if the expression consists of nothing but the alternatives. For example, I likely write something like m/(gif|jpg|jpeg|png)/ even though the parentheses are unnecessary.New Year's Resolution: I resolve to keep this habit and combine it with the other habits resulting in regexes that look more like this m/ ( gif | jpeg | jpg | png ) /x Notes:
m/ ( gif # image file extensions | jpeg | jpg | png ) /x Aside: Beware of accidentally creating a null alternative. Debugging this error could be quite annoying. The problem is that the null alternative will always match. Bad Example: m/\.gif|png|jpeg|jpg|/ Notes:
[top] 4. Use Lazy QuantifiersThe Habit: The habit is to use lazy quantifiers when this will make a regex easier to read. What I've Been Doing: Bad me. I almost never use lazy quantifiers. I don't have this habit. :( Perhaps as compensation, I sometimes use the anchors ^ and $ and describe the whole line in cases where it's really not necessary to do that. New Year's Resolution: I resolve to develop the habit of asking for each regex I write: would it be more readable and/or more accurate using lazy quantifiers. Bad Example: s/<.*>//g; # zap html tags Note:
Silly Bad Example: s/x{2}?//; Notes:
[top] 5. Use Available DelimitersThe Habit: This habit is to use alternate delimiters when that will lead to greater readability by reducing the number of escape characters. What I've Been Doing: I have this habit. Boy does it make my life easier. I don't just count the number of backslashes saved. E.g. I generally don't use the delimiter / with multiple backslashes. New Year's Resolution: I resolve to keep this habit. Aside: When you write a match, I suggest including the m even if it's not required. In the past I've had a bad habit of not doing that. It can lead to puzzling compile error messages if you change delimiters. Try the second bad example. Bad Examples: if ( $line =~ /\"(http:\/\/.*\/.*\.html)\"/ ) { $url = $1; } Notes:
Or how about the following attempted improvement. if ( $line =~ #\"(http://.*/.*\.html)\"# ) { $url = $1; } Note:
Silly Bad Example: s#^\##\#\##; Notes:
[top] So What?Part of what I'm trying to say is:
[top] Final ThoughtsAs is likely obvious by now, IMO Tony's recommendations are sound.Practice: It's all very well to resolve to change. But intellectual understanding that something is a good idea is insufficient to bring about change. And we all know that New Year's resolutions are usually for naught. So I am making a point to get some practice in following the five habits. And you? Intent: Unless you know the intent of a regex, you generally can't say how good it is. For example, the regexes for the following would likely be different even though both would be dealing with phone numbers:
Test: Also IMO the most important habit is testing. If you do that well, you'll likely end up adopting other good habits too in self-defense. [top] |
Last Updated: 2004-01-09