[Date Index][Thread Index]
[Date Prev][Date Next][Thread Prev][Thread Next]
Re: Another bug? [wml 2.0.6]
- From: Ken McGlothlen <nospam@thanx>
- Date: 18 Dec 2000 02:38:20 -0800
Denis Barbier <barbier@imacs.polytechnique.fr> writes:
| On Sun, Dec 17, 2000 at 02:01:57PM -0800, Ken McGlothlen wrote:
| [...]
| > while( $body ) {
| > if( $body =~ /^(<[^>]+>|[^a-z]+|&\w+;)(.*)$/ ) {
| > # got a tag or non-lowercase characters or entities
| > $result .= $1;
| > $body = $2;
| > } elsif( $body =~ /^([a-z]+)(.*)$/ ) {
| > # got lowercase characters
| > $result .= qq{<font size="$lcsize">} . uc( $1 ) . "</font>";
| > $body = $2;
| > }
| > }
| > return( $result );
| [...]
|
| And indeed, the loop above never breaks if $body begins with a character
| other than '<' or a lowercase letter. Maybe you should add
| } else {
| $result .= $body;
| $body = '';
Well, yes, I'd be the first to admit that the structure I hacked up needs
improvement. But I don't think that's the answer, either.
The problem here is with the regexes: I have three bugs in my regexes.
The first one (which didn't enter into this particular problem) is that I'm too
greedy with the tag-recognition bit in the if clause: it should be /<[^>]+?>/
instead of /<[^>]+>/.
The second thing is what's causing the infinite loop: the carriage return in
"This is a Test" in the second example. It's not caught by either one, because
newlines don't normally show up in the set [^a-z]. (It's the only character
that fails.) This can be solved by using the m//s operator with the regexes.
The third thing is that the regex in the if clause needs to be reordered.
So the while loop should read
while( $body ) {
if( $body =~ /^(<[^>]+?>|&\w+;|\s+|[^a-z]+)(.*)$/s ) {
# got a tag or entity or whitespace or non-lc
$result .= "$1";
$body = $2;
} elsif( $body =~ /^([a-z]+)(.*)$/s ) {
# got lowercase characters
$result .= qq{<font size="$lcsize">} . uc( $1 ) . "</font>";
$body = $2;
} else {
# got something weird
$body =~ /^(.)(.*)$/s;
$result .= "$1";
$body = $2;
}
}
and this indeed works.
The *real* bug in this is that I allowed myself to get misled by the difference
between HTML and Perl, where whitespace has very different impacts. :)
So, my apologies, Denis. At least I used a question mark in the subject line.
:)
---Ken McGlothlen
mcglk@artlogix.com
______________________________________________________________________
Website META Language (WML) www.engelschall.com/sw/wml/
Official Support Mailing List sw-wml@engelschall.com
Automated List Manager majordomo@engelschall.com