[Date Index][Thread Index]
[Date Prev][Date Next][Thread Prev][Thread Next]

Re: Another bug? [wml 2.0.6]



Denis Barbier <barbier@imacs.polytechnique.fr> writes:

| On Sun, Dec 17, 2000 at 02:01:57PM -0800, Ken McGlothlen wrote:
| [...]
| > 	while( $body ) {
| > 	    if( $body =~ /^(<[^>]+>|[^a-z]+|&\w+;)(.*)$/ ) {
| > 		# got a tag or non-lowercase characters or entities
| > 		$result .= $1;
| > 		$body = $2;
| > 	    } elsif( $body =~ /^([a-z]+)(.*)$/ ) {
| > 		# got lowercase characters
| > 		$result .= qq{<font size="$lcsize">} . uc( $1 ) . "</font>";
| > 		$body = $2;
| > 	    }
| > 	}
| > 	return( $result );
| [...]
| 
| And indeed, the loop above never breaks if $body begins with a character
| other than '<' or a lowercase letter.  Maybe you should add
|             } else {
|  		$result .= $body;
|  		$body = '';

Well, yes, I'd be the first to admit that the structure I hacked up needs
improvement.  But I don't think that's the answer, either.

The problem here is with the regexes:  I have three bugs in my regexes.

The first one (which didn't enter into this particular problem) is that I'm too
greedy with the tag-recognition bit in the if clause:  it should be /<[^>]+?>/
instead of /<[^>]+>/.

The second thing is what's causing the infinite loop: the carriage return in
"This is a Test" in the second example.  It's not caught by either one, because
newlines don't normally show up in the set [^a-z].  (It's the only character
that fails.)  This can be solved by using the m//s operator with the regexes.

The third thing is that the regex in the if clause needs to be reordered.

So the while loop should read

	while( $body ) {
	    if( $body =~ /^(<[^>]+?>|&\w+;|\s+|[^a-z]+)(.*)$/s ) {
		# got a tag or entity or whitespace or non-lc
		$result .= "$1";
		$body = $2;
	    } elsif( $body =~ /^([a-z]+)(.*)$/s ) {
		# got lowercase characters
		$result .= qq{<font size="$lcsize">} . uc( $1 ) . "</font>";
		$body = $2;
	    } else {
		# got something weird
		$body =~ /^(.)(.*)$/s;
		$result .= "$1";
		$body = $2;
	    }
	}

and this indeed works.

The *real* bug in this is that I allowed myself to get misled by the difference
between HTML and Perl, where whitespace has very different impacts.  :)

So, my apologies, Denis.  At least I used a question mark in the subject line.
:)

						---Ken McGlothlen
						   mcglk@artlogix.com
______________________________________________________________________
Website META Language (WML)                www.engelschall.com/sw/wml/
Official Support Mailing List                   sw-wml@engelschall.com
Automated List Manager                       majordomo@engelschall.com