Why does a Perl 5.6 regex run a lot slower on Perl 5.8?
perldeveloper
created: 2004-08-13 08:59:54
I'll cut to the chase: the same Perl code runs under Perl 5.8.0 (and Perl 5.8.5) a lot slower. What does a lot mean? Well, in this very case I'm presenting here, it means about five hundred times slower. Since I cannot believe that there exist recompiling options that can make Perl run 500 times slower/faster, one of the following must hold:
Since my code resembles Chapter 3 in a Perl textbook (regular expressions and IO reading/writing), the second one must be true: Perl 5.8.x is not backwardly compatible. I would expect that when it comes to obscure functionality or old deprecated functionality, but I wouldn't expect it when it comes to regular expressions. Regular expressions are the main reason why I chose Perl; if that breaks down, I might as well forget about Perl altogether and stick to Java and the ubiquitous Python (which is already the preferred choice over Perl in web development).

Out of decency towards the Perl community, I feel obliged to spend some time before jumping to conclusions, and examine my tests on three versions of Perl: 5.6.1, 5.8.0 (shipped with RedHat9), and 5.8.5, a very lite hand-made compilation, built for performance and no extra specialized functionality. However, I do not have the time nor the resources to conduct a test on another operating system. RedHat 9 is however a standard Linux operating system and this outrageous behavior is most probably common to many others, if not all.

The following tables outline debugging information obtained running perl -d:DProf and then dprofpp tmon.out.

Perl5.6.1
Total Elapsed Time = 0.080048 Seconds
  User+System Time = 0.080048 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 87.4   0.070  0.070     40   0.0018 0.0018  main::extract
 12.4   0.010  0.010      1   0.0100 0.0100  warnings::BEGIN
 0.00   0.000  0.010      2   0.0000 0.0050  main::BEGIN
 0.00   0.000  0.000      1   0.0000 0.0000  warnings::import
 0.00   0.000  0.000      1   0.0000 0.0000  strict::import
 0.00   0.000  0.000      1   0.0000 0.0000  strict::bits
 0.00   0.000  0.000      1   0.0000 0.0000  Exporter::import
 0.00   0.000  0.000      1   0.0000 0.0000  warnings::bits
Perl5.8.0
Total Elapsed Time = 123.5199 Seconds
  User+System Time = 39.62993 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 97.1   38.49 38.520     40   0.9622 0.9630  main::extract
 0.05   0.020  0.020      1   0.0200 0.0200  utf8::SWASHNEW
 0.03   0.010  0.010      1   0.0100 0.0100  utf8::AUTOLOAD
 0.00       - -0.000      1        -      -  utf8::SWASHGET
 0.00       - -0.000      1        -      -  Exporter::import
 0.00       - -0.000      1        -      -  warnings::unimport
 0.00       - -0.000      2        -      -  warnings::import
 0.00       - -0.000      1        -      -  warnings::BEGIN
 0.00       - -0.000      2        -      -  strict::unimport
 0.00       - -0.000      4        -      -  strict::bits
 0.00       - -0.000      2        -      -  strict::import
 0.00       - -0.000      3        -      -  main::BEGIN
 0.00       - -0.000      5        -      -  utf8::BEGIN
Perl5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 98.4   0.630  0.630     40   0.0157 0.0157  main::extract
 1.56   0.010  0.010      1   0.0100 0.0100  warnings::BEGIN
 0.00       - -0.000      1        -      -  warnings::import
 0.00       - -0.000      1        -      -  strict::import
 0.00       - -0.000      1        -      -  strict::bits
 0.00       -  0.010      2        - 0.0050  main::BEGIN
The main::extract subroutine takes about 9 times longer under Perl 5.8.5, and 549 times more under Perl 5.8.0, compared to Perl 5.6.1. The program itself took 1,543 times longer to finish under Perl 5.8.0 than it did under Perl 5.6.1. You may be wondering what the Perl program is:
use strict;
use warnings;

open (FILE, "a.txt");
my $text = "";
while () {
     $text .= $_;
}

close (FILE);

while (my ($one, $two) = extract ($text)) {
     $text = $one . $two;
}

sub extract {
     my ($text) = @_;

     if ($text =~ /(.*?)whatever(.*)/is) {
          return ($1, $2);
     }

     return ();
}
As you can see, this code slurps a file and removes all occurences of a certain word (`whatever'). If you're wondering why Perl 5.8.0 took 2 minutes, it's not because I was using a larger file, and it's not because the file was large. The size of the file was exactly 11,221 (about ten thousand) bytes.

When the /.*? regular expression is changed to /^.*? (an explicit version of the same regexp), and instead of a 10,000 byte file, a 5,000,000 byte file is used, here are the debugging results for the main::extract subroutine:

Perl 5.6.1
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 88.1   0.670  0.670      1   0.6700 0.6700  main::extract
Perl 5.8.0
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 95.0   2.490  2.510      1   2.4900 2.5100  main::extract
Perl 5.8.5
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 19.5   0.080  0.080      1   0.0800 0.0800  main::extract
It's obvious that the little hat balanced out the differences between the three releases (although 3.7 times longer with Perl 5.8.0 is reason enough NOT to upgrade). Perl 5.8.5, in its current build was faster than Perl 5.6.1. The differences exist on account of different versions and different build parameters. To be more exact, here are the configuration summaries for the three releases:

Perl 5.6.1 Configuration Summary
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.0. Configuration Summary
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio= d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
Perl 5.8.5 Configuration Summary
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
The conclusion is that all regular expressions written like this:
$text =~ /(.*?)/
take a thousand times more on 5.8.0. The same expressions written as
$text =~ /^(.*?)/
which obviously means the same thing (look for the first occurence of and save the text preceding it in the corresponding variables) has the same performance implications across these two versions.

In my honest opinion, This is not an issue of bad code and good code, this is an issue of good Perl and bad Perl. I've only discovered this strange behavior using standard regular expression and moving from 5.6 to 5.8, which are consecutive versions. If the changes are so dramatic when upgrading to the next version, what is one to expect of Perl in other respects?

I can tell you one thing: if IBM had written Perl, this would have never happened. Maybe there aren't enough alpha and beta testers, maybe developers don't have the time to write enough warning messages. What's certain is that Perl is not seen as a product, and the members of the community it attempts to serve are not being looked upon as customers. And that's the very difference between Open source and closed source software. What good is it's free, if it is deceiving its users about the problems it claims to solve?
Re: The Deceiver
created: 2004-08-13 09:16:11
I am the person to blame. I made the change in the regex engine that is causing the problem you're facing. Let me explain:
The conclusion is that all regular expressions written like this:
$text =~ /(.*?)/
take a thousand times more on 5.8.0. The same expressions written as
$text =~ /^(.*?)/
which obviously means the same thing (look for the first occurence of <whatever> and save the text preceding it in the corresponding variables) has the same performance implications across these two versions.
Sadly, that is not true, and that is exactly what I had to change in the source of perl. You say that /(.*)X/ and /^(.*)X/, but that is a half-truth. Consider this case:
"xxyyyRyyy" =~ /(.*)R\1/
If, as you state, the leading ^ is implied, the regex fails, because "xxyyy" cannot be found after the "R" as my regex requires. Only by not anchoring that regex can it ever match ($1 is "yyy").

There is no "easy" way to fix this problem in the source of perl; you have to explicitly state the anchor yourself. The reason is that perl has no way of knowing whether or not you'll end up using what you captured as a backreference, so anchoring has an unknown effect. The problem is not only when the .* is captured, either; any capturing in the regex causes a problem.

(The case of "abc\ndef1" =~ /.*\d/ is already handled by the engine so as not to fail. It would fail if the regex were treated as /^.*\d/, but the engine makes it (?m:^) if necessary.)

_____________________________________________________
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^2: The Deceiver
created: 2004-08-13 10:04:28
Thank you for your quick answer, and thanks again for taking the time to explain. Although I agree that the two regular expressions have different meanings, the real question here is why Perl 5.6.1 is 500-1,000 times faster than Perl 5.8.0 on the same regular expression -- this is my real query. Am I to assume that Perl 5.6.1 did not properly parse certain regular expressions and Perl 5.8.0 now does? I just tried your regular expressions and they yielded the same results under both versions. How unstable is my previous code, if new versions can make it obsolete in performance, as if encouraging not to upgrade.
Re^3: The Deceiver
created: 2004-08-13 10:08:15
I'm not entirely sure why the regexes were so much slower, unless they just never could actually match. In that circumstance, /.*FAIL/ would be a lot slower than /^.*FAIL/.
_____________________________________________________
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^4: The Deceiver
created: 2004-08-17 18:05:06

And since perldeveloper removes the "whatever"s from the string the regexp will fail in the last iteration. So I would not be surprised if most of the wasted time was in the last iteration :-)

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
   -- Rick Osborne

Re^3: The Deceiver
created: 2004-08-13 10:41:32
#reg.pl
$s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyRRRRyyyy\n" x 500;
$n = 0;
$n++ while ($s =~ /(.*?)RRRR\1/sg);
print "$n matches\n";

time ~/bin/perl5.8.0 reg.pl 
500 matches

real    0m4.836s
user    0m4.800s
sys     0m0.010s

time ~/bin/perl5.6.1 reg.pl 
0 matches

real    0m0.020s
user    0m0.020s
sys     0m0.000s
So, in fact, you are complaining that a bug got fixed. The problem is that these are extremely inefficient regular expressions because they involve a lot of backtracking. I recommend reading Mastering Regular Expressions for a detailed explanation.
Re^4: The Deceiver
created: 2004-08-13 10:55:51
That's very good to hear. What's not good is that code that relied on nothing like backreferencing regexps got squashed in the upgrade process. Like japhy guessed, the regexp failed to match, but wouldn't it make sense even for a /(.*)TEXT\1/ regexp to first look for /TEXT/ and then worry about getting the appropriate group match (be it greedy or reluctant)? This slowing down is a terrible shock some people might get (including me) when moving old code to new code. But on another note, I do agree I'm a long way from mastering regular expressions.
Re^5: The Deceiver
created: 2004-08-13 11:12:20
If I remove the backreference by changing the regex in my example to $n++ while ($s =~ /(.*?)RRRR/sg);, I get the following:
time ~/bin/perl5.8.0 reg.pl
500 matches

real    0m0.018s
user    0m0.010s
sys     0m0.010s

time ~/bin/perl5.6.1 reg.pl
1 matches

real    0m0.015s
user    0m0.010s
sys     0m0.000s

So at least in this case Perl 5.8.0 doesn't have a speed problem. I don't know exactly what's going on in your code though.

Re^6: The Deceiver
created: 2004-08-13 11:27:27
Sorry to reply to myself, but I think I found something. The problem seems to manifest itself more clearly when using the /i modifier and the regex fails. It seems the engine is wasting a lot of time normalizing case.

$s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyRRRyyyy\n" x 300;
$n = 0;
$n++ while ($s =~ /(.*?)RRRR/isg);
print "$n matches\n";

To summarize: 5.6.1: 0 matches, 0.32 s; 5.8.0: 0 matches, 2.2 s.

But note that if I change the regex to /x(.*?)RRRR/isg the results are reversed: 5.6.1: 9.2 s; 5.8.0: 1.4 s. That's because now 5.6.1 can't get away with the fake anchor. Interesting...

Re^5: The Deceiver
created: 2004-08-13 21:15:46
In a perfect world, code should be correct and fast. Perl's code used to be wrong and fast. Now it is correct and slow. Correct and slow is generally better than wrong and fast. In time it is likely to speed up again and we'll all be happy.

Unfortunately you began with the rude shock of seeing an amazing slowdown. Therefore while in other circumstances you might agree that you want the right answer, anything below the speed which you were accustomed to is bad.

On the specific optimization that you offer, you're right and wrong. You're right that you can optimize that one regular expression that way and it would be good for that regular expression. But it wouldn't speed up the one that you did want to run. Furthermore adding a check for that special case would slow down the compilation of every other regular expression out there (including the one that you wanted to run). Furthermore you've just added a code path that has potential bugs which might not get caught.

This is not to say that you never want to speed up special cases - of course you do and the regular expression engine has a lot of special tricks. But you have to balance out what is sped up by any one trick against how it slows other people down and causes opportunities for bugs to lurk.

That said, I'd like to point out why the optimization that you point out would not solve your problem. It would tell how to solve a particular expression that you weren't running. The one that you tried to run is different enough that the optimization would probably not run. What you actually would have benefited from is an optimization that says, "Check that there are no backreferences within the RE, then turn on the old special case optimizations." Which might or might not work out to be worthwhile. (And I do not wonder that japhy just chose to turn the optimization off rather than put a test that is that complicated in.)

Re: The Deceiver
created: 2004-08-13 09:33:50

Ok so japhy told you why that is slow, here's a way to make your code fast regardless - don't even bother with the capturing.

Capturing is always slow because it has to make a copy of the source string. $1, internally, is just substr( $safe_copy_of_match, $-[1], $+[1] - $-[1] ). So the largest speed hit (that I'm aware of) is the memory operation of making a safe duplicate of the data that was just matched. COW (copy on write) may mitigate this if/when it ever gets into perl.

Likely to be be fastest. This was my second thought.

my $whatever_index = index lc $text , $whatever;
return( substr( $text, 0, $whatever_index ),
        substr( $text, $whatever_index + length $whatever ) );

This may be the fastest. It was my third thought.

my $whatever_index = index lc $text, $whatever' ;
my $whatever_length = length $whatever;
return unpack "a" . $whatever_index . "x" . $whatever_length . "a*", $text;

This was my first thought. Use a plain regex to *locate* the thing in the string and then just substr() the equivalent of the captures out. This happens to be simplest to look at so it wins on the visual-complexity scale. This is a great general technique to avoid capturing on regexes and as such is a great post-bechmarking optimization.

if ( $text =~ /whatever/i )
{
    return( substr( $text, 0, $-[0] ),
	    substr( $text, $+[0] );
}
Re: The Deceiver
created: 2004-08-13 11:10:24
I must be overlooking something, but why wouldn't this code work to remove all instances of "whatever"...
open (FILE, "a.txt");
$/=undef;
$txt = ;
$txt =~ s/whatever//sig;


-- All code is 100% tested and functional unless otherwise noted.
Re^2: The Deceiver
created: 2004-08-13 11:20:31
Finding the things around 'whatever' is different than just removing 'whatever'.
Re^3: The Deceiver
created: 2004-08-13 11:52:25
My code was just a literal translation of the OP's remark...
"As you can see, this code slurps a file and removes all occurences of a certain word (`whatever')."


-- All code is 100% tested and functional unless otherwise noted.
Re^3: The Deceiver
created: 2004-08-13 13:09:17
Take a look at how that extract() routine is used a little more closely...
Re^2: The Deceiver
created: 2004-08-13 15:35:15
I was trying to make a point about the fact that this code runs incredibly slower on the prepackaged RH9 Perl 5.8.0 compared to the prepackaged (Mandrake 8 I believe) Perl 5.6.1. The example above was whipped up especially for this experiment, after a period of tracking down the exact pieces of code which were slowing down my original Perl programs.

Only after noticing that the =~ /(.*?) constructs were leading to neverending pauses in the Perl 5.8.0 code, did I realize that adding a ^ anchor would eliminate the inherent ambiguity (the /s switch was on). That's how I made this short example, in which I added the extract subroutine so I can get clear results in the DProf debugger and can make direct comparisons against the Perl versions. I was astounded to see that the slow ratio was not within 1.0 and 2.0 (meaning a tad slower), but somewhere between 500.0 and 1,000.0, explaining why buying new hardware was definitely more expensive than having somebody replace all /(.*?) regexps to /^(.*?) :).
Re: The Deceiver
created: 2004-08-13 13:02:57
So quick to blame the Perl community...

Perl 5.8.0 is slow on your system because Red Hat compiled it with threads and debugging turned on (which you didn't do in your 5.8.5 compile) and because they set the locale to use unicode and folded in a bunch of patches for unicode that were not in the official 5.8.0 release. This has been written about extensively. See the Red Hat bugzilla for more details. This was fixed in 5.8.1. The remaining slowdown of 3.7 is probably due to the regex change that japhy mentioned.

Re^2: The Deceiver
created: 2004-08-13 16:01:07
What would you think if a newer version of compiler/interpreter made your code fininsh in one day rather than in two minutes? I'm not really blaming anybody in the Perl community (it's not like I'm looking for a refund as such ;) ); if I were to blame somebody, I'd first blame myself for choosing Perl and then whoever wrote the code for not having shown more command in programming.

Anyway, what really matters to me is that now the code works as expected and overall, all my Perl programs run 20% faster under Perl 5.8.5, which really worths it all in the end. But what if it didn't... :D
Re^3: The Deceiver
created: 2004-08-13 17:09:46
What would you think if a newer version of compiler/interpreter made your code fininsh in one day rather than in two minutes?

The whole point of my post is that it wasn't caused by a new version of Perl, but rather by things that Red Hat did when packaging Perl for RH 8/9. The regex change was a difference in Perl itself, but it was also fixing a bug so it seems like a resonable choice to include it.

Slapping People For Help
created: 2004-08-13 13:29:57
I can tell you one thing: if IBM had written Perl, this would have never happened. Maybe there aren't enough alpha and beta testers, maybe developers don't have the time to write enough warning messages. What's certain is that Perl is not seen as a product, and the members of the community it attempts to serve are not being looked upon as customers. And that's the very difference between Open source and closed source software. What good is it's free, if it is deceiving its users about the problems it claims to solve?

Do you often find that insinuating that people are ignorant, malicious, sloppy, or stupid makes them likely to help you?

Re: Slapping People For Help
created: 2004-08-13 15:21:40
Maybe I was trying to convey too many ideas and feelings out of context (on one hand I'm carefully explaining the problem and seeking professional advice, on the other hand I'm criticizing the ones responsible). In the right context, anything can be made to sound the way it was intended, and I do apologize if this particular bit sounded too harsh.
Re^2: Slapping People For Help
created: 2004-08-13 17:50:49

I understand the frustration. It's sometimes difficult to remember that dozens of people have put thousands of hours into a project given away freely for other people to use when you find an apparent bug, but it's very wise to keep that in mind.

Your description of the problem was very good, though.

Re: Slapping People For Help
created: 2004-08-16 13:19:33
I think the point is that you don't need to worry about that if you're talking to a commercial vendor. It's a shift in attitude that anyone moving to F/OSS, hopefully, will get used to.
Re^2: Slapping People For Help
created: 2004-08-16 14:49:43

I've talked to proprietary vendors before. Maybe some don't cause you to worry, but those I can think of did not inspire me with confidence.

Re^2: Slapping People For Help
created: 2004-08-17 04:25:14

Barnraising your IT might be an interesting read.

Make sure to read the comments on sentiments about commercial vendors and contracts.

Makeshifts last the longest.

Re: The Deceiver
created: 2004-08-13 15:08:51
This is an issue of good Perl and bad Perl

You are correct. .*? is perhaps one of the least efficient singular regex constructs available. Why are you matching text you are not keeping, anyways? Are you unaware that there is an entirely separate construct (s/whatever//) made for removing text?

Have you not read the extensive perlre documentation for the product that you are using? Just because something is free doesn't mean that you automatically know how to use it right-out-of-the-box.

Also, if IBM had written Perl, it would probably take over a minute to start while it loaded its built in WSADIE plugins for J2EE development.

Re^2: The Deceiver
created: 2004-08-13 16:53:07
I believe that simple functionality like /(.*?) regular expressions must be consistent across small upgrades. Consistent in behavior and consistent in speed. As I said, I can deal with 3.7 factor, as other optimizations balance it out in the end, but not with 500. The fact that Perl 5.8.0 got mutilated in the RedHat 9 release is however a direct consequence of its being Open Source.

The fact that it happened deteriorates Perl's image and mitigates the effort of all people taking part in developing Perl. Probably, legally speaking, whoever was involved in redistributing Perl in RedHat 9 was breaking the very license under which Perl is being offered, abusing intellectual property -- knowingly or not knowingly. Maybe I was the last person on earth who found out Perl is very messed up by default on RedHat 9, but this still makes the Perl community indirectly responsible for not catering for the needs of its developers and allowing other people to distort its intentions.
Re^3: The Deceiver
created: 2004-08-13 17:15:09

I do not understand your reasoning. This loose collective that you are calling the perl community is supposed to police the entire tech sector to make sure that perl is implemented correctly everywhere? Even the 8000 pound Gorilla of Microsoft can't do something like that (I've seen pretty horrible stuff done with their tools). That is akin to saying the fact that your house is defective is the fault of the company who made the hammers that were used.

Re^3: The Deceiver
created: 2004-08-13 19:59:10

What do you expect? RedHat have packaged a completely broken version of GCC. RedHat have packaged problematically patched Linux kernels. RedHat have packaged a half-broken TeX distribution. That RedHat would seriously bork a Perl package seems hardly surprising.

It's not the GCC project's fault and not the Linux kernel team's fault and not the TeX project's fault that RedHat broke their software and it isn't the Perl5 porters' fault that RedHat broke their Perl package either.

Makeshifts last the longest.

Re^4: The Deceiver
created: 2004-08-15 09:10:50
To add few bits to your list,

... RedHat ships with badly packaged Tcl/Tk, (this was discussed in Tcl::Tk module development list and this makes supporting of Tcl::Tk considerably harder on RedHat)

Re: The Deceiver
created: 2004-08-13 16:49:19
I can tell you one thing: if IBM had written Perl, this would have never happened.

Not sure I agree with *that*... As an old mainframe programmer, I can tell you that when COBOL went from COBOL to VS COBOL II back in the late 80s, we practically had to recompile our entire mainframe library.

And it wasn't like it was obscure stuff... They did away with the EXAMINE statement, which was a staple of COBOL development.

It did away with the ON statement... and would no longer accept LABEL RECORDS...

Worst of all, the TRANSFORM statement vanished.

They had (supposedly) good reasons for making those kind of fundamental changes, but it didn't change the fact that COBOL, arguably the de facto programming standard of the time, was fundamentally changed long after it was a mature product.

So, don't be so sure that IBM wouldn't have done the same thing... they've done it before :)

Trek

Re^2: The Deceiver
created: 2004-08-13 17:24:53
I agree with your analogy, it's pretty much like what Perl 6 is going to be to Perl 5. But you can't complain about it when words like `backward compatibility' do not appear in the upgrade manual. What if all these COBOL statements had been kept, but would have instead run much slower, because on the new COBOL version their meaning had extended and many new checks were necessary?

Or worse, as in this case, the old code touched a bug which made things run too fast, leading to unrealistically high speed expectations from the developer (and indirectly the management). I used the IBM example because they would have taken care of the following: they would've not fixed a bug whose fixing would seriously slow down all previous code (without making a big thing about it and write self-healing code) and they would've not let their product be mutilated by any redistributor.
Re^3: The Deceiver
created: 2004-08-13 17:40:09
What if all these COBOL statements had been kept, but would have instead run much slower, because on the new COBOL version their meaning had extended and many new checks were necessary?

There was a *little* of that... it wasn't so much that code ran slower, but the amount of memory that was taken up by the reserved levels (77 level, if I recall correctly... it was a *long* time ago...) did increase, which had it's own set of side effects.

But I'll at least grant that we had prior warnings, and did receive a transformation guide from IBM

Trek

Re^4: The Deceiver
created: 2004-08-14 06:42:34
And probably had to pay BIG money to get this "excellent" service.

In some places IBM is still best remembered as the company that made typewriters which came with a service contract that guaranteed they were replaced or repaired within 24 hours. They broke down a lot, but were repaired/replaced even faster. The price of the service oontract was such that you actually paid for a new machine every two years!

Wonderful piece of company PR: how to turn bad quality and expensive servicing into a strong selling point!

At least we don't have that with Open Source!

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Why does a Perl 5.6 regex run a lot slower on Perl 5.8?
created: 2004-08-15 00:11:48

I am going to guess that you have a UTF-8 LANG set in your environment.

A consequence of this in 5.8.0, but not in 5.6.1 or 5.8.5, is that your file is implicitly opened as UTF-8. This may seem minor, but because you included the /i modifier, it probably slowed it down a lot, since case insensitivity in Unicode is a lot more complicated. You could test this by modifying your environment and rerunning, or by explicitly opening the file as latin-1, or by removing the /i.

You seem to discount the speed up you saw between 5.6.1 and 5.8.5 with your second regex version. I don't think this is really fair. I suspect that the regex engine really is faster in the later versions, when they are actually doing the same thing.

The problem really seems to be that due to some subtleties in how certain things work in different versions of perl, the regex engine is not doing the same things in each of your cases. Since you are so willing to criticize the Perl community, I will gladly turn around and criticize you. This is not particularly obscure information. It's pretty well explained in perldelta, perlunicode, and other man pages. You apparently made the decision to upgrade perl versions without taking the time to research what changed. 5.6 to 5.8 is not a minor change: there are significant changes between the two which you would have been well advised to consider before making the switch.

Furthermore, did you even stop to wonder why there were additional functions being called in one case and not the others? Don't you think this ought to have been a clue that things were not as simple as you would like to think?

Re^2: Why does a Perl 5.6 regex run a lot slower on Perl 5.8?
created: 2004-08-15 09:51:50
I am content about the one thing that I find relevant: my Perl codebase was successfully ported to Perl 5.8.5. My blaming the Perl community and other people's blaming me have both proved besides the point and their only merit -- artistic at most. As seems to always be the case, those people who had only techical points to make were the most useful.

perlmonks.org content © perlmonks.org and Aristotle, chromatic, CountZero, Courage, diotalevi, itub, japhy, Jenda, jryan, kscaldef, perldeveloper, perrin, PhilHibbs, sleepingsquirrel, synistar, tilly, TrekNoid

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03