Perl's ++ operator seems to behave like this, when the "watching" is done with a subroutine call.
Consider the output of
$m = 20; print ++$m + $m++;and
$m = 20; print noop(++$m) + $m++;
sub noop{ return shift }
What's going on here? If you'd like to see how arcane your knowledge of Perl really is, try guessing what this prints without reading ahead or evaluating the code.
For those with way too much time on their hands, this document and the code samples it refers to can be found at at http://cs.marlboro.edu/talks/increment_weirdness/.
Before going into any detail, I'd like to make it clear that this is an academic exercise only.
Steve Oualline has a nice description of the use of increment operators and their side effects in his book "Practical C++ Programming", pg 79. Essentially he says that if you want to understand how these tricky increment expressions work, the right answer should really be
"If you don't write code like this, then you don't have to worry about these sorts of questions."
Even though clearly none of *us* would write code like this (ahem), a friend (Mark Francillon) gave me some expressions like (++$m + $m++) as a puzzlers, and I was curious enough to look into it.
To really understand the pre and post increment ops, I wrote my own preInc and postInc subroutines, doing what I *thought* these operators were supposed to do.
# This is an attempt at emulating ++$m with preInc($m)
sub preInc {
$_[0] = $_[0] + 1; # Increment input argument (the side effect),
return $_[0]; # and return the new, incremented value.
}
# And this is an attempt at emulating $m++ with postInc($m)
sub postInc {
my $temp = shift; # Remember original value,
$_[0] = $_[0] + 1; # increment input argument (the side effect),
return $temp; # and return the old, un-incremented value.
}
my $m = 20;
print preInc($m) + $postInc($m); # This prints 42.
# The final value is $m is 22.
This all makes perfect sense to me. What's going on here is
If this was the whole story then I wouldn't be writing all this down. However, the value returned by the Perl interpreter is *not* 42, but 43. If you don't believe me, try it for yourself.
my $m=20;
print ++$m + $m++; # This prints 43 !
# The final value is $m is still 22.
And then I started pulling out my hair.
My first attempt at understanding what was going on was to write a subroutine that would examine the intermediate results. You can find all the gory minutia in increment_detail.pl.
# Print values and addresses of passed argument and $m.
sub look {
print "look was passed '" . $_[0] . "' at . \$_[0] . ".\n";
print "while \$m is '" . $m . "' at " . \$m . ".\n";
return $_[0];
}
my $m = 20;
my $p = look(++$m) + look($m++);
print $p;
But that's where the quantum weirdness popped up. After a variety of attempts it became clear that any subroutine call wrapped around (++$m) changes the result of the calculation to 42.
sub noop { # do nothing
return shift;
}
my $m=20; print noop(++$m) + $m++; # This prints 42 !
So if I tried to watch it do this weird thing, it wouldn't do it.By now this felt like a conspiracy.
Another friend (Brandt Kurowski) suggested using operator overloading to watch the intermediate steps, without disturbing the calculation. This works, and has helped me understand what's going on, but hasn't quite answered all my questions.
See IncrementOverload.pm for that analysis.
==== increment weirdness: ++$m + $m++ ==========
m = 20 at 0x80ab23c
p = ++m + m++
*** inc 0x80ab23c : 20 --> 21
m is 21 at 0x80ab23c
*** copy 0x80ab23c --> 0x804c120
m is 21 at 0x80ab23c
*** inc 0x804c120 : 21 --> 22
m is 22 at 0x804c120
*** add : 22 at 0x804c120 + 21 at 0x80ab23c = 43 at 0x80ab11c
m is 22 at 0x804c120
p = 43 at 0x80ab11c
The discussion on pg 357 and thereabouts of the Camel describes some
of the inner workings of the increment operators; in particular, if
there's more than one pointer to something then it makes a copy first
and increments the copy. Thus to overload ++, you must also overload
the copy operator. In the listing above, the "inc", "copy" and "add"
lines are printed out by overloaded subroutines, all invoked while
evaluating ++$m + $m++.Here's a blow by blow account.
The calculation starts out as I'd expect, with ++$m incrementing $m (..23c) in place.Staring at all this long enough gives the gist of *how* Perl gets 43 : ++$m evaluates to $m itself, which is incremented *again* by $m++. So by the time the addition operation actually needs a value for the term on the left, its value is 22.Since the left term of the sum is going to be needed later, I'm guessing that another name (pointer), say left_sum, is also now given to that (..23c) value.
The second increment, $m++, now sees something with several names, and so makes a copy (..120) before proceeding. The original address, namely (..23c) which now contains 21, is given another name, something like right_sum, to be used later when the terms are added together.
The copy is then incremented to 22 (..120), which is the final value of $m.
In the last addition step, the left_sum pointer has been apparently been carried along with the renaming of $m; that is, by the time the sum is evaluated its value is 22.
On the other hand, when ++$m is wrapped in a subroutine call, left_sum isn't "carried along" in this way; left_sum remains 21 after $m gets to 22.
How exactly the addition operation keeps track of what it's going to be adding isn't entirely clear to me; the devil seems to be in the details of what the thing I'm calling left_sum ends up referring to.
There are more comments at the end of Increment.pm, along with printouts for some other (perhaps illuminating) variations.
After all this analysis on *how* this works, I'm still left with one big question.
Why is 43 the right answer?Clearly Perl's answer must be the right one, and so I'm sure there's some really good reason why the operators *should* behave this way in this context - I just can't quite get my head around what that reason might be. :)
Anyway, it was fun trying to look a bit under the hood.
- barrachois
Fun.
The reason that using your preInc() sub changes the result is because the value it returns is a copy of $m's value at the end of the call to the sub.
By contrast, the value of $m used by the + operator in the non-sub statement is the value of $m at the point after both the subexpressions involved in the + operation have been evaluated.
I assume that the order of the evaluation is a result of converting the expressions to their reverse polish form. Clear as mud, but this is (roughly) equivalent
$m=20;
$r = \$m;
print ${ $$r += 1; $r } + ${ my $t=$$r; $$r += 1; \$t };
43
At least, it produces the same result and is close enough for my mental processes to get a notional handle on the mechanism.
Consider the output ofThis is trivially explained, and I'm amazed there are still people giving this any serious consideration. Let me spell it out once more, in easy to read letters:$m = 20; print ++$m + $m++;and$m = 20; print noop(++$m) + $m++; sub noop{ return shift }What's going on here?If you'd like to see how arcane your knowledge of Perl really is, try guessing what this prints without reading ahead or evaluating the code.
Modifying the same variable using auto-increment/decrement more than once in the same statement gives undefined behaviour.
The first sentence in perlop.pod regarding auto-increment and auto-decrement reads:
"++" and "--" work as in C.And there you have it. Works as in C. And in C, using it twice on the same variable in the same statement gives undefined behaviour.
Abigail
I'm sorry that I'm the one that has to tell you this, but people start programming the year 2004 and I believe we can expect more people to start programming in the future. I bet someone somewhere just recently wrote his first hello world program. So I'm not surprised that issues like this is brought up again. Most of us has different stages we go through as a (Perl) programmer. Giving this any serious consideration may be one of them. This isn't a trivial issue for many, and trying to understand how it currently works is something that should be encouraged IMHO as it'll give greater understanding for how Perl works, even though the behaviour isn't defined. All kinds of enlightenment should be encouraged and not ridiculed.
And there you have it. Works as in C.
Unfortunately, not everyone knows C.
You have apparently spelled this out several times, so I recommend you to send a documentation patch for perlop that clarifies what "as in C" really means if this upsets you so much. Then you've done some real good. Frankly, I don't see what there is to get upset about. He is just a guy that wants to explore Perl.
ihb
The documentation is complete. The problem isn't a problem. That the documentation refers to other documentation may be annoying, but it is complete. And, whether you like it or not, every single person who worked/works on the Perl sourcecode and nearly everyone who worked/works on the Perl design is fluent in C and has no problem with statements like that. In other words, any single person who would have ever worked on the documentation is fluent in C and has no problem with statements like that.
. . . so I recommend you to send a documentation patch for perlop that clarifies what "as in C" really means if this upsets you so much.
Ah, but he's not the one who's upset by the documentation. You are. He (and I) is upset by your unwillingness to read said documentation.
An analogy - if you were to pick up a textbook on calculus, it would assume you knew how to do algebra. Would this bother you? What if it said "The edge cases of this transformation are similar to this transformation as done in Set theory."? Because, frankly, that's what perlop is saying. "The edge cases of the pre- and post-increment operators are the same as in C."
Let's look at something else - the reason why it's the same as in C. Because Perl is implemented in C! Do you really think the Perl developers reinvented the wheel here? No! They delegated to the source language, because that's the behavior they were looking for. And, if the ANSI C standard ever changes how this edge case behaves, then Perl will have that change as a matter of course. Hence, the reason to refer to the C documentation. It's very OO in its thinking. We inherit from C and delegate back to C.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
Second, your analogy has a fallacy: algebra is a pre-req for calculus, but C is not a pre-req for Perl. A person going into Perl first of all languages is not expected to know C (or sed, or awk, or sh!). Therefore, depending on just what behavior it is they have to refer to C for, they might not understand what it is they're reading, because it'd be in a C context, not a Perl context. Besides, how much of Perl behaves like some language from which it is derived? Why don't we bundle the docs from those other languages with Perl, so Perl's docs can merely cross-reference theirs?
One problem I see is that while Perl is written in C, Perl does things differently than C. For one, Perl handles this pre-inc and post-inc thing differently. The documentation says the behavior is undefined, but it had better NOT say the behavior is undefined, "just like in C", because it is obviously not. I think Perl's docs need updating, in this case. I think the behavior is definable, we just haven't written it down to define it.
Instead of yelling at the original poster for venturing into the forbidden zone... I don't know. Something. Just don't chastise him for experimenting with the language and reporting his findings. He's learning. Let him learn.
Just don't chastise him for experimenting with the language and reporting his findings. He's learning. Let him learn.Oh, goodie. Now that's helpful! Anyone who's experimenting with the language, we just let them go, and admire them from a distance, keeping silent.
And then we wonder why people stop programming in Perl because their production code failed in mysterious ways. They walked into a swamp, and we let them, cause that will "teach them".
Now please excuse me, I've some "dangerous curve" signs to remove.
Abigail
Certainly, tell him not to use that kind of construct in his production code, and tell him that the documentation does not provide a defined behavior for it, but don't be angry or impatient with him.
And you've blended "learning" with "doing in practice". Just because I learn something (like how crazy this ++X and X++ business is) does not mean I'm going to employ it. Some "swamps" need to be investigated for signs of life, but that doesn't mean the scientist needs to live in the swamp.
Abigail
No social workers, no, but you could have made your point without resorting to being insulting; I quote...
"This is trivially explained, and I'm amazed there are still people giving this any serious consideration. Let me spell it out once more, in easy to read letters: "
Now if that's not equally rude, patronising and unhelpful then I don't know what is. A simple "The behaviour is undefined, see Perldoc for reference." would have done nicely. Furthermore, when your abrasive tone is questioned, you resort to abuse...
"What the fuck do you want from me? I explain what's wrong, including quotes from the documentation, and still you're not satisfied."
Your skill or knowlege is not being questioned, your social skills are. A terse, to-the-point answer is a good thing, but being rude isn't.
"Stercus! Dixit Pooh. Eeyore, missilis lux navigii heffalumporum iaculas. Piglet, mecum ad cellae migratae secundae concurras."
I think you don't understand the idea behind the concept of "undefined behavior" in language specifications. It doesn't mean that the language will use a random number generator to give you an unpredictable result every time. In most cases, you'll find that the behavior is perfectly reproducible, and you could think of documenting it. Don't do it! Undefined behavior means that compiler writers has the freedom to choose whatever behavior they want, and the users of the language SHOULD NOT rely on the way the behavior is "defined", because the compiler writer has the freedom to change it in the next version.
It's just like saying: cool, this module has this interesting undocumented variable! Let's use it! If you use it, it's under your own risk. The documentation is like a contract. And please don't send a documentation patch to the author before making sure that it's not undocumented for a reason.
This behaviour has been under discussion as something just may well change. There was a discussion between Juerd and hv once where Juerd said something something the current way being the only sane way, really, and then hv countered with something else reasonable (this is all perl6 mind you) and noted that this behaviour is undefined for a reason.
The observed behaviour is an artifact of the implementation - different implementations are allowed to have different behaviours.
The documentation is complete. The problem isn't a problem. That the documentation refers to other documentation may be annoying, but it is complete. And, whether you like it or not, every single person who worked/works on the Perl sourcecode and nearly everyone who worked/works on the Perl design is fluent in C and has no problem with statements like that. In other words, any single person who would have ever worked on the documentation is fluent in C and has no problem with statements like that.There are several incorrect statements there, but I'd just like to say that I at least have a problem with such statements in the documentation. They don't belong, and should be fixed; it's just a very low priority to me.
Well, like it or not, there is a body of knowledge out there.
When the manual says that the behaviour of something is udefined, then it also implies " and it may come back to bite you if you try to use it".
Enlightenment is welcomed, but the answer to the question is clear, it is documented. It is written in plain English for all to see.
Right, my point was that one shouldn't read documentation. sigh
I could try to explain my point again more clearly, but I don't know how.
Enlightenment is welcomed, but the answer to the question is clear, it is documented. It is written in plain English for all to see.
And I wrote my argumentation in plain English for you to see, but you still didn't get it. And that's the problem, isn't it? What's clear for you may not be clear to someone else.
If you take the time to read my post again, please read it in the context of Abigail-II's reply. Argumentation is always in a context.
ihb
Any self-respecting programmer knows C. It is only the most important and influential language of the last 25 years.
And understanding how it is working currently is valuable in forming an in-your-head model of what perl actually does with your code.
(Notwithstanding that, constructs with undefined behavior should not be used in real code.)
Good morning Abigail.
Looking up the word "behaviour", I find that it relates to "actions" and "reactions". Things that have happened. Actions that have taken place. Past tense.
Saying that behaviour is undefined, makes no sense. Until something has happened it isn't behaviour.
If you predict that a certain thing will (future tense) display a certain behaviour, and you get it wrong. It didn't display the "wrong behaviour", or "no behaviour". The action the the thing displayed (past tense) was the behaviour. You were simply wrong in your prediction.
Therefore, you can't say a "behaviour is undefined", because an action or reaction doesn't become behaviour until it has happened, and when it has happened, it can be measured, and is therefore not "undefined".
It may not be predictable--though in this case, I (rashly) bet the outcome has been the same since Perl 1; maybe you would confirm that?--which kinda makes it a bit predictable.
It may also be unexplainable. Which may explain why the explainations are so lacking.
It is, however, OBSERVABLE! Huge red font omitted.
If a phenomena can be observed, it can be questioned. The strongest defining impulse of the human being over our non-sentient co-inhabitants, is the ability and need to ask the question WHY?.
You may well be correct in reaching your conclusion that asking why is not "worth your effort" or "likely to result in a you reaching a satisfactory conclusion", but that is your conclusion. It satisfies your criteria.
Is it your assertion that because you have reached a conclusion that satisfies you, that noone else in the entre world is allowed to think about this, or discuss it any place where you might observe that discussion?
I think the more accurate Comp-Sci phrase is "not well defined".
For any given implementation, you can tell what will happen, because it is very unlikely to vary from what happened last time on that same implementation.
It may well vary with other implementations.
The implementers may (and do) choose not to define the behaviour.
Mostly, because if they do define the behaviour, then
Which if the behaviour is generally non-useful, and is likely to be influenced by things (like compilers) beyond their control, then the extra work that this would commit them to is not worth it.
Also a waste of energy for what is essentially a non-useful behaviour.
But given the OP's disclaimer; statement that this was purely an academic exercise; etc.; his being condemned on the basis of a non-sequitous statement, and a short phrase in an obscure document that the behaviour "is same as C"--where it may be defined as not being well-defined, but my attempts to located such definition in K&R or Waite Group's guide haven't located it--hardly seems justified.
For any given implementation, you can tell what will happen, because it is very unlikely to vary from what happened last time on that same implementation.
Does that even apply to the order that keys %foo return the keys of %foo?
ihb
Yes and no.
Unless I use srand, print rand() for 1.. 100; will likely give me different output for each run of the program. So no, the output will not be the same. Yes, the output will be 100 random(ish) numbers between 0 and 1.
The return order of keys %hash since 5.8.something is deliberately randomised, so the definition of the order is that they will be random.
This is different to pre-5.8.something in that the order was previously "an order", that was neither random nor easily predictable, but was consistant between runs of the same program on a given build.
The order is both defined, and explainable.
In the case of keys %foo, the results are defined: you will get all keys. The order they're returned is defined but poorly so: it's a mistake for documentation to say they are returned in "random order," because the order has nothing to do with a PRNG. It would be better to say the order is not defined, as that is more accurate. Tuesday's compiler on John's machine will produce something inconsistent with your own experiments.
In the case of ++$m + $m++, the results are undefined: no documentation claims to give an authoritative answer on such expressions, and some documentaion will specifically disavow any predictable results.
For those who do not program in C, here's the gist of the action. The do { my $x = $m; $m++; $x } mechanism is not a sufficient postincrement operator, because real post-incrementation happens after the rest of the expression. However, the specific moment that is defined to be after the rest of the expression is not clearly defined; various versions of a C compiler will differ, and various hardware platforms may also have something to say about it. It's undefined.
--
[ e d @ h a l l e y . c c ]
That is not entirely the case here though. See the following code and output.
my $f = 1; print '$f = ', $f, ' and $f + $f++ = ', $f + $f++, "\n"; my $g = 1; print '$g = ', $g; print ' and $g + $g++ = ', $g + $g++, "\n"; __END__ $f = 2 and $f + $f++ = 3 $g = 1 and $g + $g++ = 3
This is not the expected behaviour in C and i'm only auto-increment/decrement a single time.
I think the reason it works this way is simpler than you make out. The important thing is the implementation of the preincrement operator: do { $m = 20; ++$m } does not return the value 20, rather it returns the variable $m.
Therefore the sequence of operations reduces to:
$m = 20; # $m == 20 ++$m; # $m == 21 $temp1 = $m++ # $m == 22, $temp1 == 21 $temp2 = $m + $temp1 # 22 + 21 = 43 print $temp2
Hope this helps,
Hugo
my $x;
# unless overloaded, numeric values of two refs to the same thing are equal
{package X; use Carp; sub is_x { carp \$x == \$_[0] ? "ok" : "not ok" }}
X::is_x($x);
X::is_x(++$x);
X::is_x($x++);
X::is_x($x+0);
Part of what surprised me was that "return" gave me back a copy, so my "noop" subroutine should actually be called "copy".
sub noop {
X::is_x( $_[0] ); # ok
return $_[0];
}
X::is_x( noop($x) ); # not_ok
UpdateThe plot thickens.
I see over on [Aliasing and return, how does return work?] a discussion of just this issue, with a suggestion by japhy to use lvalue subs.
A quick test shows that one can indeed return $x itself this way - but the increment expression is still altered by a call to this kind of subroutine, too.
sub noop_lval :lvalue { $_[0] }
X:is_x( noop_lval($x) ); # ok !
$m=20; print noop_lval(++$m) + $m++; # 42 (still)
$m=20; (++$m) + $m++; # 43
So I guess the copy that "return" makes isn't
the whole issue.
You can follow the results by realizing that post-increment returns a value, while pre-increment returns an alias to the variable. That makes subsequent operations of high enough precedence turn up in pieces that have already been evaluated.
There is no reason that perl must do it that way. Perl could do away with the C restriction, which is there for good C reasons, and return values for each increment operation. That would make perl do what you expected. But for forseeable perl, as has been said (and shouted), "Don't do that!"
After Compline,
Zaxo
I applaud the determination and curiosity of the OP, its a fascinating tale, and only misguided in retrospect of knowing that Perl itself does not know the answer. If nothing else a great amount is learned about Perls treatment of variables. On another note I see no reason for Perl to maintain this. A language is what you DEFINE it to be. Perhaps Perl 6 will be bold enough to say 'this is the order of evaluation, this is the behaviour, anything else is wrong' and have done with it. IANA compiler writer btw, dragon books scare me, maybe I'm missing a really good reason, but in my mind its just a question of definition and sticking to it. On another note, Barrachois approach is typical empirical programming, good stuff, tweak it and see, try and break it and find out how it works. If you never encountered a problem like that before you wouldn't know what 'Behaviour is undefined' even means in the documentation.
Fun indeed! Interesting philosophy. Undefined but consistent is oxymoronic, if its consistent (not just constrained) then its defined. The subtlety is that we _really_ mean not defined across platforms and implementations. The platform, processor architecture, C compiler used and a host of other things add variables (that may as well be random) into the equation that defines the statements behaviour.
While this is a semantic argument, it is not as straightforward as you might think. The term 'undefined' in comp-sci lingo comes from what electrical engineers designing digital circuits call 'undefined', the so called "don't cares" in a truth table. It's not that these "don't cares" will not take on a value if the circuit happens to somehow land in that state -- it's just that the designer doesn't care, and will not gurantee at all the value of that state.
So when you're reading about a language and it says "undefined", it is with a capital 'U' and can be read as "not guaranteed", as in "this is the part of the contract of the API".
The designers of said language would probably slap you upside the head if you said "but it's consistent, therefore it's defined", mostly because in their lingo the indeterminate variables that you yourself mention are all taken seriously when incorporated under the umbrella of a strict definition such as "undefined". And you are correct that, on the hardware level, the influences can be random. These are generally avoided via the use of guard circuits -- see here for a discussion regarding guard clauses.
Cheers,
Matt
perlmonks.org content © perlmonks.org and Abigail-II, andyf, Anonymous Monk, barrachois, BrowserUk, diotalevi, dragonchild, Elgon, eric256, gam3, halley, hv, ihb, itub, jacques, japhy, jdtoronto, mojotoad, southampton, ysth, Zaxo
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03