I'm getting the warning Use of implicit split to @_ is deprecated on this line:
$entries=split(/\s+/);
I really just want to count the elements there, I'm not interested in the resulting array. So what's the most elegant way to get the number of entries and throw away the resulting array (and remove the warning)?
Thx
I.
2006-05-04 Retitled by [Arunbear], as per consideration
Original title: 'strip into @_ deprecated'
Here is one way to do it.
use strict; use warnings; $_ = 'here the text goes'; my @entries; print scalar (@entries = split /\s+/, $_);
Without return list, here is one way
use strict; use warnings; $_ = 'here the text goes'; my $count; print $count = $_ =~ s/\S+//g;
updated: Added second method as [blazar] pointed out, without return list, though not a most elegant way. Thanks.
Prasad
"So what's the most elegant way to get the number of entries and throw away the resulting array?"
I think he means: "discarding the return list, retaining only its lenght".
print $count = $_ =~ s/\S+//g;
Thx a lot
Why does a simpleprint $count = s/\S+//g;
This (as suggested below) did not count anything either:
$count = () = s/\S+//g;
Although this may occasionally work for you in this circumstance, it's not logical to modify the original string just to count the number of occurrences. Just use /\S+/g instead.
$_ = 'Hi There! x'; my $entries = () = /\S+/g; print $entries;
The so called "[wp://goatse] opearator":
=()=
BTW: if you don't know what [wp://goatse] is, then chances are you don't want to!
Incidentally, dou you really want \s+? The default, which is ' ' is a special case and does what you mean in the vast majority of cases.
$ perl -MO=Deparse -e '$a = () = split'
$a = () = split(" ", $_, 1);
GAWD! Well, the fact that you write "optimized" yourself suggests that it is really an unwanted side effect of an optimization... may I push it as far as to dare to say that it is a bug?
Well, another trick that I verified not to be flawed is:
my $count=map $_, split;
of course it doesn't just taste as good... hmmm, how 'bout:
my $count=+(split); # ?!?
(also verified!)
$ perl -lpe '$_=+(split)' foo 1 bar baz 2
well, probably not, as it is well documented on perlfunc
A workaround is to set the limit explicetly to undef (though, it generates a warning):
$count = () = split ' ', $_, undef;
amazingly, setting it to 0 doesn't work:
$ perl -MO=Deparse -e '$a = () = split(" ", $_, undef)'
$a = () = split(" ", $_, undef);
$ perl -MO=Deparse -e '$a = () = split(" ", $_, 0)'
$a = () = split(" ", $_, 1);
A better workaround that doesn't generate warnings is to use a zero-but-true value:
$count = () = split ' ', $_, '0e0';that is parsed as:
$ perl -MO=Deparse -e '$a = () = split(" ", $_, "0e0")'
$a = () = split(" ", $_, '0e0');
Well yeah, but aren't you just back to square one?
C:\test>perl -nwle"print $n = +(split)" Use of implicit split to @_ is deprecated at -e line 1. Name "main::n" used only once: possible typo at -e line 1. foo 1 foo bar 2
D'Oh! I forgot to -w when I tested it. It's incredible how annoying this little thing can be!!! All in all I would regard the assignment to @_ and the connected warning as spurious, since split is not called in void context. But... they're there!
$ perl -MO=Deparse -e '$a = () = split(" ",$_,0)'
$a = () = split(" ", $_, 1);
$ perl -MO=Deparse -e '$a = () = split(" ",$_,100000)'
$a = () = split(" ", $_, 100000);
Btw - just out of curiosity put this version in the benchmark also - and it's faster than the regexp version, but still slower than $n = @{[ split ]};
I.
b) hm, I just wanted to catch spaces and tabs and thought \s+ most appropriate.
Never seen =()= in the perldoc before :-/
I.
Because it's not an operator of itself. It's an assignment to a list further piped into another assignment. It's just a means to create a list context. Others may find a better wording to describe it: possibly mine is not as technically accurate as it could be. Unfortunately as others already explained, it's not reliable to use it with split.
You should not be sorry. Even if the topic seems trivial and elementary, it turned out to be more complex than one would probably think, and thus the discussion has been very interesting.
BTW: to insert a link you should use [id://527973] or [id://527973|here], which render like Perl Idioms Explained - my $count = () = /.../g and here respectively. This is the preferred way since they will bring up the correct link both if you're in http://perlmonks.org and http://www.perlmonks.org, or any other possible mirror. See this node for more info.
my $entries = 1; s/\s/$entries++/eg;
No, no, no, that would modify the original string in a most probably unwanted way. And if you really wanted to do it, then probably it should have been \s+. But you do not want to do so: a match would be better suited.
my $str = "this is a test string"; my $cnt = 1; # num tokens = whitespace + 1 ++$cnt while $str =~ /\s+/g; print $cnt, "\n"; # prints 5
-- [189756|Tanalis]
#include [http://www.liquidfusion.org.uk|www.liquidfusion.org.uk]
The =()= does work with matches:
$ perl -lpe '($_=()=/\s+/g)++' foo 1 bar baz 2 foo bar baz 3
But I would use [doc://split], especially with the smart behaviour provided by the default ' ' argument.
That fails when input has leading or trailing blanks:
$ perl -lpe '($_=()=/\s+/g)++' ## leading foo bar 3 $ perl -lpe '($_=()=/\s+/g)++' ## trailing foo bar 3 $ perl -lpe '($_=()=/\s+/g)++' ## both foo bar 4 $ _
It's better to count ocurrences of actual elements (\S+):
$ perl -wle 'print scalar (()=/\S+/g) for "a b c", " a b c", "a b c ", " a b c "' 3 3 3 3 $ _
Anyway I prefer [doc://split] too, like [id://446266]++'s [id://547122|0e0 solution].
--
David Serrano
I know. My entire point was not to maintain a counter manually a' la
++$cnt while $str =~ /\s+/g;
Well done to point out about \S anyway, since one often forgets about \S, \D and \W.
Another way that avoids a named array.
$entries = @{[ split /\s+/ ]};
Also, split /\s+/ is the similar to as the slightly magical split ' ', except undefs from leading whitespace are suppressed.
In turn, split ' ' is the same as [split] with no arguments, so you could reduce your code to:
$entries = @{[ split ]};
If you don't have leading whitespace, or don't want to count the undef any leading whitespace would produce as an entry.
No need to reference-dereference: see the last suggestion in Re^3: Counting the number of items returned by split without using a named array. If only I had thought of it earlier... well, it has been extremely interesing to learn that the =()= trick wouldn't work with split anyway...
$entries = @{[ split ]};
$entries = () = /\S+/g;
I.
Just out of curiosity (it doesnt really matter in my case): which one would be the more (CPU- and memory-) efficient one?
It hardly ever matters. As a wild guess I would say that since the former involves doing something and then undoing it and that something is taking a reference, it is more computationally intensive. In case of doubt
use Benchmark;
I may well (and happily!) prove wrong...
$ cat foo.pl ; echo "--------";echo; ./foo.pl
#!/usr/bin/perl
##################
use Benchmark qw(:all) ;
sub test1(){
$entries = @{[ split ]};
}
sub test2(){
$entries = () = /\S+/g;
}
$_="This is an example string with several words bla bla bla\n";
$count=1E+7;
timethis ($count, "test1()");
print "------------\n";
timethis ($count, "test2()");
--------
timethis 10000000: 13 wallclock secs (12.37 usr + 0.01 sys = 12.38 CPU) @ 807754.44/s (n=10000000)
------------
timethis 10000000: 7 wallclock secs ( 6.66 usr + 0.00 sys = 6.66 CPU) @ 1501501.50/s (n=10000000)
Adding use warnings; results in
Use of uninitialized value in split at foo.pl line 9.
Hmmm... adding (-l and) print $entries to both subs results in:
Use of uninitialized value in split at foo.pl line 9. 0 Use of uninitialized value in pattern match (m//) at foo.pl line 14. 0
Now,
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all :hireswallclock);
my $str="This is an example string with several words bla bla bla";
sub test1 () {
local $_=$str;
my $entries = @{[ split ]};
}
sub test2 () {
local $_=$str;
my $entries = () = /\S+/g;
}
cmpthese -10, {
deref => \&test1,
goatse => \&test2,
};
__END__
results in
Rate goatse deref
goatse 16325/s -- -42%
deref 28017/s 72% --
C:\test>p1
our $s = join ' ', 'aa'..'zz';;
cmpthese -3, {
split => q[ $_=$s; my $n = @{[ split ]}; ],
regex => q[ $_=$s; my $n = () = /\S+/g; ]
};;
Rate regex split
regex 546/s -- -50%
split 1102/s 102% --
Assuming I didn't goof on the benchmark [split] appears to be quicker.
$ cat a.pl; echo '-----------' ;echo;./a.pl
#!/usr/local/bin/perl -w
use Benchmark qw(:all) ;
our $s = join ' ', 'aa'..'zz';;
cmpthese( -3, {
split => q[ $_=$s; my $n = @{[ split ]}; ],
regex => q[ $_=$s; my $n = () = /\S+/g; ]
});
-----------
Rate regex split
regex 981/s -- -53%
split 2098/s 114% --
Hm, now where's the big diff to the other Bench?
Try switching warnings on in your first benchmark. It will probably explain the difference :)
perlmonks.org content © perlmonks.org and Anonymous Monk, blazar, borisz, BrowserUk, Hue-Bond, lima1, prasadbabu, salva, smokemachine, Tanalis
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03