Counting the number of items returned by split without using a named array
Anonymous Monk
created: 2006-05-03 05:46:10
Hi all,

I'm getting the warning Use of implicit split to @_ is deprecated on this line:

   $entries=split(/\s+/);

I really just want to count the elements there, I'm not interested in the resulting array. So what's the most elegant way to get the number of entries and throw away the resulting array (and remove the warning)?


Thx
I.

2006-05-04 Retitled by [Arunbear], as per consideration
Original title: 'strip into @_ deprecated'

Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 05:58:55

Here is one way to do it.

use strict;
use warnings;

$_ = 'here the text goes';
my @entries;
print  scalar (@entries = split /\s+/, $_);

Without return list, here is one way

use strict;
use warnings;

$_ = 'here the text goes';
my $count;
print $count = $_ =~ s/\S+//g;

updated: Added second method as [blazar] pointed out, without return list, though not a most elegant way. Thanks.

Prasad

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:53:53

"So what's the most elegant way to get the number of entries and throw away the resulting array?"

I think he means: "discarding the return list, retaining only its lenght".

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 08:53:30
Ha, great
print $count = $_ =~ s/\S+//g;

was probably what I wanted - just didnt know you could do that with a simple search (although I wondered if one couldnt just use a search in some way)

Thx a lot

Why does a simple
print $count = s/\S+//g;

not work, though?

This (as suggested below) did not count anything either:

$count = () = s/\S+//g;

I.
Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:02:58
I use m//g bellow __not__ s///g.
Boris
Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:11:07

Although this may occasionally work for you in this circumstance, it's not logical to modify the original string just to count the number of occurrences. Just use /\S+/g instead.

Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:02:30
$_ = 'Hi There! x';

my $entries = () = /\S+/g;

print $entries;
Boris
Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:07:13

The so called "[wp://goatse] opearator":

=()=

BTW: if you don't know what [wp://goatse] is, then chances are you don't want to!

Incidentally, dou you really want \s+? The default, which is ' ' is a special case and does what you mean in the vast majority of cases.

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:34:53
the =()= trick can not be used with split as it is "optimized" into split /foo/, $bar, 1::
$ perl -MO=Deparse -e '$a = () = split'

$a = () = split(" ", $_, 1);
Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:50:20

GAWD! Well, the fact that you write "optimized" yourself suggests that it is really an unwanted side effect of an optimization... may I push it as far as to dare to say that it is a bug?

Well, another trick that I verified not to be flawed is:

my $count=map $_, split;

of course it doesn't just taste as good... hmmm, how 'bout:

my $count=+(split);  # ?!?

(also verified!)

$ perl -lpe '$_=+(split)'
foo
1
bar baz
2
Re^4: Counting the number of items returned by split without using a named array
created: 2006-05-03 07:15:45
may I push it as far as to dare to say that it is a bug?

well, probably not, as it is well documented on perlfunc

A workaround is to set the limit explicetly to undef (though, it generates a warning):

$count = () = split ' ', $_, undef;

amazingly, setting it to 0 doesn't work:

$ perl -MO=Deparse -e '$a = () = split(" ", $_, undef)'
$a = () = split(" ", $_, undef);

$ perl -MO=Deparse -e '$a = () = split(" ", $_, 0)'
$a = () = split(" ", $_, 1);

A better workaround that doesn't generate warnings is to use a zero-but-true value:

$count = () = split ' ', $_, '0e0';
that is parsed as:
$ perl -MO=Deparse -e '$a = () = split(" ", $_, "0e0")'
$a = () = split(" ", $_, '0e0');

Re^4: Counting the number of items returned by split without using a named array
created: 2006-05-03 08:32:04

Well yeah, but aren't you just back to square one?

C:\test>perl -nwle"print $n = +(split)"
Use of implicit split to @_ is deprecated at -e line 1.
Name "main::n" used only once: possible typo at -e line 1.
foo
1
foo bar
2

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: Counting the number of items returned by split without using a named array
created: 2006-05-03 08:41:09

D'Oh! I forgot to -w when I tested it. It's incredible how annoying this little thing can be!!! All in all I would regard the assignment to @_ and the connected warning as spurious, since split is not called in void context. But... they're there!

Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-04 08:35:51
pretty amazing, that "optimization":
$ perl -MO=Deparse -e '$a = () = split(" ",$_,0)'
$a = () = split(" ", $_, 1);

BUT:
$ perl -MO=Deparse -e '$a = () = split(" ",$_,100000)'
$a = () = split(" ", $_, 100000);

so it's not completely impossible to use it with split, just limited to a fixed maximum number of fields in the split.
It's a bit unsuspected (at least for me) though, that an explicit split(" ",$_,0) was 'optimized' also.

Btw - just out of curiosity put this version in the benchmark also - and it's faster than the regexp version, but still slower than $n = @{[ split ]};

I.

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 08:55:52
a) You're right, I didnt want to know that

b) hm, I just wanted to catch spaces and tabs and thought \s+ most appropriate.

Never seen =()= in the perldoc before :-/
I.

Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:07:24

Because it's not an operator of itself. It's an assignment to a list further piped into another assignment. It's just a means to create a list context. Others may find a better wording to describe it: possibly mine is not as technically accurate as it could be. Unfortunately as others already explained, it's not reliable to use it with split.

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 13:45:56
Hm, sorry to have asked, as I see the two prominent solutions to the problem had already been discussed here
(hope linking works as expected now, preview was fine... if not: was meant to link here http://www.perlmonks.org/?node_id=527973) I.
Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-04 05:00:57

You should not be sorry. Even if the topic seems trivial and elementary, it turned out to be more complex than one would probably think, and thus the discussion has been very interesting.

BTW: to insert a link you should use [id://527973] or [id://527973|here], which render like Perl Idioms Explained - my $count = () = /.../g and here respectively. This is the preferred way since they will bring up the correct link both if you're in http://perlmonks.org and http://www.perlmonks.org, or any other possible mirror. See this node for more info.

Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:09:12
don't know if it's the "most elegant way", but if you don't want to generate an array, count the blanks, e.g.:
my $entries = 1;
s/\s/$entries++/eg;
Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:21:16

No, no, no, that would modify the original string in a most probably unwanted way. And if you really wanted to do it, then probably it should have been \s+. But you do not want to do so: a match would be better suited.

Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 10:37:05
I know that of course. It is just one quick n dirty way to count blanks (therefore the "e.g." ). but the goatse thing is nicer, i must admit
Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:12:38
Why not just count the instances of whitespace?
my $str = "this is a test string";
my $cnt = 1;                        # num tokens = whitespace + 1

++$cnt while $str =~ /\s+/g;
print $cnt, "\n";                   # prints 5

-- [189756|Tanalis]
#include [http://www.liquidfusion.org.uk|www.liquidfusion.org.uk]

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 07:07:39

The =()= does work with matches:

$ perl -lpe '($_=()=/\s+/g)++'
foo
1
bar baz
2
foo bar baz
3

But I would use [doc://split], especially with the smart behaviour provided by the default ' ' argument.

Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:38:44

That fails when input has leading or trailing blanks:

$ perl -lpe '($_=()=/\s+/g)++'   ## leading
 foo bar
3
$ perl -lpe '($_=()=/\s+/g)++'   ## trailing
foo bar 
3
$ perl -lpe '($_=()=/\s+/g)++'   ## both
 foo bar 
4
$ _

It's better to count ocurrences of actual elements (\S+):

$ perl -wle 'print scalar (()=/\S+/g) for "a b c", " a b c", "a b c ", " a b c "'
3
3
3
3
$ _

Anyway I prefer [doc://split] too, like [id://446266]++'s [id://547122|0e0 solution].

--
David Serrano

Re^4: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:47:44

I know. My entire point was not to maintain a counter manually a' la

++$cnt while $str =~ /\s+/g;

Well done to point out about \S anyway, since one often forgets about \S, \D and \W.

Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:16:09

Another way that avoids a named array.

$entries = @{[ split /\s+/ ]};

Also, split /\s+/ is the similar to as the slightly magical split ' ', except undefs from leading whitespace are suppressed.

In turn, split ' ' is the same as [split] with no arguments, so you could reduce your code to:

$entries = @{[ split ]};

If you don't have leading whitespace, or don't want to count the undef any leading whitespace would produce as an entry.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 06:57:47

No need to reference-dereference: see the last suggestion in Re^3: Counting the number of items returned by split without using a named array. If only I had thought of it earlier... well, it has been extremely interesing to learn that the =()= trick wouldn't work with split anyway...

Re^2: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:34:55
My bias goes towards either
$entries = @{[ split ]};

or
$entries = () = /\S+/g;

as "most elegant".
It's both sufficiently short, although I'm not sure which one is easier to understand for the uninitiated reader :)
Just out of curiosity (it doesnt really matter in my case):
which one would be the more (CPU- and memory-) efficient one?

I.

Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 09:42:44
Just out of curiosity (it doesnt really matter in my case): which one would be the more (CPU- and memory-) efficient one?

It hardly ever matters. As a wild guess I would say that since the former involves doing something and then undoing it and that something is taking a reference, it is more computationally intensive. In case of doubt

use Benchmark;

I may well (and happily!) prove wrong...

Re^4: Counting the number of items returned by split without using a named array
created: 2006-05-03 11:02:24
As you wish (or not):
$ cat foo.pl ; echo "--------";echo; ./foo.pl 
#!/usr/bin/perl 
##################
use Benchmark qw(:all) ;

sub test1(){
$entries = @{[ split ]};
}

sub test2(){
$entries = () = /\S+/g;
}

$_="This is an example string with several words bla bla bla\n";
$count=1E+7;
timethis ($count, "test1()");
print "------------\n";
timethis ($count, "test2()");


--------

timethis 10000000: 13 wallclock secs (12.37 usr +  0.01 sys = 12.38 CPU) @ 807754.44/s (n=10000000)
------------
timethis 10000000:  7 wallclock secs ( 6.66 usr +  0.00 sys =  6.66 CPU) @ 1501501.50/s (n=10000000)

Re^5: Counting the number of items returned by split without using a named array
created: 2006-05-04 06:49:20

Adding use warnings; results in

Use of uninitialized value in split at foo.pl line 9.

Hmmm... adding (-l and) print $entries to both subs results in:

Use of uninitialized value in split at foo.pl line 9.
0
Use of uninitialized value in pattern match (m//) at foo.pl line 14.
0

Now,

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw(:all :hireswallclock);

my $str="This is an example string with several words bla bla bla";

sub test1 () {
    local $_=$str;
    my $entries = @{[ split ]};
}

sub test2 () {
    local $_=$str;
    my $entries = () = /\S+/g;
}

cmpthese -10, {
    deref => \&test1,
    goatse => \&test2,
};

__END__

results in

          Rate goatse  deref
goatse 16325/s     --   -42%
deref  28017/s    72%     --
Re^3: Counting the number of items returned by split without using a named array
created: 2006-05-03 10:08:52

C:\test>p1
our $s = join ' ', 'aa'..'zz';;
cmpthese -3, { 
    split => q[ $_=$s; my $n = @{[ split ]}; ], 
    regex => q[ $_=$s; my $n = () = /\S+/g;  ] 
};;
        Rate regex split
regex  546/s    --  -50%
split 1102/s  102%    --

Assuming I didn't goof on the benchmark [split] appears to be quicker.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Counting the number of items returned by split without using a named array
created: 2006-05-03 11:16:16
Strange
perhaps it depends on the length of the input string?
My attempt shows a different result
You seem to be under windows - so just for comparison:
$ cat a.pl; echo '-----------' ;echo;./a.pl 
#!/usr/local/bin/perl -w
           use Benchmark qw(:all) ;

our $s = join ' ', 'aa'..'zz';;
cmpthese( -3, { 
    split => q[ $_=$s; my $n = @{[ split ]}; ], 
    regex => q[ $_=$s; my $n = () = /\S+/g;  ] 
});

-----------

        Rate regex split
regex  981/s    --  -53%
split 2098/s  114%    --


Hm, now where's the big diff to the other Bench?
I.
Re^5: Counting the number of items returned by split without using a named array
created: 2006-05-03 11:20:58
Argh... Link went wrong :( meant to link to the other benchmark above: http://www.perlmonks.org/?node_id=547169
Re^5: Counting the number of items returned by split without using a named array
created: 2006-05-03 12:34:19
Hm, now where's the big diff to the other Bench?

Try switching warnings on in your first benchmark. It will probably explain the difference :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^6: Counting the number of items returned by split without using a named array
created: 2006-05-03 13:32:11
Ok, I see :)
That could explain it *g*

I.

Re: Counting the number of items returned by split without using a named array
created: 2006-05-03 17:16:30
my$s=my@a=split/\s+/;

perlmonks.org content © perlmonks.org and Anonymous Monk, blazar, borisz, BrowserUk, Hue-Bond, lima1, prasadbabu, salva, smokemachine, Tanalis

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03