Using split() to divide a string by length
japhy
created: 2006-04-13 15:22:19
I caught the tail end of a discussion on irc.freenode.net's #perl channel about how to split a string into equal-sized chunks. Some people were trying to use split() to accomplish this; one person fell prey to this:
my $string = "abcdefghi";
my @fields = split /(?=.{3})/, $string;
They expected this to mean "split $string at every location that is followed by three characters (and then skip ahead three characters!)", but what it really means is "split $string at every location that is followed by three characters". They ended up getting ("a", "b", "c", "d", "e", "f", "ghi").

So how can you use split() to do this? Someone said "Couldn't you abuse \G?", and that reminded me of the internal assignment to $_ of the string being matched against, and the resulting use of pos()! I present:

my @fields = split /(?(?{pos() % 3})(?!))/, $string;

Jeff [japhy] Pinyan, [id://371157|P.L., P.M., P.O.D, X.S.]: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Using split() to divide a string by length
created: 2006-04-13 15:39:22

Another trick that works is to capture the split characters, which places them also in @fields and makes [pos] advance beyond them. Since all but probably the last group match, the normal split results mostly don't contain anything, so we need to filter out false elements with [grep]:

my $string = join '', a..z;

my @fields = grep {$_} split /(.{3})/, $string;

print "@fields\n";

__END__
abc def ghi jkl mno pqr stu vwx yz

After Compline,
Zaxo

Re^2: Using split() to divide a string by length
created: 2006-04-13 15:51:08
Your grep should be grep { defined }.

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^3: Using split() to divide a string by length
created: 2006-04-13 16:58:34

Yep, I tried with [defined] first because that's the way I thought it worked, too. With that, the result of mine is,

 abc  def  ghi  jkl  mno  pqr  stu  vwx yz
Note the extra spaces, indicating that there are defined empty strings instead of [undef]s in those positions.

After Compline,
Zaxo

Re^4: Using split() to divide a string by length
created: 2006-04-13 17:06:48
Then grep length, ... is needed.
Re: Using split() to divide a string by length
created: 2006-04-13 16:08:05

I find [unpack] more suitable for this task.

print for unpack '(A3)*', "abcdefghi";;
abc
def
ghi

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Using split() to divide a string by length
created: 2006-04-13 16:09:38
Note: Parens requires Perl 5.8.0.
Re^3: Using split() to divide a string by length
created: 2006-04-13 17:24:46

Yep, I know. I remember it it being added.

I also remember it from the last time you told me.

And the time before that.

So, what is your point?

  • I can't use parens in pack/unpack templates because it's only been available for 4 years*?
  • I shouldn't mention my preference for a solution because it's only been available for the last 8 releases?
  • Everytime I suggest a solution that uses a feature that isn't available in every build of perl, I should add a footnote that ikegami has (unnecessarily) reminded me that this feature has only been available for the last 8 releases and 4 years*?

I know, I know. You're just "expanding knowledge".

Perhaps you should also consider adding footnotes to all your posts that use or recommend other features that have not been around forever? Like say, the 3-arg open; or even hashes?

(*) For the pedantic, 3 years, 8 months, 16 days 4 hours (approx. at the time of posting).


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Using split() to divide a string by length
created: 2006-04-13 17:34:46

See Re: Version, version, why change the version..

Re^5: Using split() to divide a string by length
created: 2006-04-13 17:55:42

So now I'm gonna ask you the same question. What is your point?

Are you seriously suggesting that no post on PM can mention the use of a 5.8.x feature?

Or that if they are mentioned, then the post must also duplicate the deltas and give a history of each features inception?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Using split() to divide a string by length
created: 2006-04-13 19:06:34
I'm not telling you. I'm telling your readers. The ones to which you just suggested to use a feature that may not be available to them. 5.6 still has a very large user base.
Re: Using split() to divide a string by length
created: 2006-04-13 16:10:31
This doesn't use split, but is the first thing I think of:
my $string = "abcdefghi";
my @fields = $string =~ /.{1,3}/g;
Hmm, $string =~ /.{1,3}/g should even be faster than split /(?(?{pos() % 3})(?!))/, $string.
Re^2: Using split() to divide a string by length
created: 2006-04-13 17:03:53
What is your grep line all about?

Caution: Contents may have been coded under pressure.
Re: Using split() to divide a string by length
created: 2006-04-13 18:09:33

I hereby propose that we patch split such that it's first argument, if it's a reference to an integer, will split the string into chunk of characters each with as many chars as that integrer (except the last of course). Come on! Who's with me? :-)

(for the humor impaired, I'm not being serious)

Re: Using split() to divide a string by length
created: 2006-04-14 12:12:14

I'm not sure split is the right choice for extracting fixed-length substrings. Isn't that really what [doc://substr] is for (I mean, if you don't want to use [doc://unpack])?

sub split_len {
    ## split_len( $chars, $string[, $limit] ) 
    ## - splits $string into chunks of $chars chars
    ## - limits number of segments returned to $limit, if provided
    my ($chars, $string) = @_;

    my ($i, @result);  
    for ($i = 0; ($i+$chars) < length($string); $i+=$chars) {
        last if (defined $limit && @result >= $limit);
        push @result, substr($string, $i, $chars);
    }

    # deal with any short remainders
    return @result if (defined $limit && @result >= $limit);
    if ($i > length($string)-$chars) {
        push @result, substr($string, $i);
    }

    return @result;
}
<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed [http://en.wikipedia.org/wiki/Trebuchet|trebuchet]
Re^2: Using split() to divide a string by length
created: 2006-04-19 06:18:41
# deal with any short remainders

substr does it for us: "If OFFSET and LENGTH specify a substring that is partly outside the string, only the part within the string is returned". This is my version. Doesn't implement $limit (nor parameter checking) but features $start:

sub split_len {
    my ($str, $start, $len) = @_;
    my @ret;

    for (my $strlen = length $str; $start <= $strlen; $start += $len) {
        push @ret, substr $str, $start, $len;
    }
    return @ret;
}

my $c =  join '', 'a'..'z';
print "@{[ split_len $c, 0, 3 ]}\n";
print "@{[ split_len $c, 0, 4 ]}\n";
print "@{[ split_len $c, 3, 4 ]}\n";
__END__
abc def ghi jkl mno pqr stu vwx yz
abcd efgh ijkl mnop qrst uvwx yz
defg hijk lmno pqrs tuvw xyz

--
David Serrano

perlmonks.org content © perlmonks.org and ambrus, BrowserUk, chibiryuu, duff, Hue-Bond, ikegami, japhy, radiantmatrix, Roy Johnson, Zaxo

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03