fetchall_arrayref with slice and row count
gmax
created: 2006-01-31 12:54:52

The [http://search.cpan.org/perldoc?DBI#fetchall_arrayref|DBI docs] say that fetchall_arrayref can get records in batches, as a compromise between fetching one-by-one and all-at-once.

The example from the manual goes like this.

  my $rows = []; # cache for batches of rows
  while( my $row = ( shift(@$rows) || # get row from cache, or reload cache:
                     shift(@{$rows=$sth->fetchall_arrayref(undef,10_000)||[]}) )
  ) {
    ...
  }

This idiom makes sense, because it can fetch records at high speed, as shown in [id://273952].

A few days ago, brother [bradcathey] asked my help for a case that was similar to the idiom in the manual, with the addition of a slice, so that fetchall_arrayref produces a AoH instead of a AoA. This case led to an error, as shown in the following code.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use DBI;

my $driver = shift || 'mysql';
my $db   = 'test';
my $user = undef;
my $pass = undef;
my $dbh;
$driver = lc $driver;

if ($driver eq 'mysql' ) {

   $dbh = DBI->connect('DBI:mysql:'.$db
                  . ";mysql_read_default_file=$ENV{HOME}/.my.cnf"
              , $user, $pass, {RaiseError => 1})
              or die "Can't connect: $DBI::errstr\n";
    printf "DBI: %s - DBD::mysql: %s\n", $DBI::VERSION, $DBD::mysql::VERSION;

}
elsif ($driver eq 'sqlite') {

   $db   = 'test.db';
   if ( -f $db) {
        unlink $db;
   }
   $dbh = DBI->connect('DBI:SQLite:'.$db, $user, $pass, {RaiseError => 1})
              or die "Can't connect: $DBI::errstr\n";
   printf "DBI: %s - DBD::SQLite: %s\n", $DBI::VERSION, $DBD::SQLite::VERSION;
}
else {
    die "driver $driver not supported in this test\n";
}

   my $max_fields  = 3;
   my $field_size  = 9;
   my $max_records = 10;
   my $max_commit  = 5;
   my $text        = 'abcdefghi';
   my $inserted    = 0;

   my $create_query = qq{CREATE TABLE testdbi ( }
    . (join q{,}, map { qq{id$_ char($field_size) not null} } ( 1..$max_fields ))
    . qq{, primary key (id1))};

   {
        local $dbh->{PrintError} = 0;
        eval { $dbh->do(qq{DROP TABLE testdbi}) };
   }
   $dbh->do(qq{begin});
   $dbh->do($create_query);

   my $sth = $dbh->prepare(qq{INSERT INTO testdbi VALUES (}
       . (join q{,}, map {q{?}} (1..$max_fields))
       .q{)} );

   for (1..$max_records) {
       $inserted += $sth->execute( map {$text++} (1..$max_fields) );
       if (($inserted % $max_commit) == 0) {
           $dbh->do(qq{commit});
           print "$inserted\n";
           $dbh->do(qq{begin});
       }
   }
   $dbh->do(qq{commit});
   print "inserted $inserted records\n";


   my $max_rows = 3; #must leave a remainder for the error
                     #multiples of 2 and 5 work
   print "without slice\n";
   my $rowcache = [];

   my $select_query = qq{SELECT id1, id2 FROM testdbi ORDER BY id1};
   $sth = $dbh->prepare($select_query);
   $sth->execute;

   my $count = 0;
   while (my $aref = ( shift(@{ $rowcache } )
          || shift (@{$rowcache=$sth->fetchall_arrayref(undef, $max_rows)
          || [] } ) ) )
   {
      my $rec =  Data::Dumper->Dump([$aref],['rec']);
      $rec =~ s/\s+/ /g;
      $count++;
      print "$count $rec \n";
   };

#---- HERE STARTS THE FAILING CODE ------
   print "with slice\n";

   $rowcache = [];
   $sth = $dbh->prepare($select_query);
   $sth->execute;

   $count = 0;
   while (my $aref = ( shift(@{ $rowcache })
          || shift (@{$rowcache=$sth->fetchall_arrayref({}, $max_rows)
          || [] } ) ) )
   {
      my $rec =  Data::Dumper->Dump([$aref],['rec']);
      $rec =~ s/\s+/ /g;
      $count++;
      print "$count $rec \n";
   };

__END__
$ perl test_dbi.pl mysql
DBI: 1.49 - DBD::mysql: 3.0002
5
10
inserted 10 records
without slice
1 $rec = [ 'abcdefghi', 'abcdefghj' ];
2 $rec = [ 'abcdefghl', 'abcdefghm' ];
3 $rec = [ 'abcdefgho', 'abcdefghp' ];
4 $rec = [ 'abcdefghr', 'abcdefghs' ];
5 $rec = [ 'abcdefghu', 'abcdefghv' ];
6 $rec = [ 'abcdefghx', 'abcdefghy' ];
7 $rec = [ 'abcdefgia', 'abcdefgib' ];
8 $rec = [ 'abcdefgid', 'abcdefgie' ];
9 $rec = [ 'abcdefgig', 'abcdefgih' ];
10 $rec = [ 'abcdefgij', 'abcdefgik' ];
with slice
1 $rec = { 'id1' => 'abcdefghi', 'id2' => 'abcdefghj' };
2 $rec = { 'id1' => 'abcdefghl', 'id2' => 'abcdefghm' };
3 $rec = { 'id1' => 'abcdefgho', 'id2' => 'abcdefghp' };
4 $rec = { 'id1' => 'abcdefghr', 'id2' => 'abcdefghs' };
5 $rec = { 'id1' => 'abcdefghu', 'id2' => 'abcdefghv' };
6 $rec = { 'id1' => 'abcdefghx', 'id2' => 'abcdefghy' };
7 $rec = { 'id1' => 'abcdefgia', 'id2' => 'abcdefgib' };
8 $rec = { 'id1' => 'abcdefgid', 'id2' => 'abcdefgie' };
9 $rec = { 'id1' => 'abcdefgig', 'id2' => 'abcdefgih' };
10 $rec = { 'id1' => 'abcdefgij', 'id2' => 'abcdefgik' };
DBD::mysql::st fetchall_arrayref failed: fetch() without execute() at test_dbi.pl line 106.
DBD::mysql::st fetchall_arrayref failed: fetch() without execute() at test_dbi.pl line 106.

$ perl test_dbi.pl sqlite
DBI: 1.49 - DBD::SQLite: 1.09
5
10
inserted 10 records
without slice
1 $rec = [ 'abcdefghi', 'abcdefghj' ];
2 $rec = [ 'abcdefghl', 'abcdefghm' ];
3 $rec = [ 'abcdefgho', 'abcdefghp' ];
4 $rec = [ 'abcdefghr', 'abcdefghs' ];
5 $rec = [ 'abcdefghu', 'abcdefghv' ];
6 $rec = [ 'abcdefghx', 'abcdefghy' ];
7 $rec = [ 'abcdefgia', 'abcdefgib' ];
8 $rec = [ 'abcdefgid', 'abcdefgie' ];
9 $rec = [ 'abcdefgig', 'abcdefgih' ];
10 $rec = [ 'abcdefgij', 'abcdefgik' ];
with slice
1 $rec = { 'id1' => 'abcdefghi', 'id2' => 'abcdefghj' };
2 $rec = { 'id1' => 'abcdefghl', 'id2' => 'abcdefghm' };
3 $rec = { 'id1' => 'abcdefgho', 'id2' => 'abcdefghp' };
4 $rec = { 'id1' => 'abcdefghr', 'id2' => 'abcdefghs' };
5 $rec = { 'id1' => 'abcdefghu', 'id2' => 'abcdefghv' };
6 $rec = { 'id1' => 'abcdefghx', 'id2' => 'abcdefghy' };
7 $rec = { 'id1' => 'abcdefgia', 'id2' => 'abcdefgib' };
8 $rec = { 'id1' => 'abcdefgid', 'id2' => 'abcdefgie' };
9 $rec = { 'id1' => 'abcdefgig', 'id2' => 'abcdefgih' };
10 $rec = { 'id1' => 'abcdefgij', 'id2' => 'abcdefgik' };

As you can see from the sample output, the example fails when using both a slice and a row count with [CPAN://DBD::mysql], but it does not fail with [CPAN://DBD::SQLite]. However, changing $max_records to 1,000, the script fails with SQLite2 (segmentation fault).

While initially I was thinking that perhaps there was a bug in DBD::mysql, this latter fact made me think that perhaps using fetchall_arrayref with a slice and row count together was never an intended behavior. Why? Because the row count device is intended for speed, while the slice (leading to a hashref per record) is intended for clarity, but has a speed penalty that does not agree with the row count option.

Thus, I advised [bradcathey] to use the recipe described at [href://?node_id=284436#bindhash|DBI recipes /binding a hash], i.e. using bind_col with a static hash. This way, there can be clarity without suffering too much speed penalty.

So, what do you think? It is a bug, or wrong usage?

 _  _ _  _  
(_|| | |(_|><
 _|   
Re: fetchall_arrayref with slice and row count
created: 2006-01-31 14:04:29
I found the same thing to be true in node 320547. I guess you could say that the general consensus was that it was a bug. I think it would be nice to have the best of both worlds (slice + limit).

blokhead

perlmonks.org content © perlmonks.org and blokhead, gmax

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03