my $utf8key = "\x{05D0}";
dbmopen(my %hash, "/tmp/mydb", 0666) || die "d'oh!";
$hash{$utf8key} = "bar";
dbmclose(%hash);
prints
Wide character in null operation at ./test.pl line 8.
As checked with Encode::is_utf8, the string in $utf8key has the utf8 flag on.
Is this a bug in the dbm implementation or am I just confused?
It happens with perl5.8.8 and perl5.9.3. Thanks for any help.
It's not the fault of tied hashes.
use Tie::Hash qw( );
our @ISA = 'Tie::StdHash';
sub STORE {
my ($self, $key, $val) = @_;
print($key eq "\x{05D0}" ? "utf" : "not utf", "\n");
return $self->SUPER::STORE($key, $val);
}
my %h;
tie %h, __PACKAGE__;
$h{"\x{05D0}"} = 1; # Prints 'utf' in 5.8.6
dbm probably doesn't support unicode keys. The workaround is to encode your strings of chars into strings of bytes. UTF-8 is probably the best suited encoding.
use Encode;
my $utf8key = "\x{05D0}";
my $usable_key = encode( 'utf8', $utf8key );
dbmopen(my %hash, "/tmp/mydb", 0666) || die "d'oh!";
$hash{$usable_key} = "bar";
dbmclose(%hash);
I don't get the warning message. I also noticed some differences in the content of the resulting dbm file -- the OP version had null bytes where the 'encoded' version had non-null bytes, suggesting that the warning issued by the OP version reflects an actual failure to store the data.
Having to encode the hash keys like this is certainly a PITA (a minor one, but still). Perhaps the maintainer(s) the various *DBM_File modules can be persuaded to update them so as to handle this properly -- easy enough to do, I'd expect.
I don't get the warning message
It's not a warning. It's a fatal error. It was added to newer versions of Perl.
I also noticed some differences in the content of the resulting dbm file
Not here. 5.8.0 with Encode, 5.8.0 without Encode and 5.8.8 with Encode output:
0000: 02 00 FE 03 FB 03 00 00 00 00 00 00 00 00 00 00 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 03D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03F0: 00 00 00 00 00 00 00 00 00 00 00 62 61 72 D7 90
Since strings of chars are stored internally as UTF-8, the resulting file is indentical.
perlmonks.org content © perlmonks.org and Anonymous Monk, graff, ikegami, kwaping, saintmike
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03