This might be blatantly obvious, but nobody in the CB seemed to spot the problem, so I thought I'd ask here.
I have some UTF-16 files I need to convert to UTF-8, so I wrote this tiny program:
#! perl
use encoding "utf16", STDOUT => "utf8";
while (<>) { print }
It works fine. Then I realized I could just do it all from the command line:perl -Mencoding=utf16,STDOUT,utf8 -p -e 1 < in > outMuch to my surprise the output isn't UTF-8, it's some strange UTF-16-ish thing (the file is hosed, basically).
Just for grins, I even tried
perl -Mencoding=utf16,STDOUT,utf8 -n -e print < in > outBut it had the same results.
Anybody see what's going on here? Isn't that command line identical to the little program?
For reference, I'm using ActiveState on Windows XP. Perl -V output is below.
$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
usethreads=define use5005threads=undef useithreads=define usemultiplicity=de
fine
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -Zi -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='12.00.8804', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi
ze=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:
\Perl\lib\CORE" -machine:x86'
libpth=\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comd
lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -
libpath:"C:\Perl\lib\CORE" -machine:x86'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_IMPLICIT_SYS PERL_MALLOC_WRAP
PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES
USE_PERLIO USE_SITECUSTOMIZE
Locally applied patches:
ActivePerl Build 817 [257965]
Iin_load_module moved for compatibility with build 806
PerlEx support in CGI::Carp
Less verbose ExtUtils::Install and Pod::Find
Patch for CAN-2005-0448 from Debian with modifications
Partly reverted 24733 to preserve binary compatibilty
27528 win32_pclose() error exit doesn't unlock mutex
27527 win32_async_check() can loop indefinitely
27515 ignore directories when searching @INC
27359 Fix -d:Foo=bar syntax
27210 Fix quote typo in c2ph
27203 Allow compiling swigged C++ code
27200 Make stat() on Windows handle trailing slashes correctly
27194 Get perl_fini() running on HP-UX again
27133 Initialise lastparen in the regexp structure
27034 Avoid "Prototype mismatch" warnings with autouse
26970 Make Passive mode the default for Net::FTP
26921 Avoid getprotobyname/number calls in IO::Socket::INET
26897,26903 Make common IPPROTO_* constants always available
26670 Make '-s' on the shebang line parse -foo=bar switches
26379 Fix alarm() for Windows 2003
26087 Storable 0.1 compatibility
25861 IO::File performace issue
25084 long groups entry could cause memory exhaustion
24699 ICMP_UNREACHABLE handling in Net::Ping
Built under MSWin32
Compiled at Mar 20 2006 17:54:25
@INC:
c:/Perl/lib
c:/Perl/site/lib
.
Update
Somehow it's related to using cygwin bash shell, because both the command line and the tiny program work OK using cmd.exe. It's getting too complicated for my tiny brain, so I'm just going to stop worrying about it and use what works.
Update: Here are the small test programs I used. They can probably tell you what's different between cygwin and cmd.exe.
T56.pm:
package T56;
use base 'Exporter';
sub import
{
my $class = shift;
print "$class import @{[scalar(@_)]} items: @_\n";
my $fh = $_[0];
print $fh "Ouput to first param\n";
}
1;
t56:
use T56 STDOUT, bar => 'baz';
Compare the output of these to see what's different:
perl t56 perl -MT56=STDOUT,bar,baz -e 1
End of Update
Here's my Perl info:
Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
Platform:
osname=linux, osvers=2.6.8-1.521smp, archname=i386-linux-thread-multi
uname='linux jane.spidermaker.fedoralegacy.org 2.6.8-1.521smp #1 smp mon aug 16 09:25:06 edt 2004 i686 athlon i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dversion=5.8.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.2 5.8.1 5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -g -pipe -march=i386 -mcpu=i686',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.3.3 20040412 (Red Hat Linux 3.3.3-7)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.3.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.3'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.3/i386-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
Locally applied patches:
SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962
Built under linux
Compiled at Jan 28 2006 09:58:00
%ENV:
PERL5LIB="/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi:/usr/lib/perl5/site_perl/5.8.3"
@INC:
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.3
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.2
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.1
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.0
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/5.8.3
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/5.8.2
/usr/lib/perl5/site_perl/5.8.3/5.8.1
/usr/lib/perl5/site_perl/5.8.3/5.8.0
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/5.8.3
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl/5.8.2
/usr/lib/perl5/site_perl/5.8.1
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.3
/usr/lib/perl5/vendor_perl/5.8.2
/usr/lib/perl5/vendor_perl/5.8.1
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
perl -Mencoding=utf16,STDOUT,utf8 -p -e 1works for me, and so does:
perl -Mencoding=utf8,STDIN,utf16 -p -e 1using both 5.8.7 and 5.8.8 on freebsd (-V output below, FWIW).
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=freebsd, osvers=6.1-prerelease, archname=amd64-freebsd
uname='freebsd thing4.ldc.upenn.edu 6.1-prerelease freebsd 6.1-prerelease #0: fri mar 17 22:23:07 est 2006 root@thing4.ldc.upenn.edu:usrobjusrsrcsysthing4 amd64 '
config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.8.8/mach -Dprivlib=/usr/local/lib/perl5/5.8.8 -Dman3dir=/usr/local/lib/perl5/5.8.8/perl/man/man3 -Dman1dir=/usr/local/man/man1 -Dsitearch=/usr/local/lib/perl5/site_perl/5.8.8/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.8.8 -Dscriptdir=/usr/local/bin -Dsiteman3dir=/usr/local/lib/perl5/5.8.8/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv -Uinstallusrbinperl -Dcc=cc -Duseshrplib -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -Doptimize=-O2 -fno-strict-aliasing -pipe -Ud_dosuid -Ui_gdbm -Dusethreads=n -Dusemymalloc=y -Duse64bitint'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=define uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include',
optimize='-O2 -fno-strict-aliasing -pipe ',
cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include'
ccversion='', gccversion='3.4.4 [FreeBSD] 20050518', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -Wl,-E -L/usr/local/lib'
libpth=/usr/lib /usr/local/lib
libs=-lm -lcrypt -lutil
perllibs=-lm -lcrypt -lutil
libc=, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-R/usr/local/lib/perl5/5.8.8/mach/CORE'
cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: MYMALLOC PERL_MALLOC_WRAP USE_64_BIT_ALL
USE_64_BIT_INT USE_LARGE_FILES USE_PERLIO
Locally applied patches:
defined-or
Built under freebsd
Compiled at Mar 18 2006 13:05:40
@INC:
/usr/local/lib/perl5/5.8.8/BSDPAN
/usr/local/lib/perl5/site_perl/5.8.8/mach
/usr/local/lib/perl5/site_perl/5.8.8
/usr/local/lib/perl5/site_perl
/usr/local/lib/perl5/5.8.8/mach
/usr/local/lib/perl5/5.8.8
.
My first inclination for this sort of thing would be to use binmode:
perl -pe 'BEGIN{binmode STDIN,":encoding(utf16)";binmode STDOUT,":utf8"} 1;' < in > out
But that involves a bit more typing.
use encoding (split(/,/, 'utf16,STDOUT,utf8', 0));That seems to indicate that the problem is that STDOUT is just a string rather than the actual filehandle when it's passed into encoding::import via the -M command line parameter.
Actually, perlrun says the same thing under the handling of the -M parameter. But in fact, it is just the string STDOUT being passed to the pragma, even in the program. To pass the filehandle (in a simple way, without using lexicals or other modules), you would usually pass a reference to the glob, as in:
# WRONG use of encoding pragma, just an example use encoding 'utf16', \*STDOUT, 'utf8';
But anyway, as to the true source of the problem, I gave up trying to figure it out, as noted in the update.
[id://149675|Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.]
perlmonks.org content © perlmonks.org and bpphillips, graff, ikegami, sgifford, VSarkiss
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03