Strange interaction between command line and encoding
VSarkiss
created: 2006-06-02 12:57:49

This might be blatantly obvious, but nobody in the CB seemed to spot the problem, so I thought I'd ask here.

I have some UTF-16 files I need to convert to UTF-8, so I wrote this tiny program:

#! perl
use encoding "utf16", STDOUT => "utf8";
while (<>) { print }
It works fine. Then I realized I could just do it all from the command line:
perl -Mencoding=utf16,STDOUT,utf8 -p -e 1 < in > out
Much to my surprise the output isn't UTF-8, it's some strange UTF-16-ish thing (the file is hosed, basically).

Just for grins, I even tried

perl -Mencoding=utf16,STDOUT,utf8 -n -e print < in > out
But it had the same results.

Anybody see what's going on here? Isn't that command line identical to the little program?

For reference, I'm using ActiveState on Windows XP. Perl -V output is below.

$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"C:
\Perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf  -
libpath:"C:\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS PERL_MALLOC_WRAP
                        PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES
                        USE_PERLIO USE_SITECUSTOMIZE
  Locally applied patches:
        ActivePerl Build 817 [257965]
        Iin_load_module moved for compatibility with build 806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        Patch for CAN-2005-0448 from Debian with modifications
        Partly reverted 24733 to preserve binary compatibilty
        27528 win32_pclose() error exit doesn't unlock mutex
        27527 win32_async_check() can loop indefinitely
        27515 ignore directories when searching @INC
        27359 Fix -d:Foo=bar syntax
        27210 Fix quote typo in c2ph
        27203 Allow compiling swigged C++ code
        27200 Make stat() on Windows handle trailing slashes correctly
        27194 Get perl_fini() running on HP-UX again
        27133 Initialise lastparen in the regexp structure
        27034 Avoid "Prototype mismatch" warnings with autouse
        26970 Make Passive mode the default for Net::FTP
        26921 Avoid getprotobyname/number calls in IO::Socket::INET
        26897,26903 Make common IPPROTO_* constants always available
        26670 Make '-s' on the shebang line parse -foo=bar switches
        26379 Fix alarm() for Windows 2003
        26087 Storable 0.1 compatibility
        25861 IO::File performace issue
        25084 long groups entry could cause memory exhaustion
        24699 ICMP_UNREACHABLE handling in Net::Ping
  Built under MSWin32
  Compiled at Mar 20 2006 17:54:25
  @INC:
    c:/Perl/lib
    c:/Perl/site/lib
    .

Update
Somehow it's related to using cygwin bash shell, because both the command line and the tiny program work OK using cmd.exe. It's getting too complicated for my tiny brain, so I'm just going to stop worrying about it and use what works.

Re: Strange interaction between command line and encoding
created: 2006-06-02 14:01:29
Strange. It works for me on Linux with Perl 5.8.3. Some quick tests show that the arguments to a class's import function are identical for both the command-line version and the use statement. Maybe it's time to fire up the debugger or add some print statements to your encoding.pm?

Update: Here are the small test programs I used. They can probably tell you what's different between cygwin and cmd.exe.

T56.pm:

package T56;
 
use base 'Exporter';
sub import
{
  my $class = shift;
  print "$class import @{[scalar(@_)]} items: @_\n";
  my $fh = $_[0];
  print $fh "Ouput to first param\n";
}
1;

t56:

use T56 STDOUT,  bar => 'baz';

Compare the output of these to see what's different:

perl t56
perl -MT56=STDOUT,bar,baz -e 1

End of Update

Here's my Perl info:

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.6.8-1.521smp, archname=i386-linux-thread-multi
    uname='linux jane.spidermaker.fedoralegacy.org 2.6.8-1.521smp #1 smp mon aug 16 09:25:06 edt 2004 i686 athlon i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dversion=5.8.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.2 5.8.1 5.8.0'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -march=i386 -mcpu=i686',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.3.3 20040412 (Red Hat Linux 3.3.3-7)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.3.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.3'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.3/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
 
 
Characteristics of this binary (from libperl):
  Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
  Locally applied patches:
        SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962
  Built under linux
  Compiled at Jan 28 2006 09:58:00
  %ENV:
    PERL5LIB="/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi:/usr/lib/perl5/site_perl/5.8.3"
  @INC:
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.3
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.2
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.1
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi/5.8.0
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/5.8.3
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/5.8.2
    /usr/lib/perl5/site_perl/5.8.3/5.8.1
    /usr/lib/perl5/site_perl/5.8.3/5.8.0
    /usr/lib/perl5/site_perl/5.8.3
    /usr/lib/perl5/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/5.8.3
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3
    /usr/lib/perl5/site_perl/5.8.2
    /usr/lib/perl5/site_perl/5.8.1
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.3
    /usr/lib/perl5/vendor_perl/5.8.2
    /usr/lib/perl5/vendor_perl/5.8.1
    /usr/lib/perl5/vendor_perl/5.8.0
    /usr/lib/perl5/vendor_perl

--
[http://www.suspectclass.com/~sgifford/|sgifford's Web page]
Re: Strange interaction between command line and encoding
created: 2006-06-02 15:55:50
perl -Mencoding=utf16,STDOUT,utf8 -p -e 1
works for me, and so does:
perl -Mencoding=utf8,STDIN,utf16 -p -e 1
using both 5.8.7 and 5.8.8 on freebsd (-V output below, FWIW).
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=freebsd, osvers=6.1-prerelease, archname=amd64-freebsd
    uname='freebsd thing4.ldc.upenn.edu 6.1-prerelease freebsd 6.1-prerelease #0: fri mar 17 22:23:07 est 2006 root@thing4.ldc.upenn.edu:usrobjusrsrcsysthing4 amd64 '
    config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.8.8/mach -Dprivlib=/usr/local/lib/perl5/5.8.8 -Dman3dir=/usr/local/lib/perl5/5.8.8/perl/man/man3 -Dman1dir=/usr/local/man/man1 -Dsitearch=/usr/local/lib/perl5/site_perl/5.8.8/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.8.8 -Dscriptdir=/usr/local/bin -Dsiteman3dir=/usr/local/lib/perl5/5.8.8/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv -Uinstallusrbinperl -Dcc=cc -Duseshrplib -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -Doptimize=-O2 -fno-strict-aliasing -pipe  -Ud_dosuid -Ui_gdbm -Dusethreads=n -Dusemymalloc=y -Duse64bitint'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include',
    optimize='-O2 -fno-strict-aliasing -pipe ',
    cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include'
    ccversion='', gccversion='3.4.4 [FreeBSD] 20050518', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -Wl,-E -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lm -lcrypt -lutil
    perllibs=-lm -lcrypt -lutil
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='  -Wl,-R/usr/local/lib/perl5/5.8.8/mach/CORE'
    cccdlflags='-DPIC -fPIC', lddlflags='-shared  -L/usr/local/lib'


Characteristics of this binary (from libperl): 
  Compile-time options: MYMALLOC PERL_MALLOC_WRAP USE_64_BIT_ALL
                        USE_64_BIT_INT USE_LARGE_FILES USE_PERLIO
  Locally applied patches:
        defined-or
  Built under freebsd
  Compiled at Mar 18 2006 13:05:40
  @INC:
    /usr/local/lib/perl5/5.8.8/BSDPAN
    /usr/local/lib/perl5/site_perl/5.8.8/mach
    /usr/local/lib/perl5/site_perl/5.8.8
    /usr/local/lib/perl5/site_perl
    /usr/local/lib/perl5/5.8.8/mach
    /usr/local/lib/perl5/5.8.8
    .
My first inclination for this sort of thing would be to use binmode:
perl -pe 'BEGIN{binmode STDIN,":encoding(utf16)";binmode STDOUT,":utf8"} 1;' < in > out
But that involves a bit more typing.
Re: Strange interaction between command line and encoding
created: 2006-06-02 16:28:19
Update: OK, I think I am way off base... The original usage has a "fat-comma" on the right side of STDOUT which makes it a string anyway... I can't even claim no coffee since I don't drink it! Ah well...

OK, I may be way off base, but using B::Deparse to see how perl treats the -M command line switch shows it's processed as:
use encoding (split(/,/, 'utf16,STDOUT,utf8', 0));
That seems to indicate that the problem is that STDOUT is just a string rather than the actual filehandle when it's passed into encoding::import via the -M command line parameter.

HTH


-- Brian
Re^2: Strange interaction between command line and encoding
created: 2006-06-02 16:34:23
But STDOUT => "utf8" return strings too! It's equivalent to 'STDOUT', "utf8".
Re^2: Strange interaction between command line and encoding
created: 2006-06-02 16:48:30

Actually, perlrun says the same thing under the handling of the -M parameter. But in fact, it is just the string STDOUT being passed to the pragma, even in the program. To pass the filehandle (in a simple way, without using lexicals or other modules), you would usually pass a reference to the glob, as in:

# WRONG use of encoding pragma, just an example
use encoding 'utf16', \*STDOUT, 'utf8';

But anyway, as to the true source of the problem, I gave up trying to figure it out, as noted in the update.

[id://149675|Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.]

perlmonks.org content © perlmonks.org and bpphillips, graff, ikegami, sgifford, VSarkiss

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03