Different linebreaks for different folks...
EvanK
created: 2006-05-08 17:26:17
I'm curious as to how (and how often) anyone else deals with files with foreign linebreaks in their scripts. I, personally, have to deal with this quite a lot, and not just the windows (CRLF) <--> linux (LF) issues. We also have a few Mac users (CR), which keeps us on our toes.
 
For instance, let's say we have a clueless user who FTP's a configuration file onto the linux server in binary mode, and everything comes crashing down. I've dealt with this many a time, so I've grown accustomed to opening ALL my config/data files in binmode and splitting them with /\x0D?\x0A|\x0D/ (that is, if I'm not using a module to read them in).
 
How about my fellow monks? Thoughts? Comments? Insults?

__________
Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
- Terry Pratchett

Re: Different linebreaks for different folks...
created: 2006-05-09 04:18:05
Adam Kennedy just posted a use.perl journal entry about File::LocalizeNewlines which is supposed to transparently handle foreign newlines.
Re: Different linebreaks for different folks...
created: 2006-05-10 12:25:13

I rarely find it a problem. I only have to deal with Unix<->Windows issues, and it's a simple matter to convert between the two. If I forget, the results are usually noticeable enough to remind me right away.

Re: Different linebreaks for different folks...
created: 2006-05-10 13:16:16

I've used something like this little stub on the off times I've needed to "guess" at filetypes:

sub guess_newline {
  my $file = shift;
  open my $F, '<', $file or return undef;
  
  my ($buf, $sep);
  until ( eof($F) or defined $sep) {
    # read a 1-k chunk + a random few bytes
    read( $F, $buf, 1024+int(rand(5)) );
    
    # the trailing dot below is important, in case part of a newline is
    # truncated in the read!
    $sep = $1 if $buf=~m/(\x0A|\x0D|\x0D\x0A)./;
  }
  
  close $F;
  return $sep;
}

The purpose of the random length change is a workaround for a couple files I've run across where the lines (including newline) were exactly 1024 bytes. As a result, my regex never matched. ;-)

This works very well like this:

{
   local $/ = guess_newline($filename) || die "Can't guess sep for $filename";
   open my $IN, '<', $filename or die "Can't read $filename: $!";
   while (<$IN>) { ... }
}
<radiant.matrix>
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed [http://en.wikipedia.org/wiki/Trebuchet|trebuchet]

perlmonks.org content © perlmonks.org and EvanK, fireartist, radiantmatrix, spiritway

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03