Seperating individual lines of a file
tgrossner
created: 2006-02-02 16:49:10
I need to seperate lines of a file into seperate files according to the first several characters in the original file. Example: original file contains> host1 BUNCHOFDATAINALINE host2 BUNCHOFDATAINALINE host1 BUNCHOFDATAINALINE host2 BUNCHOFDATAINALINE I want to seperate this into two files, called host1 and host2, containing the BUNCHOFDATAINALINE in the order it was originally in in the original file. I am a newbie to perl, but not so new as to know this IS possible. Thanks o great ones...
Re: Seperating individual lines of a file
created: 2006-02-02 16:55:04

you can put your sample data inside <code></code> tags.

it would also be helpful if you put the code you've attempted for this inside those tags, too, so we can see where your code is not doing what you expect.



--chargrill
$/  =  q#(\w)#  ;  sub  sig { print scalar reverse  join  ' ',  @_  }  sig
map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu" );
Re^2: Seperating individual lines of a file
created: 2006-02-02 17:03:54
Here my start on the code:
#!/usr/bin/perl
open ORIGFILE, "noc.060202_13";
#print ORIGFILE;
push(@DATA,);




foreach my $LINE (@DATA){
open FH, "$LINE[0]";
print FH $LINE;
close FH;

}
after setting up the foreach loop, i can pring $LINE to STDOUT and see each line, but i cant pull out the first element as $LINE[0] to name the file, seemingly.
Re^3: Seperating individual lines of a file
created: 2006-02-02 17:16:59

first, that's an interesting use of push - you could just do @DATA=;, but i don't think that's a big deal.

second, you'll find that you'll want to open your file for writing by using the open, FILEHANDLE, ">filename" nomenclature.

third, as written, your code will open a file called "host1 BUNCHOFDATAINALINE". you might want to split( /pattern/, expression ) each line of the file so you can separately refer to the separate pieces of data, like ( $HOSTNAME, $SOMEDATA ).



--chargrill
$/  =  q#(\w)#  ;  sub  sig { print scalar reverse  join  ' ',  @_  }  sig
map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu" );
Re^4: Seperating individual lines of a file
created: 2006-02-02 17:24:17
ok, on the split function, can i split $line into two sections, one being the first 16 characters, then name the file via this string of characters?
Re^5: Seperating individual lines of a file
created: 2006-02-02 18:20:29

if you're specifically interested in just the first 16 characters, you could also look into substr( EXPR, OFFSET, LENGTH). when i saw "host1 BUNCHOFDATAONALINE" i assumed that splitting on whitespace was what you were looking for, but when you phrase it as "one being the first 16 characters", substr comes to mind.

you may wish to examine the differences between [doc://split] and [doc://substr] to see which would suit you better.



--chargrill
$/  =  q#(\w)#  ;  sub  sig { print scalar reverse  join  ' ',  @_  }  sig
map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu" );
Re^5: Seperating individual lines of a file
created: 2006-02-03 07:38:00
ok, on the split function, can i split $line into two sections, one being the first 16 characters, then name the file via this string of characters?
Yes, you can, but then chances are that substr or unpack are better suited for the tast. Split works best for splitting on a pattern. Well, more precisely it is exactly for splitting on a pattern!
Re^3: Seperating individual lines of a file
created: 2006-02-02 23:13:26
Something that [chargrill] didn't mention, but might be an issue:

Doing a file open and file close for every line can get really expensive and time consuming if there happen to be thousands of lines of input.

Perl allows you to store file handles in a hash, so you can open a new file each time you see a new "hostname" string, and just re-use that handle whenever you see the same name again:

# set $listfile to some constant, or to $ARGV[0] (and supply the file name
# as a command-line arg when you run the script)

my %outfh;  # hash to hold output file handles

open ORIGFILE, $listfile or die "$listfile: $!";
while (  ) {
    my ( $host, $data ) = split " ", $_, 2;
    if ( ! exists( $outfh{$host} )) {
        open( $outfh{$host}, ">", $host ) or die "$host: $!";
    }
    print $outfh{$host} $data;
}
# perl will flush and close output files when done
Of course, if there are lots of different host names in the input file (or if there is something really wrong and unexpected in the list file contents), the script would die when it tries to open too many file handles.
Re^4: Seperating individual lines of a file
created: 2006-02-03 11:25:05
I am trying out your code; I replaced the $listfile with $ARGV[0].
#!/usr/bin/perl


# set $listfile to some constant, or to $ARGV[0] (and supply the file
#+name
# as a command-line arg when you run the script)

my %outfh;  # hash to hold output file handles

open ORIGFILE, $ARGV[0] or die "$ARGV[0]: $!";
while (  ) {
    my ( $host, $data ) = split " ", $_, 2;
    if ( ! exists( $outfh{$host} )) {
        open( $outfh{$host}, ">", $host ) or die "$host: $!";
    }
    print $outfh{$host} $data;
}
# perl will flush and close output files when done
But this produces a syntax error of
Scalar found where operator expected at ./nocsplit.pl line 17, near "} $data"
        (Missing operator before  $data?)
syntax error at ./nocsplit.pl line 17, near "} $data"
Execution of ./nocsplit.pl aborted due to compilation errors.
It seems to not like the
print to $outfh{$host} $data;
Re^5: Seperating individual lines of a file
created: 2006-02-03 15:14:17

There may be a more elegant solution, but graff's code works if you change the line:

print $outfh{$host} $data;

to:

my $fh = $outfh{$host};
print $fh $data;

dave

Re^5: Seperating individual lines of a file
created: 2006-02-05 02:58:35
print $outfh{$host} $data;

In addition to [Not_a_Number]'s [id://527804|solution] by means of assigning to a temporary variable, another possible one is that given in [doc://print|perldoc -f print]:

print { $outfh{$host} } $data;
Re^3: Seperating individual lines of a file
created: 2006-02-03 07:31:59

People generally do

use strict;
use warnings;

nowadays, and that's the single best piece of advice I can give you!

Also, people do

my @DATA=;

but then they also prefer to avoid slurping in files all at once, and they iterate on the lines instead with a while loop rather than with a for one:

while (my $line=) { # ...

In any case you have to specify '>' mode in open for writing (and '>>' for appending). More generally I recommend you to stick with the three args form of [doc://open] and lexical handles, and always check the return value:

open my $in, '<', "whatever" or die "can't open `whatever': $!\n";
open my $out, '>', "whatever" or die "can't open `whatever': $!\n";
Re: Seperating individual lines of a file
created: 2006-02-02 20:00:56
perl -ne 'open(FILE, ">>".$1) if /^(.*?)\s(.*)$/; print FILE $2."\n"; close FILE' data_file
Re^2: Seperating individual lines of a file
created: 2006-02-03 07:42:47
Interesting because golfed, although not to the extreme. But the OP should be warned about this circumstance. More importantly, and worth repeating, graff's comment applies to your solution as well.

perlmonks.org content © perlmonks.org and blazar, chargrill, graff, Not_a_Number, smokemachine, tgrossner

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03