Easy matrix builder
hj4jc
created: 2006-01-18 12:07:01
I created 200 text files. Each file contained 30 thousand rows and 1 column. I wanted the quickest, easiest way to build a matrix that had 30 thousand rows and 200 columns. I am sure many of you here can do this in less than 10 minutes, and if you are one of them, I would be grateful if you could please share your ideas on other ways to do this or modules I can use, and I also hope somebody out there finds this code easy and useful.
#! usr/bin/perl
#This script takes multiple files that contain the same number of rows,
#and builds a matrix where the number of columns would equal the number of files.
use warnings;
use strict;
#opens all txt files in the current directory
#and stores the names of the files in @files array
opendir(FILES,".")||die "Cannot open files in the directory\n";
my @files=();
my @matrix=();
for(readdir(FILES)){
	if($_=~/\.txt/){
		push(@files, $_);
	}
}
my $n=scalar(@files);
print "This directory contains $n \.txt files\n";
#opens a file which will contain all the ratios
open(TREE,">matrix.txt");
#goes through each file in @files array
for my $i(0..$#files){
	print "Working on \.\.\. $files[$i]\.\n";
	open(FH,"<$files[$i]");
	my $j=0;
	while(my $line=){
		chomp $line;
		my @line=split("\t",$line); #This is more applicable if the file contained more than one column, tab delimited
		#@matrix is an array of array
		#jth array in @matrix contains records from different files for jth line
			push @{$matrix[$j]}, "$line[0]";
			$j++;
	}
}
for my $tmp(@matrix){
	#$records joins the elements by tab
	my $records=join "\t", @$tmp;
	print TREE "$records\n";
}
exit;
Re: Easy matrix builder
created: 2006-01-18 12:14:04
paste * > matrix.txt
We're building the house of the future together.
Re^2: Easy matrix builder
created: 2006-01-18 13:06:56

This is also available under *nix (which is where the idea for many of the ppt programs came from IIRC). However, watch for the number of open files. 200 may be approaching the per user, or per process limit on some machines / OS.

If paste does what I think it does, it will open all of the files and then pull a line from each in a loop.

--MidLifeXis

Re^3: Easy matrix builder
created: 2006-01-19 04:18:32
Indeed he most probably meant the *NIX tool and pointed to ppt for a perl implementation to look at for inspiration.
Re: Easy matrix builder
created: 2006-01-18 12:32:28

I think you've done a fine job. However, there are numerous things in your code that could be simpler. For starters, I would leave the determination of the input and output files to the shell. I.e. use @ARGV as the file list, and redirect output. Those are things the shell is better at.

Here's one way to do it using a standard module:

use Tie::File;
use strict;
use warnings;

my @files_as_arrays = map {
        my @a;
        tie @a, 'Tie::File', $_;
        \@a
} @ARGV;

{
        local $" = "\t";
        my $i = 0;
        while (1) {
                my @a = map { $_->[$i] } @files_as_arrays;
                $i++;
                grep { defined $_ } @a or last;
                print "@a\n"
        }
}
We're building the house of the future together.

perlmonks.org content © perlmonks.org and blazar, hj4jc, jdporter, MidLifeXis

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03