Reading (and parsing) a byte stream
qbxk
created: 2006-03-04 21:35:22
I'm trying to read a stream of bytes, sent over http (yes), and read specific bytes out of it. I'm finding, incredulously, that I don't know how to determine the integer value of a byte. Some context:
   my $sock = Net::HTTP->new(Host => "server.com") || die $@;
   $sock->write_request(GET => "bytestream") or die $@;
   $sock->read_response_headers( );  #i don't need these
   while(1){
      $s->recv($buf, $size); 
      my $n = length($buf);
      # oh.. i feel like reading byte #4 today.
      next if $n < 4;  # i don't want to talk about this case, ok?
      my $byte = bytes::substr($buf, 4, 1, undef); 
           #remove it from the stream too
      print $byte; #not looking like an integer.

      $byte *= 16; #yields a warning:
                   # Argument "x" isn't numeric ...etc
                   #  where "x" is some crazy character
   }
I have a hunch ord($byte) is what i'm looking for, but the byte i'm reading isn't yielding the value I expect it to, which could be a different problem...

for the more curious, I'm attempting to implement an icecast stream recorder, the stream format is explained here: http://www.smackfu.com/stuff/programming/shoutcast.html
i'm storing the metaint from the headers, just not in my example, and yes the "byte" i'm trying to read is that meta length byte - my next question will be how to convert the meta data (string of bytes) into a character string. then writing to disk (and prefiltering) only the mpg frames, none of the metadata.

maybe somebody will recommend other solutions to reimplementing the wheel, which I'm open to, but my requirements are highly specific and I haven't found anything that can meet them all, so here we are.


It's not what you look like, when you're doin' what you’re doin'.
It's what you’re doin' when you’re doin' what you look like you’re doin'!
     - Charles Wright & the Watts 103rd Street Rhythm Band
Re: Reading (and parsing) a byte stream
created: 2006-03-04 21:55:42

I don't see anything on CPAN that would solve this problem, though HTTP::Handle might make it a bit easier. I do see a couple of things that might be your problem, though. First off, do you want byte #4, or the byte at index 4? You've got the latter at the moment (from bytes::substr($buf,4,1,undef)), which is of course what most people would call byte #5.

I think that's probably the issue you're looking at, because [ord] usually does what you expect in this case, but if you're parsing binary data, you really should be looking at [unpack], which is specifically designed for this task. Assuming you weren't doing anything with the rest of the string, the invocation in this case would be

my $byteval  = unpack "x3C", $buf;

Good luck!



If God had meant us to fly, he would *never* have given us the railroads.
    --Michael Flanders

Re: Reading (and parsing) a byte stream
created: 2006-03-04 22:17:07
I'm shooting in the dark here, but...

I have a hunch ord($byte) is what i'm looking for, but the byte i'm reading isn't yielding the value I expect it to, which could be a different problem...

...sounds to me like a problem related to the "endianness" of the data.

Instead of using ord(), try using some form of unpack(). The protocol specification should specify the endianness (i.e. if it's big- or little-endian).


acid06
perl -e "print pack('h*', 16369646), scalar reverse $="
Re^2: Reading (and parsing) a byte stream
created: 2006-03-05 22:57:23

I'm afraid your shot misses—endianness has to do with byte order, not the significance of bits within bytes. So if you have a four-byte number, the most-significant byte may be at the beginning (Big-endian, or $Config{byteorder} eq '4321') or at the end (Little-endian, '1234'), but the values of the bytes themselves don't change.



If God had meant us to fly, he would *never* have given us the railroads.
    --Michael Flanders

Re^3: Reading (and parsing) a byte stream
created: 2006-03-05 23:12:19
Not really.
From the article about Endianness at the Wikipedia:

Endianness also applies in the numbering of bits within a byte or word. In a consistently big-endian architecture the bits in the word are numbered from the left, bit zero being the most significant bit and bit 7 being the least significant bit in a byte.

So endianness should have to do with bit order.
Just because the usual way of packing/unpacking little- or big-endian data in Perl (Network and VAX types) does not follow this pattern it doesn't mean it's not correct.


acid06
perl -e "print pack('h*', 16369646), scalar reverse $="
Re^4: Reading (and parsing) a byte stream (bit order)
tye
created: 2006-03-05 23:30:04

Bit order only matters if you've got a way of addressing things smaller than bytes or if you've encoded bits from one byte into multiple bytes. That isn't the case here so bit order should not matter. The data is transmitted in units of bytes and thus the values of the bytes are preserved, no matter how each unit along the way chooses to store those byte values.

- tye        

Re: Reading (and parsing) a byte stream
created: 2006-03-04 22:43:00

Possibly check out what number you expect, vs. what number you actually get. Take a look at them in binary (not hex), and see whether you're getting the bits reversed. I'm thinking some sort of '-endian' problem, but I'm not sure that fits all the facts... From the link you provided, I'm wondering whether you're simply getting a zero from the metadata.

Re: Reading (and parsing) a byte stream
created: 2006-03-04 23:06:01

Take a look at the use of the '/' in the documentation for [pack], you can also use this format character in unpack.

Basically, an unpack format of 'C/a', will read the byte value represented by the C, and then use that as the length specifier for the character following the '/'; in this case 'A' for ascii data. As you also need the length of the metadata in order to remove it from the stream, you'll need a template of "a$DATASIZE C X C/A", which will capture the data to the first variable, the length to the second, backup over the length byte and then use it to capture the metadata to the third variable.

I've used a datasize of 10 and an array to simulate the read in this example. The critical part is exiting the while loop when there is not enough data left to fullfil the data size and the read the next lump and append it to teh residual:

#! perl -slw
use strict;
use bytes;

my $DATASIZE = 10;
my @stream = (
    "abcdefghij\x04fredabcdefghij\x06barneyabcdefghij\x00abcde",
    "fghij\x07bam bamabcdefghij\x00abcdefghij"
);

my $stream = '';
for ( @stream ) {
    $stream .= $_;
    while( length( $stream ) > $DATASIZE) {
        my( $data, $len, $meta ) = unpack "a$DATASIZE C X C/A", $stream;
        print "\ndata:$data";
        print "meta:$meta" if $len;
        my $trim = $len ? $len+1 : 1;
        $stream = bytes::substr( $stream, $DATASIZE + $trim );
    }
}
__END__
c:\test>junk

data:abcdefghij
meta:fred

data:abcdefghij
meta:barney

data:abcdefghij

data:abcdefghij
meta:bam bam

data:abcdefghij

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading (and parsing) a byte stream
created: 2006-03-05 05:27:52

I have recently posted an example of reading a byte stream from a socket and interpreting it with the unpack function at GPM mouse handling. That one is simple because the records are of a fixed size (28 bytes).

Re: Reading (and parsing) a byte stream
created: 2006-03-06 02:23:53
Thanks for all this help!

I found a solution, using the read function instead of recv() - much simpler coding this way, it's not documented well where this function comes from though. Net::HTTP is the child of several large classes...

Another confounding issue was that I suppose Net::HTTP::read_response_haders() just isn't reading the headers "right", it's somehow messing with the byte counts. So I read my own headers now. ;P

Here's a very basic solution. I hope to turn this into a subclass of Net::HTTP, or perhaps instead an instance of Net::Socket::INET and call it Net::Icecast - I'm open to suggestions. You can see also that this nugget is simply of wont for features too:

#!/usr/bin/perl

###  Written by qbxk for perlmonks
###  It is provided as is with no warranties, express or implied, of any kind. Use posted code at your own risk.

$|++;
use warnings;
use strict;
use IO::Socket;
use IO::Socket::INET;
use Net::HTTP;
use Data::Dumper;
use Carp::Assert;

# use constant USER_AGENT => 'WinampMPEG/2.9';  # I got refused by some public servers unlessen i done it thar way
use constant USER_AGENT => 'Stream-Recorder-0.01';

my %HOST = ( 
   host => 'icecast.uvm.edu', port => 8005, mount => '/wruv_fm_256'
);

use constant DEBUG => 1;

sub debug(@) { print STDERR "\n" . join("\n", @_) . "\n"; }
sub debug_raw(@) { print STDERR @_; }

sub open_connection {
   my %args = (
      host => undef,
      port => 80,
      mount => '',
      user_agent => USER_AGENT,
      @_
   );
   
   die "Need a host name" unless defined($args{host});
   
   $args{mount} =~ s/^\/+//g;

   my $sock = Net::HTTP->new(Host => $args{host}, PeerPort => $args{port} ) || die $@;
   $sock->write_request(GET => "/$args{mount}", 'User-Agent' => $args{user_agent}, 'Icy-MetaData' => 1) or die $@;
   
   # my ($code, $mess, %headers) = $sock->read_response_headers( laxed => 1 )

   my ($code, $mess, %headers);
   while( <$sock> ) {
      s/\s*$//g;
      last if /^\s*$/;
      
      if( /^(?:HTTP\/1\.[01]|ICY) ([0-9]+) (.+)$/ ) {
         ($code, $mess) = ($1 +0, $2);
      }
      else {
         my ($h, $v) = split(/:/);
         $headers{$h} = $v;
      }
   }
   return ($sock,$code,$mess,%headers);
}

main: {

   my ($s,$code, $mess, %headers) = open_connection( %HOST );
  
   debug "$code|$mess\n" . Dumper(\%headers);
 
   # TODO: timeout on $s.

   exit if( $code != 200 ); # scream and shout

   my ($metaint) = map { (/^icy-metaint$/i && $headers{$_}) or () } keys %headers;
      assert( $metaint > 0 );

   open OUT, '>stream-out.mp3';
   binmode OUT; # very important

   while( 1 ) {   
      my $buf;
      $s->read($buf, $metaint);
      print OUT $buf;
      
      my ($metadata, $metalen, $metabyte);
      
      $s->read($metabyte, 1);
      $metalen = unpack("C",$metabyte) * 16;

      if( $metalen > 0) {
         #We have NEW metadata! JOY
         $s->read($metadata, $metalen);
         $metadata = unpack("A$metalen", $metadata);
         assert( $metadata =~ /Stream/, "Not good metadata!" ); #don't dump a lot of BS (binary *#$!), just die.
         
         debug "$metalen - [$metadata]";
      }
      else {
         $metadata = '';
         debug_raw "-";
      }
   }
}
You'll find a clean, "un-meta"ed mp3 file ever growing, called "stream-out.mp3" in your working directory... i've done enough for one night so that's how it stays.

It's not what you look like, when you're doin' what you’re doin'.
It's what you’re doin' when you’re doin' what you look like you’re doin'!
     - Charles Wright & the Watts 103rd Street Rhythm Band

perlmonks.org content © perlmonks.org and acid06, ambrus, BrowserUk, ChemBoy, qbxk, spiritway, tye

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03