If you are using threads, do as little as possible that consumes memory in your main thread, that includes initialising data, before you spawn your threads. Here is timing and memory usage stats from two consecutive runs of a simple threaded script. The only difference between them is the relative position of two lines of code:
c:\test>junk Image Name PID Session Name Session# Mem Usage ========================= ====== ================ ======== ============ tperl.exe 10172 0 64,840 K Taken 3.278383 seconds c:\test>junk Image Name PID Session Name Session# Mem Usage ========================= ====== ================ ======== ============ tperl.exe 2924 0 173,516 K Taken 8.761321 seconds
For the first run, the code looked like this:
#! perl -slw
use strict;
use threads;
use Time::HiRes qw[ time ];
sub simplesub { sleep 10, return 1 }
my $start = time;
my @threads = map{ threads->create( \&simplesub ) } 1 .. 10;
my @array = 0 .. 1e5;
my %hash = 1 .. 1e5;
system qq[tasklist /fi "pid eq $$"];
printf "Taken %f seconds", time() - $start;
$_->join for @threads;
For the second run, like this:
#! perl -slw
use strict;
use threads;
use Time::HiRes qw[ time ];
sub simplesub { sleep 10, return 1 }
my $start = time;
my @array = 0 .. 1e5;
my %hash = 1 .. 1e5;
my @threads = map{ threads->create( \&simplesub ) } 1 .. 10;
system qq[tasklist /fi "pid eq $$"];
printf "Taken %f seconds", time() - $start;
$_->join for @threads;
So, another secret to (somewhat) lighter threads is to ensure that you spawn your threads early in the program before you generate lots of data structures in your main thread. Everything that exists in your main threads memory at the time of spawn, (including everything created by all the packages you have [use]d ( physically before or after the point of spawn!)), will be cloned wholesale into the memory of each thread you spawn!
That has the downside that you don't always want to spawn your threads right at the start of your code as you often don't have everything they need at that point. That in turn, requires that you arrange for your threads to wait for the information they require, and some method of passing that information to them at some later point once it is available. And that introduces the complications of queues and shared memory and synchronisation.
What I've been looking for for a while now is a simple interface to a mechanism that allows me to spawn my threads early, with new, clean, uncloned, interpreters, in a suspended state and then 'resume' them, passing any parameters they require using a simple, clean interface.
my( $Xthread ) = threads->create( { suspended => 1 }, \&Xthread );
my( $Ythread ) = threads->create( { suspended => 1 }, \%Ythread );
... Do other stuff that gets me the parameters for X
$Xthread->resume( $arg1, $arg2 );
... Generate/fetch/calculate args for Y
$Ythread->resume( $Yarg1, $Yarg2 );
... tum te tum
my( @Yresults ) = $Ythread->join;
...
my( @Xresults ) = $Xthread->join;
If anyone has suggestions for how to go about doing this?
If the threads could be 're-resumed' with different parameters that would be even better.
If the main program needed another process it would ask the original parent to do it for it.
Could you explain that in a bit more detail for me? I've never done much with fork, especially in Perl.
#!/usr/bin/perl -w
#use forking open to start 3 more processes
unless (open X, "-|") { print 1+5; exit }; # X=1+5
unless (open Y, "-|") { print 2*3; exit }; # Y=2*3
unless (open Z, "-|") { print +; exit }; # Z=X+Y
print "Z = "; print ."\n";
Even in the latest Gtk2 code, which allows some fancier thread work, thru their thread-safety mechanism, experts like muppet still say the best way is to do it like you suggest. Create the threads first, before anything else is declared, and you will have few problems.
This is the basic thread I use, you can either hard code the threads code, or pass it via shared-variable and eval it. When the thread is created, it goes right to sleep, and wakes up once per second to see if it needs to awake. The one drawback with this method, is you need to clean them up when exiting......wake them up, and tell them to die, then join them.
sub work{
my $dthread = shift;
$|++;
while(1){
if($shash{$dthread}{'die'} == 1){ goto END };
if ( $shash{$dthread}{'go'} == 1 ){
eval( system( $shash{$dthread}{'data'} ) );
foreach my $num (1..100){
$shash{$dthread}{'progress'} = $num;
print "\t" x $dthread,"$dthread->$num\n";
select(undef,undef,undef, .5);
if($shash{$dthread}{'go'} == 0){last}
if($shash{$dthread}{'die'} == 1){ goto END };
}
$shash{$dthread}{'go'} = 0; #turn off self before returning
}else
{ sleep 1 }
}
END:
}
Yes. I've been using and describing these techniques here for a 3 years or more, but I am looking for a way to ecapsulate the messy and fiddly business of shared data, access control and the process of spawning 'clean&light' threads into a module with simple interface. I gotten close a couple of times, but there is always something that I haven't found a good way to do
Your example code misses the point. In a nutshell, the problem is
Possible interface:
use threads::lite; my @threads = threads::lite->spawn( 10 ); ... ## Then when I know what I want a thread to do my $Xthread = pop threads; $Xthread->run( \&doX, @Xargs ); .... my @Xresults = $Xthread->join;
Possible interface:
use threads::lite; my $threadFactory = threads::lite->genFactory; .... my( $Xthread ) = $threadsFactory->create( \&doX, @Xargs ); my( $Ythread ) = $threadsFactory->create( \&doY, @Yargs );
Don't take any notice of the module/method names shown. I could care less whether they are camelCase() or hugely_verbose_with_under_scores()--though I have my preferences like others, and I'd prefer that they weren't Hugely_Verbose_With_Camel_Case_And_Underscores() as I've encountered occasionally.
The crux of the matter is how to create light threads (which means early), but use them when I need them; and without having to reinvent the wheel of queues and synchronisation and all that good stuff in every program; and without cloning everything in my current thread into every thread I spawn.
Ie. A simple interface to lightweight, 'only-clone-what-is-needed' threads.
I'm not much into making objects, but that would be my first attempt.
You know more about it than me, I'm pretty content to stick with functional worker threads which I control thru a hash.
I like this idea. A suggestion for the interface:
use threads::lite; my $factory = threads::line->new( -threads => 10 ); #reserve 10 threads my $x_thr = $factory->create( \&doX, \@Xargs, \%optional_configs ); my $y_thr = $factory->create( \&doY, \@Yargs );
The general ideas are
sub _default_thread {
my $thr_id = shift;
if (defined $s_coderef[$thr_id] && ref $s_coderef[$thr_id] eq 'CODE') {
$s_coderef[$thr_id]->(@{ $s_param[$thr_id] });
$s_coderef[$thr_id] = undef;
}
else { sleep(1) }
}
This is just off the top of my head, so take it as such.
Although it looks like your advice, when it comes to perl and not general computing, is right:
paranoid% perl -w junk1.pl Taken 1.603796 seconds% paranoid% perl -w junk2.pl Taken 4.308179 seconds%
I'm still avoiding threads with perl, there's no good reason for lib authors to make their libs thread-safe, thus your perl apps will never be thread-safe, and, there is basically nothing that threading has to offer (well, headaches and longer development times, but if we wanted that, we would be programming java)
(But we do get a lot of people who read a book about GUIs, and they can't seem to live without threads these days)
A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox
I see just the opposite. When you have a situation where you need to share data between separate processes, it is easier for me to use threads and threads::shared. I suppose if you are used to setting up safe shared memory segments for IPC, then it may be easier for you. But I still see in that situation, threads and shared data is easier to setup, and safer. I shudder when I see those shared memory segments which are not cleaned up.....I've seen some shared mem segment apps,which are supposed to clean up after themselves, leave shared memory segments intact, after a kill 9 or a control-c. I will take threads anyday.
And there is the option of dealing with a gazillion pipes....yuck.
But I agree with you that if you don't need to share data, forking is preferred over threads.
Although it looks like your advice, when it comes to perl and not general computing, is right:
Without wishing to offend you, this is a Perl forum, and the subject is Perl threads. Ithreads are not forked processes; not pthreads; nor greeen threads; nor any other flavour. COW is not available everywhere, and Ithreads do not (yet) make use of COW anywhere that I am aware of. As such, your prior experience os of little value in a thread relating to them.
I'm still avoiding threads with perl, there's no good reason for lib authors to make their libs thread-safe, thus your perl apps will never be thread-safe, and, there is basically nothing that threading has to offer (well, headaches and longer development times, but if we wanted that, we would be programming java) (But we do get a lot of people who read a book about GUIs, and they can't seem to live without threads these days)
I rarely bother to read threads about web/cgi and related technologies because they don't interest me.
Again, without wishing to offend you, you must have seen the word "thread" in the title of post. You obviously have no interest in threads, so why bother to expend effort to respond? Especially in such a negative vein. Isn't easier to simply note the subject and move on?
FYI. Threads have many, many uses beyond "GUIs", though they are one good use. And despite your undisguised attempts to imply that GUI applications are somehow inferior, for the vast majority of computer users, as opposed to computer technologists and geeks, gui applications are easier to use and allow them to use their computer systems as tools to perform the primary job rolls without having to become computer specialists.
I started to try and explain the way iThreads work, and how they removed the need for the vast majority of modules to need to be coded to be thread safe--at a conservative guess, 90% of the modules on cpan wirk perfectly well in conjunction with threads without any speacial care needing to be taken by their users, beyond not sattempting to share objects across threads.
Then I realised that, going by the tone of your post, yousimply wouldn't care. You have no interest in threads and your mind is closed to their possibilities. So, I won't bother.
A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox
I've no idea who Alan Cox is, but it is apparent that he is just as ill informed on the subject.
First off, as far as forking is concerned, you're right: by doing the set-up work in the parent, it gets copied into the kids in shared copy-on-write memory. Thus, there is a huge runtime boon to doing that - both in memory and CPU.
That said, threads are another beast. As BrowserUk points out, these are perl threads, which make them a slightly different beast than regular win32- or p- threads.
They're different enough that I don't bother using them. However, I look forward to perl 6 partly for the hopes that by putting threading into the base language, we might get some good, lightweight threads where the types of workarounds that BrowserUk mentioned in his OP are no longer necessary. Of course, what fixing threads does to PONIE in threaded situations ... well, I don't know.
I really wish I had a thread-safe perl where I could just do stuff in parallel and not have to worry about inter-thread communication. I have some very parallelisable tasks in my code which could really gain from this, especially when it's running on multi-CPU machines (usually 4-way machines). Unfortunately, I'm using blessed references all over the place, and the overhead probably would kill me.
A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox
I'm assuming this is the Alan Cox in question ? I suppose its all well and good for someone who hacks OS kernels for fun and profit to make such statements. However, as someone who has also hacked kernels (including of the realtime, SMP kind) for fun and profit, I'd adjust Mssr. Cox's assertion a bit:
Threads are for people who can't have better things to do than program state machines.
However, if Thread::Apartment is as capable as current testing indicates, then I'll agree with your assertion that "there's no good reason for lib authors to make their libs thread-safe". Because they won't have to, assuming they're reasonably OO Perl. Just pop them into an apartment thread, and call the methods and/or invoke its closures as needed.
Yes I realize there are issues apartment threading can't solve. But I've managed to get some threads-hostile DBI drivers to behave, and hope to have Tk working soon, which indicates many otherwise threads hostile modules should be supportable.
Yes, but any solution that uses string eval means that you lose all the compile-time checking of the code contained in the strings(*), as well as being extremely slow if you call the code more than once.
For example, if you wish to spawn a thread to handle client connects, the time spent re-evaling the code to run in the thread, will leave your main thread unresponsive to accept new connections for too long.
And when things go wrong in your threads, you are left with no clues as to what and where.
It also uses Storable freeze/thaw combinations to pass data to/from/between threads. This is even slower than shared data; doesn't handle large volumes well; and makes assumptions about what the data will contain.
I admire gmpassos greatly for the attempt, but it doesn't really work well in use.
(*) IMO, a much better reason for avoiding string eval than "security issues".
So your findings about runtime and memory usage led me to ponder how likely it would be that some simple-minded perl script using DBI to do stuff with data from Oracle tables might get kind of ugly, just because a simple-minded perl hacker doesn't know about, think about, or have a choice regarding the kind of coding adjustment you demonstrated in your benchmarks.
For example, someone decides to load a lot of data from a file before connecting to the database -- and then DBD::Oracle starts doing stuff with threads "under the table", completely unbeknownst to the hapless programmer. (Maybe DBD::Oracle doesn't really do stuff with threads, but then I don't understand why it seems to need thread support...)
Anyway, part of what you're hoping for seems unattainable, unless you accept a trade-off: you can economize on memory and runtime if you can specify up front exactly how many threads you intend to use. That's great, but isn't there a whole class of apps whose defining trait is the ability to start new threads on an as-needed basis (not knowing in advance how many will be needed)?
I am not someone who can go into detail on this, but roughly speaking, it sounds like what is needed is a way to define some sort of initial minimal state -- like a snapshot at startup -- such that each new thread starts out with just the minimal stuff defined therein; the parent process might know of specific data that a given thread would need, and would explicitly enable the access (whether copied or shared), but without this action from the parent, the thread must simply accumulate its own data separately.
I don't have a clue how that would be implemented (for all I know, it might already be implemented!) -- but just in conceptual terms, that seems like what you'd want.
As I've (I hope correctly) explained in the thread, but I summaries here also to assauge any fears, threads created by call C APIs (pthreads_create()/CreateThread()/_beginThread()/other) will be completely unaffected by any memory considerations associated with Perl's cloning of Perl data.
They would possibly be affected by the changes to process stacksize settings as I described in the other recent thread--if they choose to use implicit stack size settings, but that's less common in C/C++ as the have access to the calls/parameters to use explicit values.
Anyway, part of what you're hoping for seems unattainable, unless you accept a trade-off: you can economize on memory and runtime if you can specify up front exactly how many threads you intend to use. That's great, but isn't there a whole class of apps whose defining trait is the ability to start new threads on an as-needed basis (not knowing in advance how many will be needed)?
Yes. That is the problem in a nutshell. Creating a "factory thread" very early in the script before anything else heavy is loaded is relatively easy to do. Even passing coderefs (which are allocated on the heap and (I believe) threadsafe), to the that thread factory so that it can spawn the new thread from a lightweight environment shoudl be possible. The real problem comes in transferring context, and parameters, and retrieving results.
I'm convinced it is possible, I just haven't put the right set on incantations together yet. At least, I hope that is the problem.
Yeah. I wish I understood, or one of the guys that know would tell us, where the memory growth actually arises.
If you run this (having substituted a suitable mem routine for your platform), and then play with the various values, it's really difficult to devine where the growth occurs and what controls how much?
#! perl -slw
use strict;
use Data::Dumper;
use threads;
use threads::shared;
no warnings 'misc';
our $N ||= 100;
our $D ||= 1.e5;
our $SHARED;
sub mem {
my @filler = 1 .. $D unless @_;
my @filler : shared = 1 .. $D if @_;
my( $usage ) = `tasklist /NH /FI \"pid eq $$\" ` =~ m[ (\S+) \s+ K \s* $ ]x;
$usage =~ tr[,][]d;
return 1024 * $usage;
}
my @data = 1 .. $D unless $SHARED;
my @data:shared = 1 .. $D if $SHARED;
printf "start : %6d\n", my $start = mem;
for ( 1 .. $N ) {
my $thread = threads->create( \&mem );
printf "%3d : %6d\n", $_, $thread->join;
}
printf "end : %6d\n", my $end = mem;
printf "Growth: %6d\n", $end - $start;
Here are some typical results on my system:
c:\test\fork>..\534254 -N=10 -D=1 start : 3141632 1 : 3993600 2 : 4018176 3 : 4018176 4 : 4022272 5 : 4038656 6 : 4038656 7 : 4022272 8 : 4018176 9 : 4055040 10 : 4055040 end : 4034560 Growth: 892928 c:\test\fork>..\534254 -N=10 -D=1 -SHARED start : 3145728 1 : 3997696 2 : 4022272 3 : 4022272 4 : 4030464 5 : 4026368 6 : 4022272 7 : 4059136 8 : 4059136 9 : 4059136 10 : 4059136 end : 4042752 Growth: 897024 c:\test\fork>..\534254 -N=10 -D=1e3 start : 3211264 1 : 4128768 2 : 4149248 3 : 4149248 4 : 4157440 5 : 4149248 6 : 4153344 7 : 4157440 8 : 4161536 9 : 4157440 10 : 4198400 end : 4132864 Growth: 921600 c:\test\fork>..\534254 -N=10 -D=1e3 -SHARED start : 3555328 1 : 4345856 2 : 4378624 3 : 4382720 4 : 4386816 5 : 4395008 6 : 4390912 7 : 4390912 8 : 4390912 9 : 4386816 10 : 4395008 end : 4374528 Growth: 819200 c:\test\fork>..\534254 -N=10 -D=1e5 start : 9723904 1 : 17874944 2 : 17907712 3 : 17920000 4 : 17940480 5 : 17924096 6 : 17940480 7 : 17936384 8 : 17944576 9 : 17936384 10 : 17952768 end : 11743232 Growth: 2019328 c:\test\fork>..\534254 -N=10 -D=1e5 -SHARED start : 43819008 1 : 48697344 2 : 48721920 3 : 48726016 4 : 48726016 5 : 48730112 6 : 48730112 7 : 48730112 8 : 48734208 9 : 48738304 10 : 48738304 end : 44195840 Growth: 376832
On the other hand, if you set $self->{'reuse'} = 1; and comment out the line
my @ReturnData = $z->{'thread'}->join;
the threads will run in parallel and the memory climbs with each thread.
So the trick, is to find a way to have the main watch for each thread when it is ready to join, then relaunch it, instead of making another thread object. That is why it is easier with Tk, Gtk2, POE, etc, where you can have an event loop watching the thread. I am toying with the idea of how to put a self-contained method in the object to watch for the thread finishing it's code run.
You could set the thread to be non-reuse and detach it. Then have the Zthread object store the return value in it's object. Then the main program would just have to wait an amount of time, and get the thread returns out of the Zthread object, and undef the object. That will be my next step.
#!/usr/bin/perl
use warnings;
use strict;
$|=1;
package Zthread;
use threads;
use threads::shared;
sub new {
my ($class, %arg) = @_;
my $self = {
# 'name' => $arg{-name}, #identifying name
# 'reuse' => $arg{-reuse}, # control reuse of thread
};
bless $self;
threads::shared::share($self->{'counter'});
threads::shared::share($self->{'go'});
$self->{'counter'} = 0;
$self->{'go'} = 0;
$self->{'die'} = 0;
#sets whether threads are kept alive, but sleeping
#after a run finishes
$self->{'reuse'} = 0;
$self->{'thread'} = threads->new( sub{
while(1){
if($self->{'die'} == 1){ goto END };
print $self->{'go'};
if ( $self->{'go'} == 1 ){
# eval( system( $self->{'data'} ) );
foreach my $num (1..20){
print 'Thread-> ',$self->{'counter'},"\n";
$self->{'counter'}++;
select(undef,undef,undef, .1);
if($self->{'go'} == 0){last}
if($self->{'die'} == 1){ goto END };
}
if( ! $self->{'reuse'} ){
print "We are done boss->$self->{'counter'}\n";
goto END;
}else{
print "We are done boss->$self->{'counter'} .... going to sleep\n";
$self->{'go'} = 0; #turn off self before returning
}
}else
{ select(undef,undef,undef, .1); }
}
END:
return "We are done boss->$self->{'counter'}\n";
});
return ($self);
}
###################################################
sub start {
my $self = shift;
$self->{'go'} = 1;
}
################################################
sub getCounter {
my $self = shift;
return $self->{'counter'};
}
1;
package main;
my @psizes;
foreach my $run (1..10){
my $z = Zthread->new();
$z->start();
print "wait end marker for \n";
#commenting out the following line will make them run
#in parrallel but memory will climb each run
my @ReturnData = $z->{'thread'}->join;
my @size = split "\n", `cat /proc/$$/status`;
(my $vmsize) = grep { /VmSize/ } @size;
my (undef, $size) = split ' ', $vmsize;
# print "\nThread returned @ReturnData Size->$size\n";
print "\nThread returned Size->$size\n";
push @psizes, $size;
}
print "@psizes\n";
<>;
__END__
perlmonks.org content © perlmonks.org and acid06, Anonymous Monk, BrowserUk, Eyck, graff, perrin, radiantmatrix, renodino, Tanktalus, vagnerr, zentara
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03