It's a small project, and I'm pretty green to Perl.
It's an architecture and reality check question. Hope you don't mind.
Building a section of a web site so logged in users can download files we've uploaded for them to get. The files are 10 - 20 megabytes (pdf'd engineering reports).
We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)
The current plan is, we'll ftp the file up, and my tool will provide a link so they can right-click and "save as".
(linux, apache, mod_perl, cgi-application and session plugin, HTML-Template, dbi, mysql.)
Originally, I wanted to control the download so we had as much positive info as possible that it was downloaded, date/time, IP address, login id etc. Since it doesn't look like I can use cgi.pm to drive the download, or the upload, and they're likely to not have an ftp client or know how to use it - the right clicked link appears reasonable.
Unless I'm missing something.
What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?
Does this act appear in the web server log so that I can programmatically find and record the download data?
I've spent a good deal of time in various searches here at Perlmonks, MS knowledge base, and others, and feel I've run out of even knowing where else to look.
Thanks in advance!
What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?
Whichever protocol is specified in the link. If the link says http://..., HTTP will be used. If the link says ftp://..., FTP will be used.
Thanks for this. I think I was wondering if HTTP changed its behavior for larger files being downloaded and not displayed. If the browswer defaulted to ftp for the download, I wondered how that could affect me.
Plus I'm looking to get some confidence that I'm approaching the application in a reasonable way. (link to the file rather than handling it directly via cgi.pm) It seems counterintuitive that the host will easily serve a huge download via http, but won't allow it via cgi (up or down - although I'm betting this is a security measure).
You could also wrap all of your statistic gathering into a small CGI that responds with a page containing instructions and a right-clickable link. This would allow you to have your cake and eat it too. As a bonus, you could allow the user to specify whether they want an http or ftp link in their request.
1- Receives as a parameter the filename to be downloaded
2- Given the current authenticated user, creates a symbolic link of the real file to a file in "/htdocs/downloads/", where the symlink name would be something like filename_USERNAME_MD5GARBAGE.pdf. The MD5GARBAGE would be generated by
my $md5garbage = md5($real_filename, $username, "any random string");3- Then your CGI redirects the browser (using the Location HTTP header) to the URL http://yoursite/downloads/filename_USERNAME_MD5GARBAGE.pdf.
4- Later on, look into your access log and any successful hit to "filename_username_xxxx.pdf" means that that specific user downloaded that specific file.
5- Periodically, clean up the old symlinks in the /htdocs/downloads/ directory.
(Yeah, looks like a dirty hack...)
1- Receives as a parameter the filename to be downloaded..and could also:
And don't forget one of the newest ways of doing such heavy tasks: bittorrent and .torrent files. Similar users get together to improve the download time and diminish the host work.
We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)
Yes, plus I'll know the unique filename. (assigned at upload)
Is the server aware whether the client disconnects or whether there's any other physical interference, and therefore log something related to download failure?
#ip address ... request status code size ... 10.10.10.10 ... "GET /your/file HTTP/1.0" 200 65536 ...The "size" field gives you the bytes sent. If the connection was broken mid-way, that value will be lower than your file size.
See [href://http://httpd.apache.org/docs/2.0/logs.html#accesslog|Apache access log] for the full details.
Very good.
Thanks for your help!
In other words, it means the browser asked for a byte-range of data because it already had some data (e.g.: if the connection was interrupted during the download). So your log may also have a status code 206 and the size won't match the total file size...
Oh, wouldn't it easier if your ISP allowed "full" CGIs?? :-P
perlmonks.org content © perlmonks.org and chanio, cupojoe, idsfa, ikegami, rhesa, salvix
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03