File download tool, file size issues, cgi-application
cupojoe
created: 2006-03-02 11:53:36
Hello Monks,

It's a small project, and I'm pretty green to Perl.

It's an architecture and reality check question. Hope you don't mind.

Building a section of a web site so logged in users can download files we've uploaded for them to get. The files are 10 - 20 megabytes (pdf'd engineering reports).

We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)

The current plan is, we'll ftp the file up, and my tool will provide a link so they can right-click and "save as".

(linux, apache, mod_perl, cgi-application and session plugin, HTML-Template, dbi, mysql.)

Originally, I wanted to control the download so we had as much positive info as possible that it was downloaded, date/time, IP address, login id etc. Since it doesn't look like I can use cgi.pm to drive the download, or the upload, and they're likely to not have an ftp client or know how to use it - the right clicked link appears reasonable.

Unless I'm missing something.

What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?

Does this act appear in the web server log so that I can programmatically find and record the download data?

I've spent a good deal of time in various searches here at Perlmonks, MS knowledge base, and others, and feel I've run out of even knowing where else to look.

Thanks in advance!

Re: File download tool, file size issues, cgi-application
created: 2006-03-02 11:55:56
What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?

Whichever protocol is specified in the link. If the link says http://..., HTTP will be used. If the link says ftp://..., FTP will be used.

Re^2: File download tool, file size issues, cgi-application
created: 2006-03-02 12:39:53

Thanks for this. I think I was wondering if HTTP changed its behavior for larger files being downloaded and not displayed. If the browswer defaulted to ftp for the download, I wondered how that could affect me.

Plus I'm looking to get some confidence that I'm approaching the application in a reasonable way. (link to the file rather than handling it directly via cgi.pm) It seems counterintuitive that the host will easily serve a huge download via http, but won't allow it via cgi (up or down - although I'm betting this is a security measure).

Re: File download tool, file size issues, cgi-application
created: 2006-03-02 12:55:37

You could also wrap all of your statistic gathering into a small CGI that responds with a page containing instructions and a right-clickable link. This would allow you to have your cake and eat it too. As a bonus, you could allow the user to specify whether they want an http or ftp link in their request.


The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
Re^2: File download tool, file size issues, cgi-application
created: 2006-03-02 13:15:15
Yeah we have anonymous ftp, but I just tried an http download with an 18 meg file and it was fine. The "small" cgi you mentioned is roughly what's happening except there's also an admin side so our guys can create accounts, set logins/pw's for downloaders, let's them assign the ftp'd files to the login, shows them brief reports about selected accts; users see their page with instructions, are informed this file and link will expire in 10 days (space savings). Auto email to our admin that it happened, configurable email to downloader thanking them and reminding them of the 10 day thing. etc.
Re: File download tool, file size issues, cgi-application
created: 2006-03-02 13:12:10
If your host allows symbolic links in your htdocs (Options FollowSymLinks), you could have the files to be downloaded in another directory outside the htdocs and have a simple CGI script (or mod_perl handler) that:

1- Receives as a parameter the filename to be downloaded

2- Given the current authenticated user, creates a symbolic link of the real file to a file in "/htdocs/downloads/", where the symlink name would be something like filename_USERNAME_MD5GARBAGE.pdf. The MD5GARBAGE would be generated by

my $md5garbage = md5($real_filename, $username, "any random string");
3- Then your CGI redirects the browser (using the Location HTTP header) to the URL http://yoursite/downloads/filename_USERNAME_MD5GARBAGE.pdf.

4- Later on, look into your access log and any successful hit to "filename_username_xxxx.pdf" means that that specific user downloaded that specific file.

5- Periodically, clean up the old symlinks in the /htdocs/downloads/ directory.

(Yeah, looks like a dirty hack...)

Re^2: File download tool, file size issues, cgi-application
created: 2006-03-03 11:56:55
1- Receives as a parameter the filename to be downloaded
..and could also:
  • 1.a- Compress the file to download
  • 2.a- Know the speed of the connection and mention: bzipped file name, file size and estimated time to download it.
  • 3.a- And, perhaps, even mention the MD5 sum to check with after finishing the download if the user has a way of doing it at his desktop.
Besides, as a starting point, just try creating an icon at your desktop that points to an ftp file at a well configured ftp site. See how easyly it could work. Then try adding the minimum required to fullfill all your needs.

And don't forget one of the newest ways of doing such heavy tasks: bittorrent and .torrent files. Similar users get together to improve the download time and diminish the host work.

Re: File download tool, file size issues, cgi-application
created: 2006-03-02 13:33:58
have a look at CGI::Application::Plugin::Stream.
Re^2: File download tool, file size issues, cgi-application
created: 2006-03-02 13:48:09
This looks good, but the OP stated that he can't serve files bigger than 2MB through any CGI script:
We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)
Re^3: File download tool, file size issues, cgi-application
created: 2006-03-02 15:55:11
Right, I did gloss over that. in that case, there's no way to check that the download was successful at the time of download. The only option that remains is scanning the server logs, and checking if the downloaded size equals the file size. I'd probably give the download link a query param that makes it easy to pick it up from the logs.
Re^4: File download tool, file size issues, cgi-application
created: 2006-03-02 18:54:56

Yes, plus I'll know the unique filename. (assigned at upload)

Is the server aware whether the client disconnects or whether there's any other physical interference, and therefore log something related to download failure?

Re^5: File download tool, file size issues, cgi-application
created: 2006-03-02 19:10:14
The only hint you'll likely be able to get is the reported download size. The log should say something like
#ip address  ...  request                   status code  size  ...
10.10.10.10  ...  "GET /your/file HTTP/1.0" 200          65536  ...
The "size" field gives you the bytes sent. If the connection was broken mid-way, that value will be lower than your file size.

See [href://http://httpd.apache.org/docs/2.0/logs.html#accesslog|Apache access log] for the full details.

Re^6: File download tool, file size issues, cgi-application
created: 2006-03-02 19:20:08

Very good.

Thanks for your help!

Re^6: File download tool, file size issues, cgi-application
created: 2006-03-03 10:52:36
Also watch out for status code 206, it is the "Partial Content" status code.

In other words, it means the browser asked for a byte-range of data because it already had some data (e.g.: if the connection was interrupted during the download). So your log may also have a status code 206 and the size won't match the total file size...

Oh, wouldn't it easier if your ISP allowed "full" CGIs?? :-P

Re: File download tool, file size issues, cgi-application
created: 2006-03-02 19:22:41

Thanks to everyone for the help.

perlmonks.org content © perlmonks.org and chanio, cupojoe, idsfa, ikegami, rhesa, salvix

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03