Count number or words in PDF
anniyan
created: 2006-01-05 06:31:48

Dear Monks,

I need to count the number of words in the PDF file.

What i tried is, i saved the PDF file as word file and using CPAN://Win32::OLE i added a vba macro to count the number of words. The problem is, it is taking very large time to save pdf as word document.

Earlier i did some work using CPAN://CAM::PDF, in that there is no method to identify number of words.

Is it possible to count the number of words with word application? If so is there any other module to accomplish this? Even if there is any module to 'save as' pdf to word in perl please guide.

Regards,
Anniyan
(CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

Re: Count number or words in PDF
created: 2006-01-05 06:38:17
anniyan,

CAM::PDF::PageText looks as tho it will extract the text from a PDF, once you have this you can then count the number of words. Failing that, pdftotext is available as part of the Xpdf suite here, you could convert the pdf file to a text file and count the number of words within it.

Hope this helps.

Martin
Re: Count number or words in PDF
created: 2006-01-05 07:37:09
There is ps2ascii, part of Ghostscript tools. Windows versions are available.

I'm not really a human, but I play one on earth. flash japh
Re: Count number or words in PDF
created: 2006-01-05 12:51:48
Why are you trying to convert it into a Word file anyway? I guess the reason it must be taking time to convert the PDF to a Word document is because of the text formatting involved.

A better way could be converting the PDF into a plaintext file and then going ahead.

perlmonks.org content © perlmonks.org and anniyan, marto, Truman, zentara

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03