writing Arabic text in Excel
pg09
created: 2006-04-03 14:04:30
I'm trying to read Arabic text from a MS Word document and write extracted text in Excel. I'm using the following code, but it seems to print out 'boxes' instead of Arabic text. Any help will be appreciated.
use Unicode::Map();
use Spreadsheet::WriteExcel;
use Win32::OLE::Const 'Microsoft Word'

my $workbook = Spreadsheet::WriteExcel->new("word.xls");
my $worksheet = $workbook->addworksheet();
my $word = Win32::OLE->new('Word.Application', 'Quit');

my $map = Unicode::Map->new("ISO-8859-6");
my $utf16 = $map->to_unicode($doc->Words->Item(1)->Text);
$worksheet->write(0, 0, $utf16, $format2);
Re: writing Arabic text in Excel
created: 2006-04-03 17:00:31
The Spreadsheet::WriteExcel write() method expects unicode data to be in utf8 format (in perl5.8).

You indicate that the data you are reading from Word is ISO-8859-6 but it may be UTF-16LE, which most Windows applications use internally.

Either way you should try to convert it to UTF-8 instead of UTF-16 if you are using write().

You can also write UTF-16BE and UTF-16LE data using the (poorly named) write_unicode() and write_unicode_le() methods.

--
John.

Re^2: writing Arabic text in Excel
created: 2006-04-03 19:27:08
Thanks for your response! However, I'm not able to install Unicode::Map8 through 'ppm' on my Windows 2000 machine. It complains: "Searching for 'Unicode::Map8' returned no results. Try a broader search first." I'm however sure the module is present in the 'C:\Perl\lib' directory. Please advise.
Re^3: writing Arabic text in Excel
ff
created: 2006-04-03 23:49:07
I haven't gone further than the following, but googling with perl unicode map8 ppm leads to an alternative site of ppm's which seems to have something built on Aug 1, 2003. Elsewhere(?) I notice several ppm's that are built for 5.6

This is G o o g l e's cache of http://apache.hoxt.com/perl/win32-bin/ppmpackages/ as retrieved on Feb 26, 2006 09:28:24 GMT.
Re^3: writing Arabic text in Excel
created: 2006-04-04 04:02:18
I'm not able to install Unicode::Map8 through 'ppm' on my Windows 2000 machine.

If you are using Perl 5.8 you can use the core Encode module to convert between encodings.

See also the perluniintro and perlunicode manpages for more information.

--
John.

Re: writing Arabic text in Excel
created: 2006-04-04 11:42:40
First of all, you should instruct Win32::OLE to use unicode, with the following 2 lines:
use Win32::OLE qw(CP_UTF8);
Win32::OLE->Option(CP=>CP_UTF8);
Secondly, it is not good to use obsoleted Unicode::Map module, it was used when Unicode in Perl was weak, now you should go other, the robust way, of perl5.8.x

thirdly, boxes are probably missing characters in a given font.

BR,
Vadim.









Re^2: writing Arabic text in Excel
created: 2006-04-05 19:00:52
Thanks, this helped! The Arabic text gets printed fine, but the English language symbols such as (), [], ..., etc. show up in the inappropriate places. That is, these symbols show up in the left to write format (as in English) rather than right to left (as in Arabic). Following is my code:
use Win32::OLE qw(CP_UTF8);
Win32::OLE->Option(CP=>CP_UTF8);
use Win32::OLE::Const 'Microsoft Word';
use Spreadsheet::WriteExcel;

my $workbook = Spreadsheet::WriteExcel->new("word.xls");
my $worksheet = $workbook->addworksheet();
my $word = Win32::OLE->new('Word.Application', 'Quit');
my $doc = $word->Documents->Open("C:\\file.doc");

my $string = $doc->Words->Item(1)->Text;
$worksheet->write(0, 0, $string);
Re^3: writing Arabic text in Excel
created: 2006-04-06 17:57:18
I believe your problem is in right-to-left and left-to-right mixed text...
I can't advice many here, but I believe Word is quite good at this, so it probably deserves respect on this :):):)
Re: writing Arabic text in Excel
created: 2006-04-10 16:20:13
I'm extracting 'highlighted' Arabic text from word and outputing it to an Excel file. To do this I'm iterating over each word and checking if it is highlighted. However, this code takes way too long to finish. Is there any better way to do this? Following is the code similar to what I'm using:
use Win32::OLE qw(CP_UTF8);
Win32::OLE->Option(CP=>CP_UTF8);
use Win32::OLE::Const 'Microsoft Word';
use Spreadsheet::WriteExcel;

my $word = Win32::OLE->new('Word.Application', 'Quit');
my $doc = $word->Documents->Open($file);
my $workbook = Spreadsheet::WriteExcel->new($out_file); 
my $worksheet = $workbook->addworksheet();
my $row = 0;
my $col = 0;

for(my $i = 1; $i <= $doc->Words->Count; $i++)
{
   if($doc->Words->Item($i)->HighlightColorIndex > 0) 
   {
      $worksheet->write($row++, $col, 
           $doc->Words->Item($i)->Text);     
   }
}
 
Thank you for any help in advance!
Re^2: writing Arabic text in Excel
created: 2006-04-11 07:39:27

I don't know if it'd be significantly quicker, but you could try using Word's 'find object' to search for the highlighted text, then loop through whatever it returns you.

perlmonks.org content © perlmonks.org and ff, jmcnamara, john_oshea, pg09, vkon

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03