For example: IBM, CLF, ESR, MS, ST.
I'm not a linguist so I don't have any idea of how to search for this so any starting point would be helpful.
Good luck.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
--
[ e d @ h a l l e y . c c ]
The dictionary definition (external link) of "abbreviate" means simply "to make shorter". From that definition, I would say that an acronym is a specific type of abbreviation.
The dictionary defintion (another external link) of "acronym" could be interpreted to mean what you say, but I don't think it has to be. The defintion given above says nothing about being able to pronounce the series of letters, just that it is a word formed by the first letters of the word series (or parts of the word series), which would include "IBM".
I typicaly think of an abbreviation as a shortening of a single word, such as "Inc."
----
send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.
something like Lingua::EN::Sentence might be a good start (but it only has a tiny list of acronymns/abbreviations)
Joost.
Someone once told me that there are certain rules of thumb for determining if something is pronounceable.
For example: If no vowels it's an acronym/abbreviation. Certain letter combinations are never seen for a real word... I wish there was a way to find this rulebase.
[http://ftp.gnu.org/pub/gnu/vera/vera-1.9.tar.gz|vera] is a text database of acronyms available at [http://www.gnu.org|gnu.org]. It would be pretty easy to take the initial character of your candidate acronym, lowercase it and search the vera.? file for a match.
my $meaning;
if ( /\b([A-Z]?)\b/ ) {
open my $fh, '<', '/path/to/vera/vera.' . lc( substr $1, 0, 1)
or die $!;
while (<$fh>) {
$meaning = <$fh> and last if /$1/;
}
}
If you want to do a lot of that, it would probably pay to set up a simple database of the vera data. Perl can do that for you, too.
After Compline,
Zaxo
The problem with any/all solutions to this problem is that of acceptable usage. For example, there are technical documents who would refer to SCUBA but travel brochures that would refer to scuba. Similarly scientists work with LASER but companies sell laser devices.
You cant create rules about pronouncability because both 'laser' and 'scuba' are pronouncable. 'Sky' contains no vowels and so might/might not be an acronym. Qantas contains no 'u' after the Q, nor does Iraq. Qantas is the "Queensland and Northern Territory Aerial Service" however, noone uses that outside of a trivia game these days. Its considered a word. Iraq, of course, is from another language and doesn't follow English rules.
Databases like vera can help you with a list of known acronyms but in Real Estate advertising 'LUG' is a lock-up-garage whereas in a mechanical journal a 'lug' is a type of nut.
Given all this, when parsing user text I normally require that there be one or more lower-case letters in users input. That way I know they didn't just type it with the Caps-Lock button on. If it's all in capitals I'll ask them to change or confirm what they've entered.
perlmonks.org content © perlmonks.org and andyf, Anonymous Monk, BigLug, blyman, dragonchild, halley, hardburn, Jasper, Joost, Zaxo
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03