What’s In a Name? Transliteration and Variation



What’s In a Name? Transliteration and Variation

As the international community worked to freeze the assets of Muammar Gaddafi they ran into a problem. How do you spell it? Is it Gadhafi, Gaddafi, or Qaddafi? And then they realized there are over 100,000 different ways to spell the name.

The problem is transliteration. Converting Arabic names into a Romanized form varies depending on who did it and in which country. There is no standard spelling. Check out this image from the Wall Street Journal.

How do you search for and freeze the assets of a person whose name you don’t really know how to spell?

image

It is impractical to search on every variation of the name because almost certainly you will misspell it or miss some variation. You need technology to be able to make this conversion for you.

Ideally you should be able to submit a name – Muammar Gaddafi – and the search system automatically understands the cultural derivation and all the other possible variations. It finds Gheddafi, Qadhafi, Al-Gaddafi, El-Qaddafi, as well as the variations in the first name and multi-part variations.

This isn’t really a data quality problem, as these are all perfectly valid ways to spell the name. Algorithms that search on the text won’t do either for obvious reasons.

Technology like IBM’s InfoSphere Global Name Recognition is the answer. InfoSphere Global Name Recognition (GNR) helps the name search problem by applying a unique linguistic-based approach to the name that determines things like the gender, cultural heritage, and proper parsing. Its approach allows you to find the name variations that come as a result of transliteration.

Freezing the assets of Gaddafi is a great example of the multi-cultural name problem. Do you have any interesting examples?



Practical International Data Management Online.  A free resource from GRC Data Intelligence. For comments, questions or feedback: pidm@grcdi.nl