1. Home
  2. EIDR Data Fields
  3. Appendix
  4. Data Fields – ASCII Equivalents

Data Fields – ASCII Equivalents

The Registry generates ASCII equivalents for many fields with values that use diacritical characters in Latin scripts (such as u for ü and l for ł). The mapping is based on Unicode NFKD decomposition plus the Latin supplement (Latin-ASCII.xml) from the Unicode Common Locale Data Repository. This applies to the following fields:


NOTE: These fields are automatically whitespace normalized[1] by the Registry. This has no effect on searching or de-duplication, but makes display values more consistent.

NOTE: In addition to enhancing Registry operations so consistent results are returned if one user applies accent marks and another does not, there is also a key Registry data validation implication. Since the Registry ignores diacritical marks, accented and un-accented characters are equivalent for Registry validation, search, and de-duplication. Given the choice, always use the accented version when registering or modifying a record since that tends to be the more correct representation of local language use.

[1] Tabs, carriage returns, non-breaking spaces, etc. are replaced with spaces, multiple spaces in a row are replaced with a single space, and leading and trailing spaces are removed.

Updated on April 11, 2021

Was this article helpful?

Related Articles