1. Home
  2. EIDR Data Fields
  3. Appendix
  4. Data Fields – ASCII Equivalents

Data Fields – ASCII Equivalents

The Registry generates ASCII equivalents for many fields with values that use diacritical characters in Latin scripts (such as u for ü and l for ł). The mapping is based on Unicode NFKD decomposition plus the Latin supplement (Latin-ASCII.xml) from the Unicode Common Locale Data Repository. This applies to the following fields:

BaseObjectData/ResourceName
BaseObjectData/AlternateResourceName
BaseObjectData/AssociatedOrg/DisplayName
BaseObjectData/AssociatedOrg/AlternateName
BaseObjectData/Credits/Director/DisplayName
BaseObjectData/Credits/Actor/DisplayName
BaseObjectData/Description
ExtraObjectMetadata/CompilationInfo/Entry/DisplayName
ExtraObjectMetadata/CompositeInfo/Element/Description
ExtraObjectMetadata/ManifestationInfo/…/Description

NOTE: These fields are automatically whitespace normalized[1] by the Registry. This has no effect on searching or de-duplication, but makes display values more consistent.

NOTE: In addition to enhancing Registry operations so consistent results are returned if one user applies accent marks and another does not, there is also a key Registry data validation implication. Since the Registry ignores diacritical marks, accented and un-accented characters are equivalent for Registry validation, search, and de-duplication. Given the choice, always use the accented version when registering or modifying a record since that tends to be the more correct representation of local language use.


[1] Tabs, carriage returns, non-breaking spaces, etc. are replaced with spaces, multiple spaces in a row are replaced with a single space, and leading and trailing spaces are removed.

Updated on April 11, 2021

Was this article helpful?

Related Articles