1. Home
  2. EIDR Data Fields
  3. Basic (common to all Content ID records)
  4. Capitalization, Diacriticals, and Ligatures
  1. Home
  2. Best Practices for EIDR Metadata
  3. Capitalization, Diacriticals, and Ligatures

Capitalization, Diacriticals, and Ligatures

When EIDR evaluates a text field (title, name, etc.), it is case insensitive and punctuation is ignored. Capital and lowercase letters are equivalent.[1] EIDR considers accented and un-accented characters identical. Thus, à, Ă, and ä are all the same as “a” and ß is the same as “ss.” The primary implication of this is that a title or name with accented characters or orthographic ligatures[2] will be treated as identical to that title or name without them (and will be rejected as a duplicate by the Registry).

  • Do not include multiple titles, names, etc. that differ only in capitalization, punctuation, diacriticals, or orthographic ligatures.
  • When given the choice, include the accented/ligatured version of a name or title and rely on the Registry to evaluate it the same as the un-accented/ligatured version in search and de-duplication operations.

[1] Even though such fields are case-insensitive, capitalization best practices still apply. Controlled vocabulary lists, such as Associated Org Roles, are controlled by an XML schema. XML is a case-sensitive language, so capitalization does matter in such cases.

[2] Do not use typographic ligatures – those that are purely stylistic rather than linguistic – such as “ffi,” a single character that replaces the three-letter sequence “ffi.”

Updated on April 9, 2021

Was this article helpful?

Related Articles