Content ID Data Model

Background

EIDR Content IDs identify audiovisual works and their derived versions, representations, and collections independent of distribution channel, ownership, or subsequent re-use for another purpose. Regardless of the EIDR record type or its position in the registration tree, all EIDR IDs look the same and are assigned randomly from the pool of available identifiers.

A Content ID consists of a DOI prefix (“10.5240”), which identifies the DOI as an EIDR Content ID, a separating slash (“/”), a suffix of 20 hexadecimal digits (presented in groups of 4, separated by a hyphen: e.g., CA2D-7927-3635-DD8D-7D75), and a check digit (which can be used to make sure the ID has been transmitted correctly).

Figure 1: EIDR Content ID Structure

Combined together, they form a Content ID: e.g., 10.5240/CA2D-7927-3635-DD8D-7D75-U (the ID for the 1964 classic movie, Santa Claus Conquers the Martians). These differ from EIDR Video Service IDs (e.g., 10.5239/CBAD-899D for PBS Vermont), which identify the service or channel by which audiovisual works are delivered to consumers.

Accompanying each Content ID is a selection of descriptive metadata designed to aid search, discovery, and de-duplication. Values include Title, Alternate Title, Release Date, selected Participants (production companies, directors, and actors), etc. In addition, each Content ID record can include an unlimited number of Alternate Identifiers, including third-party house IDs, publicly resolvable IDs, etc.

EDIR provides a public Content ID search service at ui.eidr.org and a similarly configured Sandbox system for training and integration testing at sandboxui.eidr.org.

Overview

The EIDR Content ID registry’s data model can be quite intimidating at first, but it need not be so. It is necessarily complex, since it must accommodate record types and practices across the global media & entertainment industry, but the elements that apply to any particular situation are limited and backed by published guidelines that normalize registration practice.

The EIDR Content ID Registry implements a simple registration tree structure with four basic record types:

  • Collection – a grouping record such as a Series, Season, or Compilation.
  • Abstraction – an abstract work in its most general form, including movies, episodes, and TV specials.
  • Edit – creative changes to a work, including both complete versions and clips.
  • Manifestation – technical representations and encodings, including language versions (“subs and dubs”).

Figure 2: A Typical Feature Film Hierarchy

There are also record relationships such as isPackagingOf and isSupplementalTo to link related records.

Episodic records are common enough that they have their own Collection types: Series and Season.[1] The Abstraction records in an episodic registration tree are called Episodes.

Figure 3: A Typical Episodic Hierarchy

All EIDR Content records[2] share a common set of Base Object Data. Child records inherit many Base Object Data values from their parent and generally supply only self-defined values for those data elements that are unique to the child. For example, an Edit may inherit its parent’s title (when it does not have a unique title of its own) while providing its own duration (since each cut may have a slightly different runtime). Certain record types extend the Base Object Data with Extra Object Metadata. This is where you will find data elements unique to an Episode or Edit along with relationship information that links one record to another, such as the included content ID list for a Composite record or the target ID of an isPromotionFor relationship.

All EIDR records are available from the Registry as XML data according to the structure defined by the EIDR XML Schemas. The EIDR Registry imposes further data validation business rules that cannot be expressed in the schemas.[3] The Registry validation rules help tailor the generic data representations to specific situations and ensure a base level of data quality. Most EIDR users are shielded from the raw Registry XML by the systems they use to interact with the Registry, such as their own in-house master title management systems or the EIDR Web UI.[4] Even if you never see the underlying XML, every EIDR user should be familiar with the EIDR data structures and allowed values as they apply to the user’s particular area of interest.

EIDR IDs themselves are always unique – the same ID is never issued twice. It is also important that the item the ID references is also unique, so every time an EIDR record is added to the Registry or an existing record is updated, it passes through de-duplication review. This helps ensure that the assigned EIDR ID references also remain unique.[5]

The two best pieces of advice when working with the EIDR Registry to obtain Content IDs are:

  • “Start at the top and work down.”
  • “Go as far as you need to go, then stop.”

Putting these together, when searching for existing registrations or registering new records, always start at the top with the Abstraction record. For episodic works, start with the Series Collection, then the Season Collection (if applicable), and then the Episodes (all Abstractions). Only after that should you work through Edits and Manifestations. Go as deep into the EIDR hierarchy as you need to go for your current business needs, but do not add unnecessary complication by registering more granular record types than your process actually requires. They can always be added later.

For example, if your current application requires Abstraction records but you do not have a current need for Edits, focus on the Abstraction records for now and leave the Edits for another day. Similarly, if you need Edit IDs to support a current avails workflow, but do not yet need Manifestation IDs for the deliverables in the fulfillment chain, then register the Abstraction and Edit records now and leave the Manifestations for later. Of course, if you need Manifestations or Clips, register them, but add them to your process after you have first mastered Abstraction records and Edits.

Further Reading

For more information see:

Footnotes

[1] Using North American nomenclature rather than European where a Programme or Brand is equivalent to an EIDR Series and a Series is equivalent to an EIDR Season.

[2] EIDR also maintains separate Registries for Party, User, and Video Service records. This document focuses on the EIDR Content Record Registry.

[3] There are certain data validation rules that an XML schema cannot represent directly, such as conditional requirements: e.g., if a record is identified as a Short, then its duration must be ≤ 40 minutes.

[4] At https://ui.eidr.org/.

[5] A given abstract work has only one EIDR Abstraction ID and an EIDR Abstraction ID refers to only one abstract work – often referred to as a “one-to-one” relationship. The strict enforcement of this rule is relaxed in the lower levels of the EIDR hierarchy where a “one-to-many” relationship may be acceptable: e.g., since it is so difficult to uniquely and unambiguously describe every encoding of every edit of a work, it is often sufficient that an EIDR Manifestation ID refer to only one encoding while allowing for the possibility that a given encoding may have been assigned more than one EIDR Manifestation ID at different times and for different purposes.

Updated on March 9, 2021

Was this article helpful?

Related Articles