Regular Expressions
These fields are used for the Identifier Schema record type only.
FAIRsharing records for identifier schemas include two distinct regular expression (regex) fields to support consistent validation and matching across systems.
The regular expression field captures the 'canonical' identifier string — that is, the identifier as defined by its schema specification, independent of any transport mechanism, resolver URL, or display formatting. This regex should match only the intrinsic identifier syntax governed by the issuing Authority. For example, for DOIs this would match the DOI name beginning with 10. (e.g. 10.1234/abcd), not the resolver form https://doi.org/10.1234/abcd.
The secondary regular expressions field allows additional patterns that match commonly encountered, non-canonical representations of the same identifier. These may include resolver URLs (e.g. https://doi.org/10.1234/abcd), legacy resolver domains, URN forms, or other widely used serialisations. These patterns exist to support identifier recognition in real-world data, where identifiers frequently appear embedded in URLs or prefixed forms. Alternative regexes should not redefine the identifier syntax itself; rather, they provide practical matching support while preserving the authoritative schema definition expressed in the primary regex.
Maintaining this separation also supports consistent evaluation of Globally Unique, Persistent, and Resolvable Identifier Schemas (GUPRI), by distinguishing intrinsic identifier structure from resolver infrastructure. See our community alignment and GUPRI pages for more information about other identifier schema properties.
The community can use our API to identify which id schema(s) a particular identifier string belongs to.
Last updated
Was this helpful?
