The framework for mmCIF dictionary is defined by the Dictionary Description Language (DDL). The role of the DDL is to define the data items which may be used to construct the definitions in the mmCIF dictionary, and also to define the relationships between these defining data items. The DDL is expressed in a dictionary using its own definitional content. Browse the content the current version of the DDL dictionary here.
The DDL contains no information about macromolecular structure; rather, it defines data items which can be used to describe other data. The DDL is actually quite generic. It defines data items that describe the general features of a data item like a textual description, a data type, a set of examples, a range of permissible values, or perhaps a discrete set of permitted values.
The DDL combines collections of related data items into categories. A category is essentially a table in which each repetition of the group of related items adds a row. Within a category, those data items which determine the uniqueness of their group are designated as key items in the category. No data item group in a category is allowed to have a set of duplicate values of its key items. Each data item is assigned membership in one or more categories. Parent-child relationships may be specified for items which belong to multiple categories. These relationships permit the expression of the very complicated data structures required to describe macromolecular structure.
The DDL also provides some other levels of data organization in addition to the category. Related categories may be collected together in category groups, and parent relationships may be specified for these groups. This higher level of association provides a means of organizing large complicated collections of categories into smaller, more relevant, and potentially interrelated groups. Within the level of a category, subcategories of data items may be defined among groups of related data items. The subcategory provides a mechanism to identify that, for example, the data items month, day, and year collectively define a date.
The highest levels of data organization provided by the DDL are the data block
and the dictionary. The dictionary level collects a set of related definitions
into a single unit, and provides for a detailed revision history to be maintained
on the collection. The data block level ties the contents of a dictionary
data_ section in which it is contained. The identifier
for the data block and hence the dictionary is added implicitly to the
key of each category.
The following sections provide schematic diagrams of each of the organizational features provided by the DDL. In these diagrams, boxes enclose the the data items within each category. Key data items are preceded by dark dots. Data items common to multiple categories are identified by connecting lines with the arrows pointing at the parent definition of the data item.
In this section, several examples are presented which illustrate how the elements of the DDL are combined into dictionary definitions.
save__citation.journal_abbrev _item_description.description ; Abbreviated name of the journal cited as given in the Chemical Abstracts Service Source Index. ; _item.name '_citation.journal_abbrev' _item.category_id citation _item.mandatory_code no _item_aliases.alias_name '_citation_journal_abbrev' _item_aliases.dictionary cif_core.dic _item_aliases.version 2.0.1 _item_type.code line _item_examples.case 'J. Mol. Biol.' save_
ITEM_DESCRIPTION category holds a text description of each data
item. This is typically written in the form of a definition for the data item.
ITEM category holds the item name, category name and a code indicating
if this item is mandatory in any row of this category. The value of the mandatory code
implicit. The implicit
value is used to indicate that a value is required for the item but it can be derived
from the context of the definition and need not be specified. This feature is most
often used in the DDL to indicate that item name values can be derived from the name
of the save frame in which they are defined.
Note that the value of the
_item.name in the above example is enclosed
in quotation marks. This is a requirement of the mmCIF syntax and avoids confusing data values
with item names.
The mmCIF dictionary contains a superset of the definitions that were originally defined
in the CIF core dictionary,
cif_core.dic. In order to maintain backward
compatibility with original definitions, the
ITEM_ALIASES category was introduced
to hold the item name, dictionary name and version in which the original definition of an
item was published.
ITEM_TYPE category holds a reference to a data type defined in
ITEM_TYPE_LIST category. A reference to the data type is used here
rather that a detailed data type description in order to avoid repeating the description
for other data items. A single list of data types and associated regular expressions
is stored in the
ITEM_TYPE_LIST category and this may be referenced by all of the
definitions in the dictionary. In the mmCIF dictionary, the codes that are used
to described the data types are generally easy to interpret. In this case,
line indicates that a single line of text will be accepted for this data item.
Textual examples of data items can be included in the
ITEM_EXAMPLES category. In
this case only a single example has been provided, but many examples can be provided
by using a
save__cell.length_a _item_description.description ; Unit-cell length a corresponding to the structure reported. ; _item.name '_cell.length_a' _item.category_id cell _item.mandatory_code no _item_aliases.alias_name '_cell_length_a' _item_aliases.dictionary cif_core.dic _item_aliases.version 2.0.1 loop_ _item_dependent.dependent_name '_cell.length_b' '_cell.length_c' loop_ _item_range.maximum _item_range.minimum . 0.0 0.0 0.0 _item_related.related_name '_cell.length_a_esd' _item_related.function_code associated_esd _item_sub_category.id cell_length _item_type.code float _item_type_conditions.code esd _item_units.code angstroms save_ save__cell.length_a_esd _item_description.description ; The estimated standard deviation of _cell.length_a. ; _item.name '_cell.length_a_esd' _item.category_id cell _item.mandatory_code no _item_default.value 0.0 loop_ _item_dependent.dependent_name '_cell.length_b_esd' '_cell.length_c_esd' _item_related.related_name '_cell.length_a' _item_related.function_code associated_value _item_sub_category.id cell_length_esd _item_type.code float _item_units.code angstroms save_
Some data items are only meaningful when expressed as a complete set. The
category is used to store this type of information. Those additional data items within
the category which are required for the meaningful interpretation of the item are
listed in this category. In the above example, the cell lengths in
c directions are defined as dependent items of the
cell length in the
The permissible range of values for a numerical data item are stored in the
ITEM_RANGE category. Each boundary condition is defined as the
non-inclusive range between a pair of mininum and maximum values. If multiple boundary
conditions are specified using the
loop_ directive, then each condition must
be satisfied. A discrete boundary value may be set by assigning the desired value
to both the maximum and minimum value. In the above example, the permissible cell length
range is defined as greater than or equal to zero.
A number of special relationships may be defined between data items.
For those relationships which occur frequently within the dictionary,
the source or function of the relationship has been standardized.
In the example above, this feature is used to identify that the
_cell.length_a_esd is the estimated standard deviation
The recognized relationships are fully described in the DDL definition of the
_item_related.function_code in catetory
The current list includes the following kinds of relationships:
_item_related.related_nameis an alternative expression in terms of its application and attributes to the item in this definition.
_item_related.related_nameis an alternative expression in terms of its application and attributes to the item in this definition. Only one of the alternative forms may be specified.
_item_related.related_namediffers from the defined item only in terms of a convention in its expression.
_item_related.related_namediffers from the defined item only by a known constant.
_item_related.related_namediffers from the defined item only by a arbitrary constant.
_item_related.related_nameis meaningful when associated with the defined item.
_item_related.related_nameis the estimated standard deviation of of the defined item.
Sets of data items within a category may be collected into named subcategories.
ITEM_SUB_CATEGORY is used to store the subcategory membership of a data item.
In the above example, item
_cell.length_a is added to the subcategory
CELL_LENGTH. Although not shown,
_cell.length_c are similarly
added to this subcategory.
ITEM_UNITS category holds the name of the system of units in which
an item is expressed. The name assigned to
refers to a single list of all of the unit types used in the dictionary. This
list is stored in the category
ITEM_UNITS_LIST. Conversion factors
between different systems of units are provided in the data table stored
save_CELL _category.description ; Data items in the CELL category record details about the crystallographic cell parameters. ; _category.id cell _category.mandatory_code no _category_key.name '_cell.entry_id' loop_ _category_group.id 'inclusive_group' 'cell_group' loop_ _category_examples.detail _category_examples.case # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; Example 1 - based on PDB entry 5HVP and laboratory records for the structure corresponding to PDB entry 5HVP ; ; _cell.entry_id '5HVP' _cell.length_a 58.39 _cell.length_a_esd 0.05 _cell.length_b 86.70 _cell.length_b_esd 0.12 _cell.length_c 46.27 _cell.length_c_esd 0.06 _cell.angle_alpha 90.00 _cell.angle_beta 90.00 _cell.angle_gamma 90.00 _cell.volume 234237 _cell.details ; The cell parameters were refined every twenty frames during data integration. The cell lengths given are the mean of 55 such refinements; the esds given are the root mean square deviations of these 55 observations from that mean. ; ; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; Example 2 - based on data set TOZ of Willis, Beckwith & Tozer [(1991). Acta Cryst. C47, 2276-2277]. ; ; _cell.length_a 5.959 _cell.length_a_esd 0.001 _cell.length_b 14.956 _cell.length_b_esd 0.001 _cell.length_c 19.737 _cell.length_c_esd 0.003 _cell.angle_alpha 90.0 _cell.angle_beta 90.0 _cell.angle_gamma 90.0 _cell.volume 1759.0 _cell.volume_esd 0.3 ; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - save_
The name and textual description of a category are stored in the category named
The item (
_category.mandatory_code) indicates if the category must appear in any
data block based on this dictionary.
The list of data items which uniquely identify each row of a category are stored
CATEGORY_KEY category. In the example above, the item
is defined as the category key. This item is a reference to the top level identifier
in the mmCIF dictionary,
_entry.id. Because only a single entry may exist
within an mmCIF data block, this key assignment defines that only a single row may exist
Membership in category groups is stored in the category
category group must have a corresponding definition in the
CATEGORY_GROUP_LIST. In the above example, the
category is assigned a category groups
The former contains other categories which describe properties of the crystallographic cell, and
the latter includes all of the categories in the mmCIF dictionary.
Complete and annotated examples of a category are stored in the
category. The text of the category example is stored in item
and any associated annotation is stored in item
Multiple examples are defined for the
CELL category above.
save_CITATION _category.description ; Data items in the CITATION category record details about the literature cited relevant to the contents of the data block. ; _category.id citation _category.mandatory_code no _category_key.name '_citation.id' loop_ _category_group.id 'inclusive_group' 'citation_group' # # --------- Abbreviated Definition ---------- save_ save__citation.id _item_description.description ; The value of _citation.id must uniquely identify a record in the CITATION list. The _citation.id 'primary' should be used to indicate the citation that the author(s) consider to be the most pertinent to the contents of the data block. Note that this item need not be a number; it can be any unique identifier. ; loop_ _item.name _item.category_id _item.mandatory_code '_citation.id' citation yes '_citation_author.citation_id' citation_author yes '_citation_editor.citation_id' citation_editor yes '_software.citation_id' software yes _item_aliases.alias_name '_citation_id' _item_aliases.dictionary cif_core.dic _item_aliases.version 2.0.1 loop_ _item_linked.child_name _item_linked.parent_name '_citation_author.citation_id' '_citation.id' '_citation_editor.citation_id' '_citation.id' '_software.citation_id' '_citation.id' _item_type.code code loop_ _item_examples.case 'primary' '1' '2' save_ save_CITATION_AUTHOR _category.description ; Data items in the CITATION_AUTHOR category record details about the authors associated with the citations in the CITATION list. ; _category.id citation_author _category.mandatory_code no loop_ _category_key.name '_citation_author.citation_id' '_citation_author.name' loop_ _category_group.id 'inclusive_group' 'citation_group' # # --------- Abbreviated Definition ---------- save_ save__citation_author.citation_id _item_description.description ; This data item is a pointer to _citation.id in the CITATION category. ; _item.name '_citation_author.citation_id' _item.mandatory_code yes _item_aliases.alias_name '_citation_author_citation_id' _item_aliases.dictionary cif_core.dic _item_aliases.version 2.0.1 save_
This example illustrates how an item which occurs in multiple categories
is defined. In the above case of the citation identifier (
ITEM category is preceded by a
loop_ directive, and
within this loop, all of the definitions of the citation identifier are listed.
For instance, the citation identifier is
also an item in category
CITATION_AUTHOR where it has the item name
_citation_author.citation_id. For conformity with the manner in
which the CIF core dictionary has been organized, a skeleton definition of the
child data item
_citation_author.citation_id has been included
in the dictionary. In fact, this skeleton definition is formally unnecessary.
In early mmCIF dictionary definitions, all of the instances of a data item were defined
within the parent item definition. Items which are related to the parent definition are listed in
The repetition of a data item in multiple categories gives rise to
parent-child relationships between such definitions. These relationships are
stored in the
ITEM_LINKED category. In the example above, this category
stores the list of data items which are children of the citation identifier,