Protein Data Bank Exchange macromolecular Crystallographic Information Framework, PDBx/mmCIF, provides the foundation for the deposition, annotation, and archiving of structural data across various experimental techniques.
PDBx/mmCIF uses data blocks to organize related information and data. A data block is a logical partition of a data file designated by a data_ record. A data block may be named by appending a text string after the data_ record and a data block is terminated by either another data_ record or by the end of the file.
An example of identifying data block at the beginning of the model file for the PDB entry 4HHB:
data_4HHB | |
---|---|
# | |
_entry.id | 4HHB |
# | |
_audit_conform.dict_name | mmcif_pdbx.dic |
_audit_conform.dict_version | 5.367 |
_audit_conform.dict_location | http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic |
# |
PDBx/mmCIF format utilizes the ASCII character set. All data items are identified by name, begin with the underscore character, and are composed of a category name followed by an attribute name. The category name is separated from the attribute name by a period.
An example of PDBx/mmCIF data item (_category.attribute):
_entity.id
Data items are presented in two styles: key-value and tabular.
An example of a key-value style where the PDBx/mmCIF item is followed directly by a corresponding value:
_cell.entry_id | 4HHB |
_cell.length_a | 63.150 |
_cell.length_b | 83.590 |
_cell.length_c | 53.800 |
_cell.angle_alpha | 90.00 |
_cell.angle_beta | 99.34 |
_cell.angle_gamma | 90.00 |
_cell.Z_PDB | 4 |
An example of a tabular style used when there are multiple values for each item. In this style, a loop_ record is followed by rows of data item names and then white-space delimited data values:
loop_ | ||
---|---|---|
_audit_author.name | ||
_audit_author.pdbx_ordinal | ||
_audit_author.identifier_ORCID | ||
'Fermi, G.' | 1 | 0000-000x-xxxx-xxxx |
'Perutz, M.F' | 2 | 0000-000x-xxxx-xxxx |
The hash symbol (#) is used to separate categories to improve readability, but is not strictly necessary. It is also used to indicate comments.
Numbers and single-word data values (i.e., those not containing white space) are listed by themselves:
_cell.length_a | 63.150 |
A single value composed of multiple words separated by white-space need to be quoted:
_audit_author.name | 'Fermi, G.' |
A single value encompassing multiple line data values can be listed on a new line within a pair of semicolons:
loop_ |
_entity_poly.entity_id |
_entity_poly.type |
_entity_poly.nstd_linkage |
_entity_poly.nstd_monomer |
_entity_poly.pdbx_seq_one_letter_code |
_entity_poly.pdbx_seq_one_letter_code_can |
_entity_poly.pdbx_strand_id |
_entity_poly.pdbx_target_identifier |
1 'polypeptide (L)' no no |
;VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD |
LHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR |
; |
;VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD |
LHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR |
; |
A,C ? |
There are two special characters used as placeholders for mmCIF item values which for some reason cannot be explicitly assigned. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted.
The PDBx/mmCIF Dictionary supports primary data types (integers, real numbers, and text), defines boundary conditions and controlled vocabularies, and provides the ability to link data items together to express relationships (e.g., parent-child related data items).
For example, the entity identifier assigned to a molecule in the "parent" _entity data category (shown here) is referred to in the "child" categories (as shown in the subsequent example):
loop_ | |||||||||
_entity.id | |||||||||
_entity.type | |||||||||
_entity.src_method | |||||||||
_entity.pdbx_description | |||||||||
_entity.formula_weight | |||||||||
_entity.pdbx_number_of_molecules | |||||||||
_entity.pdbx_ec | |||||||||
_entity.pdbx_mutation | |||||||||
_entity.pdbx_fragment | |||||||||
_entity.details | |||||||||
1 | polymer | nat | 'Hemoglobin subunit alpha' | 14981.087 | 1 | ? | ? | ? | ? |
2 | polymer | nat | 'Hemoglobin subunit beta' | 16032.274 | 1 | ? | ? | ? | ? |
3 | non-polymer | syn | 'PROTOPORPHYRIN IX CONTAINING FE' | 616.487 | 2 | ? | ? | ? | ? |
4 | water | nat | water | 18.015 | 93 | ? | ? | ? | ? |
Example of a "child" category describing the source information for each polymer entity listed in the above "parent" category:
loop_ | |||||||
_entity_src_nat.entity_id | |||||||
_entity_src_nat.pdbx_src_id | |||||||
_entity_src_nat.pdbx_alt_source_flag | |||||||
_entity_src_nat.pdbx_beg_seq_num | |||||||
_entity_src_nat.pdbx_end_seq_num | |||||||
_entity_src_nat.common_name | |||||||
_entity_src_nat.pdbx_organism_scientific | |||||||
_entity_src_nat.pdbx_ncbi_taxonomy_id | |||||||
1 | 1 | sample | 1 | 140 | horse | 'Equus caballus' | 9796 |
2 | 1 | sample | 1 | 146 | horse | 'Equus caballus' | 9796 |
The primary PDBx/mmCIF resource mmcif.wwpdb.org contains all relevant data dictionaries and documentation, as well as a detailed description of the format's development and history. The below sections present some key PDBx/mmCIF categories with descriptions and examples, and aim to help users to understand and adopt the PDBx/mmCIF format.