PDBx/mmCIF FAQ

What is the formal syntax specification for PDBx/mmCIF?

The PDBx/mmCIF data files produced by the wwPDB conform to both the CIF 1.0 and 1.1 syntax specifications. The current syntax specification for CIF 1.1 is maintained at the IUCr CIF site.

Where can I find PDBx/mmCIF data files for PDB entries?

PDB entries in PDBx/mmCIF format are stored on the ftp sites of the wwPDB partners at one of the locations:

Entries containing very large structures stored PDBx/mmCIF format are currently stored separately one of the locations:

The PDB structure entry files in PDBx/mmCIF format are named following the convention <PDB_4-LETTER-ID_CODE>:.cif.gz (e.g. 1abc.ent.gz). Experimental data files containing X-ray structure factors are only distributed in PDBx/mmCIF format and are named following an older PDB naming convention r<PDB_ID_CODE>:.ent.gz (e.g. r1abc.ent.gz).

A complete description of the download options for PDB data files is maintained at here by the wwPDB. The special handling of PDB entries containing very large structures is available here.

Are PDBx/mmCIF files hard to read? What's the syntax look like?

The PDBx/mmCIF format has a simple appearence with only a few syntax elements. All of syntax elements used in PDB data files are shown in the following snippet describing polymer sequence.

The essential syntax features include:

All data items are identified by name and begin with the underscore character, _entity_poly.entity_id.
Data item names can be decomposed into a category name and an attribute name, _category.attribute which are separated by a period.
Data categories are presented in two styles: key-value and tabular. In the example, categories entity_name_com and entity_poly both use the key-value style and the entity_poly_seq category uses the tabular style. In the tabular sytle, the data item names correpsonding to the table columns follow a reserved loop_ token which are followed by the rows of data rows of white-space delimited data values.
Any character data value may be quoted using encapsulating single or double quotes; however, character values containing internal whitespace (e.g. the value of _entity_name_com.name) must be quoted. Character values that extend over multiple lines are quoted using leading and trailing semi-colons positioned at the first character position of the records surronding the multi-line character value (e.g._entity_poly.pdbx_seq_one_letter_code).
Lines beginning with the hash symbol # are comments.

Look here for a more complete description of PDBx/mmCIF data file and dictionary syntax.

#  <-- a comment line 
_entity_name_com.entity_id  1
_entity_name_com.name       "Pantoate--beta-alanine ligase, Pantoate-activating enzyme"
 
_entity_poly.entity_id                      1 
_entity_poly.type                           'polypeptide(L)' 
_entity_poly.nstd_linkage                   no 
_entity_poly.nstd_monomer                   no 
_entity_poly.pdbx_seq_one_letter_code       
;AMAIPAFHPGELNVYSAPGDVADVSRALRLTGRRVMLVPTMGALHEGHLALVRAAKRVPGSVVVVSIFVNPMQFGAGGDL
DAYPRTPDDDLAQLRAEGVEIAFTPTTAAMYPDGLRTTVQPGPLAAELEGGPRPTHFAGVLTVVLKLLQIVRPDRVFFGE
KDYQQLVLIRQLVADFNLDVAVVGVPTVREADGLAMSSRNRYLDPAQRAAAVALSAALTAAAHAATAGAQAALDAARAVL
DAAPGVAVDYLELRDIGLGPMPLNGSGRLLVAARLGTTRLLDNIAIEIGTFAGTDRPDGYR
;

# 
loop_
_entity_poly_seq.entity_id 
_entity_poly_seq.num 
_entity_poly_seq.mon_id 
_entity_poly_seq.hetero 
1 1   ALA n 
1 2   MET n 
1 3   ALA n 
1 4   ILE n 
1 5   PRO n 
1 6   ALA n 
1 7   PHE n 
# ....  abbreviated ....