ABOUT THIS EXAMPLE:

This example shows one way of using the information about a partner atom in a connection, detailed in the the struct_conn category, to identify the atom in the atom_site category, and, in this case, to determine the (x,y,z) Cartesian coordinates of said atom. Given that the struct_conn table contains all of the information necessary to uniquely identify a partner atom, the basic idea is to find the row in the atom_site table that contains the exact same identifying information. In this case, we look for partner atoms involved in covalent bonds and report their (x,y,z) coordinates, although this program is easily extended to other connection types or connections involving specific atoms.

BUILD INSTRUCTIONS:

Files: Connections3.C, 5HVP.cif

		  Save Connections3.C to /path/to/cifparse-obj-vX.X-prod-src/parser-test-app-vX.X/src/
		  Save the CIF file anywhere, e.g., /path/to/cifparse-obj-vX.X-prod-src/parser-test-app-vX.X/bin/
		  Add Connections3.ext to the BASE_MAIN_FILES list in the Makefile in /path/to/cifparse-obj.vX.X-prod-src/parser-test-app-vX.X
		  Execute make in the same directory as the Makefile
		  cd to bin, where the executable has been made, and run ./Connections3 /path/to/5HVP.cif
		

Functions of Note

#include "CifFile.h"
string CifFile::GetFirstBlockName()
Returns the first data block name. CifFile inherits this method from TableView. Related: CifFile::GetBlockNames(vector<string>& blockNames).
Block& CifFile::GetBlock(const string& blockName)
Retrieves a data block specified by some blockName. CifFile inherits this method from TableView.
ISTable& Block::GetTable(const string& name)
Retrieves a table (i.e., category) within the block, specified by some name.
#include "ISTable.h"
unsigned int ISTable::GetNumRows()
Returns the numbers of rows in the table (i.e., category).
const string& operator()(const unsigned int rowIndex, const string colName)
Returns the value of the attribute colName at row index rowIndex
void Search(vector<unsigned int>& res, const vector<string>& targets, const vector<string>& colNames, const unsigned int fromRowIndex = 0, const eSearchDir searchDir = eFORWARD, const eSearchType searchType = eEQUAL, const string& indexName = string())
Propagates res with the indices of every row whose attributes colNames have the values targets.
vector<string>& ISTable::GetRow(const unsigned int rowIndex)
Returns the row in the zero-indexed category table specified by some rowIndex. Related: void ISTable::GetRow(vector<string>& row, const unsigned int rowIndex, const string& fromColName = string(), const string& toColName = string()).

Basic Sample Output

./Connections3 5HVP.cif
Found a covalent bond between atoms located at: (10.978, 24.553, 5.700) & (12.126, 24.878, 5.008)
Found a covalent bond between atoms located at: (15.234, 20.788, 5.209) & (15.468, 22.108, 5.234)
Found a covalent bond between atoms located at: (14.767, 21.286, 5.304) & (15.827, 20.674, 4.766)
Found a covalent bond between atoms located at: (11.557, 24.225, 5.627) & (10.971, 25.421, 5.774)
Found a covalent bond between atoms located at: (17.704, 19.162, 5.397) & (17.913, 18.389, 6.465)
Found a covalent bond between atoms located at: (8.709, 26.437, 5.751) & (7.475, 25.908, 5.659)
	  
 
/*************************
* Connections3.C
*
* For some CIF file, determine the (x, y, z) Cartesian coordinates
* of every atom involved in a covalent linkage.
*
* Method: Using the identifying information in the struct_conn category table,
* whittle down the set of possible indices in the atom_site category table to one.
*
* Highlighted lines contain footnoted references or explanations.
*************************/

#include <iostream>
#include <map>
#include <string>
#include <vector>

#include "CifFile.h"
#include "CifParserBase.h"
#include "ISTable.h"

void showUsage();

int main(int argc, char **argv)
{
    if (argc != 2)
    {
        showUsage();
    }

    // The name of the CIF file
    string cifFileName = argv[1];
    
    // A string to hold any parsing diagnostics
    string diagnostics;

    // Create CIF file and parser objects
    CifFile *cifFileP = new CifFile;
    CifParser *cifParserP = new CifParser(cifFileP);

    // Parse the CIF file
    cifParserP->Parse(cifFileName, diagnostics);

    // Delete the CIF parser, as it is no longer needed
    delete cifParserP;

    // Display any diagnostics
    if (!diagnostics.empty())
    {
        std::cout << "Diagnostics: " << std::endl << diagnostics << std::endl;
    }

    // Get the first data block name in the CIF file
    string firstBlockName = cifFileP->GetFirstBlockName(); 

    // Retrieve the first data block
    Block &block = cifFileP->GetBlock(firstBlockName); 

    // Retrieve the table corresponding to the struct_conn category, which delineates connections1
    ISTable& struct_conn = block.GetTable("struct_conn");
 
    // Retrieve the table corresponding to the atom_site category, which describes atomic constituents2
    ISTable& atom_site = block.GetTable("atom_site");
     
    // Iterate through every row in the struct_conn category table, where each row delineates an interatomic connection
    for (unsigned int i = 0; i < struct_conn.GetNumRows(); ++i)
    {
        // Verify that the linkage is covalent3
        if (struct_conn(i, "conn_type_id") != "covale")
        {
            continue;
        }
		
        std::cout << "\nFound a covalent bond between atoms located at: ";
         
        // Analyze the current row twice, once per partner
        for (unsigned int j = 0; j < 2; ++j)
        {  
            // Determine which partner we are dealing with
            string partner = (!j) ? "ptnr1_" : "ptnr2_";
 
            // Will hold the index of the atom in the atom_site category table
            vector<unsigned int> results;
			
            // Holds the attribute names and target values for these attributes4
            vector<string> colNames, targets;
			
            colNames.push_back("label_alt_id");
            targets.push_back(struct_conn(i, "pdbx_" + partner + "label_alt_id"));
			
            colNames.push_back("auth_asym_id");
            targets.push_back(struct_conn(i, partner + "auth_asym_id"));
		
            colNames.push_back("label_atom_id");
            targets.push_back(struct_conn(i, partner + "label_atom_id"));
		
            colNames.push_back("auth_comp_id");
            targets.push_back(struct_conn(i, partner + "auth_comp_id"));
			
            colNames.push_back("auth_seq_id");
            targets.push_back(struct_conn(i, partner + "auth_seq_id"));
			
            // Perform a search on the atom_site table using the atom's unique identification information
            atom_site.Search(results, targets, colNames);
			
            // Retrieve and display the atom's coordinates 
            vector<string> temp;
            atom_site.GetRow(temp, results[0], "Cartn_x", "Cartn_z");5
            std::cout << "(" + temp[0] + ", " + temp[1] + ", " + temp[2] + ")";

            // Add an 'and' before partner 2
            if (!j)
            {
                std::cout << " & ";
            }
        }
    }
    std::cout << std::endl;
    return 0;
}

void showUsage()
{
    std::cout << "Usage: ./Connections3 /path/to/file.cif" << std::endl;
    exit(1);
}

NOTES AND REFERENCES

  1. http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/struct_conn.html
  2. http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html
  3. For an enumeration of the connection types and their descriptions, see: http://mmcif.wwpdb.org/dictionaries/mmcif_std.dic/Items/_struct_conn_type.id.html
  4. Note that for brevity we are assuming that author-provided values, which are non-mandatory but commonly present, exist for three of these attributes (viz., asym_id, comp_id, seq_id), and that the alt_id, also non-mandatory, is both present and necessary to identify each partner atom. In a more extensive program, these are easily accounted for with ISTable::IsColumnPresent(const string& columnName), which returns a bool indicating the presence or absence of some column specified by columnName. Note also that while some columns may be present, their values may be "?", which indicates a missing data item value, or ".", which indicates that there is no appropriate value for that data item or that it has been intentionally omitted.
  5. Note that in many CIF files the x, y, and z in Cartn_x, Cartn_y, Cartn_z are capitalized.