PDB File Format Explained — How to Read Protein Structure Files
What is a PDB file?
PDB (Protein Data Bank) is the standard file format for storing 3D coordinates of biomolecular structures. Every protein structure you see in a molecular viewer is stored in either the classic PDB format or the newer mmCIF format. The PDB format uses fixed-width columns (an 80-character-per-line format inherited from punch cards), while mmCIF uses a more flexible key-value structure.
Anatomy of a PDB file
A PDB file contains several types of records, each identified by a keyword in the first 6 columns. The most important ones are HEADER (protein name and date), ATOM (coordinates of each atom), HETATM (coordinates of non-standard residues like ligands and water), and CONECT (bond connections between atoms).
Reading ATOM records
The ATOM record is where the structural data lives. Each line contains one atom's information in a strict column format: the atom serial number (columns 7-11), atom name (13-16), residue name (18-20), chain ID (22), residue sequence number (23-26), and the X, Y, Z coordinates in Angstroms (31-38, 39-46, 47-54). The final columns contain the occupancy (55-60) and temperature factor or B-factor (61-66).
For example, the line ATOM 1 N MET A 1 27.340 24.430 2.614 means: the first atom is a nitrogen (N) in a methionine (MET) residue, chain A, residue number 1, at coordinates (27.340, 24.430, 2.614) Angstroms.
Understanding B-factors
In experimental structures, the B-factor (temperature factor) indicates how much an atom vibrates or how uncertain its position is. High B-factors mean more uncertainty. In AlphaFold predictions, the B-factor column stores the pLDDT confidence score instead — a value from 0 to 100 indicating how confident the prediction is for that residue. This is why our structure viewer can color by confidence — it's reading the B-factor column.
PDB vs mmCIF
The PDB format has a hard limit of 99,999 atoms and 62 chains (limited by the column widths). For larger structures like ribosomes, the newer mmCIF (macromolecular Crystallographic Information File) format is required. mmCIF uses a dictionary-based approach with labeled data categories, making it more extensible and less error-prone. Most modern tools support both formats.
You can download both PDB and mmCIF files for any protein on our structure predictor.
Search, visualize, and download protein structures in PDB and mmCIF format.
Try the Structure Predictor