记录一下学习分子动力学(Gromacs)的流程,希望给后人提供一点帮助。
工作环境 Gromacs 需要在Linux下安装,再次不赘述如何安装ubuntu双系统/子系统的方法,可以自行搜索。
1 2 3 4 5 conda create --name gmx python=3.7.3 conda activate gmx conda install gromacs
1 2 3 4 5 conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ conda config --set show_channel_urls yes conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
1 conda install networkx=2.3
avogadro 用于给配体加氢操作,访问avogadro 下载安装。
CHARMM36 访问网站 ,下载CHARMM36力场,访问网站 ,下载对应cgenff_charmm2gmx脚本,解压至工作目录。
sort_mol2_bonds.pl 处理分子的脚本文件,访问此处 下载,本项目选择的是cgenff_charmm2gmx_py3_nx2.py,解压至工作目录。
操作步骤 生成拓扑
将对接复合物的pdb文件以文本格式打开,复合物前面为小分子,剪切出来新建为mol.pdb文件,剩下部分保存为pro.pdb文件;
使用avogadro打开mol.pdb文件,构建—氢原子—添加氢原子,文件—输出为mol.mol2文件
1 2 3 4 cd dygmx pdb2gmx -f pro.pdb -o protein_processed.gro -ignh perl sort_mol2_bonds.pl mol.mol2 mol_fix.mol2
文本编辑器打开mol2文件,把@MOLECULE下的星号替换成正确残基名,为了方便演示,这里我暂定义为resi。
访问CGenFF ,需用教育网邮箱注册,将上面fixed的mol2文件上传,转换为str文件下载。
1 2 python cgenff_charmm2gmx_py3_nx2.py resi mol_fix.mol2 mol_fix.str charmm36-jul2022.ff gmx editconf -f resi_ini.pdb -o resi.gro
此时已经生成了两个gro文件,新建文本文档,命名为complex.gro,首先将蛋白质的gro文件全部内容复制过去,再将小分子第三行到倒数第二行复制,插入到先前蛋白质的最后一行以前。下面为为示例数据:
1 2 3 4 163ASN C 1691 0.621 -0.740 -0.126 163ASN O1 1692 0.624 -0.616 -0.140 163ASN O2 1693 0.683 -0.703 -0.011 5.99500 5.19182 9.66100 0.00000 0.00000 -2.99750 0.00000 0.00000 0.00000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 163ASN C 1691 0.621 -0.740 -0.126 163ASN O1 1692 0.624 -0.616 -0.140 163ASN O2 1693 0.683 -0.703 -0.011 1JZ4 C4 1 2.429 -2.412 -0.007 1JZ4C7 2 2.155 -2.721 -0.411 1JZ4C8 3 2.207 -2.675 -0.533 1JZ4 C9 4 2.267 -2.551 -0.545 1JZ4 C10 5 2.277 -2.473 -0.430 1JZ4 C11 6 2.169 -2.646 -0.295 1JZ4 C12 7 2.229 -2.519 -0.308 1JZ4 C13 8 2.246 -2.441 -0.181 1JZ4 C14 9 2.392 -2.470 -0.139 1JZ4 OAB 10 2.341 -2.354 -0.434 1JZ4 H1 11 2.531 -2.436 0.015 1JZ4 H2 12 2.366 -2.453 0.069 1JZ4 H3 13 2.417 -2.306 -0.010 1JZ4 H4 14 2.107 -2.812 -0.407 1JZ4 H5 15 2.199 -2.735 -0.617 1JZ4H6 16 2.304 -2.518 -0.635 1JZ4H7 17 2.137 -2.681 -0.204 1JZ4 H8 18 2.178 -2.476 -0.106 1JZ4 H9 19 2.227 -2.337 -0.193 1JZ4 H10 20 2.458 -2.429 -0.214 1JZ4 H11 21 2.402 -2.577 -0.131 1JZ4 H12 22 2.374 -2.321 -0.516 5.99500 5.19182 9.66100 0.00000 0.00000 -2.99750 0.00000 0.00000 0.00000
此时复合物文件的原子数为2个文件之和,因此需要在第2行把2个文件原子数相加并保存。
再打开topol.top文件,在[ moleculetype ] 以上添加如下信息:
1 2 ; Include ligand parameters
1 2 ; Include ligand topology
再拉到最后面,[ molecules ] 项,在Protein_chain_A 下添加:
定义盒子
1 2 gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0 gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro
添加离子
构建完盒子后,现在得到了一个带电蛋白质的溶剂化系统(如果打开topol.top,看到[ atoms ] 项目中会存在类似于qtot 6
字段),因此需要在系统中添加离子。
新建ions.mdp文件,将下列代码粘贴进去并保存:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ; LINES STARTING WITH ';' ARE COMMENTS title = Minimization ; Title of run ; Parameters describing what to do , when to stop and what to save integrator = steep ; Algorithm (steep = steepest descent minimization) emtol = 1000.0 ; Stop minimization when the maximum force < 10.0 kJ/mol emstep = 0.01 ; Energy step size nsteps = 50000 ; Maximum number of (minimization) steps to perform ; Parameters describing how to find the neighbors of each atom and how to calculate the interactions nstlist = 1 ; Frequency to update the neighbor list and long range forces cutoff-scheme = Verlet ns_type = grid ; Method to determine neighbor list (simple, grid) rlist = 1.0 ; Cut-off for making neighbor list (short range forces) coulombtype = cutoff ; Treatment of long range electrostatic interactions rcoulomb = 1.0 ; long range electrostatic cut-off rvdw = 1.0 ; long range Van der Waals cut-off pbc = xyz ; Periodic Boundary Conditions
1 2 3 gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral
此时如果访问topol.top拖到最下面应该可以看到添加了离子。
能量最小化
动力学模拟需要把让系统处于能量最小值,先新建em.mdp文件,粘贴以下内容并保存:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ; LINES STARTING WITH ';' ARE COMMENTS title = Minimization ; Title of run ; Parameters describing what to do , when to stop and what to save integrator = steep ; Algorithm (steep = steepest descent minimization) emtol = 1000.0 ; Stop minimization when the maximum force < 10.0 kJ/mol emstep = 0.01 ; Energy step size nsteps = 50000 ; Maximum number of (minimization) steps to perform ; Parameters describing how to find the neighbors of each atom and how to calculate the interactions nstlist = 1 ; Frequency to update the neighbor list and long range forces cutoff-scheme = Verlet ns_type = grid ; Method to determine neighbor list (simple, grid) rlist = 1.2 ; Cut-off for making neighbor list (short range forces) coulombtype = PME ; Treatment of long range electrostatic interactions rcoulomb = 1.2 ; long range electrostatic cut-off vdwtype = cutoff vdw-modifier = force-switch rvdw-switch = 1.0 rvdw = 1.2 ; long range Van der Waals cut-off pbc = xyz ; Periodic Boundary Conditions DispCorr = no
如果后续报错,可以换成这个em.mdp,原理可能是steep算法问题,替换掉不会过于激进,也有可能是define = -DFLEXIBLE
这段改变了什么条件,且不去深究了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ; VARIOUS PREPROCESSING OPTIONS = title = include = define = -DFLEXIBLE ; RUN CONTROL PARAMETERS = integrator = cg ; start time and timestep in ps = tinit = 0 dt = 0.001 nsteps = 100 ; ENERGY MINIMIZATION OPTIONS = emtol = 0.00001 emstep = 0.001 nstcgsteep = 1000 ; OUTPUT CONTROL OPTIONS ; Output frequency for coords (x), velocities (v) and forces (f) nstxout = 0 nstvout = 0 nstfout = 0 ; Output frequency for energies to log file and energy file nstlog = 0 nstcalcenergy = -1 nstenergy = 0 ; Output frequency and precision for .xtc file nstxtcout = 0 lincs_order = 8 rcoulomb = 0.9 rvdw = 0.9 rlist = 0.9
1 2 gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr gmx mdrun -v -deffnm em
平衡 约束配体
1 2 3 4 gmx make_ndx -f resi.gro -o index_resi.ndx ... > 0 & ! a H* > q
1 2 gmx genrestr -f resi.gro -n index_resi.ndx -o posre_resi.itp -fc 1000 1000 1000
然后需要处理topol.top文件,打开拉到底部区域,定位到; Include water topology
处,在前面插入并保存:
1 2 3 4 ; Ligand position restraints #ifdef POSRES #include "posre_resi.itp" #endif
恒温器
1 2 3 4 5 gmx make_ndx -f em.gro -o index.ndx ... > 1 | 13 > q
再做NVT平衡,新建nvt.mdp,粘贴以下内容并保存,但是注意里面有个字段要修改! 找到tc-grps
,把后面的Protein_JZ4改成上一部合并的蛋白质残基索引!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 title = Protein-ligand complex NVT equilibration define = -DPOSRES ; position restrain the protein and ligand ; Run parameters integrator = md ; leap-frog integrator nsteps = 50000 ; 2 * 50000 = 100 ps dt = 0.002 ; 2 fs ; Output control nstenergy = 500 ; save energies every 1.0 ps nstlog = 500 ; update log file every 1.0 ps nstxout-compressed = 500 ; save coordinates every 1.0 ps ; Bond parameters continuation = no ; first dynamics run constraint_algorithm = lincs ; holonomic constraints constraints = h-bonds ; bonds to H are constrained lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy ; Neighbor searching and vdW cutoff-scheme = Verlet ns_type = grid ; search neighboring grid cells nstlist = 20 ; largely irrelevant with Verlet rlist = 1.2 vdwtype = cutoff vdw-modifier = force-switch rvdw-switch = 1.0 rvdw = 1.2 ; short-range van der Waals cutoff (in nm) ; Electrostatics coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics rcoulomb = 1.2 ; short-range electrostatic cutoff (in nm) pme_order = 4 ; cubic interpolation fourierspacing = 0.16 ; grid spacing for FFT ; Temperature coupling tcoupl = V-rescale ; modified Berendsen thermostat tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate tau_t = 0.1 0.1 ; time constant, in ps ref_t = 300 300 ; reference temperature, one for each group, in K ; Pressure coupling pcoupl = no ; no pressure coupling in NVT ; Periodic boundary conditions pbc = xyz ; 3-D PBC ; Dispersion correction is not used for proteins with the C36 additive FF DispCorr = no ; Velocity generation gen_vel = yes ; assign velocities from Maxwell distribution gen_temp = 300 ; temperature for Maxwell distribution gen_seed = -1 ; generate a random seed
1 2 3 gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr gmx mdrun -v -deffnm nvt
然后做再做NPT平衡,新建npt.mdp,粘贴以下内容并保存,一样要改tc-grps
!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 title = Protein-ligand complex NPT equilibration define = -DPOSRES ; position restrain the protein and ligand ; Run parameters integrator = md ; leap-frog integrator nsteps = 50000 ; 2 * 50000 = 100 ps dt = 0.002 ; 2 fs ; Output control nstenergy = 500 ; save energies every 1.0 ps nstlog = 500 ; update log file every 1.0 ps nstxout-compressed = 500 ; save coordinates every 1.0 ps ; Bond parameters continuation = yes ; continuing from NVT constraint_algorithm = lincs ; holonomic constraints constraints = h-bonds ; bonds to H are constrained lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy ; Neighbor searching and vdW cutoff-scheme = Verlet ns_type = grid ; search neighboring grid cells nstlist = 20 ; largely irrelevant with Verlet rlist = 1.2 vdwtype = cutoff vdw-modifier = force-switch rvdw-switch = 1.0 rvdw = 1.2 ; short-range van der Waals cutoff (in nm) ; Electrostatics coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics rcoulomb = 1.2 pme_order = 4 ; cubic interpolation fourierspacing = 0.16 ; grid spacing for FFT ; Temperature coupling tcoupl = V-rescale ; modified Berendsen thermostat tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate tau_t = 0.1 0.1 ; time constant, in ps ref_t = 300 300 ; reference temperature, one for each group, in K ; Pressure coupling pcoupl = Berendsen ; pressure coupling is on for NPT pcoupltype = isotropic ; uniform scaling of box vectors tau_p = 2.0 ; time constant, in ps ref_p = 1.0 ; reference pressure, in bar compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1 refcoord_scaling = com ; Periodic boundary conditions pbc = xyz ; 3-D PBC ; Dispersion correction is not used for proteins with the C36 additive FF DispCorr = no ; Velocity generation gen_vel = no ; velocity generation off after NVT
1 2 3 gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr gmx mdrun -v -deffnm npt
模拟
新建md.mdp文件,粘贴以下内容并保存,同样记得修改!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 title = Protein-ligand complex MD simulation ; Run parameters integrator = md ; leap-frog integrator nsteps = 5000000 ; 2 * 5000000 = 10000 ps (10 ns) dt = 0.002 ; 2 fs ; Output control nstenergy = 5000 ; save energies every 10.0 ps nstlog = 5000 ; update log file every 10.0 ps nstxout-compressed = 5000 ; save coordinates every 10.0 ps ; Bond parameters continuation = yes ; continuing from NPT constraint_algorithm = lincs ; holonomic constraints constraints = h-bonds ; bonds to H are constrained lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy ; Neighbor searching and vdW cutoff-scheme = Verlet ns_type = grid ; search neighboring grid cells nstlist = 20 ; largely irrelevant with Verlet rlist = 1.2 vdwtype = cutoff vdw-modifier = force-switch rvdw-switch = 1.0 rvdw = 1.2 ; short-range van der Waals cutoff (in nm) ; Electrostatics coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics rcoulomb = 1.2 pme_order = 4 ; cubic interpolation fourierspacing = 0.16 ; grid spacing for FFT ; Temperature coupling tcoupl = V-rescale ; modified Berendsen thermostat tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate tau_t = 0.1 0.1 ; time constant, in ps ref_t = 300 300 ; reference temperature, one for each group, in K ; Pressure coupling pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT pcoupltype = isotropic ; uniform scaling of box vectors tau_p = 2.0 ; time constant, in ps ref_p = 1.0 ; reference pressure, in bar compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1 ; Periodic boundary conditions pbc = xyz ; 3-D PBC ; Dispersion correction is not used for proteins with the C36 additive FF DispCorr = no ; Velocity generation gen_vel = no ; continuing from NPT equilibration
1 2 3 gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_100.tpr gmx mdrun -deffnm md_0_100
参考链接 Gromacs Tutorials :官方教程,非常细致,现存所有所有教程都是对官方教程的汉化。
详细版本-用conda安装gromacs
中医药方向怎么发SCI?分子对接OUT了,快来学分子动力学模拟! :其中有段写作可参考的话术。
……