qm_descriptors.py: Generate and Extract Descriptor Sets

The script qm_descriptors.py provides a set of descriptors for one or more structures in the form of a .mae, .sdf, or .csv file. The descriptors are obtained either by parsing existing Jaguar .out files or by generating a set of Jaguar jobs and parsing the resulting .out files. These two methods of operation are invoked with the command line options -outs and -maes respectively and are detailed below. The descriptors are written to one file per .out file parsed as well as an aggregate file with all structures in it called all_calcs.props.ext.

The command syntax for -outs is:

jaguar run qm_descriptors.py -outs outfile1 outfile2 ... [options]

There are two available options when using -outs: -props and -formats. The -props option is used to specify the properties to harvest from the passed Jaguar .out files; the syntax is -props prop1 prop2 ... . The properties are listed in the tables below. The -formats option is used to select the format or formats for writing out the properties (one or more of Maestro, SDF, and CSV), with the syntax -formats {mae|sdf|csv} [{mae|sdf|csv}...]. If neither -props or -formats are provided, reasonable defaults are used (see the example config file provided below, sections OutputFormats and Properties).

The command syntax for -maes is:

jaguar run qm_descriptors.py -maes maefile [options]

where maefile is a single Maestro file. This file can have multiple structures, all of which are used to generate descriptors. The only available option when using -maes is -configs, which controls the settings of the generated Jaguar jobs as well as the properties that are harvested and the format used for output. If -configs is not provided, reasonable defaults are used (given below). The syntax of the option is -config configfile. The config file format is a Python-digestible YAML file. An example is given below, showing the default settings. The allowed settings for the OutputFormats and Properties sections are the same as in the -formats and -props options when using -outs.

The BasisSets, Functionals, and GenKeyvals sections define the settings for the generated Jaguar jobs. For BasisSets and Functionals, you can use any basis set or functional defined in Jaguar, and you can specify multiple basis sets and functionals. Each combination of a basis set and a functional is used with each structure, resulting in multiple subjobs. The GenKeyvals section takes any legal Jaguar gen section keyword: value pair. The values of the charge and multiplicity (molchg and multip) specified in this section can be overridden by values set in the Maestro input file. The Properties section defines the properties to extract from the Maestro file. The allowed values of the property keywords are listed in the tables below, and they do not take any arguments. The example below shows one property keyword, all_descriptor_numeric, which you could replace with a different keyword from any of the tables, and you can add as many properties as you want, in the same format, one per line.

OutputFormats:
    - sdf
BasisSets:
    - LACVP*
Functionals:
    - b3lyp-d3
GenKeyvals:
    mulken: 2
    ldips: 5
    ipolar: -2
    nmr: 1
    fukui: 1
    esp_analysis: 1
    epn: 1
    nbo: 1
    ifreq: 1
    icfit: 1
Properties:
    - all_descriptor_numeric

Table 1. Atomic Descriptors for qm_descriptors

Keyword Description
atom_name Atom Name
charge_esp Atomic Charges from ESP
charge_lowdin Lowdin Atomic Charges
charge_mulliken Mulliken Atomic Charges
charge_nbo Atomic Charges from NBO
charge_stockholder Stockholder Atomic Charges
epn Electrostatic Potential at Atomic Nuclei
forces Atomic Forces
homo_nn Atomic Fukui Indices, f_NN HOMO
homo_ns Atomic Fukui Indices, f_NS HOMO
homo_sn Atomic Fukui Indices, f_SN HOMO
homo_ss Atomic Fukui Indices, f_SS HOMO
lumo_nn Atomic Fukui Indices, f_NN LUMO
lumo_ns Atomic Fukui Indices, f_NS LUMO
lumo_sn Atomic Fukui Indices, f_SN LUMO
lumo_ss Atomic Fukui Indices, f_SS LUMO
maxat_alie Max Atomic ALIE Values
maxat_esp Max Atomic ESP Values
minat_alie Min Atomic ALIE Value
minat_esp Min Atomic ESP Values
nmr_2d_avg_shift NMR 2D-Averaged Relative Shifts
nmr_abs_shift NMR Atomic Absolute Shifts
nmr_h_avg_shift NMR H-Averaged Relative Shifts
nmr_rel_shift NMR Atomic Relative Shifts
nmr_shielding NMR Isotropic Shielding per Atom
spin_lowdin Lowdin Spin Densities
spin_mulliken Mulliken Spin Densities

Table 2. Molecular Descriptors for qm_descriptors

Keyword Description
GTotal Total Gibbs Free Energy (HTotal - T*S), in Hartrees
HTotal Total Enthalpy (UTotal + pV), in Hartrees
S_min_eval Minimum value of S (overlap matrix)
UTotal Total Internal Energy (SCFE + ZPE + U), in Hartrees
ani_energy neural network potential energy, in Hartree
ani_stddev standard deviation in prediction of neural network energy, in Hartree
balance_alie ALIE balance on isodensity surface
balance_esp ESP balance on isodensity surface
bond_midpoint_charge Bond-Midpoint Charges Calculated in ESP Fitting
canonical_orbitals Number of canonical orbitals
dipole_strength Dipole Strengths of Normal Modes
dipolecomp_esp Dipole Moment Components Calc'd from Electrostatic Potential Charges, in Debye
dipolecomp_mulliken Dipole Moment Components Calc'd from Mulliken Charges, in Debye
dipolecomp_qm Dipole Moment Components Calc'd from Wavefunction, in Debye
dipolemag_esp Dipole Moment Magnitude Calc'd from Electrostatic Potential Charges, in Debye
dipolemag_mulliken Dipole Moment Magnitude Calc'd from Mulliken Charges, in Debye
dipolemag_qm Dipole Moment Magnitude Calc'd from Wavefunction, in Debye
doubted_geom Indicates a geometry step was not expected to be good
energy_aposteri a posteriori correction to the total energy (component (N0) in SCF summary), in Hartree
energy_aposteri0 Uncorrected energy in the case of a posteri-corrected calculations (energy-energy_aposteri), in Hartree
energy_electronic Total electronic energy (component (L) in SCF summary), in Hartree
energy_one_electron Total one-electron energy (component (E) in SCF summary), in Hartree
energy_two_electron Total two-electron energy (component (I) in SCF summary), in Hartree
enthalpy Total Calculated Enthalpy
enthalpy_elec Electronic Contribution to Enthalpy
enthalpy_rot Rotational Contribution to Enthalpy
enthalpy_trans Translational Contribution to Enthalpy
enthalpy_vib Vibrational Contribution to Enthalpy
entropy Total Calculated Entropy
entropy_elec Electronic Contribution to Entropy
entropy_rot Rotational Contribution to Entropy
entropy_trans Translational Contribution to Entropy
entropy_vib Vibrational Contribution to Entropy
et_H_if Hamiltonian of initial to final state in e- transfer
et_H_ii Hamiltonian of initial state in e- transfer
et_S_if Overlap of initial and final state wfns in e- transfer
et_T_if e- transfer transition energy
excitation_energies Excitation energies, in eV
external_program_energy Energy produced by external program, in Hartree
force_constant Force Constants of Normal Modes
frequency Frequencies of Normal Modes, in cm-1
gas_phase_energy Gas Phase Energy, in Hartree
gibbs_free_energy Total Calculated Gibbs Free Energy
gibbs_free_energy_elec Electronic Contribution to Gibbs Free Energy
gibbs_free_energy_rot Rotational Contribution to Gibbs Free Energy
gibbs_free_energy_trans Translational Contribution to Gibbs Free Energy
gibbs_free_energy_vib Vibrational Contribution to Gibbs Free Energy
heat_capacity Total Calculated Heat Capacity
heat_capacity_elec Electronic Contribution to Heat Capacity
heat_capacity_rot Rotational Contribution to Heat Capacity
heat_capacity_trans Translational Contribution to Heat Capacity
heat_capacity_vib Vibrational Contribution to Heat Capacity
homo HOMO energy (set to None for open-shell calcs), in Hartree
homo_alpha Alpha HOMO energy (set to None for closed-shell calcs), in Hartree
homo_beta Beta HOMO energy (set to None for closed-shell calcs), in Hartree
homo_lumo_gap HOMO-LUMO Gap energy. Calculated as lower of same-spin orbital differences in unrestricted calcs, in Hartree
internal_energy Total Calculated Internal Energy
internal_energy_elec Electronic Contribution to Internal Energy
internal_energy_rot Rotational Contribution to Internal Energy
internal_energy_trans Translational Contribution to Internal Energy
internal_energy_vib Vibrational Contribution to Internal Energy
ir_intensity IR Intensities of Normal Modes
lambdamax_ev Excitation energy (eV) of state with highest oscillator strength, in eV
lambdamax_nm Excitation energy (nm) of state with highest oscillator strength, in nm
lmp2_energy LMP2 Energy, in Hartree
lnq Total Calculated lnQ
lnq_elec Electronic Contribution to lnQ
lnq_rot Rotational Contribution to lnQ
lnq_trans Translational Contribution to lnQ
lnq_vib Vibrational Contribution to lnQ
local_pol_alie Avg deviation from mean ALIE on isodensity surface
local_pol_esp Local polarity on isodensity surface
lumo LUMO energy (set to None for open-shell calcs), in Hartree
lumo_alpha Alpha LUMO energy (set to None for closed-shell calcs), in Hartree
lumo_beta Beta LUMO energy (set to None for closed-shell calcs), in Hartree
max_alie Maximum ALIE value on isodensity surface
max_esp Maximum ESP value on isodensity surface
mean_alie Mean ALIE value on isodensity surface
mean_esp Mean ESP value on isodensity surface
mean_neg_alie Mean negative ALIE value on isodensity surface
mean_neg_esp Mean negative ESP value on isodensity surface
mean_pos_alie Mean positive ALIE value on isodensity surface
mean_pos_esp Mean positive ESP value on isodensity surface
min_alie Minimum ALIE value on isodensity surface
min_esp Minimum ESP value on isodensity surface
nops_on Indicates a NOPS calculation
nuclear_repulsion Nuclear Repulsion Energy, in Hartree
opt_excited_state_energy_1 Energy of first excited state geometry optimization
orb_ener_alpha Alpha Orbital Energies for UHF calculations, in Hartrees
orb_ener_beta Beta Orbital Energies for UHF calculations, in Hartrees
orb_ener_rhf Orbital Energies for RHF calculations, in Hartrees
orb_symm_alpha Alpha Orbital Energies for UHF calculations
orb_symm_beta Beta Orbital Energies for UHF calculations
orb_symm_rhf Orbital Energies for RHF calculations
oscillator_strengths Excited state oscillator strengths
polar_alpha Polarizability
polar_beta First-Order Hyperpolarizability
polar_gamma Second-Order Hyperpolarizability
raman_activity Raman Activities of Normal Modes
raman_intensity Raman Intensities of Normal Modes
reaction_coord Reaction coordinate Number
reduced_mass Reduced Masses of Normal Modes
rotational_constants Rotational constants of molecule
rotational_strength Rotational Strengths of Normal Modes
s2 Spin: >S**2>
scf_energy SCF Energy, in Hartree
sig_neg_alie Variance of negative ALIE on isodensity surface
sig_neg_esp Variance of negative ESP on isodensity surface
sig_pos_alie Variance of positive ALIE on isodensity surface
sig_pos_esp Variance of positive ESP on isodensity surface
sig_tot_alie Total ALIE variance on isodensity surface
sig_tot_esp Total ESP variance on isodensity surface
singlet_excitation_energies Restricted Singlet Electronic excitation energies, in eV
singlet_oscillator_strengths Singlet excited state oscillator strengths
sm_iter Iteration number of string method
sm_point Num of points along string method string
solution_phase_energy Solution Phase Energy, in Hartree
solvation_energy Solvation Energy, in Hartree
spin_splitting_score Ligand field spin-splitting score for DBLOC calculations
symmetry Symmetries of Normal Modes
symmetry_number symmetry number for molecule
sz2 Spin: Sz*>Sz+1>
total_lo_correction Total localized orbital energy correction
transition_state_components Transition State Components
triplet_excitation_energies Restricted triplet electronic excitation energies, in eV
triplet_oscillator_strengths Triplet excitation energy oscillator strengths
zero_point_energy Zero Point Energy, in Hartree
zvar a mapping of scan variable names to values

Table 3. Job Descriptors for qm_descriptors

Keyword Description
_sm_n_points number of string method points
basis Basis Set
charge Molecular charge of Input Structure
coords_frozen Number of frozen coordinates
coords_harmonic number of harmonic constraints
coords_ind Number of independent coordinates
coords_nred Number of non-redundant coordinates
coords_opt Number of optimization coordinates
fatal_error Error message in the event the job failed
fatal_errorno Error number in the event the job failed
functional DFT Functional
geopt_stuck Whether the geopt or tsopt got stuck
glibc Reported glibc version
host Job Host
job_id Job ID
lastexe Last Jaguar Executable Used
mae_in Maestro input file
mae_out Maestro output file
method Calculation Type
mol_weight Molecular weight of input geometry, in amu
multiplicity Spin Multiplicity of Input Structure
nbasis Number of Basis Functions
nelectron Number of Electrons
point_group Molecular point group of the input molecule
point_group_used Point group used in the calculation
qm_atoms Number of QM Atoms
status Job status - set to 0, 1, or 2 corresponding to UNKNOWN, OK, or SPLAT respectively
stoichiometry Stoichiometry of input geometry
symmetrized Whether the geometry has been symmetrized or not
ts_component_descriptions Descriptions of the transition state vector components

Table 4. Special Keywords for qm_descriptors

Keyword Description
all Returns all properties found in output file.
all_descriptor_numeric Returns all numeric Molecular/Atomic properties found in output file.
all_numeric Returns all numeric properties found in output file.