Setting Up a Conda Environment for Scientific Computations (DFT)

October 2, 2025

Published in Scientific Software

Abstract

Setting up a reproducible computational environment is crucial for density functional theory (DFT) calculations and materials science research. This guide walks you through creating a dedicated Conda environment equipped with essential tools for DFT analysis, visualization, and machine learning applications in computational chemistry.

Keywords: DFT, Conda, Python, Scientific Computing, Materials Science, Computational Chemistry

Overview

When working with DFT calculations, having a well-configured Python environment with the right packages is essential. This environment includes tools for:

First-principles calculations: ABINIT integration via AbiPy
Atomistic simulations: ASE (Atomic Simulation Environment)
Materials analysis: Pymatgen for computational materials science
Visualization: OVITO, Plotly, Matplotlib, and Seaborn
Machine learning: TensorFlow, PyTorch, and Keras for ML-driven analysis
Data processing: NumPy, SciPy, Pandas, and HDF5 support

Creating the Environment

To set up this environment, we'll use a pre-configured YAML file that ensures all dependencies are properly resolved.

Step 1: Create the Environment Configuration File

First, create a new file named environment.yml with the following content:

name: dft_test
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.12
  - numpy
  - scipy
  - pandas
  - matplotlib
  - seaborn
  - h5py
  - netcdf4
  - jupyterlab
  - beautifulsoup4
  - requests
  - pyyaml
  - phonopy
  - abipy
  - ase
  - mbuild
  - pymatgen
  - symfc
  - spglib
  - scikit-learn
  - plotly
  - bokeh
  - statsmodels
  - sympy
  - tqdm
  - networkx
  - rich
  - tabulate
  - pip
  - pip:
      - tensorflow
      - torch
      - torchvision
      - pytorch-lightning
      - keras
      - qiskit
      - qiskit-aer
      - apscheduler
      - autograd
      - bibtexparser
      - lifelines
      - pylatexenc

This configuration file includes:

Core scientific packages: NumPy, SciPy, Pandas, Matplotlib
DFT-specific tools: AbiPy, ASE, Pymatgen, Phonopy
Visualization: OVITO, Plotly, Bokeh, Seaborn
Machine learning frameworks: TensorFlow, PyTorch, Keras
Statistical analysis: Statsmodels, Scikit-learn, Lifelines
Interactive computing: JupyterLab and IPython

Step 2: Create the Environment

Once you've created the environment.yml file, create the Conda environment by running:

conda env create -f environment.yml

This command will:

Parse the YAML file
Resolve all dependencies
Download and install the required packages
Create an isolated environment named dft

The installation process may take several minutes depending on your internet connection and system specifications. It is running fast on MacOS, but it takes much longer on clusters.

By the way, if you want to remove the environment, you can run:

conda deactivate
conda env remove -n dft

Step 3: Activate the Environment

After the installation completes successfully, activate your new environment:

conda activate dft

Key Packages and Their Uses

Materials Science & DFT

AbiPy (0.9.8): Python interface for ABINIT, enabling automated post-processing and analysis
Pymatgen (2025.4.20): Robust library for materials analysis, including structure manipulation and phase diagram generation
ASE (3.25.0): Atomic Simulation Environment for atomistic simulations
Phonopy (2.38.1): Phonon calculations and lattice dynamics analysis
Spglib (2.6.0): Space group operations and symmetry analysis

Visualization

Plotly (6.0.1): Interactive, publication-quality graphs
Matplotlib (3.10.1): Comprehensive plotting library
Seaborn (0.13.2): Statistical data visualization

Machine Learning

TensorFlow (2.16.1): Deep learning framework
PyTorch (2.3.0): Flexible machine learning library
Keras (3.9.2): High-level neural networks API
Scikit-learn (1.4.2): Traditional machine learning algorithms

Data Processing

Pandas (2.2.3): Data manipulation and analysis
NumPy (1.26.4): Fundamental package for numerical computing
SciPy (1.15.2): Scientific computing and technical computing
H5PY (3.13.0): HDF5 file format support

Verification

To verify your installation, you can test importing key packages:

import abipy
import pymatgen
import ase
import torch
import tensorflow as tf

print("All packages imported successfully!")

Environment Management Tips

Exporting Your Environment

To share your exact environment configuration:

conda env export > environment_exact.yml

Updating Packages

To update all packages to their latest compatible versions:

conda update --all

Platform Considerations

This environment configuration is optimized for Apple Silicon (ARM) Macs but should work on other platforms with minor adjustments. Key architecture-specific optimizations include:

OpenBLAS: Optimized for ARM64 architecture
Native ARM builds: Most packages are compiled for osx-arm64
Accelerated frameworks: TensorFlow and PyTorch leverage Apple's Metal Performance Shaders

For Intel-based Macs or Linux systems, you may need to adjust the prefix path and some platform-specific package versions.

Troubleshooting

Common Issues

Slow package resolution: Conda's dependency solver can be slow. Consider using mamba for faster environment creation:

conda install mamba -n base -c conda-forge
mamba env create -f environment.yml

Conflicting dependencies: If you encounter conflicts, try creating a minimal environment first and adding packages incrementally.

Missing packages: Some packages might not be available for your platform. Check the Conda Forge channel or install via pip as a fallback.

Local vs. Cluster: The Source Compilation Trap

Problem Symptoms

You might find that running conda env create -f environment.yml successfully creates your environment on a local macOS machine, but using the exact same environment.yml on a Linux cluster fails during the "Installing pip dependencies" stage. The errors typically complain about missing build tools (like ninja or cmake) or fail when trying to invoke system compilers (e.g., /usr/bin/cmake --version).

Root Cause: Clusters Are More Prone to Source Compilation

Why does a pip install work locally but crash on a cluster? It comes down to the two paths pip can take when installing a package:

Pre-compiled Wheels (.whl): If pip finds a wheel matching your system architecture and library versions, it installs it directly without compilation.
Source Distributions (.tar.gz): If it cannot find a matching wheel, it attempts to build the package from source locally. This path strictly requires a full compilation toolchain (e.g., cmake, ninja, and a C/C++ compiler like gcc or g++).

Cluster and HPC environments are typically more restricted and customized (with unique module systems, strict environment variables, and older system libraries). Because of this, pip is often unable to find a compatible pre-compiled wheel on a cluster. It inevitably falls back to building from source, crashing when the necessary build suite is nowhere to be found.

Solution Approach

The most robust fix is to bundle the potential build tools directly into your Conda environment. By doing so, any necessary source compilation uses Conda's isolated toolchain instead of relying on the host cluster's unpredictable system packages.

Best Practices Checklist:

Pin a Reliable Python Version: For instance, explicitly use python=3.11.
Embed Build Dependencies: Put compilation tools (ninja, cmake, c-compiler, and cxx-compiler to provide gcc/g++) directly into your Conda dependencies.
Keep Deep Learning Frameworks in pip: While packages like TensorFlow, PyTorch, or Keras can stay in the pip section, having the "compilation infrastructure" ready in the Conda layer ensures that any fallback source builds will proceed without a hitch.

Final Working `environment.yml` (Highlights)

Here is an example of the succeeding configuration (note the prioritized conda-forge channel, toolchains located at the top, and pinned Python version):

name: dft
channels:
  - conda-forge
  - defaults
dependencies:
  - ninja
  - c-compiler
  - cxx-compiler
  - cmake
  - python=3.11
  - pip
  - pip:
      - tensorflow
      - torch
      - torchvision
      - pytorch-lightning
      - keras
      - qiskit
      - qiskit-aer
      - apscheduler
      - autograd
      - bibtexparser
      - lifelines
      - pylatexenc

(Tip: Continue putting scientific packages like NumPy, SciPy, Phonopy, ASE, and Pymatgen in the Conda layer; Conda usually handles their complex binary dependencies more gracefully than pip).

Key Takeaway

When configuring Conda environments for HPCs and clusters, do not just document your "business logic" packages. Treat your build toolchain as a core dependency. If you don't, the moment a pip package falls back to source compilation, missing cmake, ninja, or gcc will break your entire installation.

Next Steps

With your DFT environment configured, you're ready to:

Run ABINIT calculations and analyze results with AbiPy
Manipulate crystal structures using Pymatgen
Visualize atomistic data with OVITO
Apply machine learning to materials discovery
Perform phonon calculations with Phonopy

This environment provides a comprehensive toolkit for computational materials science research, from first-principles calculations to advanced data analysis and visualization.

Conclusion

Having a well-configured, reproducible computational environment is fundamental to modern materials science research. This Conda environment brings together the most essential tools for DFT calculations and analysis, ensuring you have everything needed for productive computational work.

Remember to document any additional packages you install and periodically export your environment to maintain reproducibility across projects and collaborators.