Anaconda: Building Data Science Solutions With
conda env create -f environment.yml One of Conda’s killer features is handling Python itself as a package. You can have one environment with Python 3.8 (legacy code) and another with 3.11 (newer features).
conda list --export > conda-requirements.txt # Or use conda-lock for exact binaries conda install conda-lock conda-lock -f environment.yml | Practice | Why it matters | |----------|----------------| | Use environment.yml for everything | No manual conda install – guarantees reproducibility. | | Version-lock critical packages | pandas=2.0.3 not just pandas . | | Keep data separate from code | Use data/raw , data/processed , never commit large files. | | Add a Makefile or shell script | Automate conda env create , conda activate , python train.py . | | Test with a fresh environment | conda env create -f environment.yml --prefix ./test_env to verify. | 7. Common Pitfalls & How to Avoid Them ❌ Mixing pip and conda carelessly → Can lead to broken dependencies. If needed, install everything with conda first, then use pip for remaining packages.
conda install -c conda-forge xgboost Let’s walk through a minimal but realistic project: a customer churn prediction pipeline . Folder structure: churn-solution/ ├── environment.yml ├── data/ │ └── raw/ ├── notebooks/ │ └── 01_eda.ipynb ├── src/ │ ├── preprocess.py │ ├── train.py │ └── predict.py └── README.md Step 1 – environment.yml: name: churn-env channels: - conda-forge - defaults dependencies: - python=3.10 - pandas=2.0 - scikit-learn=1.3 - matplotlib=3.7 - seaborn=0.12 - jupyter - pip - pip: - imbalanced-learn # from PyPI if not in conda Step 2 – EDA in Jupyter: Launch Jupyter from within the activated environment: building data science solutions with anaconda
conda search pandas (e.g., conda-forge, which often has newer packages):
jupyter notebook Your notebook automatically uses the correct kernel. import pandas as pd from sklearn.ensemble import RandomForestClassifier import joblib df = pd.read_csv("data/raw/churn.csv") X = df.drop("churn", axis=1) y = df["churn"] conda env create -f environment
conda create -n project-name python=3.10 conda activate project-name conda install jupyter pandas scikit-learn matplotlib Then commit your environment.yml alongside your code. Your future self — and your team — will thank you. : Explore conda build for packaging your own libraries, or anaconda-project for automating multi-step workflows. The foundation you build with Anaconda today enables the production-grade solutions of tomorrow.
Start every new data science project with: | | Version-lock critical packages | pandas=2
conda install tensorflow-gpu cudatoolkit cudnn # TensorFlow conda install pytorch torchvision torchaudio cudatoolkit=11.7 -c pytorch # PyTorch conda env export > environment.yml This YAML file can be shared or version-controlled. A collaborator recreates the exact environment with: