Files
MX_Pmodule/alphafold/README.md
2024-07-18 10:49:35 +02:00

3.7 KiB

Alphafold

Alphafold contains two parts:

  1. A conda environment containing dependencies
  2. The alphafold module itself, containing the current code and submission scripts.
  3. The Database

DataBase Data

All the download scripts work from merlin, the only one not working is the pdb-mmcif script. as it is using rsync. The port provided by alphafold is closed by PSI and the US mirror does not work nicely. Alternative that works: rsync -rlpt -v -z --info=progress2 --delete rsync.ebi.ac.uk::pub/databases/pdb/data/structures/divided/mmCIF/ $DIR

Tip: Make sure to use tmux sessions for the downloads. Tip: Double check reading permissions for users after copying/downloading the database, was causing errors last time!

Conda Environment

Alphafold installed based on Spencers instructions from older installs and original git repo. Change: No central conda env anymore, rather a version-based conda env setup.

The conda env should be installed from the environment.yml file, which has combinations of conda-forge, bioconda and pip installations, unfortuantely no environment .yml file provided by alphafold deepmind so far. Also, miniconda is used to do so, using the central conda installation might cause problems (openAFS hardlink issues), if the central version is used make sure to install from an openAFS host. (pmod7 e.g) Also, the central conda is super old, needs to be updated? After using the yml file jax and jaxlib need to be installed into the conda, does not work directly from the environment.yml file. (so far) Also, there are a lot of contradicting descriptions in the orignal git repo concerning the jaxlib versions at current state.

OLD VERSIONS
conda create --name alphafold python==3.8
conda update -n base conda

source miniconda3/etc/profile.d/conda.sh
conda activate alphafold

conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7
conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04

pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 \
    dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 \
    numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0 pandas==1.3.4
pip install --upgrade jax jaxlib==0.1.69+cuda111 \
    -f https://storage.googleapis.com/jax-releases/jax_releases.html

NEW VERSION (2.3.2 current state) 

create the conda env from the environments.yml file , content:

channels:
  - pytorch
  - conda-forge
  - defaults
  - anaconda
  - bioconda
dependencies:
  - python==3.10
  - pip 
  - openmm==7.7.0
  - cudnn  # Change version if not compatible with current system
  - cudatoolkit
  - pdbfixer
  - hmmer==3.4
  - hhsuite==3.3.0
  - kalign2==2.04
  - pip:
    - absl-py==1.0.0
    - biopython==1.79 
    - chex==0.0.7 
    - dm-haiku==0.0.10 
    - immutabledict==2.0.0 
    - ml-collections==0.1.0  
    - numpy==1.24.3 
    - scipy==1.11.1 
    - tensorflow-cpu==2.13.0 
    - jax==0.4.14 
    - pandas==2.0.3 
    - dm-tree==0.1.8 

##Alphafold CODE In the file run_alphafold.py, the flag --use_gpu_relax needs to be set to true, so far done manually! Not sure if this is really neccessary.

flags.DEFINE_boolean('use_gpu_relax', None , 'Whether to relax on GPU. ' TO: 
flags.DEFINE_boolean('use_gpu_relax', True, 'Whether to relax on GPU. '

Alphafold module

Add version to files/variants. The version number should match a github tag (e.g. v2.0.1) or else have the commit hash as $V_RELEASE.

As admin user:

cd MX/alphafold
./build <version>

Testing

Here's an example sequence:

mkdir example
cd example
cat > query.fasta <<EOF
>dummy_sequence
GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
EOF

module use MX unstable
module load alphafold/2.1.1
sbatch alphafold_merlin.sh query.fasta