Softwares for scientific computations

In this part, softwares gotchas and configurations are listed. These softwares are mostly specific computation tools instead of system-wide tools.

intel parallel studio XE

Since the cluster is in an air-gapped enviroment, and the rpm downloading seems not work well through http and ftp proxy, so one can directly download all rpms in another computer which has web connections by the online installer provided by intel and rsync the final tar to the cluster to finish the installing. And I prefer the intelpython as the favored version of python since it is very fast.

Warn: Remember to include cluster helper for MKL library which is omitted by default.

Pay special attention when MKL is upgraded to new update by parallel studio xe. There are some repetitions of libraries and many version stuff are maintained by softlinks. Especially, the default module file generated by spack put intelpython3 lib path at the first in LD_LIBRARY_PATH, which is not what we want! Since intelpython3 may have an old version and has soft links to old version of libraries such as mkl. So we need hack spack modules.yaml to add preprent path by hands. Luckily, the prepend customized is by default added at the end which would ovewrite the default order and make python lib at the last of searching order. By doing this, we can reliablely use newer version of mkl dynamic library at runtime.

Memo: I have changed several softlink in intelpython3 lib to update4, they are libmkl_intel_ilp64 libmkl_rt and libmkl_core. ln -fs target link Somehow, python don't follow .so search order as linux but prefer so in their lib folder (I guess).

PETSC+SLEPC

configure for petsc: dont set compiler when set with-mpi-dir

Installed externally to spack.

export PETSC_DIR= &&export PETSC_ARCH=

./configure  --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-metis --with-blaslapack-dir=/opt/intel/mkl CFLAGS=-fPIC CXXFLAGS=-fPIC FFLAGS=-fPIC FCFLAGS=-fPIC F90FLAGS=-fPIC F77FLAGS=-fPIC --with-debugging=0 --with-mpi-dir=/opt/intel/impi/2019.3.199/intel64 --with-cxx-dialect=C++11

for sinvert project, cmake -DMACHINE=linux -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc ../src. It turns out as a cmake error where default openmpi has been utilized.

Jax

Tried in conda virt env

Install script for GPU enabled jax

# install jaxlib
PIP=~/.conda/envs/newtest/bin/pip
PYTHON_VERSION=cp36  # alternatives: cp27, cp35, cp36, cp37
CUDA_VERSION=cuda100  # alternatives: cuda90, cuda92, cuda100
PLATFORM=linux_x86_64  # alternatives: linux_x86_64
BASE_URL='https://storage.googleapis.com/jax-releases'
$PIP install --upgrade $BASE_URL/$CUDA_VERSION/jaxlib-0.1.23-$PYTHON_VERSION-none-$PLATFORM.whl

$PIP install --upgrade jax  # install jax

Somehow currently jaxlib cannot search for cuda bu default environment variable as CUDA_HOME. Instead to use GPU, we should set the following environment variable by hand export XLA_FLAGS=--xla_gpu_cuda_data_dir=/home/ubuntu/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.4.0/cuda-10.0.130-ihth6nd2vvikwyej5mufpke2sj2nhboj, the rhs is the cuda root path by spack.

pytorch

Check gpu devices, see so.

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

tensorflow

WIP: shall check whether tensorflow-gpu is workable on cpu only device. Answer: see so, it seems that tensorflow-gpu binary can only be imported with gpu dirvers installed, let alone pre loaded cuda and cudnn. On contrast, gpu enabled torch requires none of them to exist. It is not good for tf being in this status, and it is also an unaccepteable solution for me to install gpu drivers on non-gpu nodes, this workaround is so ugly!!!

Update: for tensorflow 2.0, GPU support is already included in usual pip release. And such pip version can be run in cpu alone machines, too. Therefore, to enable tf2.0+ with GPU support, just conda install corresponding tensorflow binary and spack load cuda@10.1 spack load cudnn@7.6, and you are all set.

spark

spack install as described in VM part.

TODO: add SPARK_HOME to spark module file.

only test on user conda env: pip install findspark, to call spark context in python more smoothly.

dask

only test on user conda env

conda install dask. Fix tornado version in a pinned file at conda-meta dir for conda virtual enviroment. Otherwise, dask would upgrade tornado which break down jupyter notebook! See this issue for jupyter notebook breakdown.

bazel

Follow this to add bazel repo into apt. version too high for building tensorflow, the version window available to build tf is RIDICULOUS narrow.

Qiskit

pip install qiskit[visualization] in a new conda env, it requires around 100 third party package, so dont try installing it on your main env, some of the pakage may break. with visualization option, jupyter is automatically installed. And spack load intel-parallel-studio%intel may be necessary before pip install, otherwise qiskit may complain about blas not founding (though shouldn't be case). And one may meet all kinds of errors in the installation, just remove the env and create new one, then pip install again. There is no intrisic errors for the installation, but some thing may went wrong here and there...

The default python version maybe 3.6.0 which has incomplete typing support and may induce error when using jupyter. One need make sure python in such conda env is no older than 3.6.8!

Last updated