Softwares for scientific computations
In this part, softwares gotchas and configurations are listed. These softwares are mostly specific computation tools instead of system-wide tools.
intel parallel studio XE
Since the cluster is in an air-gapped enviroment, and the rpm downloading seems not work well through http and ftp proxy, so one can directly download all rpms in another computer which has web connections by the online installer provided by intel and rsync the final tar to the cluster to finish the installing. And I prefer the intelpython as the favored version of python since it is very fast.
Warn: Remember to include cluster helper for MKL library which is omitted by default.
Pay special attention when MKL is upgraded to new update by parallel studio xe. There are some repetitions of libraries and many version stuff are maintained by softlinks. Especially, the default module file generated by spack put intelpython3 lib path at the first in LD_LIBRARY_PATH, which is not what we want! Since intelpython3 may have an old version and has soft links to old version of libraries such as mkl. So we need hack spack modules.yaml to add preprent path by hands. Luckily, the prepend customized is by default added at the end which would ovewrite the default order and make python lib at the last of searching order. By doing this, we can reliablely use newer version of mkl dynamic library at runtime.
Memo: I have changed several softlink in intelpython3 lib to update4, they are libmkl_intel_ilp64 libmkl_rt and libmkl_core. ln -fs target link
Somehow, python don't follow .so search order as linux but prefer so in their lib folder (I guess).
PETSC+SLEPC
configure for petsc: dont set compiler when set with-mpi-dir
Installed externally to spack.
export PETSC_DIR= &&export PETSC_ARCH=
for sinvert project, cmake -DMACHINE=linux -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc ../src
. It turns out as a cmake error where default openmpi has been utilized.
Jax
Tried in conda virt env
Install script for GPU enabled jax
Somehow currently jaxlib cannot search for cuda bu default environment variable as CUDA_HOME
. Instead to use GPU, we should set the following environment variable by hand export XLA_FLAGS=--xla_gpu_cuda_data_dir=/home/ubuntu/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.4.0/cuda-10.0.130-ihth6nd2vvikwyej5mufpke2sj2nhboj
, the rhs is the cuda root path by spack.
pytorch
Check gpu devices, see so.
tensorflow
WIP: shall check whether tensorflow-gpu is workable on cpu only device. Answer: see so, it seems that tensorflow-gpu binary can only be imported with gpu dirvers installed, let alone pre loaded cuda and cudnn. On contrast, gpu enabled torch requires none of them to exist. It is not good for tf being in this status, and it is also an unaccepteable solution for me to install gpu drivers on non-gpu nodes, this workaround is so ugly!!!
Update: for tensorflow 2.0, GPU support is already included in usual pip release. And such pip version can be run in cpu alone machines, too. Therefore, to enable tf2.0+ with GPU support, just conda install corresponding tensorflow binary and spack load cuda@10.1 spack load cudnn@7.6, and you are all set.
spark
spack install
as described in VM part.
TODO: add SPARK_HOME
to spark module file.
only test on user conda env: pip install findspark
, to call spark context in python more smoothly.
dask
only test on user conda env
conda install dask
. Fix tornado version in a pinned file at conda-meta dir for conda virtual enviroment. Otherwise, dask would upgrade tornado which break down jupyter notebook! See this issue for jupyter notebook breakdown.
bazel
Follow this to add bazel repo into apt. version too high for building tensorflow, the version window available to build tf is RIDICULOUS narrow.
Qiskit
pip install qiskit[visualization]
in a new conda env, it requires around 100 third party package, so dont try installing it on your main env, some of the pakage may break. with visualization option, jupyter is automatically installed. And spack load intel-parallel-studio%intel
may be necessary before pip install, otherwise qiskit may complain about blas not founding (though shouldn't be case). And one may meet all kinds of errors in the installation, just remove the env and create new one, then pip install again. There is no intrisic errors for the installation, but some thing may went wrong here and there...
The default python version maybe 3.6.0 which has incomplete typing support and may induce error when using jupyter. One need make sure python in such conda env is no older than 3.6.8!
Last updated