Guide to IASTU-HPC2
  • Introduction
  • User's Manual
    • basics
      • basic info
      • connecting
      • storage (must-read)
      • module system (must-read)
      • workflow on jobs
    • softwares
      • python
      • mathematica
      • matlab
      • spark
      • singularity
      • fortran, C, C++
      • java
      • command line tools
      • tensorflow
      • jax
    • FAQ
  • Administrator's Manual
    • Hardwares
    • Toolchains
      • Toolset for DELL servers
      • Ansible
      • Slurm
      • Spack
      • Container
      • BigData
      • Toolset for logging and monitoring
      • Further considerations
    • History
      • ToDo
      • VM Test
      • Real Setup
      • Admin Workflow
      • Softwares for scientific computations
      • Relay Host
Powered by GitBook
On this page
  • Containers in general
  • Cgroup
  • Singularity
  • basic commands
  • build image
  • singularity hub
  • common images
  • Integrated with Kubernetes
  • k8s
  • Helm

Was this helpful?

  1. Administrator's Manual
  2. Toolchains

Container

PreviousSpackNextBigData

Last updated 5 years ago

Was this helpful?

Containers in general

Cgroup

  • check supported cgroup subsystem: cat /proc/cgroups

  • check process in which cgroup: cat /proc/777/cgroup

  • , seems not so built in on ubuntu, need to add systemd service by hands if you wanna auto start with boot….

  • . When using cpuset.cpus, cpuset.mems must also be specified to make it work. The value for mems, is the node number of the cpu binding mems.

  • my experience: say if you want to apply cgroup policy on existing proc, especially these service, then you need restart these service to make the policy work

  • , @group.

  • , by some test, the default value is around 1000 for cpu.share.

  • Possible issue: confilict between external cgroup on users and slurm cgroup. Syndrome: task submit failure on cgroup enabled node. slurmstepd-master: error: Failed to invoke task plugins: task_p_pre_launch error. More specifically, the error is happen due to external cpuset subsystem from cgroup. My guess is that the cpu core binding external to slurm is unknown to slurmctld, and the assigned cpu cores would be failed by sbatch. Possible solution: 1 easy way (which I took), use cpu instead of cpuset subsystem to limit user cpu usage in cgroup. 2 hard way (which I guess itt should work), in principle, we can assign cpu binding in slurm.conf to avoid assigenment failure. Note this is not a big issue in general, since it is rare when slurmd (which usually only on compute nodes) and cgroup (which usually only configured on login nodes) are both exist in the same machine.

Singularity

a runtime container suitable for HPC

Process - Singularity - Docker -VM: the order of resource separation. Actually singularity is more like AppImage.

Singularity cached images layer are in ~/.singularity. Namely, they cannot be shared across different user by default. But it seems to be tuned by en vars $SINGULARITY_CACHEDIR.

basic commands

  • singularity exec container.img cat /etc/os-release. ––contain flag for a better separation, no pass of certain env vars, say local python packages. ––cleanenv. ––containall for more strict namespace separation. --bind /host:/con

  • singularity inspect -l --json ubuntu.img

  • singularity shell centos7.img, if writable is add, then img may be changed.

  • sigularity pull docker://, not only pull but also transform it to sif format img.

build image

direct way by operating in the container

sudo singularity build --sandbox ubuntu/ library://ubuntu

sudo singularity exec --writable ubuntu touch /foo

singularity build new-sif sandbox

definition files (recommended)

BootStrap: library
From: ubuntu:16.04

%post
    apt-get -y update
    apt-get -y install fortune cowsay lolcat

%environment
    export LC_ALL=C
    export PATH=/usr/games:$PATH

%runscript
    fortune | cowsay | lolcat

%labels
    Author GodloveD

sudo singularity build lolcow.sif lolcow.def

singularity hub

common images

clearlinux family?

Integrated with Kubernetes

k8s

For current status, seems no open source solution that can integerate k8s and slurm well in HPC setup. You can have slurm over container or k8s over mpi, but you have to choose one work load manager.

Possible routes:

Helm

Docker vs sigularity vs shifter in HPC: , , . No sudo dameon, less namespace separation, automount of home, share the same network by default, support MPI, env vars pass to container, user stay the same, more like a local process instead of a VM. And it is much easier than docker in general.

running as background service using sigularity instance commands, see . (In this workflow, singularity is more like docker.)

.sif is the standard format for singularity 3+. For image building, you must have root access, which cannot be done on HPC. But instead one can choose remote builder system .

Below is a demo. Detailed syntax on def files .

reference:

Detailed list and benchmark comparison on resource managers:

Container meets HPC
k8s, HPC, slurm and MPI
cgroup on ubuntu user basis limit
cgroup on ubuntu 18.04
cpu set on numa architecture
introduction to cgroup in fs level: series
syntax of cgrules
cpu subsystem parameters and default value
site
singularity image library
Manual from HPC user aspect
1
2
3
doc
here
here
https://www.sylabs.io/2019/04/the-singularity-kubernetes-integration-from-a-deep-learning-use-case-to-the-technical-specifics/
k8s meets HPC
arXiv
https://github.com/sylabs/wlm-operator