Real Setup
This section reviews the real setups on the cluster.
hardware
switch
port 23 24 are changed into VLAN2 for S1720-28GWR-4P.
AP
The reseverd ip is 192.168.1.250/24
. The dhcp is turned off and it sit in AP mode. When connecting to it, the ip is assigned by master node DHCP sevice in 48 subnet.
master server
LAN upper RJ45 port 10/24 and WAN lower RJ45 port 44/24.
maste node is currently set as static ip in WAN.
Harddisk(? position may not be accurate now): left down for 2.5 SSD, right down for sdb, old 2T HDD, left up for sdc, new 3.5 2T HDD.
computation server
The leftmost RJ45 port is utilized. (not the one for iDRAC) The nic name is eno1 in ubuntu OS.
outdated server
Note: d1 is not ready to open to users. It currently went offline.
software
basic on master
First use fdisk make one partition on each disk of sdb and sdc, and then use mkfs.ext4 /dev/sdb1
to format the two disks. Mount two 2T hdds /dev/sdb1
and /dev/sdc1
to /DATA and /BACKUP, where /DATA has similar permission with tmp and shared across nfs. Namely, chmod a+w /DATA
, chmod a+t /DATA
. Meanwhile, /BACKUP is only written by root. There are several backup crontab tasks by root managed by rsync -az
from /home/ubuntu and /opt to /BACKUP. Besides, the config file /etc/fstab
is also configured such that /DATA and /BACKUP can be mounted automatically when reboot.
The backup crontab and fstab mount config have not included into ansible workflow due to flexibility consideration.
Note the netplan config logic is merging instead of overwrting, so one must add dhcp4: false into config.yaml to make sure no dhcp is utilized.
swap partition
Temporary way to utilize the empty swap partition: mount them locally under /tmp/extra dir in c4 to c9.
ansible -i hosts cn[3:8] -m filesystem -a "dev=/dev/sda3 fstype=ext4 force=yes" --become -K
ansible -i hosts cn[3:8] -m file -a "path=/tmp/extra state=directory mode=01777" --become -K
ansible -i hosts cn[3:8] -m mount -a "path=/tmp/extra src=/dev/sda3 fstype=ext4 state=mounted" --become -K
Note how cn[3:8] corresponding to c4 to c9.
ansible
sudo apt install ansible
on master node.
Test command: ansible-playbook site_test.yml -i hosts_test -vv
, remembering change the group role in hosts_test, this test will be conducted on a remote VM server.
gpu drivers
driver-418 seems to be vanishing in ppa, install 430 instead on c9.
spack
into ansible workflow
some spack things to note
python
into ansible workflow and intel parallel studio installation, always use intel python and its conda for users
prefered way: intel python+spack pip. spack load intel-parallel-studio
, spack load py-setuptools
, spack load py-pip
. Intel python+conda create enviroment.
Never use admin account's global pip. Reason: the package would be installed in ~/.pip. However, if later spack-pip install some packages, the dependece is automatically used if it is already in ~/.pip. But this folder is not accessible by other user which may lead to a chaos on python packages. However, for normal user, global pip3 is the recommended way to download packages.
jupyter
Use intel python and pip as root, pip install jupyter ipyparallel jupyter_contrib_nbextensions
. Somehow the cluster tab works after several trials. Dont know exact solution though.
mathematica
Installed by the bash script on /opt/mathematica/verno. And the script is in the bin subdirectory of the above path. Add it as a package in spack override repo, and spack load mathematica
to use it.
Possible issue: The activation need to be carried out per user per node basis. Maybe have a look at MathML in the future. Already written a script to activate all nodes at one time.
To utilize remote kernel launching with a better interface, add tunnel.m at Kernels/packages. Besides, one should also add tunnel.sh and tunnel_sub.sh in their home directory ~/.Mathematica/FrontEnd
.
One liner for user to be accessible ansible -i /home/ubuntu/hpc/hosts all -m command -a "/usr/bin/python3 /home/ubuntu/softwares/mmaact/automma.py '/opt/mathematica/11.0.1/bin/math'" --become-user=<user> -Kvv --become
.
matlab
mount iso to a dir, and use ssh -X
to install by X11 forwarding. Remember umount and mount iso2.
It is worth noting, that matlab installer supports -X for forwarding. While for matlab itself, only ssh -Y
works for remote desktop scheme.
singularity
spack install singularity
, remember to check spack edit singularity
, there is an after install warn prompt asking you run a script which would change the permission of some files with s bit. It is crucial for singularity run by normal users.
ganglia
into ansible workflow
apt-get install ganglia-monitor ganglia-monitor-python gmetad ganglia-webfrontend
ganglia-monitor is client side gmond.
On client, only ganglia-monitor
and ganglia-monitor-python
should be installed. The second one is necessary as modules to watch spec of nodes.
The workflow of ganglia configuration and installation have been merged to ansible playbooks.
ELK
into ansible workflow
add elastic repo and key
apt install elasticsearch, es binding to localhost instead of master
apt install kibana and configure nginx reverse proxy
apt install logstash and configure the pipe, note the ip binding of beat input configure (there are
""
for string ip)apt install filebeat (on all nodes)
sudo filebeat setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'
sudo filebeat setup -e -E output.logstash.enabled=false -E output.elasticsearch.hosts=['localhost:9200'] -E setup.kibana.host=localhost:5601
Summary of the above three points to init filebeat, the following is enough
The
-E
flag is for temporary output to es database to write in some templates and pipelines. Since the filebeat is configured by output to logstash.Note filebeat setup need temporary linking to es database, which is a must.
work test
curl -X GET "localhost:9200/_cat/indices?v"
disable logstash ssl (seems to be disable by default)
edit the index of logstash output (add beat.hostname in index name)
Detailed explanation on the tiemstamp mismatch for system module from filebeat: there are two types of log files, one with time zone information or direct claim on UTC timestamp, which the parser of filebeat can input the data into ES confidently with UTC timestamps. That is because ES always accept UTC timestamps and @timestamp is time zone agnostic. The reason that we feel the timestamps are just right in kibana is because the default setting of kibana is render the timestamp with timezones defined by broswer, aka. local OS system.
Coming back to the second type of log files, like syslog and authlog in ubuntu, they have timestamps but have no indication whether such timestamps is UTC or localtime. Actually rsyslog can run on UTC even if OS is set to some other timezones. This could persists to the restart of rsyslog service. So to parse these logs and write UTC timestamps to ES, filebeat must has a way to specify whetehr we need to convert the lietral time strings in syslog by some timezone and write another literal timestamps to ES. This is in principle configured by /etc/filbeat/modules.d/system.yml. There is a variable in the file called var.convert_timezone, turn it to be true (seems default false), and in principle, you can get the correct time view in kibana now.
But reconfiguring filebeat turns out not that easy. There are two totally different cases. The first one is the output of filebeat is directly some ES, which seems to be the default support case from the doc. In this case, one should first stop filebeat service, and delete all previou pipeline by curl, and then start filebeat service again, everything should be fine now, easy. In this case, restaring filebeat will automatically generate new pipelines in ES, which you dont need to care.
The second use case of filebeat, make the output of filebeat to logstash. I dont think this setup is very meaningful nowadays, since filebeat seems very powerful by itself. However, if you insist this approach and try to fix time zone problem, it would be a little subtle. A simple change on system.yml won't work, though it may be a bug instead of a feature. There are several differences between this case and the ES direct case. In the second case, the pipeline in ES cannot be autogenerated when start filebeat, which is fair since filebeat has no chance to communicate with the ES directly on the normal runtime. So after stop filebeat service and delete all previous pipelines in ES, you need to generate pipelines by hand using filebeat setup tools as indicated by the above bash block. Apart from setup —pipelines, setup -e is also suggested in case. Things behinds setup -e is to configure both index template in ES and dashbord in Kibana. And setup —pipelines, as shown by this option, is to add pipelines in ES. One can restart filebeat after there two lines of commands. Extra options for these commands with -E flags is for temporary config on config time which overwrites the default config in filebeat.yml. This is necessary because filebeat must connect to ES directly at configuring time to write there pipelines and index templates back to the ES. It also worth noting these -M options for generating pipelines. It turns out hacking convert_timezone in yml files in modules.d doesn't work for logstash output. Instead, you MUST specify them by -M options explicitly when generating pipelines. E is for configuration overwirte while M is for module configuration overwrite.
For certain pipeline in ES, one can check by
curl -XGET 'http://localhost:9200/_ingest/pipeline/filebeat-6.8.0-nginx-e*'
, one should make sure there is a timezone key field in it if the convert_timezone function is enabled.Also tips for debug, turn kibana time range as this week, so that you can realize that something may written into the furture by some misconfigure.
In sum, time is a big topic and a subtle issue in ELK stack and even in development in general. Pay attention to them and be careful!
to change config of filebeat
service filebeat stop
curl -XDELETE 'http://localhost:9200/_ingest/pipeline/filebeat*'
run the two setup steps in the summray
service filebeat start
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
(Thanks to elastic guys, xpack security now comes to free for 6.8.0+).configure on password in logstash need quote.
To query es with user authetication, just add
-u esuser:espass
option in curl commands.
Basic debug:
unset http_proxy
,curl -v --user <user>:<pass> -XGET 'http://master:9200/_cluster/health?pretty'
for ES cluster, failure of some es node may lead to http authetication error 401. It may has nothing to do with user passwrod and authentication things.
Misc note:
For debug test on es, curl will go proxy!!
no specified JAVA_HOME warning in es service log doesn't matter
actually it is ok for missing hostname, but the log from compute node is just too small compared to master…. It is not an issue due to ELK stack, but issue of non uptodate syslog. (time mismatch)
Actual problem: one should load pipeline for all modules in one line, otherwise, the latter one would overwrite the former one.
Timestamp in ES is always UTC, but kibana show them with broswer default timezone.
cluster conf
ssl is must
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca --pass "" --out elastic-stack-ca.p12
,sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12 --pass "" --out elastic-certificates.p12 --ca-pass ""
, we only need the final certifactes.p12 file.
elastalert
into ansible workflow
In general, a cool, reasonable and east-to-follow tools. The logical flow is better compared to the tools as middlewares in the logstash. Here we just query the ES database periodically, and sent alert accordingly based on some predefined rules.
pip3 install elastalert
pip3 install "elasticsearch>=5.0.0"
apt install elastalert
elastalert-create-index
Note: to use elastalert-test-rule
, first unset http_proxy
to make it accessible to localhost ES.
quota
sudo apt install quota
need further experiments on VM cluster first before apply it, always be carful for disk stuff
See the VM corresponding part for operations.
Not included in ansible due to flexibility consideration. Only use in master node.
ulimit
merged into ansible workflow
hard limit can only be changed by root and soft limit is something that anyone can change, but firstly, you need to change it.
check by ulimit -a
for individual users.
fail2ban
sudo apt install fail2ban
sudo fail2ban-client set sshd unbanip 166.
numa
apt install numactl
apt install hwloc
cgroup
sudo apt install cgroup-tools
tinc
sudo apt install tinc
combine tinc vpn with http proxy, such that http proxy ip is not public to everyone. Make http proxy only listen to the tinc ip interface.
tincd -n netname -K
to generate key pairs, and tincd -n netname
to start the daemon. For debug usage, try tincd -n netname -d5 -D
for a foregroud d with verbose output. On each Tinc daemon debug window, quit the daemon by pressing CTRL-\
.
sudo iptables -t nat -I POSTROUTING 1 -o tinc -s 192.168.48.0/24 ! -d 192.168.48.0/24 -j SNAT --to-source 10.26.11.1
on master node, make compute nodes available without any modification on them. (this new SNAT line is hopefully also managed by ansible playbooks). sudo iptables -t nat -nLv
check current iptables.
jumbo frame
merged into ansible
ip link set eth0 mtu 9000
The benchmarks shows little gain in enabling jumbo frames.
Using mtu 8500 instead of 9000 due to issue in Intel I219LM.
docker
tmpreaper
mail
mailutils seem to use hostname as fromto, no matter what myhostname is configured by postfix, it instead use -aFrom in commad line. Workable example echo "hello"|mail --debug-level 3 -s "subject" -aFrom user@some.localdomain receiver@mails.tsinghua.edu.cn
. Or echo "hello"|mail -s "go" user@mails.tsinghua.edu.cn -r ubuntu@master.localdomain
.
Now equipped with all nodes shipped with smartmontools.
Use customized smail script for slurm to overcome the wrong send address format.
backup
legacy approach (deprecated)
new approach based on restic
merged into ansible
apt install restic
ignorefile
RAID1 on c8
sudo apt install smartmontools
on c8, it depends on postfix, which I have configured to local only (not a big fan of postfix).
It seems that there is also smartd enabled as service.
RAID5 on c9
(June 4, 2021) Six 8T HDD as hardware RAID5 are added in c9. It is mounted at /DATA.c9 and shared with other nodes via NFS. Remember to check the disk health in this RAID5 frequently (may be once a month) in case one disk is down in RAID5.
some benchmarks
network
iperf, the master to compuation node bandwidth is around 940Mbit/s, which is near to the limit of the Gigabit nic.
iperf for ipv6:
iperf -sV
,iperf -c <remote> -B <src> -V
iperf for udp:
iperf -su
,iperf -c <remote> -u -b 1000M
, you should specify udp bandwidth on your own, otherwise, it gives a result around 1Mbs.-r
first send then receive;-d
both at the same time
cpu
memory
Available frequency is 2400, though the param is 2666, the speed is limited by CPU 5120.
disk
by brute force
dd if=/dev/zero of=/tmp/output bs=8k count=50k
cn3 ssd: 1.4GB/s
cn3 write to home folder, which is share by NFS and stored in master's SSD, 89.1 MB/s
master ssd: 1.3 GB/s
master /DATA, sdb1: 1.3GB/s (? cannot understand), similar results for sdc in master, it is weird though. (maybe due to disk controller or cache therein?)
dn1: sdb2 557MB/s, similar result for sdb, sda (under lvm): 471MB/s
electrical consumption
For c[1-8], the peak power is about 260W and 1.5A. For c9 with two 2080Ti, it is about 700W and 3A.
Last updated
Was this helpful?