How do you monitor your servers / VPS:es?

krash@lemmy.ml · edit-2 11 months ago

How do you monitor your servers / VPS:es?

vegetaaaaaaa@lemmy.world · edit-2 11 months ago

Netdata (agent only/not the cloud-based features), and a bunch of scanners running from cron/systemd timers, rsyslog for logs (and graylog for larger setups)

My base ansible role for monitoring.

Since your question is also related to securing your setup, inspect and harden the configuration of all running services and the OS itself. Here is my common ansible role for basic stuff. Find (prefereably official) hardening guides for your distribution and implement hardening guidelines such as DISA STIG, CIS benchmarks, ANSSI guides, etc.

JonnyJaap@lemmy.world · 11 months ago

I used zabbix at some point, but I never looked at the data so I stopped. Zabbix shows all kind of stuff.

I have cockpit on my bare-metal that has some stats, and netdata on my firewall, I do not track any of my VM’s (except vnstat that runs on everything device).

MrMcGasion@lemmy.world · 11 months ago

I’ve dabbled with some monitoring tools in the past, but never really stuck with anything proper for very long. I usually notice issues myself. I self-host my own custom new-tab page that I use across all my devices and between that, Nextcloud clients, and my home-assistant reverse proxy on the same vps, when I do have unexpected downtime, I usually notice within a few minutes.

Other than that I run fail2ban, and have my vps configured to send me a text message/notification whenever someone successfully logs in to a shell via ssh, just in case.

Based on the logs over the years, most bots that try to login try with usernames like admin or root, I have root login disabled for ssh, and the one account that can be used over ssh has a non-obvious username that would also have to be guessed before an attacker could even try passwords, and fail2ban does a good job of blocking ips that fail after a few tries.

If I used containers, I would probably want a way to monitor them, but I personally dislike containers (for myself, I’m not here to “yuck” anyone’s “yum”) and deliberately avoid them.

Dataprolet@lemmy.dbzer0.com · 11 months ago

Uptime-Kuma

𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.dbzer0.com · 11 months ago

I used to pass all the data through to Home Assistant and show it on some dashboards, but I decided to move over to Zabbix.

Works well but is quite full-featured, maybe moreso than necessary for a self hoster. Made a mediatype integration for my announciator system so I hear issues happening with the servers, as well as updates on things, so I don’t really need to check manually. Also a custom SMART template that populates the disk’s physical location/bay (as the built in one only reports SMART data).

It’s notified me of a few hardware issues that would have gone unnoticed on my previous system, and helped with diagnosing others. A lot of the sensors may seem useless, but trust me, once they flag up you should 100% check on your hardware. Hard drives losing power during high activity because of loose connections, and a CPU fan failure to name two.

It has a really high learning curve though so not sure how much I can recommend it over something like Grafana+Prometheus - something I haven’t used but the combo looks equally as comprehensive as long as you check your dashboard regularly.

Just wish there were more android apps

namelivia@lemmy.world · 11 months ago

Prometheus, Loki and Grafana.

TheGreenGolem@lemmy.dbzer0.com · 11 months ago

It cannot notify you, you have to check it manually, but: I use DaRemote on my phone to periodically check my bare metal.

Cyberflunk@lemmy.world · 11 months ago

Reduce your threat profile. Run sslh, 443 handles both SSL and ssh. Adjust your host based firewall to just 443 Attack yourself on that port, identify the logs Add the new profiles to fail2ban Enable fail2ban email If you don’t like email, use a service that translates email to notification. Ntfy.sh is free notifications Or… Use something like tailscale and don’t offer a remote login to the general Internet.

I submitted your post to got here’s what it thought

https://shareg.pt/Tz0El4k