I’m looking into setting up some monitoring combined with simple automation for my selfhosting. Currently I was thinking about using Zabbix.
I want to:
Track bandwidth usage on a router/fw and on a managed switch and track cpu/ram/disk usage on my vms.
Simple monitoring (up/down/maintenance) on the router, switch, my vms as well as on linux services (jellyfin/forgejo/etc) and windows services (lab for studying work-related tools).
I’m also interested in doing simple https checks on my webuis (i’ve had a service running but the website returning both 403 and 404 before) and testing nslookup on my internal dns (if the service is up but the lookups timeout I still want to try restarting the service).
Is there any FOSS/FLOSS alternatives that I should look into before diving into Zabbix?
I’ve been using Zabbix for ages now. It has issues but I got used to it.
I used to use MQTT, static_status and Healthchecks.io, and have that data passed through to Home Assistant, but it started to get pretty cumbersome as the amount of machines I had grew.
I now use just Zabbix and HealthchecksIO. I did need to spend some time writing new templates for some additional data I wanted to collect (like SMART data for SSDs that provide health metrics in non-standard attributes, and HealthchecksIO so I could see the status of various checks on my zabbix dashboard)
Zabbix also has some additional features I found appealing, like proxies that can continue recording data when the main server is down, and built in encryption. Some checks like open ports/icmp responses etc can be checked using either the local agent, the remote server, or both, which helps quickly diagnose things like firewall config issues.
I did look at some other solutions, but I wanted something integrated to hit the ground running. Mobile apps are very limited, and there is no official one to my knowledge. I use Moobix which I don’t believe is FOSS - but I could be wrong there
Try each solution out and see what works best for you!
I use netdata (the FOSS agent only, not the cloud offering) on all my servers (physical, VMs…) and stream all metrics to a parent netdata instance. It works extremely well for me.
Other solutions are too cumbersome and heavy on maintenance for me. You can query netdata from prometheus/grafana [1] if you really need custom dashboards.
I guess you wouldn’t be able to install it on the router/switch but there is a SNMP collector which should be able to query bandwidth info from the network appliances.
Gonna check it out!
Is it easy to setup automatic responses to the alerts, f.e. restarting a service if it isn’t answering requests in a timely manner?
Have you used it together with Windows Servers too?Windows Servers
No
setup automatic responses to the alerts
It should be possible using
script to execute on alarm = /your/custom/remediation-script
https://learn.netdata.cloud/docs/alerts-&-notifications/notifications/agent-dispatched-notifications/agent-notifications-reference. I have not experimented with this yet, but soon will (implementing a custom notification channel for specific alarms)restarting a service if it isn’t answering requests
I’d rather find the root cause of the downtime/malfunction instead of blindly restarting the service, just my 2 cents.
Also uptime kuma for fast and easy up/down, web services, etc.