Ask HN: What's your open source stack?
If you are using self-hosted open source to handle to handle your logs, traces, metrics instead of a third party SaSS or all in one platform, what do you use? How is it working out for you?
Inspired by a related thread about vendor lock in.
Interesting thread. We're building Markhub, a B2B collaboration SaaS, and we have a pragmatic, hybrid approach to our observability stack. Our philosophy is: use open source where it gives us control and flexibility, and use managed SaaS where it saves us time and engineering overhead.
Here's our stack:
Logs: We self-host a simple stack using Fluentd to collect logs, which are then shipped to Elasticsearch for storage and analysis. It's powerful, gives us full control over our data, and is more cost-effective at our scale than a managed logging service.
Metrics & Monitoring: For this, we decided not to reinvent the wheel. We use Datadog. The out-of-the-box dashboards, alerting, and deep integration with our cloud provider (AWS/GCP) save our small team hundreds of hours. The cost is justified by the engineering time we save.
Traces: We're currently using Datadog's APM for tracing as well, as it's tightly integrated. However, we're actively exploring moving to OpenTelemetry for more vendor neutrality in the future.
It's working out well. The key has been to be honest about our most valuable resource: engineering time. We self-host where we have a clear need for control (logs), and we pay for a service where the platform provides undeniable value and speed (metrics/monitoring).
For metrics and monitoring, can highly recommend victoria-metrics + vmagent + vmalert + prometheus-node-exporter. Rock solid, lean and performant. Documentation isn't all there and some of the debian packages aren't stellar (so you may want to build your own, which is ezpz) but IME it's a reliable stack.
They do integrate with Grafana much like Prometheus but we are not happy with Grafana[0] and recommendations for something low-maintenance we can drop in to the above stack instead would be awesome.
[0]: Why? Managing dashboards/plugins in an airgapped IaC setting is like pulling teeth at every other turn. Then at one point supposedly we had the container version pinned (by tag not sha, lesson learned) but yet at some point an image update from Docker Hub broke most of our dashboard/data-source links so now those dashboards are broken and I'd rather try something new before I recommit to Grafana.
Good ol’ Prometheus and Grafana stack (Loki for logs, Tempo for traces) is perfect for smaller projects. You can also explore having OpenTelemetry collectors in the middle for more sophisticated processing and if you want to keep an eye on its ecosystem.
This is still the goto OSS stack, and I wouldn’t really recommend looking into smaller projects (usually backed by a single vendor) that are claiming better performance/lower resource usage for the same capabilities, because that always comes with a cost.
Node, TypeScript, Docker, NMap, ESLint. I am thinking of dropping ESLint though. Too many dependencies.
I use that stack for this self hosted dashboard that I wrote. I can spin up a new web server or docker container in seconds. The web servers have proxies automatically provided and serve WebSockets on the same port as http. The dashboard tool also provides remote terminal and file system access. Everything displays in a web browser.
I run PiHole for DNS in my stack. I have toyed with the idea of writing my own PiHole alternative but taken no such action yet.
I also run PhotoPrism, Jellyfin, Meali, and some other stuff.
Victoriametrics for metrics (7x ram reduction vs prom), Loki for logs but evaluating Victorialogs. Then grafana for ui. Cannot recommend vm enough, it’s fantastic.
VictoriaMetrics CTO here.
If you hit some issues with VictoriaLogs or if you have ideas on how VictoriaLogs usability could be improved, then please file issues at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/ . We appreciate users' input and always try making our products easier to use.
Shynet is a good very light visitor tracking system for websites. Super simple.
I use mostly snmp with LibreNMS. Makes for a good standard way of getting most metrics I care about.
logrotate, systemd, `journalctl -u server-name | grep whatever`
This answer skips over how you gain access to the machine. I'm guessing you're a solo dev shop, but still
`ssh`, and yes I am solo.
What if the machine is unavailable? It is better to store logs from multiple hosts into a centralized database, so the logs could be investigated even if the original host is no longer available for any reason.
I create a `logs` table in my postgres database where I store important events (user upgraded, downgraded, signed up, etc). I use the filesystem based logs more or less for debugging or tracing specific things.
If the server is unavailable, then my entire product is missing because the entire product is on one server including the database.
Logs: log files, logrotate and grep.
Metrics: Prometheus, Grafana and prometheus-client (Python library for Prometheus expoters).