Go e Prometheus: Métricas e Alertas em Produção
Prometheus tornou-se o padrão de facto para monitoramento de aplicações cloud-native. Sua combinação com Go é natural - ambos foram criados pela SoundCloud e compartilham filosofias de simplicidade e performance.
Neste guia, você aprenderá a instrumentar aplicações Go com métricas Prometheus, criar dashboards no Grafana e configurar alertas inteligentes.
Índice
- Fundamentos do Prometheus
- Bibliotecas Go para Prometheus
- Tipos de Métricas
- Instrumentando Aplicações
- Métricas de Runtime Go
- Tracing e Context
- Dashboards no Grafana
- Alertas com Alertmanager
- Padrões de Produção
Fundamentos do Prometheus
Arquitetura
┌─────────────────────────────────────────────────────────────┐
│ Aplicações Go │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ App 1 │ │ App 2 │ │ App 3 │ │
│ │ :8080/metrics│ │ :8081/metrics│ │ :8082/metrics│ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼────────────────┼─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Prometheus Server │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Retriever │ │ TSDB │ │ Query Engine │ │
│ │ (Pull) │ │ (Storage) │ │ (PromQL) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────┬──────────────────────────────────────────────────┘
│
│ HTTP
▼
┌─────────────────────────────────────────────────────────────┐
│ Grafana │
│ (Visualização e Dashboards) │
└─────────────────────────────────────────────────────────────┘
│
│ Webhook
▼
┌─────────────────────────────────────────────────────────────┐
│ Alertmanager │
│ (Roteamento e Notificações) │
└─────────────────────────────────────────────────────────────┘
Modelo de Dados
Prometheus usa um modelo de dados dimensional:
http_requests_total{method="GET", endpoint="/api/users", status="200"} 1027
http_requests_total{method="POST", endpoint="/api/users", status="201"} 45
http_requests_total{method="GET", endpoint="/api/users", status="500"} 3
- Nome da métrica:
http_requests_total - Labels (dimensões):
method,endpoint,status - Valor: número atual (timestamp é implícito)
Bibliotecas Go para Prometheus
Instalação
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promauto
go get github.com/prometheus/client_golang/prometheus/promhttp
Configuração Básica
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
// Counter - apenas incrementa
requestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total de requisições HTTP",
},
[]string{"method", "endpoint", "status"},
)
// Gauge - pode subir ou descer
activeConnections = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "active_connections",
Help: "Número de conexões ativas",
},
)
// Histogram - distribuição de valores
requestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Duração das requisições HTTP",
Buckets: prometheus.DefBuckets, // 0.005, 0.01, 0.025, ..., 10
},
[]string{"method", "endpoint"},
)
// Summary - similar ao histogram, mas calcula percentis
responseSize = promauto.NewSummaryVec(
prometheus.SummaryOpts{
Name: "http_response_size_bytes",
Help: "Tamanho das respostas HTTP",
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
},
[]string{"method", "endpoint"},
)
)
func main() {
// Endpoint de métricas
http.Handle("/metrics", promhttp.Handler())
// Aplicação
http.HandleFunc("/api/data", handleRequest)
http.ListenAndServe(":8080", nil)
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Simula processamento
processRequest(w, r)
// Registra métricas
duration := time.Since(start).Seconds()
status := "200" // Determinado pelo resultado
requestsTotal.WithLabelValues(r.Method, "/api/data", status).Inc()
requestDuration.WithLabelValues(r.Method, "/api/data").Observe(duration)
}
Tipos de Métricas
1. Counter (Contador)
Apenas incrementa. Útil para contar eventos.
var (
// Contador simples
errorsTotal = promauto.NewCounter(
prometheus.CounterOpts{
Name: "errors_total",
Help: "Número total de erros",
},
)
// Counter com labels
httpErrors = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_errors_total",
Help: "Total de erros HTTP por tipo",
},
[]string{"code", "endpoint"},
)
// Contador de exceções
panicRecoveries = promauto.NewCounter(
prometheus.CounterOpts{
Name: "panic_recoveries_total",
Help: "Número de panics recuperados",
},
)
)
// Uso
func processWithError() error {
if err := doSomething(); err != nil {
errorsTotal.Inc()
httpErrors.WithLabelValues("500", "/api/endpoint").Inc()
return err
}
return nil
}
// Rate() no Prometheus: rate(errors_total[5m])
2. Gauge (Medidor)
Pode aumentar ou diminuir. Útil para valores instantâneos.
var (
// Conexões ativas
activeConnections = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "active_connections",
Help: "Número de conexões ativas",
},
)
// Tamanho da fila
queueSize = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "queue_size",
Help: "Tamanho da fila por tipo",
},
[]string{"queue_name"},
)
// Temperatura (exemplo IoT)
temperature = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "sensor_temperature_celsius",
Help: "Temperatura dos sensores",
},
[]string{"sensor_id", "location"},
)
)
// Uso
func handleConnection(conn net.Conn) {
activeConnections.Inc()
defer activeConnections.Dec()
// Processa conexão
}
// Set() para valores absolutos
func updateQueueSize(name string, size int) {
queueSize.WithLabelValues(name).Set(float64(size))
}
// Query no Prometheus: active_connections
3. Histogram (Histograma)
Amostra observações e conta em buckets. Útil para latências.
var (
// Latência de requisições
requestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Duração das requisições HTTP em segundos",
Buckets: []float64{0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10},
},
[]string{"method", "endpoint"},
)
// Tamanho de payloads
payloadSize = promauto.NewHistogram(
prometheus.HistogramOpts{
Name: "request_payload_bytes",
Help: "Tamanho dos payloads em bytes",
Buckets: prometheus.ExponentialBuckets(100, 10, 8), // 100, 1000, 10000...
},
)
)
// Uso
func timedHandler(w http.ResponseWriter, r *http.Request) {
start := time.Now()
defer func() {
duration := time.Since(start).Seconds()
requestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
}()
// Processa requisição
}
// Queries úteis:
// - histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
// - http_request_duration_seconds_count
// - http_request_duration_seconds_sum / http_request_duration_seconds_count (média)
4. Summary (Resumo)
Similar ao histogram, mas calcula percentis no cliente.
var (
// Latência com percentis calculados
requestLatency = promauto.NewSummaryVec(
prometheus.SummaryOpts{
Name: "http_request_latency_seconds",
Help: "Latência das requisições HTTP",
Objectives: map[float64]float64{
0.5: 0.05, // Mediana com 5% de erro
0.9: 0.01, // P90 com 1% de erro
0.99: 0.001, // P99 com 0.1% de erro
},
MaxAge: 10 * time.Minute,
AgeBuckets: 5,
},
[]string{"method", "endpoint"},
)
)
// Summary é mais preciso para percentis, mas mais custoso no cliente
// Use quando precisar de percentis precisos e tiver poucas séries temporais
Instrumentando Aplicações
Middleware HTTP Completo
package middleware
import (
"net/http"
"strconv"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
httpDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Duração das requisições HTTP",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path", "status"},
)
httpRequests = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total de requisições HTTP",
},
[]string{"method", "path", "status"},
)
httpRequestSize = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_size_bytes",
Help: "Tamanho das requisições HTTP",
Buckets: prometheus.ExponentialBuckets(100, 10, 7),
},
[]string{"method", "path"},
)
httpResponseSize = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_response_size_bytes",
Help: "Tamanho das respostas HTTP",
Buckets: prometheus.ExponentialBuckets(100, 10, 7),
},
[]string{"method", "path"},
)
activeRequests = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "http_active_requests",
Help: "Requisições HTTP ativas",
},
)
)
// responseWriter wrapper para capturar status code
type responseWriter struct {
http.ResponseWriter
statusCode int
written int64
}
func (rw *responseWriter) WriteHeader(code int) {
rw.statusCode = code
rw.ResponseWriter.WriteHeader(code)
}
func (rw *responseWriter) Write(b []byte) (int, error) {
n, err := rw.ResponseWriter.Write(b)
rw.written += int64(n)
return n, err
}
// PrometheusMiddleware instrumenta handlers HTTP
func PrometheusMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
activeRequests.Inc()
defer activeRequests.Dec()
wrapped := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
next.ServeHTTP(wrapped, r)
duration := time.Since(start).Seconds()
status := strconv.Itoa(wrapped.statusCode)
labels := prometheus.Labels{
"method": r.Method,
"path": r.URL.Path,
"status": status,
}
httpDuration.With(labels).Observe(duration)
httpRequests.With(labels).Inc()
httpResponseSize.WithLabelValues(r.Method, r.URL.Path).Observe(float64(wrapped.written))
if r.ContentLength > 0 {
httpRequestSize.WithLabelValues(r.Method, r.URL.Path).Observe(float64(r.ContentLength))
}
})
}
// Uso
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/api/users", handleUsers)
mux.HandleFunc("/api/orders", handleOrders)
// Aplica middleware
handler := PrometheusMiddleware(mux)
// Endpoint de métricas
http.Handle("/metrics", promhttp.Handler())
http.Handle("/", handler)
http.ListenAndServe(":8080", nil)
}
Instrumentação de Database
package database
import (
"context"
"database/sql"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
_ "github.com/lib/pq"
)
var (
dbQueries = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "db_queries_total",
Help: "Total de queries executadas",
},
[]string{"operation", "table"},
)
dbQueryDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "db_query_duration_seconds",
Help: "Duração das queries",
Buckets: []float64{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1},
},
[]string{"operation", "table"},
)
dbConnections = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "db_connections",
Help: "Número de conexões no pool",
},
[]string{"state"},
)
dbErrors = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "db_errors_total",
Help: "Erros de banco de dados",
},
[]string{"operation", "error_type"},
)
)
// InstrumentedDB wrapper para database/sql.DB
type InstrumentedDB struct {
*sql.DB
}
func NewInstrumentedDB(driver, dsn string) (*InstrumentedDB, error) {
db, err := sql.Open(driver, dsn)
if err != nil {
return nil, err
}
// Coleta métricas do pool
go collectPoolMetrics(db)
return &InstrumentedDB{DB: db}, nil
}
func collectPoolMetrics(db *sql.DB) {
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for range ticker.C {
stats := db.Stats()
dbConnections.WithLabelValues("open").Set(float64(stats.OpenConnections))
dbConnections.WithLabelValues("in_use").Set(float64(stats.InUse))
dbConnections.WithLabelValues("idle").Set(float64(stats.Idle))
}
}
func (db *InstrumentedDB) QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
return db.instrumentQuery("SELECT", "unknown", func() (*sql.Rows, error) {
return db.DB.QueryContext(ctx, query, args...)
})
}
func (db *InstrumentedDB) instrumentQuery(operation, table string, fn func() (*sql.Rows, error)) (*sql.Rows, error) {
start := time.Now()
defer func() {
dbQueryDuration.WithLabelValues(operation, table).Observe(time.Since(start).Seconds())
}()
dbQueries.WithLabelValues(operation, table).Inc()
rows, err := fn()
if err != nil {
dbErrors.WithLabelValues(operation, "query_error").Inc()
}
return rows, err
}
Métricas de Runtime Go
Coletor de Runtime
package metrics
import (
"runtime"
"runtime/metrics"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
// Métricas do GC
gcCycles = promauto.NewCounter(
prometheus.CounterOpts{
Name: "go_gc_cycles_total",
Help: "Número de ciclos de garbage collection",
},
)
gcPauseNs = promauto.NewHistogram(
prometheus.HistogramOpts{
Name: "go_gc_pause_duration_seconds",
Help: "Duração das pausas do GC",
Buckets: []float64{0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05},
},
)
// Memória
heapAlloc = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "go_memory_heap_alloc_bytes",
Help: "Bytes alocados no heap",
},
)
heapSys = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "go_memory_heap_sys_bytes",
Help: "Bytes obtidos do sistema para o heap",
},
)
// Goroutines
goroutines = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "go_goroutines",
Help: "Número de goroutines ativas",
},
)
// Threads
threads = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "go_threads",
Help: "Número de threads do sistema operacional",
},
)
)
// StartRuntimeMetrics inicia a coleta de métricas de runtime
func StartRuntimeMetrics() {
// Coleta inicial
collectRuntimeMetrics()
// Coleta periódica
ticker := time.NewTicker(15 * time.Second)
go func() {
for range ticker.C {
collectRuntimeMetrics()
}
}()
}
func collectRuntimeMetrics() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
heapAlloc.Set(float64(m.HeapAlloc))
heapSys.Set(float64(m.HeapSys))
goroutines.Set(float64(runtime.NumGoroutine()))
threads.Set(float64(runtime.GOMAXPROCS(0)))
// GC metrics
gcCycles.Add(float64(m.NumGC))
// Pauses recentes do GC
if len(m.PauseNs) > 0 {
// Pega a última pausa
lastPause := m.PauseNs[(m.NumGC+255)%256]
gcPauseNs.Observe(float64(lastPause) / 1e9)
}
}
// Métricas avançadas com runtime/metrics (Go 1.16+)
func collectAdvancedMetrics() {
samples := []metrics.Sample{
{Name: "/sched/gomaxprocs:threads"},
{Name: "/sched/goroutines:goroutines"},
{Name: "/memory/classes/heap/free:bytes"},
{Name: "/memory/classes/heap/objects:bytes"},
{Name: "/gc/heap/allocs:bytes"},
{Name: "/gc/heap/frees:bytes"},
}
metrics.Read(samples)
for _, sample := range samples {
switch sample.Value.Kind() {
case metrics.KindUint64:
// Processa uint64
case metrics.KindFloat64:
// Processa float64
}
}
}
Tracing e Context
Distributed Tracing
package tracing
import (
"context"
"net/http"
"time"
"github.com/opentracing/opentracing-go"
"github.com/opentracing/opentracing-go/ext"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
traceSpans = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "trace_spans_total",
Help: "Total de spans criados",
},
[]string{"operation", "service"},
)
traceDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "trace_span_duration_seconds",
Help: "Duração dos spans",
Buckets: prometheus.DefBuckets,
},
[]string{"operation", "service"},
)
)
// TracedHandler adiciona tracing a handlers HTTP
func TracedHandler(operation string, next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
tracer := opentracing.GlobalTracer()
// Extrai span context do header (distributed tracing)
spanCtx, _ := tracer.Extract(
opentracing.HTTPHeaders,
opentracing.HTTPHeadersCarrier(r.Header),
)
span := tracer.StartSpan(
operation,
ext.RPCServerOption(spanCtx),
)
defer span.Finish()
// Adiciona span ao context
ctx := opentracing.ContextWithSpan(r.Context(), span)
start := time.Now()
next(w, r.WithContext(ctx))
duration := time.Since(start).Seconds()
// Métricas
traceSpans.WithLabelValues(operation, "api").Inc()
traceDuration.WithLabelValues(operation, "api").Observe(duration)
}
}
// TracedFunction instrumenta funções internas
func TracedFunction(ctx context.Context, operation string, fn func() error) error {
span, ctx := opentracing.StartSpanFromContext(ctx, operation)
defer span.Finish()
start := time.Now()
err := fn()
duration := time.Since(start).Seconds()
if err != nil {
ext.Error.Set(span, true)
span.LogKV("error", err.Error())
}
traceSpans.WithLabelValues(operation, "internal").Inc()
traceDuration.WithLabelValues(operation, "internal").Observe(duration)
return err
}
Dashboards no Grafana
Dashboard JSON Exemplo
{
"dashboard": {
"title": "Go Application Metrics",
"panels": [
{
"title": "Request Rate",
"type": "stat",
"targets": [{
"expr": "sum(rate(http_requests_total[5m]))",
"legendFormat": "req/s"
}],
"gridPos": {"h": 8, "w": 6, "x": 0, "y": 0}
},
{
"title": "Error Rate",
"type": "stat",
"targets": [{
"expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) * 100",
"legendFormat": "%"
}],
"gridPos": {"h": 8, "w": 6, "x": 6, "y": 0}
},
{
"title": "Latency P95",
"type": "graph",
"targets": [{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "{{method}} {{endpoint}}"
}],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
},
{
"title": "Goroutines",
"type": "graph",
"targets": [{
"expr": "go_goroutines",
"legendFormat": "goroutines"
}],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
},
{
"title": "Memory Usage",
"type": "graph",
"targets": [
{
"expr": "go_memory_heap_alloc_bytes",
"legendFormat": "Heap Alloc"
},
{
"expr": "go_memory_heap_sys_bytes",
"legendFormat": "Heap Sys"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
}
]
}
}
Queries PromQL Essenciais
# Rate de requisições por segundo
sum(rate(http_requests_total[5m]))
# Latência P95 por endpoint
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)
# Taxa de erro
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) * 100
# Uso de memória
process_resident_memory_bytes{job="api"}
# Goroutines crescendo (possível leak)
rate(go_goroutines[5m]) > 10
# GC pressure
rate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m])
Alertas com Alertmanager
Regras de Alerta
# alerts.yml
groups:
- name: api_alerts
rules:
# Alta taxa de erro
- alert: HighErrorRate
expr: |
(
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Alta taxa de erro na API"
description: "Taxa de erro: {{ $value | humanizePercentage }}"
# Latência alta
- alert: HighLatency
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "Latência P95 acima de 500ms"
# Muitas goroutines (possível leak)
- alert: GoroutineLeak
expr: go_goroutines > 10000
for: 10m
labels:
severity: warning
annotations:
summary: "Possível leak de goroutines"
# API down
- alert: APIDown
expr: up{job="api"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "API está offline"
# GC pausas longas
- alert: LongGCPauses
expr: |
histogram_quantile(0.99,
sum(rate(go_gc_pause_duration_seconds_bucket[5m])) by (le)
) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "GC pausas acima de 100ms"
Configuração do Alertmanager
# alertmanager.yml
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alerts@example.com'
route:
receiver: 'default'
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
- match:
severity: warning
receiver: 'slack-warnings'
receivers:
- name: 'default'
email_configs:
- to: 'team@example.com'
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: 'your-service-key'
severity: critical
- name: 'slack-warnings'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts'
title: 'Alerta: {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname']
Padrões de Produção
1. Cardinalidade Controlada
// ❌ Evite: cardinalidade não limitada
httpRequests.WithLabelValues(r.Method, r.URL.Path, r.UserAgent()).Inc()
// Path pode ter IDs: /users/123, /users/456 → explosão de séries
// ✅ Faça: agrupe endpoints
func normalizePath(path string) string {
// Remove IDs e normaliza
re := regexp.MustCompile(`/\d+`)
return re.ReplaceAllString(path, "/:id")
}
httpRequests.WithLabelValues(
r.Method,
normalizePath(r.URL.Path),
r.UserAgent(),
).Inc()
2. Métricas de Health
var (
healthCheck = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "health_check",
Help: "Status dos health checks",
},
[]string{"check"},
)
dependencyUp = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "dependency_up",
Help: "Status de dependências externas",
},
[]string{"name"},
)
)
func healthHandler(w http.ResponseWriter, r *http.Request) {
checks := map[string]bool{
"database": checkDatabase(),
"cache": checkCache(),
"queue": checkQueue(),
}
allHealthy := true
for name, healthy := range checks {
if healthy {
healthCheck.WithLabelValues(name).Set(1)
} else {
healthCheck.WithLabelValues(name).Set(0)
allHealthy = false
}
}
if allHealthy {
w.WriteHeader(http.StatusOK)
} else {
w.WriteHeader(http.StatusServiceUnavailable)
}
}
3. Graceful Shutdown
func main() {
srv := &http.Server{Addr: ":8080"}
// Métricas
go func() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":9090", nil)
}()
// Graceful shutdown
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-quit
// Métrica de shutdown
shutdownTimestamp.SetToCurrentTime()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
srv.Shutdown(ctx)
}()
srv.ListenAndServe()
}
Conclusão
Neste guia, você aprendeu:
✅ Fundamentos: Modelo de dados e arquitetura Prometheus ✅ Métricas: Counter, Gauge, Histogram e Summary ✅ Instrumentação: Middleware HTTP, database, runtime ✅ Tracing: Distributed tracing com OpenTracing ✅ Visualização: Dashboards Grafana e queries PromQL ✅ Alertas: Regras de alerta e Alertmanager ✅ Produção: Cardinalidade, health checks, graceful shutdown
Próximos Passos
- Go e Grafana - Dashboards avançados e visualizações
- Go Observability - Logs, métricas e traces unificados
- Go e Jaeger - Distributed tracing completo
Recursos Adicionais
FAQ
Q: Qual a diferença entre Histogram e Summary? R: Histogram agrupa em buckets no servidor; Summary calcula percentis no cliente. Histogram é mais flexível para agregação; Summary é mais preciso para percentis.
Q: Como evitar alta cardinalidade? R: Evite usar valores únicos (IDs, timestamps) em labels. Normalize paths, agrupe status codes (2xx, 4xx, 5xx), use enums para valores finitos.
Q: Devo expor métricas na mesma porta da aplicação? R: Para simplicidade, sim. Para segurança, use uma porta separada ou proteja /metrics com autenticação.
Q: Qual o intervalo ideal de scrape? R: 15-30s é comum. Ajuste conforme necessidade de granularidade vs overhead.