Logging Integrations
This guide covers integrating the Greenfield Cluster's structured logging with popular log aggregation platforms.
Table of Contents
- Overview
- Log Format
- Splunk Integration
- Elasticsearch/ELK Stack
- GCP Cloud Logging
- AWS CloudWatch
- Grafana Loki
- Datadog
- New Relic
- Common Patterns
Overview
The backend-service and updated fastapi-example use structured JSON logging that's compatible with most log aggregation platforms. Logs include:
- ECS-compatible format: Works with Elastic Common Schema
- Trace correlation:
trace.idandtrace.span_idfor log-trace correlation - Structured fields: Machine-readable JSON for easy querying
- Rich metadata: Service, host, process information
Log Format
Example log entry:
{
"@timestamp": "2024-01-15T10:30:45.123Z",
"timestamp": "2024-01-15T10:30:45.123Z",
"log": {
"level": "INFO",
"logger": "app.main",
"origin": {
"file": {
"name": "main.py",
"line": 145
},
"function": "call_frontend"
}
},
"message": "GET /call-frontend - 200 (45.23ms)",
"service": {
"name": "backend-service",
"version": "1.0.0",
"environment": "production"
},
"trace": {
"id": "1234567890abcdef1234567890abcdef",
"span_id": "1234567890abcdef"
},
"process": {
"pid": 1234
},
"host": {
"hostname": "backend-service-7d8f9c-xkz2p"
},
"http": {
"request": {
"method": "GET",
"path": "/call-frontend"
},
"response": {
"status_code": 200,
"duration_ms": 45.23
}
}
}
Splunk Integration
Using HTTP Event Collector (HEC)
Step 1: Create HEC Token in Splunk
- Navigate to Settings → Data Inputs → HTTP Event Collector
- Click "New Token"
- Name your token (e.g., "kubernetes-logs")
- Select source type:
_json - Copy the token value
Step 2: Deploy Fluent Bit as DaemonSet
Create kustomize/base/logging/fluentbit-splunk.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: greenfield
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*greenfield*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name splunk
Match *
Host YOUR_SPLUNK_HOST
Port 8088
Splunk_Token YOUR_HEC_TOKEN
TLS On
TLS.Verify Off
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: greenfield
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.1
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
Step 3: Query Logs in Splunk
# Find all backend-service logs
index=kubernetes service.name="backend-service"
# Find errors
index=kubernetes service.name="backend-service" log.level="ERROR"
# Find slow requests
index=kubernetes service.name="backend-service" http.response.duration_ms>1000
# Find logs for specific trace
index=kubernetes trace.id="1234567890abcdef1234567890abcdef"
# Correlation: errors over time
index=kubernetes service.name="backend-service" log.level="ERROR"
| timechart count by service.name
Elasticsearch/ELK Stack
Using Filebeat
Step 1: Deploy Elasticsearch and Kibana
# Add Elastic Helm repo
helm repo add elastic https://helm.elastic.co
# Install Elasticsearch
helm install elasticsearch elastic/elasticsearch \
--namespace logging \
--create-namespace
# Install Kibana
helm install kibana elastic/kibana \
--namespace logging
Step 2: Deploy Filebeat
Create kustomize/base/logging/filebeat.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: greenfield
data:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*greenfield*.log
json.keys_under_root: true
json.add_error_key: true
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
index: "kubernetes-logs-%{+yyyy.MM.dd}"
setup.ilm.enabled: false
setup.template.name: "kubernetes-logs"
setup.template.pattern: "kubernetes-logs-*"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: greenfield
spec:
selector:
matchLabels:
app: filebeat
template:
metadata:
labels:
app: filebeat
spec:
serviceAccountName: filebeat
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:8.11.0
env:
- name: ELASTICSEARCH_HOST
value: elasticsearch.logging.svc.cluster.local
- name: ELASTICSEARCH_PORT
value: "9200"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: config
mountPath: /usr/share/filebeat/filebeat.yml
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
name: filebeat-config
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
- name: data
emptyDir: {}
Step 3: Create Kibana Index Pattern
- Open Kibana
- Navigate to Management → Index Patterns
- Create pattern:
kubernetes-logs-* - Select timestamp field:
@timestamp
Step 4: Query Logs
# KQL (Kibana Query Language) examples
# Find backend service logs
service.name:"backend-service"
# Find errors
service.name:"backend-service" AND log.level:"ERROR"
# Find slow requests
service.name:"backend-service" AND http.response.duration_ms > 1000
# Find logs for trace
trace.id:"1234567890abcdef1234567890abcdef"
# Visualize error rate
service.name:"backend-service" AND log.level:"ERROR"
# Then create visualization: Count over time
GCP Cloud Logging
Using Fluent Bit with Google Cloud Output
Step 1: Create Service Account
# Create service account
gcloud iam service-accounts create logging-writer \
--display-name="Logging Writer"
# Grant permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:logging-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/logging.logWriter"
# Create key
gcloud iam service-accounts keys create logging-key.json \
--iam-account=logging-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com
# Create secret
kubectl create secret generic gcp-logging-key \
--from-file=key.json=logging-key.json \
--namespace=greenfield
Step 2: Deploy Fluent Bit
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-gcp-config
namespace: greenfield
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon off
Log_Level info
[INPUT]
Name tail
Path /var/log/containers/*greenfield*.log
Parser docker
Tag kube.*
Refresh_Interval 5
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
[OUTPUT]
Name stackdriver
Match *
google_service_credentials /var/secrets/google/key.json
resource k8s_container
k8s_cluster_name YOUR_CLUSTER_NAME
k8s_cluster_location YOUR_CLUSTER_LOCATION
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit-gcp
namespace: greenfield
spec:
selector:
matchLabels:
app: fluent-bit-gcp
template:
metadata:
labels:
app: fluent-bit-gcp
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.1
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
- name: google-cloud-key
mountPath: /var/secrets/google
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-gcp-config
- name: google-cloud-key
secret:
secretName: gcp-logging-key
Step 3: Query in Cloud Logging
# Find backend service logs
resource.type="k8s_container"
labels."k8s-pod/app"="backend-service"
# Find errors
resource.type="k8s_container"
labels."k8s-pod/app"="backend-service"
jsonPayload.log.level="ERROR"
# Find slow requests
resource.type="k8s_container"
labels."k8s-pod/app"="backend-service"
jsonPayload.http.response.duration_ms > 1000
# Find logs for trace
resource.type="k8s_container"
jsonPayload.trace.id="1234567890abcdef1234567890abcdef"
AWS CloudWatch
Using Fluent Bit with CloudWatch Output
Step 1: Create IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
Step 2: Deploy Fluent Bit
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-cloudwatch-config
namespace: greenfield
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon off
Log_Level info
[INPUT]
Name tail
Path /var/log/containers/*greenfield*.log
Parser docker
Tag kube.*
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /kubernetes/greenfield
log_stream_prefix from-fluent-bit-
auto_create_group true
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Step 3: Query CloudWatch Logs Insights
# Find backend service logs
fields @timestamp, message, service.name
| filter service.name = "backend-service"
| sort @timestamp desc
# Find errors
fields @timestamp, message, log.level, error
| filter service.name = "backend-service" and log.level = "ERROR"
| sort @timestamp desc
# Find slow requests
fields @timestamp, http.request.path, http.response.duration_ms
| filter service.name = "backend-service" and http.response.duration_ms > 1000
| sort http.response.duration_ms desc
# Aggregate errors by endpoint
fields http.request.path
| filter service.name = "backend-service" and log.level = "ERROR"
| stats count() by http.request.path
Grafana Loki
Using Promtail
Step 1: Install Loki and Grafana
# Add Grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts
# Install Loki
helm install loki grafana/loki-stack \
--namespace logging \
--create-namespace \
--set grafana.enabled=true \
--set prometheus.enabled=false
Step 2: Deploy Promtail
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: greenfield
data:
promtail.yaml: |
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: (backend-service|fastapi-app)
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: node_name
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
pipeline_stages:
- json:
expressions:
level: log.level
service: service.name
trace_id: trace.id
span_id: trace.span_id
- labels:
level:
service:
trace_id:
Step 3: Query with LogQL
# Find backend service logs
{app="backend-service"}
# Find errors
{app="backend-service"} | json | level="ERROR"
# Find slow requests
{app="backend-service"} | json | http_response_duration_ms > 1000
# Find logs for trace
{trace_id="1234567890abcdef1234567890abcdef"}
# Rate of errors
rate({app="backend-service"} | json | level="ERROR" [5m])
# P95 latency
quantile_over_time(0.95,
{app="backend-service"}
| json
| unwrap http_response_duration_ms [5m]
) by (service)
Datadog
Using Datadog Agent
Step 1: Create API Key Secret
kubectl create secret generic datadog-secret \
--from-literal api-key=YOUR_DATADOG_API_KEY \
--namespace=greenfield
Step 2: Deploy Datadog Agent
helm repo add datadog https://helm.datadoghq.com
helm install datadog datadog/datadog \
--namespace greenfield \
--set datadog.apiKeyExistingSecret=datadog-secret \
--set datadog.logs.enabled=true \
--set datadog.logs.containerCollectAll=true \
--set datadog.apm.enabled=true
Step 3: Query in Datadog
# Find backend service logs
service:backend-service
# Find errors
service:backend-service status:error
# Find slow requests
service:backend-service @http.response.duration_ms:>1000
# Find logs for trace
@trace_id:1234567890abcdef1234567890abcdef
New Relic
Using New Relic Kubernetes Integration
Step 1: Install New Relic
helm repo add newrelic https://helm-charts.newrelic.com
helm install newrelic-bundle newrelic/nri-bundle \
--set global.licenseKey=YOUR_LICENSE_KEY \
--set global.cluster=greenfield-cluster \
--namespace=greenfield \
--set newrelic-infrastructure.privileged=true \
--set ksm.enabled=true \
--set prometheus.enabled=true \
--set kubeEvents.enabled=true \
--set logging.enabled=true
Step 2: Query NRQL
-- Find backend service logs
SELECT * FROM Log WHERE service.name = 'backend-service'
-- Find errors
SELECT * FROM Log WHERE service.name = 'backend-service' AND log.level = 'ERROR'
-- Find slow requests
SELECT * FROM Log
WHERE service.name = 'backend-service'
AND http.response.duration_ms > 1000
-- Error rate over time
SELECT count(*) FROM Log
WHERE service.name = 'backend-service' AND log.level = 'ERROR'
TIMESERIES
Common Patterns
Log-Trace Correlation
In any system, you can correlate logs with traces:
- From Trace to Logs: Click trace in Jaeger, copy
trace_id, search logs - From Logs to Trace: See
trace_idin log, search Jaeger
Example workflow:
1. User reports slow request
2. Find error in logs: trace_id = "abc123..."
3. Search Jaeger for trace_id = "abc123..."
4. See full distributed trace
5. Identify slow service
6. Search logs for that service with same trace_id
7. Find root cause in logs
Alerting on Logs
Most platforms support alerting on log patterns:
Splunk Alert:
index=kubernetes service.name="backend-service" log.level="ERROR"
| stats count by service.name
| where count > 10
Elasticsearch Watcher:
{
"trigger": {
"schedule": { "interval": "5m" }
},
"input": {
"search": {
"request": {
"indices": ["kubernetes-logs-*"],
"body": {
"query": {
"bool": {
"must": [
{ "term": { "service.name": "backend-service" }},
{ "term": { "log.level": "ERROR" }}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": { "gt": 10 }
}
}
}
Log Sampling
For high-volume services, sample logs:
import random
def should_log_debug():
# Log 1% of debug messages
return random.random() < 0.01
if should_log_debug() or log_level >= logging.INFO:
logger.log(level, message)
Log Retention
Set appropriate retention periods:
- Development: 7 days
- Staging: 30 days
- Production: 90+ days
- Compliance: As required (years)
Cost Optimization
- Filter at source: Don't send debug logs from production
- Sample high-volume: Sample verbose logs
- Use index patterns: Separate hot/warm/cold storage
- Archive old logs: Move to cheaper storage (S3, GCS)