Configuration System
This document explains Kepler's hierarchical configuration system, which provides flexible, user-friendly configuration management while maintaining operational simplicity.
Design Principle: Simple Configuration
"Simple Configuration to reduce learning curve - keep flags and configuration in sync (as much as possible)"
The configuration system balances flexibility with simplicity, providing sensible defaults while allowing precise control when needed.
Configuration Hierarchy
Configuration follows a clear precedence order, with higher levels overriding lower levels:
1. CLI Flags (highest precedence) → Operational overrides
2. YAML Files (middle precedence) → Persistent configuration
3. Default Values (lowest precedence) → Sensible out-of-box behavior
Example Configuration Flow
# Start with defaults
kepler
# ↓ Override with YAML file
kepler --config=production.yaml
# ↓ Override specific values with CLI flags
kepler --config=production.yaml --log.level=debug --monitor.interval=5s
Configuration Structure
Main Configuration Types
type Config struct {
Log Log `yaml:"log"` // Logging configuration
Host Host `yaml:"host"` // System paths
Monitor Monitor `yaml:"monitor"` // Collection behavior
Rapl Rapl `yaml:"rapl"` // Hardware filtering
Exporter Exporter `yaml:"exporter"` // Export configuration
Web Web `yaml:"web"` // HTTP server
Kube Kube `yaml:"kube"` // Kubernetes integration
Debug Debug `yaml:"debug"` // Debug features
Dev Dev `yaml:"dev"` // Development options (no CLI flags)
}
Logging Configuration
type Log struct {
Level string `yaml:"level"` // debug, info, warn, error
Format string `yaml:"format"` // text, json
}
CLI Flags:
--log.level
: Override log level--log.format
: Override log format
YAML Example:
log:
level: info
format: json
System Paths Configuration
type Host struct {
SysFS string `yaml:"sysfs"` // Hardware sensor path (default: /sys)
ProcFS string `yaml:"procfs"` // Process info path (default: /proc)
}
CLI Flags:
--host.sysfs
: Override sysfs path--host.procfs
: Override procfs path
Use Cases:
- Container Deployment: Mount host paths to different locations
- Testing: Point to test fixtures
- Development: Use different filesystem layouts
Monitoring Configuration
type Monitor struct {
// Collection timing
Interval time.Duration `yaml:"interval"` // How often to collect (default: 3s)
Staleness time.Duration `yaml:"staleness"` // Data freshness threshold (default: 10s)
// Terminated workload tracking
MaxTerminated int `yaml:"maxTerminated"` // Capacity limit (default: 100)
MinTerminatedEnergyThreshold int64 `yaml:"minTerminatedEnergyThreshold"` // Joules (default: 10)
}
CLI Flags:
--monitor.interval
: Collection frequency--monitor.staleness
: Data freshness threshold--monitor.max-terminated
: Terminated workload limit--monitor.min-terminated-energy-threshold
: Energy threshold
YAML Example:
monitor:
interval: 3s
staleness: 10s
maxTerminated: 100
minTerminatedEnergyThreshold: 10
Hardware Configuration
type Rapl struct {
Zones []string `yaml:"zones"` // Filter specific zones (empty = all zones)
}
CLI Flags:
--rapl.zones
: Comma-separated zone list
YAML Example:
rapl:
zones: ["package", "dram"] # Only collect package and DRAM zones
Zone Options:
package
: CPU package energy (recommended)core
: CPU core energydram
: Memory energyuncore
: Uncore/cache energypsys
: Platform system energy (if available)
Export Configuration
type Exporter struct {
Stdout StdoutExporter `yaml:"stdout"`
Prometheus PrometheusExporter `yaml:"prometheus"`
}
type StdoutExporter struct {
Enabled *bool `yaml:"enabled"` // Pointer allows nil = use default
}
type PrometheusExporter struct {
Enabled *bool `yaml:"enabled"`
DebugCollectors []string `yaml:"debugCollectors"`
MetricsLevel Level `yaml:"metricsLevel"`
}
CLI Flags:
--exporter.stdout.enabled
: Enable stdout exporter--exporter.prometheus.enabled
: Enable Prometheus exporter--exporter.prometheus.debug-collectors
: Debug collector list--exporter.prometheus.metrics-level
: Metrics granularity
YAML Example:
exporter:
stdout:
enabled: false
prometheus:
enabled: true
debugCollectors: ["go", "process"]
metricsLevel: "container"
Web Server Configuration
type Web struct {
Config string `yaml:"configFile"` // TLS configuration file
ListenAddresses []string `yaml:"listenAddresses"` // Bind addresses
}
CLI Flags:
--web.config.file
: TLS/auth configuration--web.listen-address
: HTTP listen addresses (can be repeated)
YAML Example:
web:
listenAddresses: ["0.0.0.0:8080", "[::]:8080"]
configFile: "/etc/kepler/web-config.yaml"
Kubernetes Integration
type Kube struct {
Enabled *bool `yaml:"enabled"` // Enable Kubernetes features
Config string `yaml:"config"` // Kubeconfig path (empty = in-cluster)
Node string `yaml:"nodeName"` // Node name for metrics labels
}
CLI Flags:
--kube.enabled
: Enable Kubernetes integration--kube.config
: Kubeconfig file path--kube.node-name
: Node name override
YAML Example:
kube:
enabled: true
config: "" # Use in-cluster config
nodeName: "node-1"
Debug Configuration Structure
type Debug struct {
Pprof PprofDebug `yaml:"pprof"`
}
type PprofDebug struct {
Enabled *bool `yaml:"enabled"`
}
CLI Flags:
--debug.pprof.enabled
: Enable pprof endpoints
YAML Example:
debug:
pprof:
enabled: true
Development Configuration Structure
type Dev struct {
FakeCpuMeter struct {
Enabled *bool `yaml:"enabled"` // Use fake CPU meter
Zones []string `yaml:"zones"` // Fake zone list
} `yaml:"fake-cpu-meter"`
}
Important: Development options are NOT exposed as CLI flags - they must be set in YAML files only. This prevents accidental use in production.
YAML Example:
dev:
fake-cpu-meter:
enabled: true
zones: ["package", "core", "dram"]
Configuration Loading Process
1. Default Configuration
Every configuration option has a sensible default:
func DefaultConfig() *Config {
return &Config{
Log: Log{
Level: "info",
Format: "text",
},
Host: Host{
SysFS: "/sys",
ProcFS: "/proc",
},
Monitor: Monitor{
Interval: 3 * time.Second,
Staleness: 10 * time.Second,
MaxTerminated: 100,
MinTerminatedEnergyThreshold: 10,
},
Exporter: Exporter{
Stdout: StdoutExporter{
Enabled: ptr.To(false),
},
Prometheus: PrometheusExporter{
Enabled: ptr.To(true),
DebugCollectors: []string{"go"},
MetricsLevel: MetricsLevelAll,
},
},
Web: Web{
ListenAddresses: []string{"0.0.0.0:8080", "[::]:8080"},
},
Kube: Kube{
Enabled: ptr.To(false),
},
Debug: Debug{
Pprof: PprofDebug{
Enabled: ptr.To(false),
},
},
Dev: Dev{
FakeCpuMeter: struct {
Enabled *bool `yaml:"enabled"`
Zones []string `yaml:"zones"`
}{
Enabled: ptr.To(false),
Zones: []string{"package", "core", "dram"},
},
},
}
}
2. YAML File Loading
YAML files override defaults:
func FromFile(filename string) (*Config, error) {
data, err := os.ReadFile(filename)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err)
}
cfg := DefaultConfig()
if err := yaml.Unmarshal(data, cfg); err != nil {
return nil, fmt.Errorf("failed to parse config file: %w", err)
}
return cfg, nil
}
3. CLI Flag Integration
CLI flags are registered with kingpin and applied last:
func RegisterFlags(app *kingpin.Application) func(*Config) error {
// Register all flags
logLevel := app.Flag("log.level", "Log level (debug, info, warn, error)").String()
logFormat := app.Flag("log.format", "Log format (text, json)").String()
monitorInterval := app.Flag("monitor.interval", "Collection interval").Duration()
monitorStaleness := app.Flag("monitor.staleness", "Data staleness threshold").Duration()
exporterPrometheusEnabled := app.Flag("exporter.prometheus.enabled", "Enable Prometheus exporter").Bool()
exporterStdoutEnabled := app.Flag("exporter.stdout.enabled", "Enable stdout exporter").Bool()
// ... more flags
// Return function that applies flags to config
return func(cfg *Config) error {
if *logLevel != "" {
cfg.Log.Level = *logLevel
}
if *logFormat != "" {
cfg.Log.Format = *logFormat
}
if *monitorInterval != 0 {
cfg.Monitor.Interval = *monitorInterval
}
if *monitorStaleness != 0 {
cfg.Monitor.Staleness = *monitorStaleness
}
if *exporterPrometheusEnabled {
cfg.Exporter.Prometheus.Enabled = ptr.To(true)
}
if *exporterStdoutEnabled {
cfg.Exporter.Stdout.Enabled = ptr.To(true)
}
return nil
}
}
4. Complete Loading Flow
func parseArgsAndConfig() (*Config, error) {
app := kingpin.New("kepler", "Power consumption monitoring exporter")
configFile := app.Flag("config.file", "Path to YAML configuration file").String()
updateConfig := RegisterFlags(app)
kingpin.MustParse(app.Parse(os.Args[1:]))
// Start with defaults
cfg := DefaultConfig()
// Override with YAML file if provided
if *configFile != "" {
loadedCfg, err := FromFile(*configFile)
if err != nil {
return nil, err
}
cfg = loadedCfg
}
// Apply CLI flags (highest precedence)
if err := updateConfig(cfg); err != nil {
return nil, err
}
return cfg, nil
}
Configuration Validation
Type Safety
Configuration uses Go's type system for validation:
type Level int
const (
MetricsLevelNode Level = iota
MetricsLevelProcess
MetricsLevelContainer
MetricsLevelVM
MetricsLevelPod
MetricsLevelAll
)
func (l *Level) UnmarshalYAML(value *yaml.Node) error {
var s string
if err := value.Decode(&s); err != nil {
return err
}
switch s {
case "node":
*l = MetricsLevelNode
case "process":
*l = MetricsLevelProcess
case "container":
*l = MetricsLevelContainer
case "vm":
*l = MetricsLevelVM
case "pod":
*l = MetricsLevelPod
case "all":
*l = MetricsLevelAll
default:
return fmt.Errorf("invalid metrics level: %s", s)
}
return nil
}
Runtime Validation
Configuration is validated after loading:
func (cfg *Config) Validate() error {
var errs []error
// Validate log level
validLevels := []string{"debug", "info", "warn", "error"}
if !contains(validLevels, cfg.Log.Level) {
errs = append(errs, fmt.Errorf("invalid log level: %s", cfg.Log.Level))
}
// Validate paths exist
if _, err := os.Stat(cfg.Host.SysFS); err != nil {
errs = append(errs, fmt.Errorf("sysfs path not accessible: %w", err))
}
// Validate intervals
if cfg.Monitor.Interval <= 0 {
errs = append(errs, fmt.Errorf("monitor interval must be positive"))
}
if cfg.Monitor.Staleness < cfg.Monitor.Interval {
errs = append(errs, fmt.Errorf("staleness must be >= interval"))
}
return errors.Join(errs...)
}
Configuration Examples
Development Configuration Example
# dev-config.yaml
log:
level: debug
format: text
dev:
fake-cpu-meter:
enabled: true
zones: ["package", "core", "dram"]
exporter:
stdout:
enabled: true
prometheus:
enabled: true
debugCollectors: ["go", "process"]
metricsLevel: "all"
monitor:
interval: 1s
staleness: 3s
Usage:
kepler --config=dev-config.yaml
Production Configuration
# production.yaml
log:
level: info
format: json
host:
sysfs: /host/sys
procfs: /host/proc
kube:
enabled: true
nodeName: "${NODE_NAME}"
exporter:
stdout:
enabled: false
prometheus:
enabled: true
metricsLevel: "container"
monitor:
interval: 3s
staleness: 10s
maxTerminated: 50
web:
listenAddresses: ["0.0.0.0:8080"]
configFile: "/etc/kepler/web-config.yaml"
rapl:
zones: ["package", "dram"]
Usage:
kepler --config=production.yaml --log.level=warn
Minimal Configuration
# minimal.yaml - only override what's necessary
kube:
enabled: true
rapl:
zones: ["package"]
Usage:
kepler --config=minimal.yaml
Environment-Specific Patterns
Container Deployment
host:
sysfs: /host/sys # Host sysfs mounted in container
procfs: /host/proc # Host procfs mounted in container
kube:
enabled: true
nodeName: "${NODE_NAME}" # From downward API
web:
listenAddresses: ["0.0.0.0:8080"]
Kubernetes DaemonSet
apiVersion: v1
kind: ConfigMap
metadata:
name: kepler-config
data:
config.yaml: |
host:
sysfs: /host/sys
procfs: /host/proc
kube:
enabled: true
nodeName: "${NODE_NAME}"
exporter:
prometheus:
metricsLevel: "pod"
---
apiVersion: apps/v1
kind: DaemonSet
spec:
template:
spec:
containers:
- name: kepler
command: ["/kepler", "--config=/etc/kepler/config.yaml"]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: config
mountPath: /etc/kepler
volumes:
- name: config
configMap:
name: kepler-config
Configuration Best Practices
1. Use Defaults When Possible
Don't override unless necessary:
# Good - only override what's needed
log:
level: debug
# Avoid - unnecessary overrides
log:
level: debug
format: text # This is already the default
2. Operational vs Development Settings
CLI Flags: Use for operational overrides
# Override log level for debugging
kepler --config=production.yaml --log.level=debug
# Override collection interval for testing
kepler --config=production.yaml --monitor.interval=1s
YAML Files: Use for persistent configuration
# production.yaml - persistent settings
monitor:
interval: 3s
maxTerminated: 50
3. Environment Variable Integration
For containerized deployments, use environment variables in YAML:
kube:
nodeName: "${NODE_NAME}"
web:
listenAddresses: ["${LISTEN_ADDRESS:-0.0.0.0:8080}"]
4. Configuration Validation
Always validate configuration in CI/CD:
# Validate configuration syntax
kepler --config=production.yaml --help > /dev/null
# Test with dry-run mode (if available)
kepler --config=production.yaml --dry-run
Troubleshooting Configuration
Common Issues
- Path Problems: Incorrect sysfs/procfs paths in containers
- Permission Issues: Insufficient privileges for hardware access
- YAML Syntax: Indentation and format errors
- Type Mismatches: Wrong data types in YAML
Debug Configuration Troubleshooting
Enable configuration debugging:
kepler --config=debug.yaml --log.level=debug
The startup log shows the final configuration:
INFO Configuration
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
log:
level: debug
format: text
monitor:
interval: 3s
staleness: 10s
...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Next Steps
After understanding the configuration system:
- Components: Understand how configuration flows through system components
- User Configuration Guide: End-user configuration documentation