Design Principles
The Kepler architecture is built on a set of core principles that drive all design decisions. Understanding these principles is essential for contributing to the project and maintaining architectural consistency.
1. Fair Power Allocation
"You can't rely on Prometheus scrapes for fairness. Processes that consumed power but aren't running anymore should be reported."
Problem Statement
Traditional monitoring systems only report currently running workloads, leading to unfair power attribution when processes terminate between scrapes. This creates several issues:
- Attribution Gaps: Power consumed by terminated processes is unaccounted for
- Unfair Billing: Running workloads get charged for terminated process energy
- Incomplete Metrics: Power consumption data has gaps and inconsistencies
Solution: Terminated Workload Tracking
Kepler implements a comprehensive terminated workload tracking system:
// Terminated workload trackers in PowerMonitor
terminatedProcessesTracker *TerminatedResourceTracker[*Process]
terminatedContainersTracker *TerminatedResourceTracker[*Container]
terminatedVMsTracker *TerminatedResourceTracker[*VirtualMachine]
terminatedPodsTracker *TerminatedResourceTracker[*Pod]
Key Features:
- Priority-Based Retention: Keep highest energy consumers to ensure fair attribution
- Export-Triggered Cleanup: Clear terminated workloads only after they've been exported to prevent data loss
- Configurable Capacity: Control memory usage while maintaining fairness
Configuration:
monitor:
maxTerminated: 100 # Track top 100 terminated workloads by energy
minTerminatedEnergyThreshold: 10 # Only track workloads with >10J consumption
2. Data Consistency, Completeness & Mathematical Integrity
"Data exported should be consistent, complete and not half complete"
Mathematical Relationships Enforced
Kepler enforces strict mathematical consistency across all power attribution levels:
Node Watts = Node Active Watts + Node Idle Watts
Node Active Watts = Σ(Process Watts)
Container Watts = Σ(Process Watts in Container)
VM Watts = Σ(Process Watts associated with VM)
Pod Watts = Σ(Container Watts in Pod)
Implementation Guarantees
- Atomic Snapshots: All related data captured at the same instant
- Energy Conservation Validation: Attribution totals must equal measured energy
- Hierarchical Aggregation: Bottom-up rollup ensures mathematical consistency
- Immutable Data Structures: Prevent partial updates during export
Testing Strategy
// Energy conservation tests validate:
assert.Equal(t, nodeActiveEnergy, sumOfProcessEnergies)
assert.Equal(t, containerEnergy, sumOfContainerProcessEnergies)
assert.Equal(t, podEnergy, sumOfPodContainerEnergies)
3. Computation-Presentation Separation (M/V Pattern)
"Keep Computation away from Presentation → Allow multiple ways to present the same data"
Architecture Pattern
Monitor (Model) → PowerDataProvider Interface → Multiple Exporters (Views)
Current Exporters
- ✅ Prometheus: Production metrics endpoint with full Prometheus ecosystem integration
- ✅ Stdout: Development and debugging with human-readable output
- 🔄 OpenTelemetry: Future extensibility for emerging observability standards
Benefits
- Extensibility: Add new export formats without changing core logic
- Flexibility: Different presentation needs (dashboards, alerts, debugging)
- Future-Proofing: Ready for emerging standards and requirements
- Testing: Easy to validate computation logic independently from presentation
M/V Pattern Implementation
type PowerDataProvider interface {
Snapshot() (*Snapshot, error) // Core data access
DataChannel() <-chan struct{} // Change notifications
ZoneNames() []string // Metadata access
}
4. Data Freshness Guarantee
"Data presented should be as fresh as possible within the stale duration"
Freshness Controls
- Configurable Staleness Threshold: Default 10s, prevents serving outdated data
- Automatic Refresh: Fresh data computed on-demand when stale
- Singleflight Protection: Prevents redundant calculations during concurrent requests
- Data Channel Notifications: Signal exporters when new data is available
Freshness Implementation
func (pm *PowerMonitor) isFresh() bool {
age := pm.clock.Now().Sub(snapshot.Timestamp)
return age <= pm.maxStaleness
}
func (pm *PowerMonitor) Snapshot() (*Snapshot, error) {
if !pm.isFresh() {
pm.synchronizedPowerRefresh() // Automatic refresh
}
return pm.snapshot.Load(), nil
}
Configuration
monitor:
staleness: 10s # Data older than 10s triggers automatic refresh
5. Deterministic Processing
"Parallel vs Serial processing should not produce different results"
Thread Safety Guarantees
- Atomic Snapshot Updates: Single writer ensures consistency
- Immutable Snapshots: No partial modifications possible
- Deterministic Attribution: Same CPU time ratios → same power attribution
- Race-Free Calculations: All concurrent access is read-only
Implementation Patterns
// Single writer pattern
func (pm *PowerMonitor) refreshSnapshot() error {
// Only one goroutine can update snapshots
newSnapshot := NewSnapshot()
// ... perform calculations ...
pm.snapshot.Store(newSnapshot) // Atomic update
}
// Multiple readers pattern
func (pm *PowerMonitor) Snapshot() (*Snapshot, error) {
return pm.snapshot.Load(), nil // Lock-free read
}
Testing Requirements
All concurrent code must be validated with race detection:
go test -race ./...
6. Prefer Package Reuse Over Reinvention
"Prefer reuse (of packages) over reinventing them (like in old kepler)"
Well-Maintained Dependencies
Area | Package | Benefit |
---|---|---|
Service Management | oklog/run |
Coordinated lifecycle management |
Metrics | prometheus/client_golang |
Standard Prometheus exposition |
Concurrency | golang.org/x/sync/singleflight |
Proven deduplication patterns |
Kubernetes | k8s.io/client-go |
Native Kubernetes API integration |
Configuration | alecthomas/kingpin/v2 |
Battle-tested CLI parsing |
Logging | log/slog |
Standard structured logging |
Package Reuse Benefits
- Reduced Maintenance Burden: Focus on core logic, not infrastructure
- Battle-Tested Implementations: Proven reliability and performance
- Community Support: Active maintenance and security updates
- Standards Compliance: Follow established patterns and conventions
Evaluation Criteria
When choosing external packages:
- Active Maintenance: Regular updates and responsive maintainers
- Production Usage: Proven track record in production systems
- API Stability: Stable interfaces that won't break frequently
- Performance: Suitable for Kepler's performance requirements
- License Compatibility: Compatible with Apache 2.0 license
7. Configurable Collection & Exposure
"Allow Users to configure what to collect and expose only those"
Metrics Level Configuration
Users can configure exactly what metrics to collect and expose:
exporter:
prometheus:
metricsLevel: "container" # node|process|container|vm|pod|all
Current Support Matrix
Level | Status | Metrics | Benefits |
---|---|---|---|
Node | ✅ | kepler_node_cpu_watts{zone="package"} |
System-level monitoring |
Process | 🔄 | kepler_process_cpu_watts{pid="1234"} |
Fine-grained analysis |
Container | 🔄 | kepler_container_cpu_watts{id="abc123"} |
Container-level billing |
VM | 🔄 | kepler_vm_cpu_watts{id="vm-123"} |
Virtualization monitoring |
Pod | ✅ | kepler_pod_cpu_watts{pod="webapp"} |
Kubernetes workload monitoring |
Configuration Benefits
- Reduced Cardinality: Lower storage and query costs for large clusters
- Lower Resource Usage: Collect only necessary data
- Focused Monitoring: Target specific workload types
- Scalability: Scale metrics collection based on cluster size
Zone Filtering
rapl:
zones: ["package", "dram"] # Only collect package and DRAM zones
8. Implementation Abstraction
"Allow implementation details to change without affecting the rest of the code. E.g: eBPF is only a means to an end"
Interface-Based Design
Core abstractions enable implementation flexibility:
// Hardware abstraction - can be RAPL, eBPF, or other implementations
type CPUPowerMeter interface {
Zones() ([]EnergyZone, error)
PrimaryEnergyZone() (EnergyZone, error)
}
// Resource tracking abstraction - can be procfs, eBPF, or other methods
type Informer interface {
Refresh() error
Processes() *Processes
Containers() *Containers
}
Current and Future Implementations
Layer | Current | Future | Abstraction Benefit |
---|---|---|---|
Hardware | RAPL sysfs | eBPF perf counters | Pluggable energy sources |
Resource Tracking | procfs reader | eBPF tracer | Different collection methods |
Export | Prometheus, stdout | OpenTelemetry | Multiple presentation formats |
Design Benefits
- Technology Evolution: Adopt new technologies without architectural changes
- Platform Support: Support different platforms through different implementations
- Testing: Easy to create fake implementations for testing
- Performance Optimization: Swap implementations based on performance needs
9. Simple Configuration & Low Learning Curve
"Simple Configuration to reduce learning curve - keep flags and configuration in sync (as much as possible)"
Hierarchical Configuration
Configuration follows a clear precedence order:
- CLI Flags (highest) - for operational overrides
- YAML Files (middle) - for persistent configuration
- Defaults (lowest) - for sensible out-of-box experience
Synchronization Strategy
- Operational Flags: Every YAML config option has corresponding CLI flag where operationally relevant
- Development Options:
dev.*
settings intentionally not exposed as CLI flags (internal only) - Consistent Naming:
--exporter.prometheus.enabled
↔exporter.prometheus.enabled
Examples
# CLI override of YAML config
kepler --config=production.yaml --log.level=debug --monitor.interval=5s
# Development mode (not exposed as CLI flag - internal only)
# Must be set in YAML: dev.fake_cpu_meter.enabled: true
Design Philosophy
- Operational vs Development: Clear separation between production and development settings
- Discoverability: All options documented with clear descriptions
- Validation: Configuration validation with helpful error messages
- Defaults: Sensible defaults that work out-of-the-box
How Principles Shape Architecture
These principles directly influenced major architectural decisions:
Principle | Architectural Impact | Implementation |
---|---|---|
Fair Allocation | Terminated workload tracking system | TerminatedResourceTracker with priority queues |
Mathematical Integrity | Atomic snapshots + energy conservation | Immutable Snapshot with validation |
M/V Separation | PowerDataProvider interface + pluggable exporters | Interface-based exporter system |
Data Freshness | Staleness control + singleflight pattern | isFresh() checks + automatic refresh |
Deterministic Processing | Thread-safe design + immutable snapshots | Single writer, multiple readers |
Package Reuse | Minimal custom implementations + proven libraries | Strategic dependency selection |
Configurable Exposure | Metrics level system + zone filtering | config.Level + filtering logic |
Implementation Abstraction | Interface-based layers + dependency injection | Clean interface boundaries |
Simple Configuration | Hierarchical config + CLI/YAML sync | kingpin + YAML parsing |
Next Steps
After understanding these principles, explore:
- System Components: How principles are implemented in practice
- Data Flow: How principles ensure fair and accurate power attribution
- Concurrency: How deterministic processing is achieved