A comprehensive distributed tracing and observability solution for Spring Boot microservices, integrating with Grafana, Elasticsearch, Kibana, Loki, Tempo, and Prometheus.
Traceflow is a distributed monitoring library designed to provide end-to-end observability for Java Spring Boot microservices. It consists of two main components:
- traceflow: A Java Spring library that provides tracing capabilities through custom annotations
- traceflow-libs: Docker Compose configurations and dependencies for the observability stack (Grafana, Elasticsearch, Kibana, Loki, Tempo, Prometheus)
The project uses OpenTelemetry for instrumentation and exports telemetry data to multiple backends:
- Tempo: Distributed tracing backend
- Loki: Log aggregation system
- Elasticsearch: Search and analytics engine
- Prometheus: Metrics collection and monitoring
- Grafana: Unified visualization dashboard
- Kibana: Elasticsearch data visualization
View and analyze application logs with powerful search capabilities
Monitor service metrics and performance with Prometheus integration
Visualize end-to-end request traces across microservices
- Java 17 or higher
- Maven 3.6+
- Docker & Docker Compose
- Spring Boot 3.x
Navigate to the traceflow-libs directory and start all services:
cd traceflow-libs
docker-compose up -dThis will start:
- Grafana (default: http://localhost:3000)
- Elasticsearch (default: http://localhost:9200)
- Kibana (default: http://localhost:5601)
- Loki (default: http://localhost:3100)
- Tempo (default: http://localhost:3200)
- Prometheus (default: http://localhost:9090)
Download the OpenTelemetry Java agent and place it in the traceflow-libs directory:
cd traceflow-libs
curl -L -o opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jarBuild and install the Traceflow library to your local Maven repository:
cd traceflow
mvn clean installAdd the following dependencies to your service's pom.xml:
<!-- Distributed tracing starter -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
<version>4.0.0-M2</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<dependency>
<groupId>com.svc.traceflow</groupId>
<artifactId>traceflow</artifactId>
<version>1.0.0</version>
</dependency>
<!-- Distributed tracing end -->Add the @EnableAspectJAutoProxy annotation to your main Spring Boot application class:
@SpringBootApplication
@EnableAspectJAutoProxy
public class YourServiceApplication {
public static void main(String[] args) {
SpringApplication.run(YourServiceApplication.class, args);
}
}Use the @TraceflowSpan annotation on your controllers and service methods to enable tracing:
@RestController
@RequestMapping("/api")
public class YourController {
@GetMapping("/endpoint")
@TraceflowSpan("your-custom-span-name")
public ResponseEntity<String> yourEndpoint() {
// Your logic here
return ResponseEntity.ok("Success");
}
}@Service
public class YourService {
@TraceflowSpan("service-operation")
public void performOperation() {
// Your business logic
}
}The project includes a Makefile with pre-configured commands to run services with OpenTelemetry instrumentation:
# Run inventory service
make run-inventory
# Run orders service
make run-orders
# Run payment service
make run-payment
# Clean build artifacts
make clean-inventory
make clean-orders
make clean-payment
# Run tests
make test-inventory
make test-orders
make test-paymentIf you prefer to run services manually, use the following Java options:
JAVA_TOOL_OPTIONS="-javaagent:/path/to/opentelemetry-javaagent.jar \
-Dotel.traces.exporter=otlp \
-Dotel.metrics.exporter=otlp \
-Dotel.logs.exporter=otlp \
-Dotel.exporter.otlp.endpoint=http://localhost:4318 \
-Dotel.exporter.otlp.protocol=http/protobuf \
-Dotel.service.name=your-service-name \
-Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false \
-Dotel.instrumentation.jdbc.capture-parameters=true"
./mvnw spring-boot:runThe Makefile defines the following OpenTelemetry configuration:
- Agent Location:
$(PWD)/traceflow-libs/opentelemetry-javaagent.jar - OTLP Endpoint:
http://localhost:4318 - Protocol:
http/protobuf - Exporters: OTLP for traces, metrics, and logs
- JDBC Instrumentation: Enabled with parameter capture
Once your services are running, access the following dashboards:
| Service | URL | Description |
|---|---|---|
| Grafana | http://localhost:3000 | Unified visualization dashboard |
| Kibana | http://localhost:5601 | Elasticsearch data visualization |
| Prometheus | http://localhost:9090 | Metrics and monitoring |
| Elasticsearch | http://localhost:9200 | Search and analytics API |
Grafana is pre-configured with the following data sources:
- Tempo: Distributed tracing visualization
- Loki: Log aggregation and querying
- Elasticsearch: Log and event data
- Prometheus: Metrics and time-series data
Track request flows across multiple microservices with detailed span information:
- Service Dependencies: Visualize how services interact with each other
- Request Timeline: See the complete lifecycle of a request with timing information
- Span Details: Drill down into individual operations with metadata and tags
- Error Tracking: Identify failed requests and error patterns
Centralized logging with powerful search and filtering:
- Structured Logs: JSON-formatted logs with rich metadata
- Full-Text Search: Search across all logs using Elasticsearch
- Log Correlation: Link logs to traces using trace IDs
- Time-Series Analysis: Analyze log patterns over time
Real-time metrics and performance monitoring:
- HTTP Metrics: Request rates, response times, error rates
- JVM Metrics: Memory usage, garbage collection, thread pools
- Database Metrics: Connection pool stats, query execution times
- Custom Metrics: Define your own business metrics
Automatic database query tracing with:
- Query Performance: Track database query execution times
- SQL Statements: Capture SQL queries with parameters
- Connection Pooling: Monitor database connection usage
- Slow Query Detection: Identify performance bottlenecks
traceflow/
├── traceflow/ # Core library
│ ├── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/svc/traceflow/
│ └── pom.xml
│
├── traceflow-libs/ # Observability stack
│ ├── docker-compose.yml
│ ├── grafana/
│ ├── loki/
│ ├── tempo/
│ ├── prometheus/
│ └── opentelemetry-javaagent.jar
│
├── inventory-api/ # Example service
├── orders-api/ # Example service
├── payment-api/ # Example service
├── Makefile
└── README.md
- ✅ Distributed Tracing: Track requests across multiple microservices
- ✅ Custom Span Annotation: Easy-to-use
@TraceflowSpanannotation - ✅ Multi-Backend Support: Integration with Tempo, Loki, Elasticsearch, and Prometheus
- ✅ Unified Visualization: Grafana dashboards for comprehensive observability
- ✅ JDBC Instrumentation: Automatic database query tracing
- ✅ Log Aggregation: Centralized logging with Loki and Elasticsearch
- ✅ Metrics Collection: Prometheus integration for performance monitoring
- ✅ Correlation: Link traces, logs, and metrics together
- ✅ Production-Ready: Battle-tested observability stack
-
Meaningful Span Names: Use descriptive span names that indicate the operation being performed
@TraceflowSpan("inventory-check-availability") public boolean checkAvailability(String productId) { ... }
-
Strategic Annotation Placement: Add
@TraceflowSpanto:- REST API endpoints (controllers)
- Service layer methods
- Database operations
- External API calls
- Critical business logic
-
Service Naming: Use consistent and descriptive service names across your microservices
- Format:
{domain}-api(e.g.,orders-api,inventory-api) - Keep names lowercase and hyphen-separated
- Format:
-
Error Handling: Ensure proper exception handling to capture error traces
@TraceflowSpan("process-payment") public PaymentResult processPayment(PaymentRequest request) { try { // Payment logic } catch (PaymentException e) { // Exception will be captured in trace throw e; } }
-
Performance: Monitor trace overhead and adjust sampling rates if needed
- For high-traffic services, consider implementing sampling strategies
- Monitor the impact on application performance
- Verify that the OpenTelemetry agent is correctly loaded
# Check if agent is loaded in logs grep "opentelemetry" logs/application.log
- Check that the OTLP endpoint is accessible
curl http://localhost:4318/v1/traces
- Ensure the service name is correctly configured
- Verify network connectivity between services and OTLP collector
- Confirm that
@EnableAspectJAutoProxyis present in your main application class - Verify that
@TraceflowSpanannotations are on public methods - Check that AOP dependencies are correctly configured
- Review application logs for AspectJ warnings
- Ensure required ports are not already in use
netstat -tuln | grep -E '3000|4318|9090|9200|5601'
- Check Docker logs:
docker-compose logs -f
- Verify system resources (memory, disk space)
docker stats
- Check Docker Compose version compatibility
- Adjust JVM heap size for your services
- Configure trace sampling rates
- Review retention policies for logs and traces
- Monitor Elasticsearch disk usage
- Trace Sampling: Implement sampling for high-traffic services
- Log Levels: Use appropriate log levels (INFO for production)
- Retention Policies: Configure data retention based on requirements
- Resource Allocation: Ensure adequate CPU and memory for observability stack
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
[Specify your license here]
For issues and questions:
- Open an issue on GitHub
- Check the existing documentation
- Review example implementations in the sample services
- Add support for distributed caching traces
- Implement alert rules and notifications
- Add pre-configured Grafana dashboards
- Support for additional message brokers (Kafka, RabbitMQ)
- Enhanced security with authentication and authorization
- Multi-environment support (dev, staging, production)
Happy Tracing! 🚀
Built with ❤️ using OpenTelemetry, Spring Boot, and the Grafana Stack