Skip to content

miftakhulaziz/traceflow

Repository files navigation

Traceflow - Distributed Tracing Library for Spring Boot

A comprehensive distributed tracing and observability solution for Spring Boot microservices, integrating with Grafana, Elasticsearch, Kibana, Loki, Tempo, and Prometheus.

Overview

Traceflow is a distributed monitoring library designed to provide end-to-end observability for Java Spring Boot microservices. It consists of two main components:

  • traceflow: A Java Spring library that provides tracing capabilities through custom annotations
  • traceflow-libs: Docker Compose configurations and dependencies for the observability stack (Grafana, Elasticsearch, Kibana, Loki, Tempo, Prometheus)

Architecture

The project uses OpenTelemetry for instrumentation and exports telemetry data to multiple backends:

  • Tempo: Distributed tracing backend
  • Loki: Log aggregation system
  • Elasticsearch: Search and analytics engine
  • Prometheus: Metrics collection and monitoring
  • Grafana: Unified visualization dashboard
  • Kibana: Elasticsearch data visualization

Screenshots

Elasticsearch - Log Analysis and Search

Elasticsearch Dashboard View and analyze application logs with powerful search capabilities

Grafana - Metrics Visualization

Grafana Metrics Monitor service metrics and performance with Prometheus integration

Grafana - Distributed Tracing with Tempo

Grafana Tempo Traces Visualize end-to-end request traces across microservices

Prerequisites

  • Java 17 or higher
  • Maven 3.6+
  • Docker & Docker Compose
  • Spring Boot 3.x

Quick Start

1. Start the Observability Stack

Navigate to the traceflow-libs directory and start all services:

cd traceflow-libs
docker-compose up -d

This will start:

2. Download OpenTelemetry Java Agent

Download the OpenTelemetry Java agent and place it in the traceflow-libs directory:

cd traceflow-libs
curl -L -o opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

3. Build the Traceflow Library

Build and install the Traceflow library to your local Maven repository:

cd traceflow
mvn clean install

Integration with Your Microservices

Step 1: Add Dependencies

Add the following dependencies to your service's pom.xml:

<!-- Distributed tracing starter -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
    <version>4.0.0-M2</version>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>

<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>

<dependency>
    <groupId>com.svc.traceflow</groupId>
    <artifactId>traceflow</artifactId>
    <version>1.0.0</version>
</dependency>
<!-- Distributed tracing end -->

Step 2: Enable AspectJ Auto Proxy

Add the @EnableAspectJAutoProxy annotation to your main Spring Boot application class:

@SpringBootApplication
@EnableAspectJAutoProxy
public class YourServiceApplication {
    public static void main(String[] args) {
        SpringApplication.run(YourServiceApplication.class, args);
    }
}

Step 3: Add Tracing Annotations

Use the @TraceflowSpan annotation on your controllers and service methods to enable tracing:

@RestController
@RequestMapping("/api")
public class YourController {
    
    @GetMapping("/endpoint")
    @TraceflowSpan("your-custom-span-name")
    public ResponseEntity<String> yourEndpoint() {
        // Your logic here
        return ResponseEntity.ok("Success");
    }
}
@Service
public class YourService {
    
    @TraceflowSpan("service-operation")
    public void performOperation() {
        // Your business logic
    }
}

Running Services with OpenTelemetry

Using the Makefile

The project includes a Makefile with pre-configured commands to run services with OpenTelemetry instrumentation:

# Run inventory service
make run-inventory

# Run orders service
make run-orders

# Run payment service
make run-payment

# Clean build artifacts
make clean-inventory
make clean-orders
make clean-payment

# Run tests
make test-inventory
make test-orders
make test-payment

Manual Configuration

If you prefer to run services manually, use the following Java options:

JAVA_TOOL_OPTIONS="-javaagent:/path/to/opentelemetry-javaagent.jar \
-Dotel.traces.exporter=otlp \
-Dotel.metrics.exporter=otlp \
-Dotel.logs.exporter=otlp \
-Dotel.exporter.otlp.endpoint=http://localhost:4318 \
-Dotel.exporter.otlp.protocol=http/protobuf \
-Dotel.service.name=your-service-name \
-Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false \
-Dotel.instrumentation.jdbc.capture-parameters=true"

./mvnw spring-boot:run

OpenTelemetry Configuration

The Makefile defines the following OpenTelemetry configuration:

  • Agent Location: $(PWD)/traceflow-libs/opentelemetry-javaagent.jar
  • OTLP Endpoint: http://localhost:4318
  • Protocol: http/protobuf
  • Exporters: OTLP for traces, metrics, and logs
  • JDBC Instrumentation: Enabled with parameter capture

Accessing the Observability Stack

Once your services are running, access the following dashboards:

Service URL Description
Grafana http://localhost:3000 Unified visualization dashboard
Kibana http://localhost:5601 Elasticsearch data visualization
Prometheus http://localhost:9090 Metrics and monitoring
Elasticsearch http://localhost:9200 Search and analytics API

Grafana Data Sources

Grafana is pre-configured with the following data sources:

  • Tempo: Distributed tracing visualization
  • Loki: Log aggregation and querying
  • Elasticsearch: Log and event data
  • Prometheus: Metrics and time-series data

Observability Features

1. Distributed Tracing (Tempo + Grafana)

Track request flows across multiple microservices with detailed span information:

  • Service Dependencies: Visualize how services interact with each other
  • Request Timeline: See the complete lifecycle of a request with timing information
  • Span Details: Drill down into individual operations with metadata and tags
  • Error Tracking: Identify failed requests and error patterns

2. Log Aggregation (Loki + Elasticsearch + Kibana)

Centralized logging with powerful search and filtering:

  • Structured Logs: JSON-formatted logs with rich metadata
  • Full-Text Search: Search across all logs using Elasticsearch
  • Log Correlation: Link logs to traces using trace IDs
  • Time-Series Analysis: Analyze log patterns over time

3. Metrics Monitoring (Prometheus + Grafana)

Real-time metrics and performance monitoring:

  • HTTP Metrics: Request rates, response times, error rates
  • JVM Metrics: Memory usage, garbage collection, thread pools
  • Database Metrics: Connection pool stats, query execution times
  • Custom Metrics: Define your own business metrics

4. JDBC Instrumentation

Automatic database query tracing with:

  • Query Performance: Track database query execution times
  • SQL Statements: Capture SQL queries with parameters
  • Connection Pooling: Monitor database connection usage
  • Slow Query Detection: Identify performance bottlenecks

Project Structure

traceflow/
├── traceflow/                  # Core library
│   ├── src/
│   │   └── main/
│   │       └── java/
│   │           └── com/svc/traceflow/
│   └── pom.xml
│
├── traceflow-libs/             # Observability stack
│   ├── docker-compose.yml
│   ├── grafana/
│   ├── loki/
│   ├── tempo/
│   ├── prometheus/
│   └── opentelemetry-javaagent.jar
│
├── inventory-api/              # Example service
├── orders-api/                 # Example service
├── payment-api/                # Example service
├── Makefile
└── README.md

Features

  • Distributed Tracing: Track requests across multiple microservices
  • Custom Span Annotation: Easy-to-use @TraceflowSpan annotation
  • Multi-Backend Support: Integration with Tempo, Loki, Elasticsearch, and Prometheus
  • Unified Visualization: Grafana dashboards for comprehensive observability
  • JDBC Instrumentation: Automatic database query tracing
  • Log Aggregation: Centralized logging with Loki and Elasticsearch
  • Metrics Collection: Prometheus integration for performance monitoring
  • Correlation: Link traces, logs, and metrics together
  • Production-Ready: Battle-tested observability stack

Best Practices

  1. Meaningful Span Names: Use descriptive span names that indicate the operation being performed

    @TraceflowSpan("inventory-check-availability")
    public boolean checkAvailability(String productId) { ... }
  2. Strategic Annotation Placement: Add @TraceflowSpan to:

    • REST API endpoints (controllers)
    • Service layer methods
    • Database operations
    • External API calls
    • Critical business logic
  3. Service Naming: Use consistent and descriptive service names across your microservices

    • Format: {domain}-api (e.g., orders-api, inventory-api)
    • Keep names lowercase and hyphen-separated
  4. Error Handling: Ensure proper exception handling to capture error traces

    @TraceflowSpan("process-payment")
    public PaymentResult processPayment(PaymentRequest request) {
        try {
            // Payment logic
        } catch (PaymentException e) {
            // Exception will be captured in trace
            throw e;
        }
    }
  5. Performance: Monitor trace overhead and adjust sampling rates if needed

    • For high-traffic services, consider implementing sampling strategies
    • Monitor the impact on application performance

Troubleshooting

Services not appearing in Grafana

  • Verify that the OpenTelemetry agent is correctly loaded
    # Check if agent is loaded in logs
    grep "opentelemetry" logs/application.log
  • Check that the OTLP endpoint is accessible
    curl http://localhost:4318/v1/traces
  • Ensure the service name is correctly configured
  • Verify network connectivity between services and OTLP collector

Missing traces

  • Confirm that @EnableAspectJAutoProxy is present in your main application class
  • Verify that @TraceflowSpan annotations are on public methods
  • Check that AOP dependencies are correctly configured
  • Review application logs for AspectJ warnings

Docker containers failing to start

  • Ensure required ports are not already in use
    netstat -tuln | grep -E '3000|4318|9090|9200|5601'
  • Check Docker logs:
    docker-compose logs -f
  • Verify system resources (memory, disk space)
    docker stats
  • Check Docker Compose version compatibility

High memory usage

  • Adjust JVM heap size for your services
  • Configure trace sampling rates
  • Review retention policies for logs and traces
  • Monitor Elasticsearch disk usage

Performance Considerations

  • Trace Sampling: Implement sampling for high-traffic services
  • Log Levels: Use appropriate log levels (INFO for production)
  • Retention Policies: Configure data retention based on requirements
  • Resource Allocation: Ensure adequate CPU and memory for observability stack

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

[Specify your license here]

Support

For issues and questions:

  • Open an issue on GitHub
  • Check the existing documentation
  • Review example implementations in the sample services

Roadmap

  • Add support for distributed caching traces
  • Implement alert rules and notifications
  • Add pre-configured Grafana dashboards
  • Support for additional message brokers (Kafka, RabbitMQ)
  • Enhanced security with authentication and authorization
  • Multi-environment support (dev, staging, production)

Happy Tracing! 🚀

Built with ❤️ using OpenTelemetry, Spring Boot, and the Grafana Stack

About

Spring boot distributed tracing and monitoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published