Skip to content

Go 1.16 + OTLPProto/Collector upgrades#546

Merged
mdisibio merged 13 commits intografana:masterfrom
mdisibio:otel-collector-upgrade
Mar 1, 2021
Merged

Go 1.16 + OTLPProto/Collector upgrades#546
mdisibio merged 13 commits intografana:masterfrom
mdisibio:otel-collector-upgrade

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Feb 22, 2021

What this PR does:
This PR updates several OTEL dependencies and addresses some of the pain points. This is in the same line as another attempt #502 but goes further.

Changes:

  1. Go 1.16
  2. Update opentelemetry-collector from v0.6.1 to v0.16.0.
  3. Update opentelemetry-proto submodule from v0.3.0 to v0.7.0.
  4. Update otel/build-protobuf docker image from 0.1.0 to 0.2.1.
  5. Change the way otel proto is handled: Previously Tempo edited the OpenTelemetryCollector vendored code to mesh with opentelemetry-proto. Now Tempo leaves all the dependencies alone and creates its own derivative types in /pkg/tempopb. Remarshal/unmarshal is used to swap types in shim.go. This is unfortunate for efficiency but a significant win by decoupling things. The reverse swap also occurs in tempo-query.

Additional Changes/Notes:

  1. The goal is that a downstream app (i.e. grafana) could vendor /pkg/tempopb/ to pull in everything needed to communicate with Tempo with no other dependencies, and also importantly: no conflicts with other usages of OTEL proto or collector.
  2. Go 1.16 update includes new go mod tidy -e parameter to continue despite errors. This eliminates another hurdle with dependency updates which was the cmux -> test -> grpc hello world issue as stated in the previous PR. This error still occurs but the use of -e argument allows build process to continue. Note that Tempo still builds and tests successfully because the grpc hello world reference are in cmux tests which Tempo CI does not execute. Note this change also allows cleanup of the vendor-fix/prometheus work-around.
  3. Since there is no longer a link between vendoring and protos, Makefile was simplified to remove vendor-dependencies. Now there are only vendor-check and gen-protos.

Questions / Next Steps:

  1. This PR could be somewhat broken down into smaller PRs if desired.
  2. Testing:
    Local testing was performed to cover most cases, if anything else needed please mention.
    2.1 Unit tests
    2.2. integration tests
    2.3. Jaeger collector (using synthetic-load-generator)
    2.4. Zipkin collector (using curl)
    2.5. OTLP gRPC and HTTP collectors (using my own test app)
    2.6 Grafana/tempo-query lookup
  3. The proto swap in shim.go converts otlp.ExportTraceServiceRequest to tempopb.Trace which are technically wire-compatible. But Tempo does have its own definition of ExportTraceServiceRequest which we could use but needs a few more lines in makefile, etc. It seemed simpler to do it this way.

Which issue(s) this PR fixes:
None but likely introduces a few

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usual smattering of questions and thoughts. Also:

  • does this PR contain native support for OTLP in the distributor? i think we want to go ahead and add that. no idea what the additional load will look like with the bytes marshalling/unmarshalling in the shim.
  • ctrl+shift+f for 1.15. it is referenced in a few places that need to be updated.
  • contributing.md in particular probably needs review b/c it's where instructions related to these changes will be contained.
  • if we do add native support for grpc/http for otlp to this PR we need some documentation that explains why the shims are convenience layers for low volume situations, but if you want a true tracing backend you need a trace pipeline and you should push otlp to tempo only (and not through the shim)

Comment thread modules/distributor/distributor_test.go
Comment thread go.mod
Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread Makefile Outdated
@mdisibio
Copy link
Copy Markdown
Contributor Author

mdisibio commented Mar 1, 2021

  • ctrl+shift+f for 1.15. it is referenced in a few places that need to be updated.
  • contributing.md in particular probably needs review b/c it's where instructions related to these changes will be contained.

Good catch, addressed.

  • does this PR contain native support for OTLP in the distributor? i think we want to go ahead and add that. no idea what the additional load will look like with the bytes marshalling/unmarshalling in the shim.
  • if we do add native support for grpc/http for otlp to this PR we need some documentation that explains why the shims are convenience layers for low volume situations, but if you want a true tracing backend you need a trace pipeline and you should push otlp to tempo only (and not through the shim)

Yes agree, we definitely want to migrate to native support to reclaim the efficiency (and for other reasons), but think it makes to do in a followup PR: this one is already decently large and risk of drift, and benchmarking locally and in internal environment isn't show any performance regression in cpu or mem usage.

@mdisibio mdisibio changed the title WIP - Go/OTLPProto/OTLPCollector upgrades Go 1.16 + OTLPProto/Collector upgrades Mar 1, 2021
@mdisibio mdisibio requested a review from joe-elliott March 1, 2021 18:26
Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best PR since the first one.

i've been waiting months to fix all this. excellent work.

@mdisibio mdisibio merged commit 0448492 into grafana:master Mar 1, 2021
@mdisibio mdisibio deleted the otel-collector-upgrade branch May 27, 2021 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants