Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download URLs for opentelemetry artifacts #1993

Open
svrnm opened this issue Mar 5, 2024 · 15 comments
Open

Download URLs for opentelemetry artifacts #1993

svrnm opened this issue Mar 5, 2024 · 15 comments
Labels
area/project-infra Non-GitHub project infra (DockerHub, etc.)

Comments

@svrnm
Copy link
Member

svrnm commented Mar 5, 2024

While many language SDKs are installed via their respective package managers, we have a set of projects that produce artifacts that are downloaded by end-users via GitHub. Some of them are

  • OpenTelemetry Collector (core + contrib)
  • OpenTelemetry Collector Builder (ocb)
  • OpenTelemetry Java Agent
  • OpenTelemetry .NET Autoinstrumentation

Right now those artifacts are served via GitHub and end users need to pull them from URLs like

https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.95.0/ocb_0.95.0_linux_amd64

Those URLs have 2 issues:

  • In docs (and probably some other places) this leads to code blocks that are hard to read/require unnecessary line breaks.
  • We can not get centralized insights on how often which artifact has been downloaded

As proposed by @austinlparker and discussed in open-telemetry/opentelemetry.io#4079 we would like to give scarf.sh a try, which can turn the URL above into something like

https://get.opentelemetry.io/ocb_0.95.0_linux_amd64

I raise this community issue, because to do so I would need some support from different SIGs:

  • @open-telemetry/governance-committee & @open-telemetry/technical-committee to take a look if this is a fit for our community (note that scarf is vetted by LF, see The Linux Foundation is Partnering With Scarf for OSS Usage Analytics
  • @open-telemetry/sig-security-maintainers to review scarf and see if there are any security concerns we need to get out of the way (or if there are any blockers)
  • @open-telemetry/collector-maintainers, @open-telemetry/java-instrumentation-maintainers, @open-telemetry/dotnet-instrumentation-maintainers to take a look if they are OK with that for their artifacts

I can and will create issues in SIGs repositories as needed.


Notes:

  • Scarf can also be used for docker images, e.g. fluent is using that already: https://docs.fluentbit.io/manual/installation/docker
  • For the "shorter urls" we can implement something in the docs repository as well, but this would come without analytics and with a lot more maintanance and setup effort.
@jpkrohling
Copy link
Member

I was finally able to get to this. All in all, I'm happy with Scarf, but there's one thing I would recommend before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily. This could be something for the SIG Tooling to work on.

Here are some notes for reference:

  • Scarf will issue redirects for file downloads, like the ocb example given by @svrnm
  • Scarf will act as a reverse proxy for container images
  • Scarf claims to respect Do Not Track headers

I believe our configuration on scarf.sh has changed so that the correct URL to download the latest ocb would be:

https://get.opentelemetry.io/0.105.0/linux/amd64/ocb

And it resulted in the following redirect:

< HTTP/2 302 
< date: Thu, 18 Jul 2024 11:52:47 GMT
< location: https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.105.0/ocb_0.105.0_linux_amd64
< strict-transport-security: max-age=15724800; includeSubDomains

@jpkrohling
Copy link
Member

jpkrohling commented Jul 18, 2024

And a personal request: if we decide to use it for container images as well, can we use "cr" as the subdomain, instead of docker? Docker is one specific technology (and company), while "cr" is "container registry", as used elsewhere as well.

@austinlparker
Copy link
Member

Yeah, we could make it whatever. download.opentelemetry.io? packages.opentelemetry.io?

@jpkrohling
Copy link
Member

jpkrohling commented Jul 18, 2024

I like get.opentelemetry.io for the files, and cr.opentelemetry.io (or containers.opentelemetry.io) for containers, as we might have other packages in the future (npm, for instance).

@svrnm
Copy link
Member Author

svrnm commented Jul 22, 2024

before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily.

I thought about that potential plan B for a little bit, here is a proposal (and I would like @chalin to also take a look): we use the website (specifically netlify) by writing redirects into the netlify.toml, e.g.

[[redirects]]
from = "https://get.opentelemetry.io/:version/:os/:arch/ocb"
to = "https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv:version/ocb_:version_:os_:arch"

This provides a very similar functionality to scarf (minus the analytics) functionality.

@chalin
Copy link
Contributor

chalin commented Jul 23, 2024

A few thoughts:

  • If we're going to do that, why not make this plan B our plan A? I'd rather not have to introduce another (analytics++) tool if we can avoid it.
  • Also, does it need to be a subdomain? Why not use, for example, https://opentelemetry.io/download/:version/:os/:arch/ocb
  • Btw, I'd rather that the redirects be programmed via the _redirects file, rather than the Netlify config file.

If y'all agree, then we could incrementally implement this Netlify-based redirects approach, without a need for a fallback plan B. WDYT?

@svrnm
Copy link
Member Author

svrnm commented Jul 24, 2024

@chalin, good point! I think one reason for having scarf.sh is exactly the analytics part. For me the short URLs are the main reason to have a solution

@chalin
Copy link
Contributor

chalin commented Jul 24, 2024

So you want to switch from GA4 to Scarf.sh for analytics? (If so, maybe we can move that discussion to another thread?) Does anyone have enough experience with the use of Scarf.sh for the purpose of analytics? (I'll ask internally.)

@svrnm
Copy link
Member Author

svrnm commented Jul 25, 2024

No, this is not about switching from ga4 to scarf.sh, but in that particular use case ga4 is not going to track anything, since these download URLs do not result in any HTML being downloaded and JavaScript being executed for that matter.

We probably could use netlify logs or something as an alternative, but if analytics of downloads is important to us, scarf.sh (since it is LF/CNCF "approved") is the easist thing to do.

@svrnm
Copy link
Member Author

svrnm commented Aug 12, 2024

Following up on this, netlify has analytics capabilities via server side logs, which if we go with the redirect option probably provides similar functionality: https://docs.netlify.com/monitor-sites/site-analytics/

@svrnm svrnm changed the title Download URLs for opentelemetry artifacts with scarf.sh Download URLs for opentelemetry artifacts Aug 12, 2024
@jpkrohling
Copy link
Member

Note that Scarf would also proxy the container images. During my review, I saw that they don't do a simple redirect of the container images, but rather, have a proper proxy in place especially to handle the authentication. That's the reason I suggested a Go application serving as proxy. For the cases where scarf issues a redirect, plain redirects at netlify would certainly work.

@svrnm
Copy link
Member Author

svrnm commented Aug 13, 2024

Note that netlify is able to do redirects as well, I used them for the go.opentelemetry.io prototype:

https://docs.netlify.com/routing/redirects/

I was not aware that a proxy is needed for docker images (I assume there is a reason why they do that). This of course raises the question about required capacity. I could imagine this is quickly going into some 100GBs?

@jpkrohling
Copy link
Member

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

https://docs.scarf.sh/gateway/#how-it-works

@svrnm
Copy link
Member Author

svrnm commented Aug 14, 2024

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

This is the one compelling reason for scarf, they figured that part out and probably also make sure that this works with registries across the board, while this would be our own responsibility if we go through hugo+netlify.

@trask trask added triage:deciding This issue needs more discussion or consideration. area/project-infra Non-GitHub project infra (DockerHub, etc.) and removed triage:deciding This issue needs more discussion or consideration. labels Aug 27, 2024
@trask trask reopened this Sep 18, 2024
@mx-psi
Copy link
Member

mx-psi commented Jan 7, 2025

I am in favor of adopting scarf, I would like to suggest that our plan B for now would be to revert to the way things currently work (e.g. for the Collector this is just Github/Dockerhub). I personally don't think we need to figure anything else right now. I also think it's important that we ensure our users can still opt out of scarf if they want to use pre-built binaries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/project-infra Non-GitHub project infra (DockerHub, etc.)
Development

No branches or pull requests

6 participants