-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove plaintext NATS #932
Conversation
This commit enables mTLS between service-discovery-controller and NATS. Only the client auth has to be specified in YAML; the NATS CA gets into service-discovery-controller via the nats-tls BOSH link.
Connections to TLS NATS need to be via hostname, not IP. The NATS TLS cert is for the hostname, not the IP. The lines removed by this PR ensure that components using NATS will get the hostname instead of the IP. However, this is the default behaviour [1]. We already have other components using TLS NATS without this line, so it seems perfectly safe. [1] https://bosh.io/docs/links/#consumers under "ip_addresses"
Until now cf-deployment has deployed a NATS cluster that contained both plaintext and TLS nodes. Internal communication was over mTLS, but clients could choose to connect over plaintext. This commit removes the plaintext nodes. All the software used in cf-deployment has been modified to handle this and use NATS over mTLS. Without the nats job, and without the nats link, everything will use the nats-tls job and link. Any users requiring plaintext NATS could re-enable it by keeping this plaintext NATS job.
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/179038645 The labels on this github issue will be updated when the story is started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just had one question about the change to the route_emitter
configuration.
@davewalter there’s a commit message about that removal
|
This is now only waiting on #931. Once that is merged then we can merge this. |
Thanks everyone that's been working to get this work merged! Just this one PR left :) |
@davewalter Unfortunately yes you are best to do two releases to avoid downtime. Embarrassingly I've forgotten what |
It's a container networking component that reconciles internal routes emitted by the route emitter via the NATS message queue. I am happy to cut a release, merge the last PR, and then cut a second fast-follow release. |
Hi @davewalter, thanks a lot! |
Hi @davewalter, |
I managed to look at the failure in the Windows test:
This is puzzling to me since both the I will keep digging tomorrow. |
The other failures are the clean install with isolation segments and upgrade install with isolated diego cells:
These appear to be more straightforward. The Similarly, the |
I've also fixed the Windows test failure, via 56860d5. The |
CI is finally green so I cut v17.0.0 |
WHAT is this change about?
This commit removes the plaintext NATS nodes. All the software used in
cf-deployment
has been modified to handle this and use NATS over mTLS.Without the
nats
job, and without thenats
link, everything standard incf-deployment
will use thenats-tls
job and link. Any users with custom components that still require plaintext NATS can re-enable it by keeping the plaintext NATS job.What customer problem is being addressed?
Until now cf-deployment has deployed a NATS cluster that contained both plaintext and TLS nodes. Internal communication was over mTLS, but clients could choose to connect over plaintext.
Communicating over plaintext is undesirable, as discussed by many users in #906. Previous work on this issue has stopped components relying on plaintext NATS, but we should carry on and remove it entirely.
Please provide any contextual information.
This PR addresses this issue: #929
We have already merged all prerequisite PRs:
Has a cf-deployment including this change passed cf-acceptance-tests?
Does this PR introduce a breaking change? Please take a moment to read through the examples before answering the question.
YES - removes the existing
nats
job which could impact operators with custom components using NATSAll route registrars will also need a change introduced in cloudfoundry/routing-release#214.
How should this change be described in cf-deployment release notes?
As an operator I now know that all NATS traffic is sent over mTLS.
Does this PR introduce a new BOSH release into the base cf-deployment.yml manifest or any ops-files?
No
Does this PR make a change to an experimental or GA'd feature/component?
GA'd feature/component
Please provide Acceptance Criteria for this change?
Deploy this change, run the acceptance tests, and see how they pass despite the lack of plaintext NATS.
What is the level of urgency for publishing this change?
Slightly Less than Urgent
Tag your pair, your PM, and/or team!
Collaborating with @ameowlia on this