-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hostname waypoints #1224
Support hostname waypoints #1224
Conversation
Skipping CI for Draft Pull Request. |
Based on user feedback. Before/after dns outage ``` 2024-07-29T17:10:59.111431Z warn xds::client:xds{id=14} XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution, retrying in 15s 2024-07-29T17:22:14.958433Z warn xds::client:xds{id=3} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution (hint: is the DNS server reachable?), retrying in 80ms ``` wrong dns name ``` 2024-07-29T17:22:47.910253Z warn xds::client:xds{id=10} XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s 2024-07-29T17:22:47.910253Z warn xds::client:xds{id=10} XDS client connection error: gRPC connection error connecting to https://istiodx.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s ``` Bad auth (ztunnel) ``` 2024-07-29T17:25:29.137815Z warn xds::client:xds{id=11} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure", retrying in 15s 2024-07-29T17:35:00.273104Z warn xds::client:xds{id=9} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure" (hint: check the control plane logs for more information), retrying in 5.12s ```
19d5dc5
to
f851be1
Compare
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems fine. Just a couple of nits from your friendly neighborhood unwrap nag.
What is the intended timeline for rolling this out? 1.23 shipping in pretty much imminent so guessing 1.24 ships with this in ztunnel in preparation for 1.25 in istiod?
What is the plan for knowing this works as intended if actual usability is that far out? It seems like timing wise this could complicate maintaining ztunnel in the short term and might make more sense to merge closer to shipping 1.24 so that any fixes for ztunnel that come up as folks get their hands on 1.23 in the wild won't need to contend with these changes as we try to backport them.
I think the key to this is just testing... which was lacking in the initial PR -- I have built it out a lot more now. WDYT? FWIW I have also been manually testing with Istiod PR, just hard to turn that into automation |
47ff2a3
to
42c73bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I do think we should reconsider carrying the diff if we don't get the cherrypick accepted and we won't have the istiod change in 1.24
In response to a cherrypick label: new pull request created: #1246 |
@ilrudie are you saying we should hold off on this change in 1.23 since related istiod changes aren't certain in 1.24? |
@thedebugger, The opposite, the istiod changes are ready, or nearly so, I think we should cherry-pick this to 1.23 so that we can merge the istiod changes soon and begin full testing. |
Regression from istio#1224 Before that PR (and now the behavior returns), we would use the IP family of the IP stuck in the workload.waypoint or service.waypoint field. This was always IPv4 due to istiod's implementation. With the PR, we started to use the original IP family of the request. Since waypoints had IPv6 addresses, we would use them. Due to another bug in Istiod (fixed in istio/istio#52555), the waypoints didn't actually listen on IPv6, though, so would drop the requests. With this PR, we will continue to send to the IP actually specified. In the near future we will move to hostname references, where we can correctly pick the best IP family based on the request.
Regression from #1224 Before that PR (and now the behavior returns), we would use the IP family of the IP stuck in the workload.waypoint or service.waypoint field. This was always IPv4 due to istiod's implementation. With the PR, we started to use the original IP family of the request. Since waypoints had IPv6 addresses, we would use them. Due to another bug in Istiod (fixed in istio/istio#52555), the waypoints didn't actually listen on IPv6, though, so would drop the requests. With this PR, we will continue to send to the IP actually specified. In the near future we will move to hostname references, where we can correctly pick the best IP family based on the request.
Regression from istio#1224 Before that PR (and now the behavior returns), we would use the IP family of the IP stuck in the workload.waypoint or service.waypoint field. This was always IPv4 due to istiod's implementation. With the PR, we started to use the original IP family of the request. Since waypoints had IPv6 addresses, we would use them. Due to another bug in Istiod (fixed in istio/istio#52555), the waypoints didn't actually listen on IPv6, though, so would drop the requests. With this PR, we will continue to send to the IP actually specified. In the near future we will move to hostname references, where we can correctly pick the best IP family based on the request.
Regression from #1224 Before that PR (and now the behavior returns), we would use the IP family of the IP stuck in the workload.waypoint or service.waypoint field. This was always IPv4 due to istiod's implementation. With the PR, we started to use the original IP family of the request. Since waypoints had IPv6 addresses, we would use them. Due to another bug in Istiod (fixed in istio/istio#52555), the waypoints didn't actually listen on IPv6, though, so would drop the requests. With this PR, we will continue to send to the IP actually specified. In the near future we will move to hostname references, where we can correctly pick the best IP family based on the request. Co-authored-by: John Howard <[email protected]>
* Improve XDS error diagnostics Based on user feedback. Before/after dns outage ``` 2024-07-29T17:10:59.111431Z warn xds::client:xds{id=14} XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution, retrying in 15s 2024-07-29T17:22:14.958433Z warn xds::client:xds{id=3} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Temporary failure in name resolution (hint: is the DNS server reachable?), retrying in 80ms ``` wrong dns name ``` 2024-07-29T17:22:47.910253Z warn xds::client:xds{id=10} XDS client connection error: gRPC connection error:status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s 2024-07-29T17:22:47.910253Z warn xds::client:xds{id=10} XDS client connection error: gRPC connection error connecting to https://istiodx.istio-system.svc:15012: status: Unknown, message: "client error (Connect)", source: dns error: failed to lookup address information: Name or service not known, retrying in 10.24s ``` Bad auth (ztunnel) ``` 2024-07-29T17:25:29.137815Z warn xds::client:xds{id=11} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure", retrying in 15s 2024-07-29T17:35:00.273104Z warn xds::client:xds{id=9} XDS client connection error: gRPC connection error connecting to https://istiod.istio-system.svc:15012: status: Unauthenticated, message: "authentication failure" (hint: check the control plane logs for more information), retrying in 5.12s ``` * Support hostname waypoints * add test and drop unwrap
Regression from istio#1224 Before that PR (and now the behavior returns), we would use the IP family of the IP stuck in the workload.waypoint or service.waypoint field. This was always IPv4 due to istiod's implementation. With the PR, we started to use the original IP family of the request. Since waypoints had IPv6 addresses, we would use them. Due to another bug in Istiod (fixed in istio/istio#52555), the waypoints didn't actually listen on IPv6, though, so would drop the requests. With this PR, we will continue to send to the IP actually specified. In the near future we will move to hostname references, where we can correctly pick the best IP family based on the request.
This PR adds support for sending traffic to waypoints identified by hostnames. Currently, this code is not exercised by the control plane, so it should never be hit. However, we need this to be present 1 release before we turn it on for istiod for compatibility purposes. This gives us that capability.
This is preferred long term as the implementation is simpler in both the control plane and ztunnel, and hostname is already the unique key for a service, so its a better lookup key. Additionally, it makes dual stack usage more clear (i.e. we don't have to pick an arbitrary IP to put, we can just reference the hostname).
Related: istio/istio#52318
Istiod change: istio/istio#52294