-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for IPV6 datapath support #240
base: devel
Are you sure you want to change the base?
Conversation
🤖 Created branch: z_pr240/yboaron/v6_support |
3cb6a15
to
842e3c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image links don't work from the rich diff view.
* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved. | ||
* run NAT Discovery per IP family in remote Endpoint. | ||
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint. | ||
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change the appropriate IP fields to arrays and bump the API version and implement a webhook converter so each version can be served for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are changing any CRDs, we should also consider the impact during upgrades as there will be brownfield deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - a webhook converter would handle that, ie convert to and from both versions depending on client requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I guess we need to be persistent and change also other resources , for example
GlobalnetCIDRRange and ClustersetIPCIDRRange in Broker resource. right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah whatever is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpantelis Why not follow the core API example add a new field with xxxIPs
which is an array? This makes it easy to handle upgrades and allows us to depricate old xxxIP
field over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that but then we’d have to handle both fields in various places in code going forward. A webhook converter handles all that transparently. Webhooks and CRD versioning were designed for cases like this. K8s core API adds new fields and deprecates old ones b/c they’re not CRDs and much harder to bump versions. We can discuss further (next year 🙂)
And Pod IPV6 egress packets for the same configuration will be: | ||
![non-ovnk-ipv6-egress](./images/ipv6-non-ovnk-egress-packets.png) | ||
|
||
**Note**: In future, we may optimize this architecture for a dual-stack case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cluster is IPv4 only or dualStack, probably starting with a single V4 VxLAN for both might be a better option from the initial implementation. Otherwise you have to consider the impact on upgrades for subsequent changes in future. Just a suggestion.
* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved. | ||
* run NAT Discovery per IP family in remote Endpoint. | ||
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint. | ||
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are changing any CRDs, we should also consider the impact during upgrades as there will be brownfield deployments.
57dea3c
to
4552153
Compare
* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved. | ||
* run NAT Discovery per IP family in remote Endpoint. | ||
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint. | ||
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpantelis Why not follow the core API example add a new field with xxxIPs
which is an array? This makes it easy to handle upgrades and allows us to depricate old xxxIP
field over time.
* Lighthouse ignores processing imported services IP addresses that don’t match local cluster networking configuration. | ||
* Lighthouse will process imported services regardless of local cluster networking configuration, and rely on local workloads dns requests. | ||
For example, in use case \#3 in the table above, Lighthouse DNS database will store both V4 and V6 addresses. | ||
However the local dns client will look for the V6 (AAAA) record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, all the clients/apps make DNS requests for both v4/v6 and try connecting to them in parallel with a slight preference for v6. Please see Happy Eyeballs algo - https://datatracker.ietf.org/doc/html/rfc6555
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the case also for single-stack clients?
I mean does V6 client/app try to connect to V4 address returned from DNS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the DNS server returns any v4 address, then the client will try to connect to it (which would fail if the server is not listening on it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, if V6 client tries to connect to IP V4 address I assume we should go with first approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the local cluster supports only V4 and the remote cluster is v6 only (or dualStack), then returning V6 address as part of Lighthouse may not help since we do not use any intermediate proxies (or NAT mechanisms) to translate v4 to v6. So, approach 1 seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double checking, for use case#1, the V4 client may try to connect to the V6 address, right ?
An IPv4 client can make a DNS query for both A (IPv4) and AAAA (IPv6) records. If there is a response to the AAAA query, the client can then check whether the network-interface/host supports IPv6 and decide accordingly whether to connect over IPv6. OTOH, some client implementations may first check the network-interface/host to see if IPv6 is supported and only then send a AAAA query. Therefore, the behavior depends on the implementation of the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s quite common to see simultaneous A/AAAA queries, and then connection attempts on whatever comes back first, with a fallback to the other, regardless of what the host claims to support (because there are systems with working local IPv6 but no external IPv6 connectivity, and likewise for IPv4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the modern client applications that support dual-stack environments implement the Happy eyeballs algorithm that tries to connect to both V4 and V6 addresses (returned from DNS) concurrently (or with a small delay in v4) and use the connection that succeeds first to improve the user-experience. In the Submariner context, since it is aware of both local and remote cluster networking and because it has control over LH DNS Server, approach 1 seems more optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to summarize:
So, for use cases similar to # 1, the client will not try to connect to the server's V6 address, but it may waste time checking if the local host/network interface supports IPv6.
Approach # 1:
This may not be optimal (I assume the extra time to check the local host/network interface capabilities should be very short) but things should work in a reasonable time and make for a simpler/uniform Lighthouse implementation
Approach # 2:
Faster connection to remote server, more complex Lighthouse implementation
cc @vthapar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some misunderstanding. I'm personally suggesting that we go with approach 1 and not approach 2. We cannot control the behavior of the client whether its checking if the local host/network-iface supports v6 or not. However, we do control the LH DNS Server and can respond to only supported ipFamilies (i.e., return AAAA/v6 of the remote service, only if the local cluster supports v6/dualStack). Approach 1 would also make the implementation of LH DNS server simpler.
Signed-off-by: Yossi Boaron <[email protected]>
|
||
V4 cluster A should successfully join the clusterset. | ||
Dual-stack cluster B should successfully join the clusterset. | ||
V6 cluster C should fail joining the clusterset (because it can't connect to cluster A). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the table above, its mentioned that if we have a v6 cluster and a dual-stack cluster, v6 would be the supported connectivity. Whereas this line mentions that cluster C will be unable to join the clusterSet. Is this validation done at the time of joining the cluster to the Broker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the validation should be done when the cluster joins the clusterset.
Cluster C (V6) won't be able to join the clusterset , because it can't connect cluster A (V4)
No description provided.