Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for IPV6 datapath support #240

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

yboaron
Copy link

@yboaron yboaron commented Dec 3, 2024

No description provided.

@submariner-bot
Copy link
Collaborator

🤖 Created branch: z_pr240/yboaron/v6_support

@yboaron yboaron force-pushed the v6_support branch 2 times, most recently from 3cb6a15 to 842e3c6 Compare December 3, 2024 15:26
Copy link
Contributor

@tpantelis tpantelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image links don't work from the rich diff view.

* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved.
* run NAT Discovery per IP family in remote Endpoint.
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint.
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the appropriate IP fields to arrays and bump the API version and implement a webhook converter so each version can be served for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are changing any CRDs, we should also consider the impact during upgrades as there will be brownfield deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - a webhook converter would handle that, ie convert to and from both versions depending on client requests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I guess we need to be persistent and change also other resources , for example
GlobalnetCIDRRange and ClustersetIPCIDRRange in Broker resource. right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah whatever is needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpantelis Why not follow the core API example add a new field with xxxIPs which is an array? This makes it easy to handle upgrades and allows us to depricate old xxxIP field over time.

Copy link
Contributor

@tpantelis tpantelis Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but then we’d have to handle both fields in various places in code going forward. A webhook converter handles all that transparently. Webhooks and CRD versioning were designed for cases like this. K8s core API adds new fields and deprecates old ones b/c they’re not CRDs and much harder to bump versions. We can discuss further (next year 🙂)

submariner/IPV6-datapath.md Outdated Show resolved Hide resolved
submariner/IPV6-datapath.md Show resolved Hide resolved
And Pod IPV6 egress packets for the same configuration will be:
![non-ovnk-ipv6-egress](./images/ipv6-non-ovnk-egress-packets.png)

**Note**: In future, we may optimize this architecture for a dual-stack case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cluster is IPv4 only or dualStack, probably starting with a single V4 VxLAN for both might be a better option from the initial implementation. Otherwise you have to consider the impact on upgrades for subsequent changes in future. Just a suggestion.

* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved.
* run NAT Discovery per IP family in remote Endpoint.
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint.
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are changing any CRDs, we should also consider the impact during upgrades as there will be brownfield deployments.

submariner/IPV6-datapath.md Outdated Show resolved Hide resolved
@maayanf24 maayanf24 mentioned this pull request Dec 5, 2024
15 tasks
@yboaron yboaron force-pushed the v6_support branch 3 times, most recently from 57dea3c to 4552153 Compare December 15, 2024 13:44
* Note: gateway should address corner cases related to this change, for example in dual-stack environment only the V4 publicIP address is successfully resolved.
* run NAT Discovery per IP family in remote Endpoint.
* advertise in local Endpoint IP details based on cluster networking type, for example in duals-stack cluster both V4 Public IP and V6 Public IP should be advertised in Endpoint.
* continue advertising a **single** Endpoint. in case of a dual-stack cluster, fields should consist of both V4 and V6 addresses separated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpantelis Why not follow the core API example add a new field with xxxIPs which is an array? This makes it easy to handle upgrades and allows us to depricate old xxxIP field over time.

submariner/IPV6-datapath.md Outdated Show resolved Hide resolved
submariner/IPV6-datapath.md Outdated Show resolved Hide resolved
* Lighthouse ignores processing imported services IP addresses that don’t match local cluster networking configuration.
* Lighthouse will process imported services regardless of local cluster networking configuration, and rely on local workloads dns requests.
For example, in use case \#3 in the table above, Lighthouse DNS database will store both V4 and V6 addresses.
However the local dns client will look for the V6 (AAAA) record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, all the clients/apps make DNS requests for both v4/v6 and try connecting to them in parallel with a slight preference for v6. Please see Happy Eyeballs algo - https://datatracker.ietf.org/doc/html/rfc6555

Copy link
Author

@yboaron yboaron Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case also for single-stack clients?
I mean does V6 client/app try to connect to V4 address returned from DNS ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the DNS server returns any v4 address, then the client will try to connect to it (which would fail if the server is not listening on it).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, if V6 client tries to connect to IP V4 address I assume we should go with first approach

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the local cluster supports only V4 and the remote cluster is v6 only (or dualStack), then returning V6 address as part of Lighthouse may not help since we do not use any intermediate proxies (or NAT mechanisms) to translate v4 to v6. So, approach 1 seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checking, for use case#1, the V4 client may try to connect to the V6 address, right ?

An IPv4 client can make a DNS query for both A (IPv4) and AAAA (IPv6) records. If there is a response to the AAAA query, the client can then check whether the network-interface/host supports IPv6 and decide accordingly whether to connect over IPv6. OTOH, some client implementations may first check the network-interface/host to see if IPv6 is supported and only then send a AAAA query. Therefore, the behavior depends on the implementation of the client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s quite common to see simultaneous A/AAAA queries, and then connection attempts on whatever comes back first, with a fallback to the other, regardless of what the host claims to support (because there are systems with working local IPv6 but no external IPv6 connectivity, and likewise for IPv4).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the modern client applications that support dual-stack environments implement the Happy eyeballs algorithm that tries to connect to both V4 and V6 addresses (returned from DNS) concurrently (or with a small delay in v4) and use the connection that succeeds first to improve the user-experience. In the Submariner context, since it is aware of both local and remote cluster networking and because it has control over LH DNS Server, approach 1 seems more optimal.

Copy link
Author

@yboaron yboaron Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to summarize:

So, for use cases similar to # 1, the client will not try to connect to the server's V6 address, but it may waste time checking if the local host/network interface supports IPv6.

Approach # 1:
This may not be optimal (I assume the extra time to check the local host/network interface capabilities should be very short) but things should work in a reasonable time and make for a simpler/uniform Lighthouse implementation

Approach # 2:
Faster connection to remote server, more complex Lighthouse implementation

cc @vthapar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some misunderstanding. I'm personally suggesting that we go with approach 1 and not approach 2. We cannot control the behavior of the client whether its checking if the local host/network-iface supports v6 or not. However, we do control the LH DNS Server and can respond to only supported ipFamilies (i.e., return AAAA/v6 of the remote service, only if the local cluster supports v6/dualStack). Approach 1 would also make the implementation of LH DNS server simpler.


V4 cluster A should successfully join the clusterset.
Dual-stack cluster B should successfully join the clusterset.
V6 cluster C should fail joining the clusterset (because it can't connect to cluster A).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the table above, its mentioned that if we have a v6 cluster and a dual-stack cluster, v6 would be the supported connectivity. Whereas this line mentions that cluster C will be unable to join the clusterSet. Is this validation done at the time of joining the cluster to the Broker?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the validation should be done when the cluster joins the clusterset.
Cluster C (V6) won't be able to join the clusterset , because it can't connect cluster A (V4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

6 participants