forked from TritonDataCenter/rfd
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFD 0011: IPv6 and multiple IP addresses support in SDC
- Loading branch information
Showing
2 changed files
with
261 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,260 @@ | ||
--- | ||
authors: Cody Mello <[email protected]> | ||
state: draft | ||
--- | ||
|
||
# RFD 11 IPv6 and multiple IP addresses support in SDC | ||
|
||
# Introduction | ||
|
||
This proposal lays out the work that needs to be done to add support for IPv6 to | ||
SmartOS and SmartDataCenter (SDC). As the Internet grows and more people come | ||
online around the globe, ISPs and online businesses are looking towards a future | ||
where they will need to support IPv6. Major ISPs like AT\&T, Verizon and Comcast | ||
have already started giving customers IPv6 service and IPv6-enabled modems and | ||
routers. | ||
|
||
Joyent customers and members of the SmartOS community are interested in using | ||
IPv6 with their VMs, so this will be a useful feature and selling point to have. | ||
(Note that it has been possible for some time to use IPv6 with [vmadm(1M)] but | ||
[it required extra effort](https://digitalelf.net/2014/10/ipv6-the-smartos-way/).) | ||
Parts of the stack have made room for future IPv6 support, but almost everything | ||
that touches networking will require modifications and testing. | ||
|
||
This proposal also suggests a change to the relationship between Network | ||
Interface Cards (NICs) and IP addresses in SDC. Instead of always having one | ||
address per NIC, it should be possible for multiple IP addresses to be placed on | ||
a single NIC. We call this interface-centric provisioning. The idea is that when | ||
a machine is being created, it should be possible to request multiple IP | ||
addresses for each NIC (where addresses can be picked by the requester or left | ||
up to the API based on what's available), as long as they are on the same VLAN | ||
or overlay. | ||
|
||
This document is approximately sorted in the order in which each step will need | ||
to be implemented. Basic tooling will come first, and then the plumbing through | ||
SDC. Once that is finished and tested, IPv6 support can be exposed through user | ||
interfaces to end users and operators. | ||
|
||
# Support in tooling | ||
|
||
## vmadm(1M) | ||
|
||
[vmadm(1M)] is the SmartOS tool for creating and managing virtual machines. It makes | ||
use of illumos' ZFS and zone management tools to provision instances of the | ||
images managed with [imgadm(1M)]. [vmadm(1M)] will need to be modified to appropriately | ||
call out to [zonecfg(1M)] and set up the zone to use IPv6. | ||
|
||
[vmadm(1M)] creates and updates objects based on an input JSON object. The following | ||
fields will need to be modified to accommodate IPv6 and to allow specifying | ||
multiple IP addresses: | ||
|
||
- **nic.\*.ips** will accept an array of the following items: | ||
- An IPv4 or IPv6 address using CIDR notation to indicate the routing prefix | ||
- One of the strings `dhcp` or `addrconf`; note that `dhcp` and `addrconf` | ||
cannot appear multiple times in the list, and there cannot be more than 32 | ||
items between **ips** and **allowed\_ips** since the kernel does not allow | ||
the `allowed-ips` property of a NIC to contain more entries than this | ||
- **nic.\*.gateways** will accept an array of IPv4 and IPv6 addresses | ||
- **nic.\*.network6\_uuid** will be a UUID for an IPv6 network | ||
- **routes** will accept both IPv4 and IPv6 networks and addresses as keys and | ||
values (address family must match for each key-value pair, of course) | ||
- **resolvers** will accept both IPv4 and IPv6 addresses | ||
|
||
The fields **nic.\*.ip**, **nic.\*.netmask** and **nic.\*.gateway** will | ||
continue to exist as is, and will be returned by `vmadm get`, but will be | ||
considered deprecated. Note that these new fields allow IPv4 and IPv6 to be | ||
mixed. This is to avoid creating a split in configuration and creating an IPv6 | ||
dual for everything, like `ping` vs. `ping6` or `traceroute` vs. `traceroute6` | ||
in other systems. | ||
|
||
Much of this work has been finished in [OS-2994]. | ||
|
||
## fwadm(1M) and fwrule(5) | ||
|
||
[fwadm(1M)] is the SmartOS tool for creating and managing firewall rules. Under | ||
the hood, it makes use of [ipfilter(5)], and converts rules written in the | ||
[fwrule(5)] domain-specific language to the language defined in [ipf(4)]. Like | ||
[vmadm(1M)], [fwadm(1M)] takes a JSON object as input when creating or modifying | ||
rules. Since it passes off the firewall rules field to the [fwrule(5)] parser, | ||
IPv6 will only need to be accounted for in the **ips** field of remote VMs, and | ||
the **ip** field of NIC objects. | ||
|
||
The parser for the firewall rules is generated by the [Jison parser | ||
generator](http://zaach.github.io/jison/). The grammar will need to be adjusted | ||
to accept and validate IPv6 addresses, and the consumers will need to make sure | ||
that it gets fed appropriately to [ipf(1M)]. | ||
|
||
## Zone \& KVM Images and Setup | ||
|
||
The startup scripts and programs for zones and KVM images will need to be | ||
updated to make use of their IPv6 networking stack. For SmartOS and LX-branded | ||
zones, much of this work has been done in [OS-4582] and [OS-4741]. KVM instances | ||
that are assigned static IP addresses currently make use of the DHCPv4 server | ||
embedded in QEMU, which is less than ideal. In order to respond appropriately to | ||
the guest, QEMU needs to inspect each packet sent over the network. Instead, it | ||
would be better to have the images check on boot (or more likely when the | ||
system's networking service starts up) for what IP addresses they should be | ||
using, and on what NICs. This data can be fetched via [mdata-get(1M)], and used | ||
to configure the system. | ||
|
||
The logic in the brand scripts that prepare the NICs for zones will also need to | ||
be updated, so that properties like **allowed-ips** and **dhcp-nospoof** will | ||
get set correctly. | ||
|
||
# Support in SDC API's | ||
|
||
## Networking API (napi) | ||
|
||
The Networking API is responsible for managing data about networks within SDC. | ||
It also takes care of contacting VMAPI and CNAPI for adding, removing and | ||
updating NICs. | ||
|
||
NAPI will need to be modified to support the following: | ||
|
||
- Accept and validate new IPv6 networks (see [NAPI-308](https://devhub.joyent.com/jira/browse/NAPI-308)); a new field type, **address\_family** will need to be stored in Moray so people will be able to search for specific network types | ||
- Manage and search IPv6 addresses | ||
- Ensure that a NIC only has networks placed on it with the same VLAN tag | ||
- Manage IPv6 network pools, and ensure that they are validated (a network pool | ||
is either IPv4 or IPv6) | ||
|
||
## Firewall API (fwapi) | ||
|
||
The Firewall API is responsible for creating and managing firewall rules. Since | ||
rules are pushed off to the same library used by [fwadm(1M)], there is not much | ||
that should require updating beyond searching. The following changes for the | ||
ListRules endpoint will need to be made: | ||
|
||
- **ip** will need to accept IPv6 addresses | ||
- **subnet** will need to accept IPv6 subnets | ||
- **address\_family** will need to be added, so that it is possible to search | ||
for rules affecting one of IPv4 or IPv6 | ||
|
||
# Compute Node Agents | ||
|
||
The SDC headnode communicates with each of the compute nodes through agents | ||
running in their Global Zone. These agents perform a variety of tasks locally, | ||
ranging from updating the headnode, to making sure that the state on the compute | ||
node matches what the headnode has stored. | ||
|
||
The only agent that should need to be updated is the Compute Node Agent | ||
(cn-agent), which manipulates NICs and associated information. Currently, it | ||
assumes that IP addresses are IPv4 and converts them into 32-bit numbers. After | ||
reviewing the source code for the Firewall Agent (firewaller) and for the | ||
Networking Agent (net-agent), it looks like neither one should need to be | ||
updated, since they don't manipulate IP addresses themselves, but instead pass | ||
them off to other services like NAPI. | ||
|
||
# Overlay Networks | ||
|
||
## Fabrics | ||
|
||
Work on adding IPv6 support to fabrics will occur during a second phase once | ||
standard zones and networks are working. Once support is added, we will assign | ||
users /64 subnets located within the fd00::/8 private network. RFC4193 | ||
recommends randomizing allocations within this space. We should probably provide | ||
the option of picking or randomizing the prefix to the customer. | ||
|
||
Since customers will most likely end up wanting private IP addresses that can | ||
access the rest of the internet, we may need to explore implementing IPv6 | ||
support in [ipnat(1M)], and possibly 6to4 options. These will require further | ||
evaluation in the future to determine if they're worth implementing, or leaving | ||
up to the network operator. Protocols like NAT64 require a lot of configuration, | ||
and running a cooperative DNS64 server, which may not be worth investing | ||
resources in. | ||
|
||
# Operator- and User-facing support | ||
|
||
## Operations Portal (adminui) | ||
|
||
The Operations Portal is the web interface for managing SDC and provisioning new | ||
compute nodes and virtual machines. There are several things that will need to | ||
be updated here: | ||
|
||
- When managing NICs on a VM, the interface will need to allow for assigning | ||
multiple IP addresses to the NIC | ||
- Tests for validating input IP addresses and query parameters will need to | ||
accommodate IPv6 | ||
- The interface for creating new networks should make it clear that IPv6 can be | ||
used by giving example input | ||
- When creating new network pools, once the address family is decided, only | ||
networks of the same type should be suggested | ||
|
||
## CloudAPI | ||
|
||
CloudAPI will need to be extended to allow provisioning with IPv6 addresses, and | ||
to also accept multiple addresses per NIC. Currently, it accepts the fields | ||
**ipv4\_uuid** and **ipv4\_count**, but they cannot be used to assign multiple | ||
IP addresses. We will want it to support the following fields and allow them to | ||
be used for assigning multiple IP addresses: | ||
|
||
- **ipv4\_uuid** is the UUID of the IPv4 network to use (VLAN/vxlan ID must match IPv6 network) | ||
- **ipv4\_count** specifies how many IPv4 addresses should be selected from the pool and assigned to this NIC; it is currently restricted to only being 1 | ||
- **ipv4\_ips** is an array of IPv4 addresses that should be assigned to this NIC | ||
- **ipv6\_uuid** is the UUID of the IPv6 network to use (VLAN/vxlan ID must match IPv4 network) | ||
- **ipv6\_count** specifies how many IPv6 addresses should be selected from the pool and assigned to this NIC | ||
- **ipv6\_ips** is an array of IPv6 addresses that should be assigned to this NIC | ||
|
||
We began moving towards this schema and mindset in [ZAPI-598]. With the work | ||
laid out in this proposal, we will finish it up. | ||
|
||
CloudAPI will also need to be extended to allow for managing firewall rules that | ||
apply to IPv6 networks and addresses. | ||
|
||
## Docker | ||
|
||
Docker and our APIs for supporting it will need to gain the appropriate support | ||
and plumbing. The Docker Inspect API call only allows a single IP address to be | ||
returned. Either we will need to only pick a representative IP address, or the | ||
API will need to be improved to allow returning multiple IPv4 or IPv6 addresses. | ||
|
||
# IPv6 on the admin network | ||
|
||
Making it possible for SDC to have an IPv6 admin network would be a nice feature | ||
to offer, but it is not essential. Since the admin network is usually a | ||
non-routable private network, there will probably never be a real need for it to | ||
support IPv6. As a result, some of these features may be put off for a while. | ||
They are enumerated here though as a point of reference. Note that as of | ||
[OS-4802], SmartOS hosts can use IPv6 from the global zone, but this cannot be | ||
used within SDC. | ||
|
||
The simplest path to assigning IPv6 addresses to nodes on the admin network | ||
would be to run [in.ndpd(1M)] alongside Booter, and send out Router | ||
Advertisements with the autonomous bit set, so that everyone else performs | ||
SLAAC. If additional information is to be sent though, or if more control over | ||
assigning addresses is needed, then it may be better to use DHCPv6. | ||
|
||
## Binder | ||
|
||
Binder is the DNS server used within SDC for locating admin services and compute | ||
nodes. Currently, it only serves up IPv4 A records. Before SDC can be run on an | ||
IPv6 admin network, Binder will need to gain support for serving IPv6 AAAA | ||
records, so that various programs can continue to look up services on the admin | ||
network via DNS. | ||
|
||
## Booter | ||
|
||
Booter is the DHCP and TFTP server used in SDC for assigning compute nodes IP | ||
addresses and PXE booting them. In order for IPv6 to be used on the admin | ||
network, Booter will need to gain support for DHCPv6 so that compute nodes can | ||
get an IPv6 address and be sent the appropriate options and information to know | ||
how to properly boot. | ||
|
||
<!--- Manual page links --> | ||
[in.ndpd(1M)]: https://smartos.org/man/1M/in.ndpd | ||
[ipf(1M)]: https://smartos.org/man/1M/ipf | ||
[ipnat(1M)]: https://smartos.org/man/1M/ipnat | ||
[fwadm(1M)]: https://smartos.org/man/1M/fwadm | ||
[imgadm(1M)]: https://smartos.org/man/1M/imgadm | ||
[mdata-get(1M)]: https://smartos.org/man/1M/mdata-get | ||
[vmadm(1M)]: https://smartos.org/man/1M/vmadm | ||
[zonecfg(1M)]: https://smartos.org/man/1M/zonecfg | ||
[ipf(4)]: https://smartos.org/man/4/ipf | ||
[fwrule(5)]: https://smartos.org/man/5/fwrule | ||
[ipfilter(5)]: https://smartos.org/man/5/ipfilter | ||
|
||
<!-- Issue links --> | ||
[OS-2994]: https://smartos.org/bugview/OS-2994 | ||
[OS-4582]: https://smartos.org/bugview/OS-4582 | ||
[OS-4741]: https://smartos.org/bugview/OS-4741 | ||
[OS-4802]: https://smartos.org/bugview/OS-4802 | ||
[ZAPI-598]: https://smartos.org/bugview/ZAPI-598 |