portmap: support hairpin, improve performance

This change improves the performance of the portmap plugin and fixes
hairpin, when a container is mapped back to itself.

Performance is improved by using a multiport test to reduce rule
traversal, and by using a masquerade mark.

Hairpin is fixed by enabling masquerading for hairpin traffic.
This commit is contained in:
Casey Callendrello
2017-11-03 16:53:12 +00:00
parent 7f98c94613
commit 5576f3120e
8 changed files with 513 additions and 296 deletions

View File

@ -8,6 +8,8 @@ You should use this plugin as part of a network configuration list. It accepts
the following configuration options:
* `snat` - boolean, default true. If true or omitted, set up the SNAT chains
* `markMasqBit` - int, (0-31), default 13. The mark bit to use for masquerading (see section SNAT). Cannot be set when `externalSetMarkChain` is used.
* `externalSetMarkChain` - string, default nil. If you already have a Masquerade mark chain (e.g. Kubernetes), specify it here. This will use that instead of creating a separate chain. When this is set, `markMasqBit` must be unspecified.
* `conditionsV4`, `conditionsV6` - array of strings. A list of arbitrary `iptables`
matches to add to the per-container rule. This may be useful if you wish to
exclude specific IPs from port-mapping
@ -15,7 +17,7 @@ exclude specific IPs from port-mapping
The plugin expects to receive the actual list of port mappings via the
`portMappings` [capability argument](https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md)
So a sample standalone config list (with the file extension .conflist) might
A sample standalone config list for Kubernetes (with the file extension .conflist) might
look like:
```json
@ -39,21 +41,31 @@ look like:
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": false,
"conditionsV4": ["!", "-d", "192.0.2.0/24"],
"conditionsV6": ["!", "-d", "fc00::/7"]
"externalSetMarkChain": "KUBE-MARK-MASQ"
}
]
}
```
A configuration file with all options set:
```json
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true,
"markMasqBit": 13,
"externalSetMarkChain": "CNI-HOSTPORT-SETMARK",
"conditionsV4": ["!", "-d", "192.0.2.0/24"],
"conditionsV6": ["!", "-d", "fc00::/7"]
}
```
## Rule structure
The plugin sets up two sequences of chains and rules - one "primary" DNAT
sequence to rewrite the destination, and one additional SNAT sequence that
rewrites the source address for packets from localhost. The sequence is somewhat
complex to minimize the number of rules non-forwarded packets must traverse.
will masquerade traffic as needed.
### DNAT
@ -68,50 +80,54 @@ rules look like this:
- `--dst-type LOCAL -j CNI-HOSTPORT-DNAT`
`CNI-HOSTPORT-DNAT` chain:
- `${ConditionsV4/6} -j CNI-DN-xxxxxx` (where xxxxxx is a function of the ContainerID and network name)
- `${ConditionsV4/6} -p tcp --destination-ports 8080,8043 -j CNI-DN-xxxxxx` (where xxxxxx is a function of the ContainerID and network name)
`CNI-DN-xxxxxx` chain:
- `-p tcp --dport 8080 -j DNAT --to-destination 172.16.30.2:80`
`CNI-HOSTPORT-SETMARK` chain:
- `-j MARK --set-xmark 0x2000/0x2000`
`CNI-DN-xxxxxx` chain:
- `-p tcp -s 172.16.30.2 --dport 8080 -j CNI-HOSTPORT-SETMARK` (masquerade hairpin traffic)
- `-p tcp -s 127.0.0.1 --dport 8080 -j CNI-HOSTPORT-SETMARK` (masquerade localhost traffic)
- `-p tcp --dport 8080 -j DNAT --to-destination 172.16.30.2:80` (rewrite destination)
- `-p tcp -s 172.16.30.2 --dport 8043 -j CNI-HOSTPORT-SETMARK`
- `-p tcp -s 127.0.0.1 --dport 8043 -j CNI-HOSTPORT-SETMARK`
- `-p tcp --dport 8043 -j DNAT --to-destination 172.16.30.2:443`
New connections to the host will have to traverse every rule, so large numbers
of port forwards may have a performance impact. This won't affect established
connections, just the first packet.
### SNAT
The SNAT rule enables port-forwarding from the localhost IP on the host.
This rule rewrites (masquerades) the source address for connections from
localhost. If this rule did not exist, a connection to `localhost:80` would
still have a source IP of 127.0.0.1 when received by the container, so no
packets would respond. Again, it is a sequence of 3 chains. Because SNAT has to
occur in the `POSTROUTING` chain, the packet has already been through the DNAT
chain.
### SNAT (Masquerade)
Some packets also need to have the source address rewritten:
* connections from localhost
* Hairpin traffic back to the container.
In the DNAT chain, a bit is set on the mark for packets that need snat. This
chain performs that masquerading. By default, bit 13 is set, but this is
configurable. If you are using other tools that also use the iptables mark,
you should make sure this doesn't conflict.
Some container runtimes, most notably Kubernetes, already have a set of rules
for masquerading when a specific mark bit is set. If so enabled, the plugin
will use that chain instead.
`POSTROUTING`:
- `-s 127.0.0.1 ! -d 127.0.0.1 -j CNI-HOSTPORT-SNAT`
- `-j CNI-HOSTPORT-MASQ`
`CNI-HOSTPORT-SNAT`:
- `-j CNI-SN-xxxxx`
`CNI-SN-xxxxx`:
- `-p tcp -s 127.0.0.1 -d 172.16.30.2 --dport 80 -j MASQUERADE`
- `-p tcp -s 127.0.0.1 -d 172.16.30.2 --dport 443 -j MASQUERADE`
Only new connections from the host, where the source address is 127.0.0.1 but
not the destination will traverse this chain. It is unlikely that any packets
will reach these rules without being SNATted, so the cost should be minimal.
`CNI-HOSTPORT-MASQ`:
- `--mark 0x2000 -j MASQUERADE`
Because MASQUERADE happens in POSTROUTING, it means that packets with source ip
127.0.0.1 need to pass a routing boundary. By default, that is not allowed
in Linux. So, need to enable the sysctl `net.ipv4.conf.IFNAME.route_localnet`,
where IFNAME is the name of the host-side interface that routes traffic to the
container.
127.0.0.1 need to first pass a routing boundary before being masqueraded. By
default, that is not allowed in Linux. So, the plugin needs to enable the sysctl
`net.ipv4.conf.IFNAME.route_localnet`, where IFNAME is the name of the host-side
interface that routes traffic to the container.
There is no equivalent to `route_localnet` for ipv6, so SNAT does not work
for ipv6. If you need port forwarding from localhost, your container must have
an ipv4 address.
There is no equivalent to `route_localnet` for ipv6, so connections to ::1
will not be portmapped for ipv6. If you need port forwarding from localhost,
your container must have an ipv4 address.
## Known issues
- ipsets could improve efficiency
- SNAT does not work with ipv6.
- forwarding from localhost does not work with ipv6.