portmap: support hairpin, improve performance
This change improves the performance of the portmap plugin and fixes hairpin, when a container is mapped back to itself. Performance is improved by using a multiport test to reduce rule traversal, and by using a masquerade mark. Hairpin is fixed by enabling masquerading for hairpin traffic.
This commit is contained in:
@ -8,6 +8,8 @@ You should use this plugin as part of a network configuration list. It accepts
|
||||
the following configuration options:
|
||||
|
||||
* `snat` - boolean, default true. If true or omitted, set up the SNAT chains
|
||||
* `markMasqBit` - int, (0-31), default 13. The mark bit to use for masquerading (see section SNAT). Cannot be set when `externalSetMarkChain` is used.
|
||||
* `externalSetMarkChain` - string, default nil. If you already have a Masquerade mark chain (e.g. Kubernetes), specify it here. This will use that instead of creating a separate chain. When this is set, `markMasqBit` must be unspecified.
|
||||
* `conditionsV4`, `conditionsV6` - array of strings. A list of arbitrary `iptables`
|
||||
matches to add to the per-container rule. This may be useful if you wish to
|
||||
exclude specific IPs from port-mapping
|
||||
@ -15,7 +17,7 @@ exclude specific IPs from port-mapping
|
||||
The plugin expects to receive the actual list of port mappings via the
|
||||
`portMappings` [capability argument](https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md)
|
||||
|
||||
So a sample standalone config list (with the file extension .conflist) might
|
||||
A sample standalone config list for Kubernetes (with the file extension .conflist) might
|
||||
look like:
|
||||
|
||||
```json
|
||||
@ -39,21 +41,31 @@ look like:
|
||||
{
|
||||
"type": "portmap",
|
||||
"capabilities": {"portMappings": true},
|
||||
"snat": false,
|
||||
"conditionsV4": ["!", "-d", "192.0.2.0/24"],
|
||||
"conditionsV6": ["!", "-d", "fc00::/7"]
|
||||
"externalSetMarkChain": "KUBE-MARK-MASQ"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
A configuration file with all options set:
|
||||
```json
|
||||
{
|
||||
"type": "portmap",
|
||||
"capabilities": {"portMappings": true},
|
||||
"snat": true,
|
||||
"markMasqBit": 13,
|
||||
"externalSetMarkChain": "CNI-HOSTPORT-SETMARK",
|
||||
"conditionsV4": ["!", "-d", "192.0.2.0/24"],
|
||||
"conditionsV6": ["!", "-d", "fc00::/7"]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Rule structure
|
||||
The plugin sets up two sequences of chains and rules - one "primary" DNAT
|
||||
sequence to rewrite the destination, and one additional SNAT sequence that
|
||||
rewrites the source address for packets from localhost. The sequence is somewhat
|
||||
complex to minimize the number of rules non-forwarded packets must traverse.
|
||||
will masquerade traffic as needed.
|
||||
|
||||
|
||||
### DNAT
|
||||
@ -68,50 +80,54 @@ rules look like this:
|
||||
- `--dst-type LOCAL -j CNI-HOSTPORT-DNAT`
|
||||
|
||||
`CNI-HOSTPORT-DNAT` chain:
|
||||
- `${ConditionsV4/6} -j CNI-DN-xxxxxx` (where xxxxxx is a function of the ContainerID and network name)
|
||||
- `${ConditionsV4/6} -p tcp --destination-ports 8080,8043 -j CNI-DN-xxxxxx` (where xxxxxx is a function of the ContainerID and network name)
|
||||
|
||||
`CNI-DN-xxxxxx` chain:
|
||||
- `-p tcp --dport 8080 -j DNAT --to-destination 172.16.30.2:80`
|
||||
`CNI-HOSTPORT-SETMARK` chain:
|
||||
- `-j MARK --set-xmark 0x2000/0x2000`
|
||||
|
||||
`CNI-DN-xxxxxx` chain:
|
||||
- `-p tcp -s 172.16.30.2 --dport 8080 -j CNI-HOSTPORT-SETMARK` (masquerade hairpin traffic)
|
||||
- `-p tcp -s 127.0.0.1 --dport 8080 -j CNI-HOSTPORT-SETMARK` (masquerade localhost traffic)
|
||||
- `-p tcp --dport 8080 -j DNAT --to-destination 172.16.30.2:80` (rewrite destination)
|
||||
- `-p tcp -s 172.16.30.2 --dport 8043 -j CNI-HOSTPORT-SETMARK`
|
||||
- `-p tcp -s 127.0.0.1 --dport 8043 -j CNI-HOSTPORT-SETMARK`
|
||||
- `-p tcp --dport 8043 -j DNAT --to-destination 172.16.30.2:443`
|
||||
|
||||
New connections to the host will have to traverse every rule, so large numbers
|
||||
of port forwards may have a performance impact. This won't affect established
|
||||
connections, just the first packet.
|
||||
|
||||
### SNAT
|
||||
The SNAT rule enables port-forwarding from the localhost IP on the host.
|
||||
This rule rewrites (masquerades) the source address for connections from
|
||||
localhost. If this rule did not exist, a connection to `localhost:80` would
|
||||
still have a source IP of 127.0.0.1 when received by the container, so no
|
||||
packets would respond. Again, it is a sequence of 3 chains. Because SNAT has to
|
||||
occur in the `POSTROUTING` chain, the packet has already been through the DNAT
|
||||
chain.
|
||||
### SNAT (Masquerade)
|
||||
Some packets also need to have the source address rewritten:
|
||||
* connections from localhost
|
||||
* Hairpin traffic back to the container.
|
||||
|
||||
In the DNAT chain, a bit is set on the mark for packets that need snat. This
|
||||
chain performs that masquerading. By default, bit 13 is set, but this is
|
||||
configurable. If you are using other tools that also use the iptables mark,
|
||||
you should make sure this doesn't conflict.
|
||||
|
||||
Some container runtimes, most notably Kubernetes, already have a set of rules
|
||||
for masquerading when a specific mark bit is set. If so enabled, the plugin
|
||||
will use that chain instead.
|
||||
|
||||
`POSTROUTING`:
|
||||
- `-s 127.0.0.1 ! -d 127.0.0.1 -j CNI-HOSTPORT-SNAT`
|
||||
- `-j CNI-HOSTPORT-MASQ`
|
||||
|
||||
`CNI-HOSTPORT-SNAT`:
|
||||
- `-j CNI-SN-xxxxx`
|
||||
|
||||
`CNI-SN-xxxxx`:
|
||||
- `-p tcp -s 127.0.0.1 -d 172.16.30.2 --dport 80 -j MASQUERADE`
|
||||
- `-p tcp -s 127.0.0.1 -d 172.16.30.2 --dport 443 -j MASQUERADE`
|
||||
|
||||
Only new connections from the host, where the source address is 127.0.0.1 but
|
||||
not the destination will traverse this chain. It is unlikely that any packets
|
||||
will reach these rules without being SNATted, so the cost should be minimal.
|
||||
`CNI-HOSTPORT-MASQ`:
|
||||
- `--mark 0x2000 -j MASQUERADE`
|
||||
|
||||
Because MASQUERADE happens in POSTROUTING, it means that packets with source ip
|
||||
127.0.0.1 need to pass a routing boundary. By default, that is not allowed
|
||||
in Linux. So, need to enable the sysctl `net.ipv4.conf.IFNAME.route_localnet`,
|
||||
where IFNAME is the name of the host-side interface that routes traffic to the
|
||||
container.
|
||||
127.0.0.1 need to first pass a routing boundary before being masqueraded. By
|
||||
default, that is not allowed in Linux. So, the plugin needs to enable the sysctl
|
||||
`net.ipv4.conf.IFNAME.route_localnet`, where IFNAME is the name of the host-side
|
||||
interface that routes traffic to the container.
|
||||
|
||||
There is no equivalent to `route_localnet` for ipv6, so SNAT does not work
|
||||
for ipv6. If you need port forwarding from localhost, your container must have
|
||||
an ipv4 address.
|
||||
There is no equivalent to `route_localnet` for ipv6, so connections to ::1
|
||||
will not be portmapped for ipv6. If you need port forwarding from localhost,
|
||||
your container must have an ipv4 address.
|
||||
|
||||
|
||||
## Known issues
|
||||
- ipsets could improve efficiency
|
||||
- SNAT does not work with ipv6.
|
||||
- forwarding from localhost does not work with ipv6.
|
||||
|
Reference in New Issue
Block a user