This PR bring V2 API support into win-overlay CNI. With the current V1
API, only docker runtime works for win-overlay. By bringing new changes, we
should be able to use containerd as the runtime.Below are the key
points regarding this implementation.
1. Clear seperation for V1 & V2 API support
2. New cni.conf sample that works for win-overlay
Signed-off-by: selansen <esiva@redhat.com>
Signed-off-by: mansikulkarni96 <mankulka@redhat.com>
Calling AddPort before AddProtocol returns an error, which means ConntrackDeleteFilter has been called without port filter.
Signed-off-by: Sang Heon Lee <developistBV@gmail.com>
This commit adds a new parameter `ingressPolicy` (`string`) to the `firewall` plugin.
The supported values are `open` and `same-bridge`.
- `open` is the default and does NOP.
- `same-bridge` creates "CNI-ISOLATION-STAGE-1" and "CNI-ISOLATION-STAGE-2"
that are similar to Docker libnetwork's "DOCKER-ISOLATION-STAGE-1" and
"DOCKER-ISOLATION-STAGE-2" rules.
e.g., when `ns1` and `ns2` are connected to bridge `cni1`, and `ns3` is
connected to bridge `cni2`, the `same-bridge` ingress policy disallows
communications between `ns1` and `ns3`, while allowing communications
between `ns1` and `ns2`.
Please refer to the comment lines in `ingresspolicy.go` for the actual iptables rules.
The `same-bridge` ingress policy is expected to be used in conjunction
with `bridge` plugin. May not work as expected with other "main" plugins.
It should be also noted that the `same-bridge` ingress policy executes
raw `iptables` commands directly, even when the `backend` is set to `firewalld`.
We could potentially use the "direct" API of firewalld [1] to execute
iptables via firewalld, but it doesn't seem to have a clear benefit over just directly
executing raw iptables commands.
(Anyway, we have been already executing raw iptables commands in the `portmap` plugin)
[1] https://firewalld.org/documentation/direct/options.html
This commit replaces the `isolation` plugin proposal (issue 573, PR 574).
The design of `ingressPolicy` was discussed in the comments of the withdrawn PR 574 ,
but `same-network` was renamed to `same-bridge` then.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Use the constants already defined in the golang.org/x/sys/unix package
instead of open-coding them.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
The current code accidentally ignores partial reads, since it doesn't
check the return value of (io.Reader).Read.
What we actually want is io.ReadFull(rand.Reader, buf), which is
conveniently provided by rand.Read(buf).
Signed-off-by: edef <edef@edef.eu>
The new macspoofchk field is added to the bridge plugin to support
anti-mac-spoofing.
When the parameter is enabled, traffic is limited to the mac addresses
of the container interface (the veth peer that is placed in the
container ns).
Any traffic that exits the pod is checked against the source mac address
that is expected. If the mac address is different, the frames are
dropped.
The implementation is using nftables and should only be used on nodes
that support it.
Signed-off-by: Edward Haas <edwardh@redhat.com>
Instead of moving the host side of the veth peer into the host
network namespace later, just create it in the host namespace
directly.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Controlling the mac address of the interface (veth peer) in the
container is useful for functionalities that depend on the mac address.
Examples range from dynamic IP allocations based on an identifier (the
mac) and up to firewall rules (e.g. no-mac-spoofing).
Enforcing a mac address at an early stage and not through a chained
plugin assures the configuration does not have wrong intermediate
configuration. This is especially critical when a dynamic IP may be
provided already in this period.
But it also has implications for future abilities that may land on the
bridge plugin, e.g. supporting no-mac-spoofing.
The field name used (`mac`) fits with other plugins which control the
mac address of the container interface.
The mac address may be specified through the following methods:
- CNI_ARGS
- Args
- RuntimeConfig [1]
The list is ordered by priority, from lowest to higher. The higher
priority method overrides any previous settings.
(e.g. if the mac is specified in RuntimeConfig, it will override any
specifications of the mac mentioned in CNI_ARGS or Args)
[1] To use RuntimeConfig, the network configuration should include the
`capabilities` field with `mac` specified (`"capabilities": {"mac": true}`).
Signed-off-by: Edward Haas <edwardh@redhat.com>
- support v2 api
- unify v1 and v2 api
BREAKING CHANGE:
- remove `HcnPolicyArgs` field
- merge `HcnPolicyArgs` into `Policies` field
Signed-off-by: thxcode <thxcode0824@gmail.com>
A dot is a valid character in interface names and is often used in the
names of VLAN interfaces. The sysctl net.ipv6.conf.<ifname>.disable_ipv6
key path cannot use dots both in the ifname and as path separator.
We switch to using / as key path separator so dots are allowed in the
ifname.
This works because sysctl.Sysctl() accepts key paths with either dots
or slashes as separators.
Also, print error message to stderr in case sysctl cannot be read
instead of silently hiding the error.
Signed-off-by: David Verbeiren <david.verbeiren@tessares.net>
conntrack does not have any way to track UDP connections, so
it relies on timers to delete a connection.
The problem is that UDP is connectionless, so a client will keep
sending traffic despite the server has gone, thus renewing the
conntrack entries.
Pods that use portmaps to expose UDP services need to flush the existing
conntrack entries on the port exposed when they are created,
otherwise conntrack will keep sending the traffic to the previous IP
until the connection age (the client stops sending traffic)
Signed-off-by: Antonio Ojea <aojea@redhat.com>
nc behaviour depends on the implementation version of what's on the current host.
Here we use our own client with stable behaviour.
Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
In GetCurrentNS, If there is a context-switch between
getCurrentThreadNetNSPath and GetNS, another goroutine may execute in
the original thread and change its network namespace, then the original
goroutine would get the updated network namespace, which could lead to
unexpected behavior, especially when GetCurrentNS is used to get the
host network namespace in netNS.Do.
The added test has a chance to reproduce it with "-count=50".
The patch fixes it by locking the thread in GetCurrentNS.
Signed-off-by: Quan Tian <qtian@vmware.com>
If the pluging receives portMappings in runtimeConfig, the pluing will add a NAT policy for each port mapping on the generated endpoints.
It enables HostPort usage on Windows with win-bridge.
Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>
The current ns package code is very careful about not leaving the calling
thread with the overridden namespace set, for example when origns.Set() fails.
This is achieved by starting a new green thread, locking its OS thread, and
never unlocking it. Which makes golang runtime to scrap the OS thread backing
the green thread after the go routine exits.
While this works, it's probably not as optimal: stopping and starting a new OS
thread is expensive and may be avoided if we unlock the thread after resetting
network namespace to the original. On the other hand, if resetting fails, it's
better to leave the thread locked and die.
While it won't work in all cases, we can still make an attempt to reuse the OS
thread when resetting the namespace succeeds. This can be achieved by unlocking
the thread conditionally to the namespace reset success.
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Sysctl names can use dots or slashes as separator:
- if dots are used, dots and slashes are interchanged.
- if slashes are used, slashes and dots are left intact.
Separator in use is determined by firt ocurrence.
Reference: http://man7.org/linux/man-pages/man5/sysctl.d.5.html
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Add the following idempotent functions to iptables utils:
DeleteRule: idempotently delete an iptables rule
DeleteChain: idempotently delete an iptables chain
ClearChain: idempotently flush an iptables chain
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Concurrent use of the `portmap` and `firewall` plugins can result in
errors during iptables chain creation:
- The `portmap` plugin has a time-of-check-time-of-use race where it
checks for existence of the chain but the operation isn't atomic.
- The `firewall` plugin doesn't check for existing chains and just
returns an error.
This commit makes both operations idempotent by creating the chain and
then discarding the error if it's caused by the chain already
existing. It also factors the chain creation out into `pkg/utils` as a
site for future refactoring work.
Signed-off-by: Tim Gross <tim@0x74696d.com>
When running in a user namespace created by an unprivileged user the
owner of /var/run will be reported as the unknown user (as defined in
/proc/sys/kernel/overflowuid) so any access to the directory will
fail.
If the XDG_RUNTIME_DIR environment variable is set, check whether the
current user is also the owner of /var/run. If the owner is different
than the current user, use the $XDG_RUNTIME_DIR/netns directory.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>