This is a third draft of the first steps in implementing more general
"connection" tracking, as described at:
https://pad.passt.top/p/NewForwardingModel
This series changes the TCP connection table and hash table into a
more general flow table that can track other protocols as well. Each
flow uniformly keeps track of all the relevant addresses and ports,
which will allow for more robust control of NAT and port forwarding.
ICMP is converted to use the new flow table.
Caveats:
* We …
[View More]significantly increase the size of a connection/flow entry
- Can probably be mitigated, but I haven't investigated much yet
* We perform a number of extra getsockname() calls to know some of
the socket endpoints
- Haven't yet measured how much performance impact that has
- Can be mitigated in at least some cases, but again, haven't
tried yet
* UDP is not converted yet
Changes since v2:
* Cosmetic fixes based on review
* Extra doc comments for enum flow_type
* Rename flowside to flowaddrs which turns out to make more sense in
light of future changes
* Fix bug where the socket flowaddrs for tap initiated connections
wasn't initialised to match the socket address we were using in the
case of map-gw NAT
* New flowaddrs_from_sock() helper used in most cases which is cleaner
and should avoid bugs like the above
* Using newer centralised workarounds for clang-tidy issue 58992
* Remove duplicate definition of FLOW_MAX as maximum flow type and
maximum number of tracked flows
* Rebased on newer versions of preliminary work (ICMP, flow based
dispatch and allocation, bind/address cleanups)
* Unified hash table as well as base flow table
* Integrated ICMP
Changes since v1:
* Terminology changes
- "Endpoint" address/port instead of "correspondent" address/port
- "flowside" instead of "demiflow"
* Actually move the connection table to a new flow table structure in
new files
* Significant rearrangement of earlier patchs on top of that new
table, to reduce churn
David Gibson (15):
flow: Common data structures for tracking flow addresses
tcp, flow: Maintain guest side flow information
tcp, flow: Maintain host side flow information
tcp_splice,flow: Maintain flow information for spliced connections
flow, tcp, tcp_splice: Uniform debug helpers for new flows
tcp, flow: Replace TCP specific hash function with general flow hash
flow: Add helper to determine a flow's protocol
flow, tcp: Generalise TCP hash table to general flow hash table
tcp: Re-use flow hash for initial sequence number generation
icmp: Store ping socket information in the flow table
icmp: Populate guest side information for ping flows
icmp: Populate and use host side flow information
icmp: Use 'flowside' epoll references for ping sockets
icmp: Merge EPOLL_TYPE_ICMP and EPOLL_TYPE_ICMPV6
icmp: Eliminate icmp_id_map
Makefile | 6 +-
flow.c | 260 ++++++++++++++++++++++++++++++++++++++++++
flow.h | 104 +++++++++++++++++
flow_table.h | 2 +
icmp.c | 211 +++++++++++++++++++---------------
icmp.h | 15 +--
icmp_flow.h | 31 +++++
passt.c | 15 +--
passt.h | 9 +-
tap.c | 11 --
tap.h | 1 -
tcp.c | 313 +++++++++++++++------------------------------------
tcp_conn.h | 9 --
tcp_splice.c | 63 ++++++++---
tcp_splice.h | 3 +-
util.c | 4 +-
util.h | 18 +++
17 files changed, 683 insertions(+), 392 deletions(-)
create mode 100644 icmp_flow.h
--
2.43.0
[View Less]
As with TCP, it turns out that there are a bunch of clean ups and
reworks to the ICMP code which will make integration with the flow
table easier, even before introducing a non-trivial version of the
flow table itself.
Based on the flow based dispatch/allocation, and bind/addressing
cleanup series.
Changes since v1:
* Rebased on newer version of flow dispatch & allocation series
* Added 12/12 splitting out close and new sequence functions
David Gibson (12):
checksum: Don't use linux/…
[View More]icmp.h when netinet/ip_icmp.h will do
icmp: Don't set "port" on destination sockaddr for ping sockets
icmp: Remove redundant initialisation of sendto() address
icmp: Don't attempt to handle "wrong direction" ping socket traffic
icmp: Don't attempt to match host IDs to guest IDs
icmp: Use -1 to represent "missing" sockets
icmp: Simplify socket expiry scanning
icmp: Share more between IPv4 and IPv6 paths in icmp_tap_handler()
icmp: Consolidate icmp_sock_handler() with icmpv6_sock_handler()
icmp: Warn on receive errors from ping sockets
icmp: Validate packets received on ping sockets
icmp: Dedicated functions for starting and closing ping sequences
checksum.c | 2 +-
icmp.c | 326 ++++++++++++++++++++++++++---------------------------
icmp.h | 5 +-
passt.c | 4 +-
4 files changed, 164 insertions(+), 173 deletions(-)
--
2.43.0
[View Less]
There are a number of things that are more-or-less general to flows
which are still explicitly handled in tcp.c and tcp_splice.c including
allocation and freeing of flow entries, and dispatch of deferred and
timer functions.
Even without adding more fields to the common flow structure, we can
handle a number of these in a more flow-centric way.
Unlike v1 this version is based on the hash table rework series.
Changes since v2:
* Realised the prealloc/commit functions where confusing and …
[View More]worked
poorly for some future stuff. Replaced with alloc/alloc_cancel
* Fixed a bug where newly allocated flow entries might not be
0-filled, because of the free tracking information in there. This
could cause very subtle problems.
Changes since v1:
* Store the timestamp of last flow timers run in a global, rather
than a ctx field
* Rebased on the TCP hash table rework
* Add patches 9..13/13 with changes to allocation and freeing of flow
entries.
David Gibson (13):
flow: Make flow_table.h #include the protocol specific headers it
needs
treewide: Standardise on 'now' for current timestamp variables
tcp, tcp_splice: Remove redundant handling from tcp_timer()
tcp, tcp_splice: Move per-type cleanup logic into per-type helpers
flow, tcp: Add flow-centric dispatch for deferred flow handling
flow, tcp: Add handling for per-flow timers
epoll: Better handling of number of epoll types
tcp, tcp_splice: Avoid double layered dispatch for connected TCP
sockets
flow: Move flow_log_() to near top of flow.c
flow: Move flow_count from context structure to a global
flow: Abstract allocation of new flows with helper function
flow: Enforce that freeing of closed flows must happen in deferred
handlers
flow: Avoid moving flow entries to compact table
flow.c | 223 ++++++++++++++++++++++++++++++++++++++++++---------
flow.h | 5 +-
flow_table.h | 20 +++++
icmp.c | 12 +--
icmp.h | 2 +-
log.c | 34 ++++----
passt.c | 20 +++--
passt.h | 9 +--
tcp.c | 143 +++++++++------------------------
tcp.h | 2 +-
tcp_conn.h | 8 +-
tcp_splice.c | 49 +++++------
tcp_splice.h | 4 +-
udp.c | 16 ++--
udp.h | 2 +-
15 files changed, 324 insertions(+), 225 deletions(-)
--
2.43.0
[View Less]
e5eefe77435a ("tcp: Refactor to use events instead of states, split out
spliced implementation") has exported tcp_sock_set_bufsize() to
be able to use it in tcp_splice.c, but 6ccab72d9b40 has removed its use
in tcp_splice.c, so we can set it static again.
Fixes: 6ccab72d9b40 ("tcp: Improve handling of fallback if socket pool is empty on new splice")
Cc: david(a)gibson.dropbear.id.au
Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>
---
tcp.c | 2 +-
tcp.h | 1 -
2 files changed, 1 …
[View More]insertion(+), 2 deletions(-)
diff --git a/tcp.c b/tcp.c
index f506cfdd3bc7..1680b516b5b9 100644
--- a/tcp.c
+++ b/tcp.c
@@ -929,7 +929,7 @@ static void tcp_get_sndbuf(struct tcp_tap_conn *conn)
* tcp_sock_set_bufsize() - Set SO_RCVBUF and SO_SNDBUF to maximum values
* @s: Socket, can be -1 to avoid check in the caller
*/
-void tcp_sock_set_bufsize(const struct ctx *c, int s)
+static void tcp_sock_set_bufsize(const struct ctx *c, int s)
{
int v = INT_MAX / 2; /* Kernel clamps and rounds, no need to check */
diff --git a/tcp.h b/tcp.h
index 27b11668f258..87a6bf9f0ee8 100644
--- a/tcp.h
+++ b/tcp.h
@@ -23,7 +23,6 @@ int tcp_init(struct ctx *c);
void tcp_timer(struct ctx *c, const struct timespec *ts);
void tcp_defer_handler(struct ctx *c);
-void tcp_sock_set_bufsize(const struct ctx *c, int s);
void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
/**
--
2.42.0
[View Less]
In most places where we need to get ICMP definitions, we get them from
<netinet/ip_icmp.h>. However in checksum.c we instead include
<linux/icmp.h>. Change it to use <netinet/ip_icmp.h> for consistency.
Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>
---
checksum.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/checksum.c b/checksum.c
index 03b8a7c..f21c9b7 100644
--- a/checksum.c
+++ b/checksum.c
@@ -49,11 +49,11 @@
#include <…
[View More]arpa/inet.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
+#include <netinet/ip_icmp.h>
#include <stddef.h>
#include <stdint.h>
#include <linux/udp.h>
-#include <linux/icmp.h>
#include <linux/icmpv6.h>
/* Checksums are optional for UDP over IPv4, so we usually just set
--
2.43.0
[View Less]
I now have an in-progress draft of a unified hash table to go with the
unified flow table. This turns out to be easier if we first make some
preliminary changes to the structure of the TCP hash table. So, here
are those.
Changes since v1:
* Use while loops instead of some equivalent, but hard to read for
loops for the hash probing.
* Switch from probing forwards through hash buckets to probing
backwards. This makes the code closer to the version in Knuth its
based on, and thus …
[View More]easier to see if we've made a mistake in
adaptation.
* Improve the helpers for modular arithmetic in use
* Correct an error where we had things exactly the wrong way around
when finding entries to move during removal.
* Add a patch fixing a conceptual / documentation problem in some
adjacent code
David Gibson (4):
tcp: Fix conceptually incorrect byte-order switch in tcp_tap_handler()
tcp: Switch hash table to linear probing instead of chaining
tcp: Implement hash table with indices rather than pointers
tcp: Don't account for hash table size in tcp_hash()
flow.h | 11 +++++
tcp.c | 143 ++++++++++++++++++++++++++++-------------------------
tcp_conn.h | 2 -
util.h | 28 +++++++++++
4 files changed, 114 insertions(+), 70 deletions(-)
--
2.43.0
[View Less]
Reported-by: Yalan Zhang <yalzhang(a)redhat.com>
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
v2: Fix botched paragraph
README.md | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/README.md b/README.md
index c360037..916260e 100644
--- a/README.md
+++ b/README.md
@@ -406,13 +406,13 @@ upstream interface of the host, and the same default gateway as the default
gateway of the host. Addresses are translated in case the guest is seen using …
[View More]a
different address from the assigned one.
-For IPv6, the guest or namespace is assigned, via SLAAC, the same prefix as the
-upstream interface of the host, the same default route as the default route of
-the host, and, if a DHCPv6 client is running in the guest or namespace, also the
-same address as the upstream address of the host. This means that, with a DHCPv6
-client in the guest or namespace, addresses don't need to be translated. Should
-the client use a different address, the destination address is translated for
-packets going to the guest or to the namespace.
+For IPv6, the guest or namespace is assigned, via SLAAC, a prefix derived from
+the address of the upstream interface of the host, the same default route as the
+default route of the host, and, if a DHCPv6 client is running in the guest or
+namespace, also the same address as the upstream address of the host. This means
+that, with a DHCPv6 client in the guest or namespace, addresses don't need to be
+translated. Should the client use a different address, the destination address
+is translated for packets going to the guest or to the namespace.
### Local connections with _passt_
--
2.39.2
[View Less]