[LARTC] netfilter resets TCP conversation that was DNATed from the local machine to another

Michael freeware@adsl-209-204-165-151.sonic.net
Wed, 02 Jul 2003 14:36:36 -0700


The netfilter list had no answer for this.

I have a configuration:

/------------\ .0.{8,9}     .{0,1}.1 /----------\ 1.2.3.{4,5,6}  
(          )
| Web server |---------+-------------| firewall |---------------(  
Internet  )
\------------/         |        eth0 |  Squid   | eth1           
(          )
                       |             \----------/
/---------\ .1.2       |
| browser |------------/
\---------/

The firewall is running Squid to proxy for 192.168.1. clients, and it 
works fine *except* when the target server resolves to a public IP on 
eth1.  When that happens, I see the client-to-Squid communication go OK, 
then Squid send a SYN (from .0.1) to .0.8:80, .0.8 sends a SYN ACK,... 
but then netfilter spontaneously issues a RST to .0.8:80 from another 
port (i.e., not the one that Squid was using)!  I have no 
reject-with-tcp-reset lines in my tables.

Let's watch: I stop and start Squid. I access my domain, xxx.org, from 
my browser. On eth0, I see

10:52:42.684703 192.168.1.2.4358 > 192.168.1.1.squid: S 
3347635646:3347635646(0) win 16060 <mss 1460,sackOK,timestamp 54041369 
0,nop,wscale 0> (DF)
10:52:42.685601 192.168.1.1.squid > 192.168.1.2.4358: S 
3951034191:3951034191(0) ack 3347635647 win 5792 <mss 
1460,sackOK,timestamp 77755654 54041369,nop,wscale 0> (DF)
10:52:42.685952 192.168.1.2.4358 > 192.168.1.1.squid: . ack 1 win 16060 
<nop,nop,timestamp 54041369 77755654> (DF)
10:52:42.686482 192.168.1.2.4358 > 192.168.1.1.squid: P 1:432(431) ack 1 
win 16060 <nop,nop,timestamp 54041369 77755654> (DF)
10:52:42.686801 192.168.1.1.squid > 192.168.1.2.4358: . ack 432 win 6432 
<nop,nop,timestamp 77755655 54041369> (DF)

That's my browser querying Squid. On eth1, I see

10:52:42.690711 1.2.3.4.32804 > myisp.domain: 2+ A? xxx.org. [|domain] (DF)
10:52:42.691228 1.2.3.4.32804 > myisp.domain: 3+ A? xxx.org. [|domain] (DF)
10:52:42.710447 myisp.domain > 1.2.3.4.32804: 2* 1/2/2 xxx.org. A 
1.2.3.5 (119) (DF)
10:52:42.715838 myisp.domain > 1.2.3.4.32804: 3* 1/2/2 xxx.org. A 
1.2.3.5 (119) (DF)

That's Squid looking up my domain. (Why twice? I don't know.) The OUTPUT 
chain in the nat table is

Chain OUTPUT (policy ACCEPT)
target prot opt in out source     destination
DNAT   tcp  --  *  *   0.0.0.0/0  1.2.3.5     multiport dports 80,443 
to:192.168.0.8
DNAT   tcp  --  *  *   0.0.0.0/0  1.2.3.6     multiport dports 80,443 
to:192.168.0.9

(The PREROUTING chain is identical, to handle requests coming from the 
Internet.) Then on eth0, I see

10:52:42.719385 192.168.0.1.36065 > 192.168.0.8.www: S 
3950921369:3950921369(0) win 32767 <mss 16396,sackOK,timestamp 77755658 
0,nop,wscale 0> (DF)
10:52:42.719797 192.168.0.8.www > 192.168.0.1.36065: S 
3348203817:3348203817(0) ack 3950921370 win 5792 <mss 
1460,sackOK,timestamp 30595894 77755658,nop,wscale 0> (DF)
10:52:42.720206 192.168.0.1.1028 > 192.168.0.8.www: R 
3950921370:3950921370(0) win 0 (DF)
10:52:45.716310 192.168.0.1.36065 > 192.168.0.8.www: S 
3950921369:3950921369(0) win 32767 <mss 16396,sackOK,timestamp 77755958 
0,nop,wscale 0> (DF)
10:52:45.716595 192.168.0.8.www > 192.168.0.1.36065: S 
3348203817:3348203817(0) ack 3950921370 win 5792 <mss 
1460,sackOK,timestamp 30596194 77755658,nop,wscale 0> (DF)
10:52:45.716974 192.168.0.1.1028 > 192.168.0.8.www: R 
3950921370:3950921370(0) win 0 (DF)
10:52:46.910244 192.168.0.8.www > 192.168.0.1.36065: S 
3348203817:3348203817(0) ack 3950921370 win 5792 <mss 
1460,sackOK,timestamp 30596314 77755658,nop,wscale 0> (DF)
10:52:46.910653 192.168.0.1.1028 > 192.168.0.8.www: R 
3950921370:3950921370(0) win 0 (DF)

and this pattern repeats, with Squid resending its SYN on port 36065, 
and the server asynchronously resending its SYN-ACK, and netfilter (or 
something) sending a RST for every SYN-ACK and thinking it's handled the 
incoming packet.

Eventually Squid changes to port 36066, with the same effect, then sends 
its error page back to the browser. That's all the traffic that I see, 
excepting arp, igmp, my own ssh, and an external www request that 
happened during the long wait.

Ideas as to why netfilter thinks it's handling the SYN-ACK with the RST?

The netfilter thread starts at 
http://lists.netfilter.org/pipermail/netfilter/2003-June/045141.html