[LARTC] neighbor table overflow

Marco C. Coelho maillist1 at argontech.net
Fri Dec 7 18:17:09 CET 2007


Ok, I hope this helps someone else out there when they google neighbor 
table overflow solution linux kernel:

This is just an update to state that since gc_thresh1 was increased to a 
number greater than the number of simultaneous connected PPPoE clients 
on this box, it has not given me the neighbor table problem.

So set gc_thresh1 greater than the number of local connections you get with:

ip route | grep link | wc -l

So in /etc/sysctl.conf add something like:

# Added to stop "neighbor table overflow" messages in the kernel
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096
# Added to increase IP contrack number (was getting to max)
net.ipv4.ip_conntrack_max=99999


Have a Merry Christmas!

Marco Coelho
Argon Technologies Inc.
www.argontech.net



Marco C. Coelho wrote:
> Still beating the same bush!
>
> I've done all the possible suggestions so far.  I still was getting a 
> neighbor table overflow.
> Looking at the MAN 7 ARP pages, I see:
>
>        gc_thresh1
>               The minimum number of entries to keep in the ARP cache.  
> The garbage collector will not run if there are
>               fewer than this number of entries in the cache.  
> Defaults to 128.
>
>        gc_thresh2
>               The soft maximum number of entries to keep in the ARP 
> cache.  The garbage collector will allow the  num-
>               ber of entries to exceed this for 5 seconds before 
> collection will be performed.  Defaults to 512.
>
>        gc_thresh3
>               The  hard  maximum number of entries to keep in the ARP 
> cache.  The garbage collector will always run if
>               there are more than this number of entries in the 
> cache.  Defaults to 1024.
>
> Since this box never gets less than 500 pppoe connections, this Sat I 
> changed
>                           WAS     NOW  
> gc_thresh1      512         1024
> gc_thresh2     2048        2048
> gc_thresh3     4096        4096
>    
> what's strange is when I do an 'arp -an' I only get three entries 
> back. (ips changed to protect the guilty).  Shouldn't this show the 
> arp entries
>
> ? (x.202.x.3) at 00:03:47:2D:8B:F9 [ether] on eth0
> ? (x.202.x.1) at 00:03:E3:88:EC:C2 [ether] on eth0
> ? (x.202.x.2) at 00:18:8B:76:EC:D8 [ether] on eth0
> ? (x.202.x.9) at 00:90:27:43:C2:CF [ether] on eth0
>
> ip route | grep link provides:
>
> snip (lots of pppoe connects)
> x.202.x.237 dev ppp53  proto kernel  scope link  src 10.20.1.1
> x.202.x.235 dev ppp339  proto kernel  scope link  src 10.20.1.1
> x.202.x.232 dev ppp185  proto kernel  scope link  src 10.20.1.1
> x.202.x.231 dev ppp313  proto kernel  scope link  src 10.20.1.1
> x.202.x.230 dev ppp67  proto kernel  scope link  src 10.20.1.1
> x.202.x.226 dev ppp74  proto kernel  scope link  src 10.20.1.1
> x.202.x.224 dev ppp150  proto kernel  scope link  src 10.20.1.1
> x.202.x.0/24 dev eth0  proto kernel  scope link  src x.202.224.8
> 192.168.1.0/24 dev eth3  proto kernel  scope link  src 192.168.1.8
>
> I don't think we are doing anything too special with this box that we 
> would see a kernel issue no one else is seeing.  Can arp poisoning 
> cause this?
>
> a dmesg after a clean reboot only gives:
>
> Shorewall:all2all:REJECT:IN=ppp413 OUT= MAC= SRC=x.202.x.165 
> DST=10.20.1.1 LEN=60 TOS=0x00 PREC=0x00 TTL=254 ID=39752 PROTO=ICMP 
> TYPE=8 CODE=0 ID=25040 SEQ=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=72 TOS=0x00 PREC=0x00 TTL=126 ID=48363 PROTO=UDP 
> SPT=427 DPT=427 LEN=52
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48492 DF 
> PROTO=TCP SPT=36005 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48493 DF 
> PROTO=TCP SPT=36005 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48517 DF 
> PROTO=TCP SPT=36005 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48518 DF 
> PROTO=TCP SPT=33969 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=72 TOS=0x00 PREC=0x00 TTL=126 ID=48519 PROTO=UDP 
> SPT=427 DPT=427 LEN=52
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48522 DF 
> PROTO=TCP SPT=33969 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48526 DF 
> PROTO=TCP SPT=33969 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48614 DF 
> PROTO=TCP SPT=35790 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48630 DF 
> PROTO=TCP SPT=35790 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48x6 DF PROTO=TCP 
> SPT=35790 DPT=9220 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48x8 DF PROTO=TCP 
> SPT=34718 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48663 DF 
> PROTO=TCP SPT=34718 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.202.x.110 
> DST=192.168.1.7 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=48679 DF 
> PROTO=TCP SPT=34718 DPT=16398 WINDOW=16384 RES=0x00 SYN URGP=0
> Shorewall:all2all:REJECT:IN=ppp160 OUT=eth3 SRC=x.y.x.110 
> DST=192.168.1.7 LEN=72 TOS=0x00 PREC=0x00 TTL=126 ID=48724 PROTO=UDP 
> SPT=427 DPT=427 LEN=52
>
> Kernel Version 2.6.18-8.1.6
>
>
> Looking for any suggestions.
>
> Marco
>
>
>
>
>
> Andrei Kovacs wrote:
>> On 10/25/07, Marco C. Coelho <maillist1 at argontech.net> wrote:
>>   
>>>  Looking into it further an ip route shows:
>>>
>>>  x.0.0.0/8 via x.y.224.1 dev eth0  proto zebra  metric 20 equalize
>>>
>>>  So the x.0.0.0 announce is coming into this box through OSPF  (zebra)
>>>
>>>  The 169.254.0.0/16 is being automajically added through the sysconfig
>>> network scripts.  I'm looking into why.
>>>
>>>     
>>
>> Add "NOZEROCONF=yes" in /etc/sysconfig/network and the 169.254.0.0/16
>> network won't be created anymore.
>>
>>   
>>>  In either case I still don't see why these entries would make the neighbor
>>> table overflow.  Could it have been the previous fix to the hosts file?
>>>
>>>  mc
>>>
>>>  Alexandru Dragoi wrote:
>>>  Marco C. Coelho wrote:
>>>
>>>
>>>  the ip route with a grep for link returns:
>>>
>>> snip** too long
>>> x.y.x.198 dev ppp436 proto kernel scope link src 10.20.1.1
>>> x.y.x.196 dev ppp421 proto kernel scope link src 10.20.1.1
>>> x.y.x.197 dev ppp211 proto kernel scope link src 10.20.0.1
>>> x.y.x.194 dev ppp13 proto kernel scope link src 10.20.1.1
>>> x.y.x.192 dev ppp404 proto kernel scope link src 10.20.1.1
>>> x.y.x.254 dev ppp194 proto kernel scope link src 10.20.1.1
>>> x.y.x.253 dev ppp130 proto kernel scope link src 10.20.1.1
>>> x.y.x.252 dev ppp243 proto kernel scope link src 10.20.1.1
>>> x.y.x.249 dev ppp195 proto kernel scope link src 10.20.1.1
>>> x.y.x.248 dev ppp254 proto kernel scope link src 10.20.1.1
>>> x.y.x.247 dev ppp235 proto kernel scope link src 10.20.1.1
>>> x.y.x.242 dev ppp78 proto kernel scope link src 10.20.1.1
>>> x.y.x.240 dev ppp328 proto kernel scope link src 10.20.1.1
>>> x.y.x.237 dev ppp44 proto kernel scope link src 10.20.1.1
>>> x.y.x.236 dev ppp122 proto kernel scope link src 10.20.1.1
>>> x.y.x.234 dev ppp316 proto kernel scope link src 10.20.1.1
>>> x.y.x.232 dev ppp132 proto kernel scope link src 10.20.1.1
>>> x.y.x.231 dev ppp104 proto kernel scope link src 10.20.0.1
>>> x.y.x.226 dev ppp179 proto kernel scope link src 10.20.0.1
>>> x.y.224.0/24 dev eth0 proto kernel scope link src x.y.224.8
>>> 192.168.1.0/24 dev eth3 proto kernel scope link src 192.168.1.8
>>> 169.254.0.0/16 dev eth3 scope link
>>>
>>>  The one above must be deleted, many redhat-like distros attach
>>> 169.254.0.0/16.
>>>
>>>
>>>  All the pppoe terminations (pppd) are shown, as well as the last three
>>> subnets. I'll have to see where the 169.254.0.0/16 is coming from?
>>>
>>> mc
>>>
>>>
>>>
>>>
>>> Alexandru Dragoi wrote:
>>>
>>>
>>>  Marco C. Coelho wrote:
>>>
>>>
>>>
>>>  This box is doing a lot. It terminates 1000 PPPoE connections,
>>> provides traffic shaping using TC/HTB, authenticates all users via
>>> Radius. It also runs OSPF routing for the internal network. Looking
>>> at a simple route output I see all the PPP connections coming through
>>> the box, and due to the OSPF I also see the rest of my network
>>> announcements. The only strange things are:
>>>
>>> 1. The last man working on this box had mistakenly edited the hosts
>>> file and added the machine name and complete domain name to the local
>>> host 127.0.0.1 name. It should only be pointed to the eth0
>>> interface. I have changed this.
>>>
>>> 2. The route output is making an announcement
>>>
>>>  x.0.0.0 argontech.net 255.0.0.0 UG 20
>>> 0 0 eth0
>>>
>>>
>>>  This doesn't look dangerous for your problem, I was only talking about
>>> directly connected networks:
>>>
>>> # ip route |grep link
>>>
>>>
>>>
>>>
>>>  My public IP space is a /20 within that space, not the whole Class A.
>>> I have not found which box is announcing this within my network yet.
>>>
>>>
>>>
>>>
>>>
>>> Jeff Welling wrote:
>>>
>>>
>>>
>>>
>>>  On 10/23/07 06:56, Alexandru Dragoi wrote:
>>>
>>>
>>>
>>>  What about checking your routing table? you may have link routes
>>> for massive subnets (like 85.0.0.0/8 or 140.20.0.0/16). Some
>>> programs prefer to use "standard" netmask of classes A and B.
>>>
>>>
>>>  I'm betting that the OP has other things going on seeing has how
>>> s/he mentioned PPPoE, which to my knowledge is a layer 2 protocol,
>>> and thus not subject to typical routing scenarios. In essence the
>>> OP could have thousands of PPPoE connections terminating on one
>>> system with the ARP cache having to deal with where to send traffic
>>> to which MAC address. There is not a lot of room for routing in such
>>> a scenario.
>>>
>>>
>>>
>>>  I agree with Peter's suggestion, arpd. I ran into the neighbor table
>>> overflow problem recently, at the hands of our ISP. I was in the
>>> process of recompiling the kernel and mucking with arpd (I couldn't
>>> get it to run/start properly) when the problem disappeared as quickly
>>> as it showed up. Lucky for me, this was some kind of ISP problem, I
>>> was able to determine that much through `tcpdump -i X -n arpd`.
>>>
>>> My 'two cents' is that you try arpd, I did a bit of looking when I
>>> came across that problem and it seemed to be the last ditch effort
>>> when changing the gc threshold had no effect. Wasn't able to confirm
>>> that it worked for sure though.
>>>
>>> Cheers.
>>> _______________________________________________
>>> LARTC mailing list
>>> LARTC at mailman.ds9a.nl
>>> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
>>>
>>>
>>>
>>>  _______________________________________________
>>> LARTC mailing list
>>> LARTC at mailman.ds9a.nl
>>> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> LARTC mailing list
>>> LARTC at mailman.ds9a.nl
>>> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LARTC mailing list
>>> LARTC at mailman.ds9a.nl
>>> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
>>>
>>>
>>>     
>>
>>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> LARTC mailing list
> LARTC at mailman.ds9a.nl
> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ds9a.nl/pipermail/lartc/attachments/20071207/83eacf61/attachment-0001.html


More information about the LARTC mailing list