OSDN Git Service
Eric Dumazet [Sun, 22 Apr 2012 23:38:54 +0000 (23:38 +0000)]
tcp: sk_add_backlog() is too agressive for TCP
While investigating TCP performance problems on 10Gb+ links, we found a
tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit
in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends
one ACK every two MSS segments.
A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default
value (87380), allowing a too small backlog.
A TCP ACK, even being small, can consume nearly same truesize space than
outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and
is fast to compute.
Performance results on netperf, single flow, receiver with disabled
GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop
increments at sender.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Apr 2012 23:34:26 +0000 (23:34 +0000)]
net: add a limit parameter to sk_add_backlog()
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.
No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Mon, 23 Apr 2012 14:25:49 +0000 (14:25 +0000)]
net ax25: Fix the build when sysctl support is disabled.
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since
20120420:
>
>
> include/net/ax25.h:447:75: error: expected ';' before '}' token
>
> static inline int ax25_register_dev_sysctl(ax25_dev *ax25_dev) { return 0 };
> static inline void ax25_unregister_dev_sysctl(ax25_dev *ax25_dev) {};
>
> First function: move ';' inside braces.
> Second function: drop the ';'.
Put the semicolons where it makes sense.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Mon, 23 Apr 2012 12:13:02 +0000 (12:13 +0000)]
net sysctl: Add place holder functions for when sysctl support is compiled out of the kernel.
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since
20120420:
>
>
>
> ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
> ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
>
> when CONFIG_SYSCTL is not enabled.
Add static inline stub functions to gracefully handle the case when sysctl
support is not present.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Sat, 21 Apr 2012 18:53:22 +0000 (18:53 +0000)]
be2net: fix ethtool get settings
ethtool get settings was not displaying all the settings correctly.
use the get_phy_info to get more information about the PHY to fix this.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 23 Apr 2012 07:21:58 +0000 (03:21 -0400)]
tcp: Fix build warning after tcp_{v4,v6}_init_sock consolidation.
net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock':
net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable]
net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock':
net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable]
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wu Jiajun-B06378 [Thu, 19 Apr 2012 22:54:35 +0000 (22:54 +0000)]
gianfar: add GRO support
Replace netif_receive_skb with napi_gro_receive.
Signed-off-by: Jiajun Wu <b06378@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 21:56:11 +0000 (21:56 +0000)]
af_packet: packet_getsockopt() cleanup
Factorize code, since most fetched values are int type.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 19 Apr 2012 09:55:21 +0000 (09:55 +0000)]
tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()
This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().
Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).
When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 09:38:17 +0000 (09:38 +0000)]
net: allow better page reuse in splice(sock -> pipe)
splice() from socket to pipe needs linear_to_page() helper to transfert
skb header to part of page.
We can reset the offset in the current sk->sk_sndmsg_page if we are the
last user of the page.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 20 Apr 2012 04:42:06 +0000 (04:42 +0000)]
team: add per-port option for enabling/disabling ports
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 20 Apr 2012 04:42:05 +0000 (04:42 +0000)]
team: allow to enable/disable ports
This patch changes content of hashlist (used to get port struct by
computed index (0...en_port_count-1)). Now the hash list contains only
enabled ports so userspace will be able to say what ports can be used
for tx/rx. This becomes handy when userspace will need to disable ports
which does not belong to active aggregator. By default, newly added port
is enabled.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 20 Apr 2012 04:42:04 +0000 (04:42 +0000)]
team: lb: let userspace care about port macs
Better to leave this for userspace
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Apr 2012 18:04:01 +0000 (20:04 +0200)]
net: change big iov allocations
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through
sock_kmalloc()
As these allocations are temporary only and small enough, it makes sense
to use plain kmalloc() and avoid sk_omem_alloc atomic overhead.
Slightly changed fast path to be even faster.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:41:57 +0000 (03:41 +0000)]
tcp: Repair connection-time negotiated parameters
There are options, which are set up on a socket while performing
TCP handshake. Need to resurrect them on a socket while repairing.
A new sockoption accepts a buffer and parses it. The buffer should
be CODE:VALUE sequence of bytes, where CODE is standard option
code and VALUE is the respective value.
Only 4 options should be handled on repaired socket.
To read 3 out of 4 of these options the TCP_INFO sockoption can be
used. An ability to get the last one (the mss_clamp) was added by
the previous patch.
Now the restore. Three of these options -- timestamp_ok, mss_clamp
and snd_wscale -- are just restored on a coket.
The sack_ok flags has 2 issues. First, whether or not to do sacks
at all. This flag is just read and set back. No other sack info is
saved or restored, since according to the standart and the code
dropping all sack-ed segments is OK, the sender will resubmit them
again, so after the repair we will probably experience a pause in
connection. Next, the fack bit. It's just set back on a socket if
the respective sysctl is set. No collected stats about packets flow
is preserved. As far as I see (plz, correct me if I'm wrong) the
fack-based congestion algorithm survives dropping all of the stats
and repairs itself eventually, probably losing the performance for
that period.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:41:32 +0000 (03:41 +0000)]
tcp: Report mss_clamp with TCP_MAXSEG option in repair mode
The mss_clamp is the only connection-time negotiated option which
cannot be obtained from the user space. Make the TCP_MAXSEG sockopt
report one in the repair mode.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:41:01 +0000 (03:41 +0000)]
tcp: Repair socket queues
Reading queues under repair mode is done with recvmsg call.
The queue-under-repair set by TCP_REPAIR_QUEUE option is used
to determine which queue should be read. Thus both send and
receive queue can be read with this.
Caller must pass the MSG_PEEK flag.
Writing to queues is done with sendmsg call and yet again --
the repair-queue option can be used to push data into the
receive queue.
When putting an skb into receive queue a zero tcp header is
appented to its head to address the tcp_hdr(skb)->syn and
the ->fin checks by the (after repair) tcp_recvmsg. These
flags flags are both set to zero and that's why.
The fin cannot be met in the queue while reading the source
socket, since the repair only works for closed/established
sockets and queueing fin packet always changes its state.
The syn in the queue denotes that the respective skb's seq
is "off-by-one" as compared to the actual payload lenght. Thus,
at the rcv queue refill we can just drop this flag and set the
skb's sequences to precice values.
When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent,
waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the
protocol POV the send queue looks like it was sent, but the data
between the write_seq and snd_nxt is lost in the network.
This helps to avoid another sockoption for setting the snd_nxt
sequence. Leaving the whole queue in a 'not yet sent' state (as
it will be after sendmsg-s) will not allow to receive any acks
from the peer since the ack_seq will be after the snd_nxt. Thus
even the ack for the window probe will be dropped and the
connection will be 'locked' with the zero peer window.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:40:39 +0000 (03:40 +0000)]
tcp: Initial repair mode
This includes (according the the previous description):
* TCP_REPAIR sockoption
This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.
* TCP_REPAIR_QUEUE sockoption
This one sets the queue which we're about to repair. The
'no-queue' is set by default.
* TCP_QUEUE_SEQ socoption
Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).
* Ability to forcibly bind a socket to a port
The sk->sk_reuse is set to SK_FORCE_REUSE.
* Immediate connect modification
The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.
* Silent close modification
The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:40:01 +0000 (03:40 +0000)]
tcp: Move code around
This is just the preparation patch, which makes the needed for
TCP repair code ready for use.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 19 Apr 2012 03:39:36 +0000 (03:39 +0000)]
sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:16 +0000 (10:56 +0000)]
drivers/net: decouple ISA and ISA_DMA_API
The two options are separate, and some platforms (e.g. arm pxa)
have ISA slots but no ISA dma controller, so they cannot build
drivers using the DMA API functions.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:15 +0000 (10:56 +0000)]
sungem: use mdelay instead of udelay where necessary
Some architectures like ARM cannot handle large numbers as
arguments to udelay, so the drivers should use mdelay when
delaying for multiple miliseconds.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:14 +0000 (10:56 +0000)]
donauboe: replace excessive udelay with msleep
No driver should spin the CPU for 10ms, so better use
an msleep, which is allowed in the ->suspend function.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:13 +0000 (10:56 +0000)]
8390: select CRC32 support
The ax88796 driver uses the CRC32 functions, so make sure that
they are actually enabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:12 +0000 (10:56 +0000)]
drivers/net: iwmc3200 depends on EXPERIMENTAL
The iwmc3200 driver selects other code in Kconfig that depends on
EXPERIMENTAL. Kconfig warns about this when CONFIG_EXPERIMENTAL
is not already set, so logically, these options should also
be marked experimental or promoted to stable.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:11 +0000 (10:56 +0000)]
caif: include linux/io.h
The caif_shmcore requires io.h in order to use ioremap, so include that
explicitly to compile in all configurations.
Also add a note about the use of ioremap(), which is not a proper way
to map a DMA buffer into kernel space. It's not completely clear what
the intention is for using ioremap, but it is clear that the result
of ioremap must not simply be accessed using kernel pointers but
should use readl/writel or memcopy_{to,from}io. Assigning the result
of ioremap to a regular pointer that can also be set to something
else is not ok.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:10 +0000 (10:56 +0000)]
drivers/net: add missing __devexit_p() annotations
Drivers that refer to a __devexit function in an operations
structure need to annotate that pointer with __devexit_p so
replace it with a NULL pointer when the section gets discarded.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 20 Apr 2012 10:56:09 +0000 (10:56 +0000)]
davinci_cpdma: export symbols used by other drivers
The davinci_emac driver can be a module, so the symbols
it needs from the cpdma driver must be exported.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Fri, 20 Apr 2012 18:50:36 +0000 (18:50 +0000)]
pch_gbe: remove suspicious comment
The time stamping code in this driver appears to have been copied from
the ixp4xx_eth.c driver, including this timing comment. I had actually
measured the time stamp delay on an IXP425, but I really doubt that this
value also applies here.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Fri, 20 Apr 2012 18:50:35 +0000 (18:50 +0000)]
pch_gbe: run the ptp bpf just once per packet
This patch fixes code which needlessly ran the BPF twice per
packet. Instead, we just run the classifier once and test
whether the packet is any kind of PTP event message.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:34 +0000 (18:50 +0000)]
pch_gbe: correct receive time stamp filtering
This patch fixes the driver so that multicast PTP event messages can
be recognized by the hardware time stamping unit. The station address
register must be set according to the desired transport type.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:33 +0000 (18:50 +0000)]
pch_gbe: do not set the channel control register
We will let the pch_gbe code do that according to the receive time stamp
filter.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:32 +0000 (18:50 +0000)]
pch_gbe: improve coding style
This patch clears up a few coding style issues:
- Makes two function definitions a bit nicer looking.
- Remove unneeded parentheses.
- Simplify macros for register bits.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:31 +0000 (18:50 +0000)]
pch_gbe: export a method to set the receive match address
The code in phc_gbe_main will need to call this method in order to set the
station address register according to the receive time stamping filter.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:30 +0000 (18:50 +0000)]
pch_gbe: reprogram multicast address register on reset
The reset logic after a Rx FIFO overrun will clear the programmed
multicast addresses. This patch fixes the issue by reprogramming the
registers after the reset.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:29 +0000 (18:50 +0000)]
pch_gbe: simplify transmit time stamping flag test
This patch makes logic surrounding the test of the
transmit time stamping flag more readable.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takahiro Shimizu [Fri, 20 Apr 2012 18:50:28 +0000 (18:50 +0000)]
pch_gbe: scale time stamps to nanoseconds
This patch fixes the helper functions that give the transmit and
receive time stamps to return nanoseconds, instead of arbitrary clock
ticks.
[ RC - Rebased Takahiro's changes and wrote a commit message
explaining the changes. ]
Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:46:06 +0000 (13:46 +0000)]
net: Remove register_net_sysctl_table
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:45:29 +0000 (13:45 +0000)]
net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:44:49 +0000 (13:44 +0000)]
net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.
Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:43:55 +0000 (13:43 +0000)]
net: Convert nf_conntrack_proto to use register_net_sysctl
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:42:09 +0000 (13:42 +0000)]
net ipv4: Convert devinet to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.
We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:41:24 +0000 (13:41 +0000)]
net ipv6: Convert addrconf to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.
We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:40:37 +0000 (13:40 +0000)]
net decnet: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:38:03 +0000 (13:38 +0000)]
net neighbour: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.
We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:37:09 +0000 (13:37 +0000)]
net ipv6: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables
with .child entries.
Split the ipv6_table to remove the .child entries.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:35:39 +0000 (13:35 +0000)]
net llc: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables with .child
entries.
Kill the intermediate tables and use register_net_sysctl directly to
remove the need for compatibility code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:34:18 +0000 (13:34 +0000)]
net ax25: Simplify and cleanup the ax25 sysctl handling.
Don't register/unregister every ax25 table in a batch. Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.
This moves ax25 to be a completely modern sysctl user. Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:32:39 +0000 (13:32 +0000)]
net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh
sysctl no longer requires explicit creation of directories. The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.
Delete the ipv4_path and the ipv4_skeleton these are no longer needed.
Directly register the ipv4_route_table.
And since I am an idiot remove the header definitions that I should
have removed in the previous patch.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:26:19 +0000 (13:26 +0000)]
net ipv6: Remove unneded registration of an empty net/ipv6/neigh
sysctl no longer requires explicit creation of directories. The neigh
directory is always populated with at least a default entry so this
should cause no user visible changes.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:25:13 +0000 (13:25 +0000)]
net core: Remove unneded creation of an empty net/core sysctl directory
On the next line we register the net_core_table in net/core which
creates the directory and ensures it exists.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:24:33 +0000 (13:24 +0000)]
net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.
This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.
This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:22:55 +0000 (13:22 +0000)]
net: Kill register_sysctl_rotable
register_sysctl_rotable never caught on as an interesting way to
register sysctls. My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace. What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.
That is a very silly way to go. Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.
The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf
I really don't expect anyone will miss them if they can't read them in a
child user namespace.
CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:20:32 +0000 (13:20 +0000)]
net sysctl: Initialize the network sysctls sooner to avoid problems.
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough. So to avoid races call net_sysctl_init from sock_init().
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:19:46 +0000 (13:19 +0000)]
net sysctl: Register an empty /proc/sys/net
Implementation limitations of the sysctl core won't let /proc/sys/net
reside in a network namespace. /proc/sys/net at least must be registered
as a normal sysctl. So register /proc/sys/net early as an empty directory
to guarantee we don't violate this constraint and hit bugs in the sysctl
implementation.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 19 Apr 2012 13:18:47 +0000 (13:18 +0000)]
net: Implement register_net_sysctl.
Right now all of the networking sysctl registrations are running in a
compatibiity mode. The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table. Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.
Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.
I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built. Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Apr 2012 00:40:31 +0000 (20:40 -0400)]
Merge branch 'tipc_net-next' of git://git./linux/kernel/git/paulg/linux
Huang, Xiong [Wed, 18 Apr 2012 22:01:31 +0000 (22:01 +0000)]
atl1c: remove MDIO_REG_ADDR_MASK in atl1c_mdio_read/write
MDIO_REG_ADDR_MASK is already applied in function
atl1c_write_phy_reg and atl1c_read_phy_reg
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:30 +0000 (22:01 +0000)]
atl1c: fix WoL(magic) issue for l2cb 1.1
l2cb 1.1 hardware has a bug for magic wakeup,
the workaround is to add pattern enable.
WoL related registers are refined as well.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:29 +0000 (22:01 +0000)]
atl1c: refine atl1c_pcie_patch
bit PCIE_PHYMISC_FORCE_RCV_DET is only for l1c&l2c to fix WoL issue,
other chips set bit5 of REG_MASTER_CTRL --- this way could save more
power than the former, and the bit should be kept all time.
l2cb 1.x has special setting for L0S/L1
l2cb 1.x & l1d 1.x should clear Vendor Message on some platforms,
otherwise it will cause the root complex hang.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:28 +0000 (22:01 +0000)]
atl1c: refine/update ASPM configuration
some platforms(BIOS or OS) may change ASPM configuration in
PCI Express Link Control Register directly and dynamically
regardless the device driver installation.
Checking if ASPM support during the driver init phase by reading
PCI Express Link Contrl Register doesn't make sense.
This refine/update assume L0S/L1 is defalut enabled as hw->ctrl_flags
inited. atl1c_set_aspm will set real configuration based on chip
capability to hardware register.
atl1c_disable_l0s_l1 and register definition of REG_PM_CTRL are
refined as well.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:27 +0000 (22:01 +0000)]
atl1c: clear bit MASTER_CTRL_CLK_SEL_DIS in atl1c_pcie_patch
bit MASTER_CTRL_CLK_SEL_DIS could be set before enter suspend
clear it after resume to enable pclk(PCIE clock) switch to
low frequency(25M) in some circumstances to save power.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:26 +0000 (22:01 +0000)]
atl1c: refine reg definition of REG_MASTER_CTRL
refine/update register REG_MASTER_CTRL definition according with
hardware spec.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:25 +0000 (22:01 +0000)]
atl1c: clear PCIE error status in atl1c_reset_pcie
clear PCIE error status (error log is write-1-clear).
REG_PCIE_UC_SEVERITY is removed as it's a standard pcie register,
and using kernle API to access it.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:24 +0000 (22:01 +0000)]
atl1c: remove dmar_dly_cnt and dmaw_dly_cnt
dmar_dly_cnt and dmaw_dly_cnt aren't used by hardware/driver any more.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:23 +0000 (22:01 +0000)]
atl1c: update right threshold for TSO
atl1c_configure_tx used a wrong value of MAX_TX_OFFLOAD_THRESH(9KB)
for TSO threshold.
the right value should be 7KB
Fast Ethernet controller doesn't support Jumbo frame.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:22 +0000 (22:01 +0000)]
atl1c: add module parameter for l1c_wait_until_idle
l1c_wait_until_idle is called for serval modules (TXQ/RXQ/TXMAC/RXMAC).
specific moudle have specific idle/busy status in reg REG_IDLE_STATUS.
the previous code return wrongly if all modules are in idle status,
regardless the 'stop' action is applied on individual module.
Refine the reg REG_IDLE_STATUS definition as well.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Wed, 18 Apr 2012 22:01:21 +0000 (22:01 +0000)]
atl1c: threshold for ASPM is changed based on chip capability
threshold setting to control ASPM for diff chips are different.
currently, all gigabit-capability chips have limited-ASPM under
100M throughput.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Wed, 18 Apr 2012 19:48:22 +0000 (19:48 +0000)]
stmmac: do not fail when probe and there is no csr clk defined
On some platforms, for example where we are doing the bring-up,
the csr clock is not passed from the framework and the Ethernet
device driver is failing when it can work w/o any issues and
using the default values. So this patch just warnings the case
of the csr clock cannot be acquired but w/o failing the probe
step. I have just tested it on ST STiH415 SoC (ARM).
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Wed, 18 Apr 2012 19:48:21 +0000 (19:48 +0000)]
stmmac: verify the dma_cfg platform fields
Recently the dma parameters that can be passed from the platform
have been moved from the plat_stmmacenet_data to the stmmac_dma_cfg.
In case of this new structure is not well allocated the driver can
fails. This is an example how this field is managed in ST platforms
static struct stmmac_dma_cfg gmac_dma_setting = {
.pbl = 32,
};
static struct plat_stmmacenet_data stih415_ethernet_platform_data[] = {
{
.dma_cfg = &gmac_dma_setting,
.has_gmac = 1,
[snip]
This patch so verifies that the dma_cfg passed from the platform.
In case of it is NULL there is no reason that the driver has to fail
and some default values can be passed. These are ok for all the
Synopsys chips and could impact on performances, only.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
cc: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Virlinzi [Wed, 18 Apr 2012 19:48:20 +0000 (19:48 +0000)]
stmmac: Move the mdio_register/_unregister in probe/remove
This patch moves the mdio_register/_unregister in probe/remove
functions and this also is required when hibernation on disk
is done.
Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st,com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st,com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Virlinzi [Wed, 18 Apr 2012 19:48:19 +0000 (19:48 +0000)]
stmmac: use custom init/exit functions in pm ops
Freeze and restore can call the custom init/exit functions.
Also the patch adds a custom data field that can be used
for storing platform data useful on restore the embedded
setup (e.g. GPIO, SYSCFG).
Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Spinadel [Thu, 19 Apr 2012 20:46:38 +0000 (13:46 -0700)]
iwlwifi: Remove inconsistent and redundant declaration
Remove declaration of iwl_alloc_traffic_mem from iwl-agn.h,
from methods that was exposed to support MVM.
MVM doesn't have to use this declaration.
CC: netdev@vger.kernel.org
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allan Stephens [Wed, 18 Apr 2012 13:42:56 +0000 (09:42 -0400)]
tipc: Ensure network address change doesn't impact configuration service
Enhances command validation done by TIPC's configuration service so
that it works properly even if the node's network address is changed in
mid-operation. The default node address of <0.0.0> is now recognized as an
alias for "this node" even after a new network address has been assigned.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Wed, 18 Apr 2012 13:42:29 +0000 (09:42 -0400)]
tipc: Ensure network address change doesn't impact rejected message
Revises handling of a rejected message to ensure that a locally
originated message is returned properly even if the node's network
address is changed in mid-operation. The routine now treats the
default node address of <0.0.0> as an alias for "this node" when
determining where to send a returned message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Wed, 18 Apr 2012 13:27:22 +0000 (09:27 -0400)]
tipc: handle <0.0.0> as an alias for this node on outgoing msgs
Revises handling of send routines for payload messages to ensure that
they are processed properly even if the node's network address is
changed in mid-operation. The routines now treat the default node
address of <0.0.0> as an alias for "this node" when determining where
to send an outgoing message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Wed, 18 Apr 2012 13:22:56 +0000 (09:22 -0400)]
tipc: properly handle off-node send requests with invalid addr
There are two send routines that might conceivably be asked by an
application to send a message off-node when the node is still using
the default network address. These now have an added check that
detects this and rejects the message gracefully.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Wed, 18 Apr 2012 13:12:09 +0000 (09:12 -0400)]
tipc: take lock while updating node network address
The routine that changes the node's network address now takes TIPC's
network lock in write mode while the main address variable and associated
data structures are being changed; this is needed to ensure that the
link subsystem won't attempt to send a message off-node until the sending
port's message header template has been updated with the node's new
network address.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:42:28 +0000 (18:42 -0400)]
tipc: Ensure network address change doesn't impact local connections
Revises routines that deal with connections between two ports on
the same node to ensure the connection is not impacted if the node's
network address is changed in mid-operation. The routines now treat
the default node address of <0.0.0> as an alias for "this node" in
the following situations:
1) Incoming messages destined to a connected port now handle the alias
properly when validating that the message was sent by the expected
peer port, ensuring that the message will be accepted regardless of
whether it specifies the node's old network address or it's current one.
2) The code which completes connection establishment now handles the
alias properly when determining if the peer port is on the same node
as the connected port.
An added benefit of addressing issue 1) is that some peer port
validation code has been relocated to TIPC's socket subsystem, which
means that validation is no longer done twice when a message is
sent to a non-socket port (such as TIPC's configuration service or
network topology service).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:36:42 +0000 (18:36 -0400)]
tipc: delete duplicate peerport/peernode helper functions
Prior to commit
23dd4cce387124ec3ea06ca30d17854ae4d9b772
"tipc: Combine port structure with tipc_port structure"
there was a need for the two sets of helper functions. But
now they are just duplicates. Remove the globally visible
ones, and mark the remaining ones as inline.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:22:49 +0000 (18:22 -0400)]
tipc: Ensure network address change doesn't impact new port
Re-orders port creation logic so that the initialization of a new
port's message header template occurs while the port list lock is
held. This ensures that a change to the node's network address that
occurs at the same time as the port is being created does not result
in the template identifying the sender using the former network
address. The new approach guarantees that the new port's template is
using the current network address or that it will be updated when
the address changes.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:17:35 +0000 (18:17 -0400)]
tipc: Optimize re-initialization of port message header templates
Removes an unnecessary check in the logic that updates the message
header template for existing ports when a node's network address is
first assigned. There is no longer any need to check to see if the
node's network address has actually changed since the calling routine
has already verified that this is so.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:16:34 +0000 (18:16 -0400)]
tipc: Ensure network address change doesn't impact name table updates
Revises routines that add and remove an entry from a node's name table
so that the publication scope lists are updated properly even if the
node's network address is changed in mid-operation. The routines now
recognize the default node address of <0.0.0> as an alias for "this node"
even after a new network address has been assigned.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 22:02:01 +0000 (18:02 -0400)]
tipc: Add routines for safe checking of node's network address
Introduces routines that test whether a given network address is
equal to a node's own network address or if it lies within the node's
own network cluster, and which work properly regardless of whether
the node is using the default network address <0.0.0> or a non-zero
network address that is assigned later on. In essence, these routines
ensure that address <0.0.0> is treated as an alias for "this node",
regardless of which network address the node is actually using.
Old users of the pre-existing more strict match in_own_cluster()
have been accordingly redirected to what is now called
in_own_cluster_exact() --- which does not extend matching to <0,0,0>.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Wed, 9 Nov 2011 19:22:52 +0000 (14:22 -0500)]
tipc: Don't record failed publication attempt as a success
No longer increments counter of number of publications by a node
if an attempt to add a new publication fails. This prevents TIPC from
incorrectly blocking future publications because the configured maximum
number of publications has been reached.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 21:57:52 +0000 (17:57 -0400)]
tipc: Update node-scope publications when network address is assigned
Ensures that node-scope name publications that exist prior to the
configuration of a node's network address are properly re-initialized
with that address when it is assigned. TIPC's node-scope publications
are now tracked using a publications list like the lists used for
cluster-scope and zone-scope publications so they can be easily updated
when required.
The inclusion of node scope name publications in a conventional publication
list means that they must now also be withdrawn, just like cluster and zone
scope publications are currently withdrawn. So some conditional tests on
scope ==/!= TIPC_NODE_SCOPE are inserted/removed accordingly.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allan Stephens [Tue, 17 Apr 2012 21:57:52 +0000 (17:57 -0400)]
tipc: Separate cluster-scope and zone-scope names into distinct lists
Utilizes distinct lists to track zone-scope and cluster-scope names
published by a node. For now, TIPC continues to process the entries
in both lists in the same way; however, an upcoming patch will utilize
the existence of the lists to prevent the sending of cluster-scope names
to nodes that are not part of the local cluster.
To achieve this, an array of publication lists is introduced, so
that they can be iterated over and accessed via publ->scope as
an index where convenient.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Michal Kubeček [Tue, 17 Apr 2012 02:02:06 +0000 (02:02 +0000)]
bonding: start slaves with link down for ARP monitor
Initialize slave device link state as down if ARP monitor is
active and net_carrier_ok() returns zero. Also shift initial
value of its last_arp_tx so that it doesn't immediately cause
fake detection of "up" state.
When ARP monitoring is used, initializing the slave device with
up link state can cause ARP monitor to detect link failure
before the device is really up (with igb driver, this can take
more than two seconds).
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Apr 2012 23:19:25 +0000 (23:19 +0000)]
nf_bridge: remove holes in struct nf_bridge_info
Put use & mask on same location to avoid two holes on 64bit arches
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 06:10:26 +0000 (06:10 +0000)]
ipv4: dont drop packet in defrag but consume it
When defragmentation is finalized, we clone a packet and kfree_skb() it.
Call consume_skb() to not confuse dropwatch, since its not a drop.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 07:07:40 +0000 (07:07 +0000)]
net: gro: GRO_MERGED_FREE consumes packets
As part of GRO processing, merged skbs should be consumed, not freed, to
not confuse dropwatch/drop_monitor.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:53 +0000 (02:24 +0000)]
net: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:48 +0000 (02:24 +0000)]
ipv6: dccp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:42 +0000 (02:24 +0000)]
packet: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:36 +0000 (02:24 +0000)]
ipv6: tcp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:28 +0000 (02:24 +0000)]
netlink: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 19 Apr 2012 02:24:17 +0000 (02:24 +0000)]
ip6_tunnel: dont drop packet but consume it
When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shan Wei [Wed, 18 Apr 2012 18:05:46 +0000 (18:05 +0000)]
net: fix compile error of leaking kmemleak.h header
net/core/sysctl_net_core.c: In function ‘sysctl_core_init’:
net/core/sysctl_net_core.c:259: error: implicit declaration of function ‘kmemleak_not_leak’
with same error in net/ipv4/route.c
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Tue, 17 Apr 2012 19:32:36 +0000 (19:32 +0000)]
atl1c: restore max-read-request-size in Device Conrol Register
in some platforms, we found the max-read-request-size in Device Control
Register is set to 0 by (BIOS?) during bootup, this will cause the
performance(throughput) very bad.
Restore it to a min-value.
register definition of REG_DEVICE_CTRL is removed, using kernel API to
access it as it's a standard pcie register.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang, Xiong [Tue, 17 Apr 2012 19:32:35 +0000 (19:32 +0000)]
atl1c: using fixed TXQ configuration for l2cb and l1c
using fixed TXQ config for l2cb and l1c regardless dmar_block
to make tx-DMA more stable.
register REG_TXQ_CTRL is refined as well.
Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>