commit ef16065a9e97e4fbcecc42e9effa3d2f15119794 Author: Greg Kroah-Hartman Date: Thu Feb 26 17:49:14 2015 -0800 Linux 3.10.70 commit c30748a365edbbc94084742b49d55c336a356f0b Author: Alex Elder Date: Tue Mar 25 15:36:02 2014 +0200 rbd: drop an unsafe assertion commit 638c323c4d1f8eaf25224946e21ce8818f1bcee1 upstream. Olivier Bonvalet reported having repeated crashes due to a failed assertion he was hitting in rbd_img_obj_callback(): Assertion failure in rbd_img_obj_callback() at line 2165: rbd_assert(which >= img_request->next_completion); With a lot of help from Olivier with reproducing the problem we were able to determine the object and image requests had already been completed (and often freed) at the point the assertion failed. There was a great deal of discussion on the ceph-devel mailing list about this. The problem only arose when there were two (or more) object requests in an image request, and the problem was always seen when the second request was being completed. The problem is due to a race in the window between setting the "done" flag on an object request and checking the image request's next completion value. When the first object request completes, it checks to see if its successor request is marked "done", and if so, that request is also completed. In the process, the image request's next_completion value is updated to reflect that both the first and second requests are completed. By the time the second request is able to check the next_completion value, it has been set to a value *greater* than its own "which" value, which caused an assertion to fail. Fix this problem by skipping over any completion processing unless the completing object request is the next one expected. Test only for inequality (not >=), and eliminate the bad assertion. Tested-by: Olivier Bonvalet Signed-off-by: Alex Elder Reviewed-by: Sage Weil Reviewed-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman commit b2b501af4181db36a3301cef3e4ccab16571ad27 Author: Austin Lund Date: Thu Jul 24 07:40:20 2014 -0300 media/rc: Send sync space information on the lirc device commit a8f29e89f2b54fbf2c52be341f149bc195b63a8b upstream. Userspace expects to see a long space before the first pulse is sent on the lirc device. Currently, if a long time has passed and a new packet is started, the lirc codec just returns and doesn't send anything. This makes lircd ignore many perfectly valid signals unless they are sent in quick sucession. When a reset event is delivered, we cannot know anything about the duration of the space. But it should be safe to assume it has been a long time and we just set the duration to maximum. Signed-off-by: Austin Lund Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 572d332c02bb349d6fe428bf17e3068631064976 Author: Saran Maruti Ramanara Date: Thu Jan 29 11:05:58 2015 +0100 net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param [ Upstream commit cfbf654efc6d78dc9812e030673b86f235bf677d ] When making use of RFC5061, section 4.2.4. for setting the primary IP address, we're passing a wrong parameter header to param_type2af(), resulting always in NULL being returned. At this point, param.p points to a sctp_addip_param struct, containing a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation id. Followed by that, as also presented in RFC5061 section 4.2.4., comes the actual sctp_addr_param, which also contains a sctp_paramhdr, but this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that param_type2af() can make use of. Since we already hold a pointer to addr_param from previous line, just reuse it for param_type2af(). Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Saran Maruti Ramanara Signed-off-by: Daniel Borkmann Acked-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a7df378ab94e59b29128b6d6b95da9fd67b40337 Author: Florian Westphal Date: Wed Jan 28 10:56:04 2015 +0100 ppp: deflate: never return len larger than output buffer [ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ] When we've run out of space in the output buffer to store more data, we will call zlib_deflate with a NULL output buffer until we've consumed remaining input. When this happens, olen contains the size the output buffer would have consumed iff we'd have had enough room. This can later cause skb_over_panic when ppp_generic skb_put()s the returned length. Reported-by: Iain Douglas Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6bed3166d097a20ffcf2d440825c611500b0ff97 Author: Eric Dumazet Date: Thu Jan 29 21:35:05 2015 -0800 ipv4: tcp: get rid of ugly unicast_sock [ Upstream commit bdbbb8527b6f6a358dbcb70dac247034d665b8e4 ] In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 23990c29a7bc207bfcb3026e286e804c7cdee933 Author: Eric Dumazet Date: Wed Jan 28 05:47:11 2015 -0800 tcp: ipv4: initialize unicast_sock sk_pacing_rate [ Upstream commit 811230cd853d62f09ed0addd0ce9a1b9b0e13fb5 ] When I added sk_pacing_rate field, I forgot to initialize its value in the per cpu unicast_sock used in ip_send_unicast_reply() This means that for sch_fq users, RST packets, or ACK packets sent on behalf of TIME_WAIT sockets might be sent to slowly or even dropped once we reach the per flow limit. Signed-off-by: Eric Dumazet Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b4faf21b76b2a0cf00e5a75e73efb22a09868b18 Author: Roopa Prabhu Date: Wed Jan 28 16:23:11 2015 -0800 bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify [ Upstream commit 59ccaaaa49b5b096cdc1f16706a9f931416b2332 ] Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081 This patch avoids calling rtnl_notify if the device ndo_bridge_getlink handler does not return any bytes in the skb. Alternately, the skb->len check can be moved inside rtnl_notify. For the bridge vlan case described in 92081, there is also a fix needed in bridge driver to generate a proper notification. Will fix that in subsequent patch. v2: rebase patch on net tree Signed-off-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 650a7901c0fe0c3b0e5d9c1c945f140058585a6f Author: Hannes Frederic Sowa Date: Mon Jan 26 15:11:17 2015 +0100 ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too [ Upstream commit 6e9e16e6143b725662e47026a1d0f270721cdd24 ] Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug : | [root@rhel7-5 lkundrak]# sh -x lal | + ip link add dev0 type dummy | + ip link set dev0 up | + ip link add dev1 type dummy | + ip link set dev1 up | + ip addr add 2001:db8:8086::2/64 dev dev0 | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 | + ip link del dev0 type dummy | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 | | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel Signed-off-by: Hannes Frederic Sowa Tested-by: Lubomir Rintel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 688ba993d1b7e3c84b1a87f39cd14b01b4716306 Author: subashab@codeaurora.org Date: Fri Jan 23 22:26:02 2015 +0000 ping: Fix race in free in receive path [ Upstream commit fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0 ] An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->|exception -007|sk_mem_uncharge -007|sock_rfree -008|skb_release_head_state -009|skb_release_all -009|__kfree_skb -010|kfree_skb -011|icmp_rcv -012|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by: Subash Abhinov Kasiviswanathan Cc: Eric Dumazet Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bd1f50c627afe99e374da23b76f0d0ee83244223 Author: Herbert Xu Date: Sat Jan 24 08:02:40 2015 +1100 udp_diag: Fix socket skipping within chain [ Upstream commit 86f3cddbc3037882414c7308973530167906b7e9 ] While working on rhashtable walking I noticed that the UDP diag dumping code is buggy. In particular, the socket skipping within a chain never happens, even though we record the number of sockets that should be skipped. As this code was supposedly copied from TCP, this patch does what TCP does and resets num before we walk a chain. Signed-off-by: Herbert Xu Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8c6dafeba6f8d1435f05e39142b50bc605f7a91c Author: Hannes Frederic Sowa Date: Fri Jan 23 12:01:26 2015 +0100 ipv4: try to cache dst_entries which would cause a redirect [ Upstream commit df4d92549f23e1c037e83323aff58a21b3de7fe0 ] Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov Signed-off-by: Marcelo Leitner Signed-off-by: Florian Westphal Signed-off-by: Hannes Frederic Sowa Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 727ab4c06af65c1b2313c95dfcd8827318d5a438 Author: Daniel Borkmann Date: Thu Jan 22 18:26:54 2015 +0100 net: sctp: fix slab corruption from use after free on INIT collisions [ Upstream commit 600ddd6825543962fb807884169e57b580dba208 ] When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950c646 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950c646 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [] SyS_listen+0x9d/0xb0 [ 535.166848] [] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann Acked-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e98d2751bec14b3279edaaa0e5f6584254e13126 Author: Eric Dumazet Date: Thu Jan 22 07:56:18 2015 -0800 netxen: fix netxen_nic_poll() logic [ Upstream commit 6088beef3f7517717bd21d90b379714dd0837079 ] NAPI poll logic now enforces that a poller returns exactly the budget when it wants to be called again. If a driver limits TX completion, it has to return budget as well when the limit is hit, not the number of received packets. Reported-and-tested-by: Mike Galbraith Signed-off-by: Eric Dumazet Fixes: d75b1ade567f ("net: less interrupt masking in NAPI") Cc: Manish Chopra Acked-by: Manish Chopra Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa3f55df7d5ae2d978024dda5b236c645a7c7819 Author: Hagen Paul Pfeifer Date: Thu Jan 15 22:34:25 2015 +0100 ipv6: stop sending PTB packets for MTU < 1280 [ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ] Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont Signed-off-by: Hagen Paul Pfeifer Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 06b5ff9f351205b2900c9629addf74a4c875b12c Author: Eric Dumazet Date: Thu Jan 15 17:04:22 2015 -0800 net: rps: fix cpu unplug [ Upstream commit ac64da0b83d82abe62f78b3d0e21cca31aea24fa ] softnet_data.input_pkt_queue is protected by a spinlock that we must hold when transferring packets from victim queue to an active one. This is because other cpus could still be trying to enqueue packets into victim queue. A second problem is that when we transfert the NAPI poll_list from victim to current cpu, we absolutely need to special case the percpu backlog, because we do not want to add complex locking to protect process_queue : Only owner cpu is allowed to manipulate it, unless cpu is offline. Based on initial patch from Prasad Sodagudi & Subash Abhinov Kasiviswanathan. This version is better because we do not slow down packet processing, only make migration safer. Reported-by: Prasad Sodagudi Reported-by: Subash Abhinov Kasiviswanathan Signed-off-by: Eric Dumazet Cc: Tom Herbert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1d480edb0cee748ef61b4b8f7b21ab2d1c3ff0a2 Author: Willem de Bruijn Date: Thu Jan 15 13:18:40 2015 -0500 ip: zero sockaddr returned on error queue [ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ] The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That structure is defined and allocated on the stack as struct { struct sock_extended_err ee; struct sockaddr_in(6) offender; } errhdr; The second part is only initialized for certain SO_EE_ORIGIN values. Always initialize it completely. An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that would return uninitialized bytes. Signed-off-by: Willem de Bruijn ---- Also verified that there is no padding between errhdr.ee and errhdr.offender that could leak additional kernel data. Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman