commit 5054319d9fe56ed1ef6c83c00ae37fdc2b277a79 Author: Greg Kroah-Hartman Date: Fri Jan 16 07:00:00 2015 -0800 Linux 3.10.65 commit 7d702b4b2b0f10cfa45f596ebc262e092c1808f0 Author: Linus Torvalds Date: Sun Jan 11 11:33:57 2015 -0800 mm: Don't count the stack guard page towards RLIMIT_STACK commit 690eac53daff34169a4d74fc7bfbd388c4896abb upstream. Commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page") made sure that we return the error properly for stack growth conditions. It also theorized that counting the guard page towards the stack limit might break something, but also said "Let's see if anybody notices". Somebody did notice. Apparently android-x86 sets the stack limit very close to the limit indeed, and including the guard page in the rlimit check causes the android 'zygote' process problems. So this adds the (fairly trivial) code to make the stack rlimit check be against the actual real stack size, rather than the size of the vma that includes the guard page. Reported-and-tested-by: Chih-Wei Huang Cc: Jay Foad Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 88b5d12c6413c7b80c3df4527b09c9d5f65c1c0a Author: Linus Torvalds Date: Tue Jan 6 13:00:05 2015 -0800 mm: propagate error from stack expansion even for guard page commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream. Jay Foad reports that the address sanitizer test (asan) sometimes gets confused by a stack pointer that ends up being outside the stack vma that is reported by /proc/maps. This happens due to an interaction between RLIMIT_STACK and the guard page: when we do the guard page check, we ignore the potential error from the stack expansion, which effectively results in a missing guard page, since the expected stack expansion won't have been done. And since /proc/maps explicitly ignores the guard page (commit d7824370e263: "mm: fix up some user-visible effects of the stack guard page"), the stack pointer ends up being outside the reported stack area. This is the minimal patch: it just propagates the error. It also effectively makes the guard page part of the stack limit, which in turn measn that the actual real stack is one page less than the stack limit. Let's see if anybody notices. We could teach acct_stack_growth() to allow an extra page for a grow-up/grow-down stack in the rlimit test, but I don't want to add more complexity if it isn't needed. Reported-and-tested-by: Jay Foad Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6bb148fb1e6a1211c55f8cc3fcd789c34483c7bb Author: Vlastimil Babka Date: Thu Jan 8 14:32:40 2015 -0800 mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed commit 9e5e3661727eaf960d3480213f8e87c8d67b6956 upstream. Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing to sleep. Their analysis found the cause to be a combination of several factors: 1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait 2. The process has been killed (by OOM in this case), but has not yet been scheduled to remove itself from the waitqueue and die. 3. kswapd checks for throttled processes in prepare_kswapd_sleep(): if (waitqueue_active(&pgdat->pfmemalloc_wait)) { wake_up(&pgdat->pfmemalloc_wait); return false; // kswapd will not go to sleep } However, for a process that was already killed, wake_up() does not remove the process from the waitqueue, since try_to_wake_up() checks its state first and returns false when the process is no longer waiting. 4. kswapd is running on the same CPU as the only CPU that the process is allowed to run on (through cpus_allowed, or possibly single-cpu system). 5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd encounters no voluntary preemption points and repeatedly fails prepare_kswapd_sleep(), blocking the process from running and removing itself from the waitqueue, which would let kswapd sleep. So, the source of the problem is that we prevent kswapd from going to sleep until there are processes waiting on the pfmemalloc_wait queue, and a process waiting on a queue is guaranteed to be removed from the queue only when it gets scheduled. This was done to make sure that no process is left sleeping on pfmemalloc_wait when kswapd itself goes to sleep. However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait queue actually empties. To prevent processes from being left sleeping, it's actually enough to guarantee that all processes waiting on pfmemalloc_wait queue have been woken up by the time we put kswapd to sleep. This patch therefore fixes this issue by substituting 'wake_up' with 'wake_up_all' and removing 'return false' in the code snippet from prepare_kswapd_sleep() above. Note that if any process puts itself in the queue after this waitqueue_active() check, or after the wake up itself, it means that the process will also wake up kswapd - and since we are under prepare_to_wait(), the wake up won't be missed. Also we update the comment prepare_kswapd_sleep() to hopefully more clearly describe the races it is preventing. Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") Signed-off-by: Vlastimil Babka Signed-off-by: Vladimir Davydov Cc: Mel Gorman Cc: Johannes Weiner Acked-by: Michal Hocko Acked-by: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 2788d619a6d20e994dbbdcd91ba6847744bee675 Author: Jiri Olsa Date: Wed Nov 26 16:39:31 2014 +0100 perf session: Do not fail on processing out of order event commit f61ff6c06dc8f32c7036013ad802c899ec590607 upstream. Linus reported perf report command being interrupted due to processing of 'out of order' event, with following error: Timestamp below last timeslice flush 0x5733a8 [0x28]: failed to process type: 3 I could reproduce the issue and in my case it was caused by one CPU (mmap) being behind during record and userspace mmap reader seeing the data after other CPUs data were already stored. This is expected under some circumstances because we need to limit the number of events that we queue for reordering when we receive a PERF_RECORD_FINISHED_ROUND or when we force flush due to memory pressure. Reported-by: Linus Torvalds Signed-off-by: Jiri Olsa Acked-by: Ingo Molnar Cc: Andi Kleen Cc: Corey Ashford Cc: David Ahern Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Linus Torvalds Cc: Matt Fleming Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Link: http://lkml.kernel.org/r/1417016371-30249-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo [zhangzhiqiang: backport to 3.10: - adjust context - commit f61ff6c06d struct events_stats was defined in tools/perf/util/event.h while 3.10 stable defined in tools/perf/util/hist.h. - 3.10 stable there is no pr_oe_time() which used for debug. - After the above adjustments, becomes same to the original patch: https://github.com/torvalds/linux/commit/f61ff6c06dc8f32c7036013ad802c899ec590607 ] Signed-off-by: Zhiqiang Zhang Signed-off-by: Greg Kroah-Hartman commit d525563b50aa3b63c7b7cb7908810facb6dc7ac5 Author: Jiri Olsa Date: Wed Dec 10 21:23:51 2014 +0100 perf: Fix events installation during moving group commit 9fc81d87420d0d3fd62d5e5529972c0ad9eab9cc upstream. We allow PMU driver to change the cpu on which the event should be installed to. This happened in patch: e2d37cd213dc ("perf: Allow the PMU driver to choose the CPU on which to install events") This patch also forces all the group members to follow the currently opened events cpu if the group happened to be moved. This and the change of event->cpu in perf_install_in_context() function introduced in: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()") forces group members to change their event->cpu, if the currently-opened-event's PMU changed the cpu and there is a group move. Above behaviour causes problem for breakpoint events, which uses event->cpu to touch cpu specific data for breakpoints accounting. By changing event->cpu, some breakpoints slots were wrongly accounted for given cpu. Vinces's perf fuzzer hit this issue and caused following WARN on my setup: WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150() Can't find any breakpoint slot [...] This patch changes the group moving code to keep the event's original cpu. Reported-by: Vince Weaver Signed-off-by: Jiri Olsa Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Vince Weaver Cc: Yan, Zheng Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit ac96652da2a760bbd8cb4dc74b6bf1c5d2c8babb Author: Jiri Olsa Date: Wed Dec 10 21:23:50 2014 +0100 perf/x86/intel/uncore: Make sure only uncore events are collected commit af91568e762d04931dcbdd6bef4655433d8b9418 upstream. The uncore_collect_events functions assumes that event group might contain only uncore events which is wrong, because it might contain any type of events. This bug leads to uncore framework touching 'not' uncore events, which could end up all sorts of bugs. One was triggered by Vince's perf fuzzer, when the uncore code touched breakpoint event private event space as if it was uncore event and caused BUG: BUG: unable to handle kernel paging request at ffffffff82822068 IP: [] uncore_assign_events+0x188/0x250 ... The code in uncore_assign_events() function was looking for event->hw.idx data while the event was initialized as a breakpoint with different members in event->hw union. This patch forces uncore_collect_events() to collect only uncore events. Reported-by: Vince Weaver Signed-off-by: Jiri Olsa Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Yan, Zheng Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 16e9d54b3f8158479e796188dcf45a6bfed18f03 Author: Chris Mason Date: Wed Dec 31 12:18:29 2014 -0500 Btrfs: don't delay inode ref updates during log replay commit 6f8960541b1eb6054a642da48daae2320fddba93 upstream. Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a check to skip delayed inode updates during log replay because it confuses the enospc code. But the delayed processing will end up ignoring delayed refs from log replay because the inode itself wasn't put through the delayed code. This can end up triggering a warning at commit time: WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34() Which is repeated for each commit because we never process the delayed inode ref update. The fix used here is to change btrfs_delayed_delete_inode_ref to return an error if we're currently in log replay. The caller will do the ref deletion immediately and everything will work properly. Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman commit f35fff85dc8a8ee4fea7802d44822d1f18d035ed Author: Thomas Petazzoni Date: Thu Nov 13 10:38:57 2014 +0100 ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP commit e55355453600a33bb5ca4f71f2d7214875f3b061 upstream. Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada 38x and Armada XP requires a certain number of conditions: - On Armada 370, the cache policy must be set to write-allocate. - On Armada 375, 38x and XP, the cache policy must be set to write-allocate, the pages must be mapped with the shareable attribute, and the SMP bit must be set Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none of these conditions are met. With Armada 370, the situation is worse: since the processor is single core, regardless of whether CONFIG_SMP or !CONFIG_SMP is used, the cache policy will be set to write-back by the kernel and not write-allocate. Since solving this problem turns out to be quite complicated, and we don't want to let users with a mainline kernel known to have infrequent but existing data corruptions, this commit proposes to simply disable hardware I/O coherency in situations where it is known not to work. And basically, the is_smp() function of the kernel tells us whether it is OK to enable hardware I/O coherency or not, so this commit slightly refactors the coherency_type() function to return COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate type of the coherency fabric in the other case. Thanks to this, the I/O coherency fabric will no longer be used at all in !CONFIG_SMP configurations. It will continue to be used in CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x (which are multiple cores processors), but will no longer be used on Armada 370 (which is a single core processor). In the process, it simplifies the implementation of the coherency_type() function, and adds a missing call to of_node_put(). Signed-off-by: Thomas Petazzoni Fixes: e60304f8cb7bb545e79fe62d9b9762460c254ec2 ("arm: mvebu: Add hardware I/O Coherency support") Cc: # v3.8+ Acked-by: Gregory CLEMENT Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman commit 697f52b45551255d3e151110cba150107141de56 Author: Johannes Berg Date: Wed Dec 10 15:41:28 2014 -0800 scripts/kernel-doc: don't eat struct members with __aligned commit 7b990789a4c3420fa57596b368733158e432d444 upstream. The change from \d+ to .+ inside __aligned() means that the following structure: struct test { u8 a __aligned(2); u8 b __aligned(2); }; essentially gets modified to struct test { u8 a; }; for purposes of kernel-doc, thus dropping a struct member, which in turns causes warnings and invalid kernel-doc generation. Fix this by replacing the catch-all (".") with anything that's not a semicolon ("[^;]"). Fixes: 9dc30918b23f ("scripts/kernel-doc: handle struct member __aligned without numbers") Signed-off-by: Johannes Berg Cc: Nishanth Menon Cc: Randy Dunlap Cc: Michal Marek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 344e833f18ddb40252264d38dbf7d82b760ec34a Author: Ryusuke Konishi Date: Wed Dec 10 15:54:34 2014 -0800 nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races commit 705304a863cc41585508c0f476f6d3ec28cf7e00 upstream. Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead of insert_inode_locked() and a bug of a check for dead inodes needs to be fixed. If nilfs_iget() is called from nfsd after nilfs_new_inode() calls insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode. If nilfs_iget() is called before nilfs_new_inode() calls insert_inode_locked4(), it will create an in-core inode and read its data from the on-disk inode. But, nilfs_iget() will find i_nlink equals zero and fail at nilfs_read_inode_common(), which will lead it to call iget_failed() and cleanly fail. However, this sanity check doesn't work as expected for reused on-disk inodes because they leave a non-zero value in i_mode field and it hinders the test of i_nlink. This patch also fixes the issue by removing the test on i_mode that nilfs2 doesn't need. Signed-off-by: Ryusuke Konishi Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0171a58775ee1eefeaf0f5a8fd30da50be400d16 Author: Benjamin Coddington Date: Sun Dec 7 16:05:47 2014 -0500 nfsd4: fix xdr4 inclusion of escaped char commit 5a64e56976f1ba98743e1678c0029a98e9034c81 upstream. Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as an additional string encoding. Signed-off-by: Benjamin Coddington Fixes: e7a0444aef4a "nfsd: add IPv6 addr escaping to fs_location hosts" Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 11e404024797de1db3cbc2479a2f7e89b2a14ad8 Author: Rasmus Villemoes Date: Fri Dec 5 16:40:07 2014 +0100 fs: nfsd: Fix signedness bug in compare_blob commit ef17af2a817db97d42dd2ec0a425231748e23dbc upstream. Bugs similar to the one in acbbe6fbb240 (kcmp: fix standard comparison bug) are in rich supply. In this variant, the problem is that struct xdr_netobj::len has type unsigned int, so the expression o1->len - o2->len _also_ has type unsigned int; it has completely well-defined semantics, and the result is some non-negative integer, which is always representable in a long long. But this means that if the conditional triggers, we are guaranteed to return a positive value from compare_blob. In this case it could be fixed by - res = o1->len - o2->len; + res = (long long)o1->len - (long long)o2->len; but I'd rather eliminate the usually broken 'return a - b;' idiom. Reviewed-by: Jeff Layton Signed-off-by: Rasmus Villemoes Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 6a388a8350c86335b940dea0a3cc63bdbcc238ca Author: Robert Baldyga Date: Mon Nov 24 07:56:21 2014 +0100 serial: samsung: wait for transfer completion before clock disable commit 1ff383a4c3eda8893ec61b02831826e1b1f46b41 upstream. This patch adds waiting until transmit buffer and shifter will be empty before clock disabling. Without this fix it's possible to have clock disabled while data was not transmited yet, which causes unproper state of TX line and problems in following data transfers. Signed-off-by: Robert Baldyga Signed-off-by: Greg Kroah-Hartman commit 21fe2674284e19f70a2ce145f812d9329d4f1cc2 Author: Tejun Heo Date: Fri Oct 24 15:38:21 2014 -0400 writeback: fix a subtle race condition in I_DIRTY clearing commit 9c6ac78eb3521c5937b2dd8a7d1b300f41092f45 upstream. After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and tests inode->i_state locklessly to see whether it already has all the necessary I_DIRTY bits set. The comment above the barrier doesn't contain any useful information - memory barriers can't ensure "changes are seen by all cpus" by itself. And it sure enough was broken. Please consider the following scenario. CPU 0 CPU 1 ------------------------------------------------------------------------------- enters __writeback_single_inode() grabs inode->i_lock tests PAGECACHE_TAG_DIRTY which is clear enters __set_page_dirty() grabs mapping->tree_lock sets PAGECACHE_TAG_DIRTY releases mapping->tree_lock leaves __set_page_dirty() enters __mark_inode_dirty() smp_mb() sees I_DIRTY_PAGES set leaves __mark_inode_dirty() clears I_DIRTY_PAGES releases inode->i_lock Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem to lead to an immediately critical problem because requeue_inode() later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when deciding whether the inode needs to be requeued for IO and there are enough unintentional memory barriers inbetween, so while the inode ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the IO list. The lack of explicit barrier may also theoretically affect the other I_DIRTY bits which deal with metadata dirtiness. There is no guarantee that a strong enough barrier exists between I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied inode. Filesystem inode writeout path likely has enough stuff which can behave as full barrier but it's theoretically possible that the writeout may not see all the updates from ->dirty_inode(). Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note that I_DIRTY_PAGES needs a special treatment as it always needs to be cleared to be interlocked with the lockless test on __mark_inode_dirty() side. It's cleared unconditionally and reinstated after smp_mb() if the mapping still has dirty pages. Also add comments explaining how and why the barriers are paired. Lightly tested. Signed-off-by: Tejun Heo Cc: Jan Kara Cc: Mikulas Patocka Cc: Jens Axboe Cc: Al Viro Reviewed-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit b16c4055b24b1afce2e57270003d443831350fa8 Author: Oliver Neukum Date: Thu Nov 20 14:54:35 2014 +0100 cdc-acm: memory leak in error case commit d908f8478a8d18e66c80a12adb27764920c1f1ca upstream. If probe() fails not only the attributes need to be removed but also the memory freed. Reported-by: Ahmed Tamrawi Signed-off-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman commit dd4fb6fc5dd2e85e65222f0e5e1e321dec9fc2b7 Author: Jens Axboe Date: Wed Nov 19 13:06:22 2014 -0700 genhd: check for int overflow in disk_expand_part_tbl() commit 5fabcb4c33fe11c7e3afdf805fde26c1a54d0953 upstream. We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition() with a user passed in partno value. If we pass in 0x7fffffff, the new target in disk_expand_part_tbl() overflows the 'int' and we access beyond the end of ptbl->part[] and even write to it when we do the rcu_assign_pointer() to assign the new partition. Reported-by: David Ramos Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit edefe2069bc0d8ab1fd03dd9b141cbfec98aca8f Author: Greg Kroah-Hartman Date: Fri Nov 7 08:48:15 2014 -0800 USB: cdc-acm: check for valid interfaces commit 403dff4e2c94f275e24fd85f40b2732ffec268a1 upstream. We need to check that we have both a valid data and control inteface for both types of headers (union and not union.) References: https://bugzilla.kernel.org/show_bug.cgi?id=83551 Reported-by: Simon Schubert <2+kernel@0x2c.org> Signed-off-by: Greg Kroah-Hartman commit c5bcceb0e607ef522d8b244a40d08080f04cde65 Author: Takashi Iwai Date: Mon Jan 5 13:27:33 2015 +0100 ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs commit c507de88f6a336bd7296c9ec0073b2d4af8b4f5e upstream. stac_store_hints() does utterly wrong for masking the values for gpio_dir and gpio_data, likely due to copy&paste errors. Fortunately, this feature is used very rarely, so the impact must be really small. Reported-by: Rasmus Villemoes Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit a26f3d7af44539e6fc1d7c62e03fab95efccdbf9 Author: Dan Carpenter Date: Thu Nov 27 01:34:43 2014 +0300 ALSA: hda - using uninitialized data commit 69eba10e606a80665f8573221fec589430d9d1cb upstream. In olden times the snd_hda_param_read() function always set "*start_id" but in 2007 we introduced a new return and it causes uninitialized data bugs in a couple of the callers: print_codec_info() and hdmi_parse_codec(). Fixes: e8a7f136f5ed ('[ALSA] hda-intel - Improve HD-audio codec probing robustness') Signed-off-by: Dan Carpenter Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 5c27da8b2773851c9920b458366c5294c9de2aa6 Author: Jiri Jaburek Date: Thu Dec 18 02:03:19 2014 +0100 ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC commit d70a1b9893f820fdbcdffac408c909c50f2e6b43 upstream. The Arcam rPAC seems to have the same problem - whenever anything (alsamixer, udevd, 3.9+ kernel from 60af3d037eb8c, ..) attempts to access mixer / control interface of the card, the firmware "locks up" the entire device, resulting in SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error from alsa-lib. Other operating systems can somehow read the mixer (there seems to be playback volume/mute), but any manipulation is ignored by the device (which has hardware volume controls). Signed-off-by: Jiri Jaburek Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit bdf2a0db176e1de7c93fe7b7c5a74756b976fb33 Author: Alex Williamson Date: Fri Oct 31 11:13:07 2014 -0600 driver core: Fix unbalanced device reference in drivers_probe commit bb34cb6bbd287b57e955bc5cfd42fcde6aaca279 upstream. bus_find_device_by_name() acquires a device reference which is never released. This results in an object leak, which on older kernels results in failure to release all resources of PCI devices. libvirt uses drivers_probe to re-attach devices to the host after assignment and is therefore a common trigger for this leak. Example: # cd /sys/bus/pci/ # dmesg -C # echo 1 > devices/0000\:01\:00.0/sriov_numvfs # echo 0 > devices/0000\:01\:00.0/sriov_numvfs # dmesg | grep 01:10 pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000 kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices' kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79cd0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' kobject: '0000:01:10.0' (ffff8801d79cd0a8): kobject_cleanup, parent (null) kobject: '0000:01:10.0' (ffff8801d79cd0a8): calling ktype release kobject: '0000:01:10.0': free name [kobject freed as expected] # dmesg -C # echo 1 > devices/0000\:01\:00.0/sriov_numvfs # echo 0000:01:10.0 > drivers_probe # echo 0 > devices/0000\:01\:00.0/sriov_numvfs # dmesg | grep 01:10 pci 0000:01:10.0: [8086:10ca] type 00 class 0x020000 kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_add_internal: parent: '0000:00:01.0', set: 'devices' kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' kobject: '0000:01:10.0' (ffff8801d79ce0a8): kobject_uevent_env kobject: '0000:01:10.0' (ffff8801d79ce0a8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:10.0' [no free] Signed-off-by: Alex Williamson Signed-off-by: Greg Kroah-Hartman commit 04d98e96a02e24987a53f4a879abeae66440e49a Author: Andy Lutomirski Date: Sun Dec 21 08:57:46 2014 -0800 x86, vdso: Use asm volatile in __getcpu commit 1ddf0b1b11aa8a90cef6706e935fc31c75c406ba upstream. In Linux 3.18 and below, GCC hoists the lsl instructions in the pvclock code all the way to the beginning of __vdso_clock_gettime, slowing the non-paravirt case significantly. For unknown reasons, presumably related to the removal of a branch, the performance issue is gone as of e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu but I don't trust GCC enough to expect the problem to stay fixed. There should be no correctness issue, because the __getcpu calls in __vdso_vlock_gettime were never necessary in the first place. Note to stable maintainers: In 3.18 and below, depending on configuration, gcc 4.9.2 generates code like this: 9c3: 44 0f 03 e8 lsl %ax,%r13d 9c7: 45 89 eb mov %r13d,%r11d 9ca: 0f 03 d8 lsl %ax,%ebx This patch won't apply as is to any released kernel, but I'll send a trivial backported version if needed. [ Backported by Andy Lutomirski. Should apply to all affected versions. This fixes a functionality bug as well as a performance bug: buggy kernels can infinite loop in __vdso_clock_gettime on affected compilers. See, for exammple: https://bugzilla.redhat.com/show_bug.cgi?id=1178975 ] Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support Cc: Marcelo Tosatti Acked-by: Paolo Bonzini Signed-off-by: Andy Lutomirski Signed-off-by: Greg Kroah-Hartman commit 466ad6591b4c24c5f8a28563dc756d5e2c6d025d Author: Andy Lutomirski Date: Fri Dec 19 16:04:11 2014 -0800 x86_64, vdso: Fix the vdso address randomization algorithm commit 394f56fe480140877304d342dec46d50dc823d46 upstream. The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook Signed-off-by: Andy Lutomirski Signed-off-by: Greg Kroah-Hartman commit a0dd9ca4500ed7cb9b21b3e0014ca7beadb4e633 Author: Giedrius Statkevičius Date: Sat Dec 27 00:28:30 2014 +0200 HID: Add a new id 0x501a for Genius MousePen i608X commit 2bacedada682d5485424f5227f27a3d5d6eb551c upstream. New Genius MousePen i608X devices have a new id 0x501a instead of the old 0x5011 so add a new #define with "_2" appended and change required places. The remaining two checkpatch warnings about line length being over 80 characters are present in the original files too and this patch was made in the same style (no line break). Just adding a new id and changing the required places should make the new device work without any issues according to the bug report in the following url. This patch was made according to and fixes: https://bugzilla.kernel.org/show_bug.cgi?id=67111 Signed-off-by: Giedrius Statkevičius Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit bee704e2d767aa8936aa2dbb94674a57caa8c866 Author: Karl Relton Date: Tue Dec 16 15:37:22 2014 +0000 HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard commit da940db41dcf8c04166f711646df2f35376010aa upstream. Apple bluetooth wireless keyboard (sold in UK) has always reported zero for battery strength no matter what condition the batteries are actually in. With this patch applied (applying same quirk as other Apple keyboards), the battery strength is now correctly reported. Signed-off-by: Karl Relton Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit 94bb429ef60ffe198fbd250ce852d57d06fd02e4 Author: Dan Carpenter Date: Fri Jan 9 15:32:31 2015 +0300 HID: roccat: potential out of bounds in pyra_sysfs_write_settings() commit 606185b20caf4c57d7e41e5a5ea4aff460aef2ab upstream. This is a static checker fix. We write some binary settings to the sysfs file. One of the settings is the "->startup_profile". There isn't any checking to make sure it fits into the pyra->profile_settings[] array in the profile_activated() function. I added a check to pyra_sysfs_write_settings() in both places because I wasn't positive that the other callers were correct. Signed-off-by: Dan Carpenter Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit 32b57c08f4cca3bf821606a2f598fc397a143127 Author: Gwendal Grignou Date: Thu Dec 11 16:02:45 2014 -0800 HID: i2c-hid: prevent buffer overflow in early IRQ commit d1c7e29e8d276c669e8790bb8be9f505ddc48888 upstream. Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE, 64 bytes. While processing the IRQ, we were asking to receive up to wMaxInputLength bytes, which can be bigger than 64 bytes. Later, when ->start is run, a proper bufsize will be calculated. Given wMaxInputLength is said to be unreliable in other part of the code, set to receive only what we can even if it results in truncated reports. Signed-off-by: Gwendal Grignou Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit caa853b3d80473ec7a646a7e163ddf8bc1f4ef46 Author: Jean-Baptiste Maneyrol Date: Thu Nov 20 00:46:37 2014 +0800 HID: i2c-hid: fix race condition reading reports commit 6296f4a8eb86f9abcc370fb7a1a116b8441c17fd upstream. Current driver uses a common buffer for reading reports either synchronously in i2c_hid_get_raw_report() and asynchronously in the interrupt handler. There is race condition if an interrupt arrives immediately after the report is received in i2c_hid_get_raw_report(); the common buffer is modified by the interrupt handler with the new report and then i2c_hid_get_raw_report() proceed using wrong data. Fix it by using a separate buffers for synchronous reports. Signed-off-by: Jean-Baptiste Maneyrol [Antonio Borneo: cleanup, rebase to v3.17, submit mainline] Signed-off-by: Antonio Borneo Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit b658f2ad07e5e15a8e29bb95740452bd5b6a8eea Author: Jiang Liu Date: Wed Nov 26 09:42:10 2014 +0800 iommu/vt-d: Fix an off-by-one bug in __domain_mapping() commit cc4f14aa170d895c9a43bdb56f62070c8a6da908 upstream. There's an off-by-one bug in function __domain_mapping(), which may trigger the BUG_ON(nr_pages < lvl_pages) when (nr_pages + 1) & superpage_mask == 0 The issue was introduced by commit 9051aa0268dc "intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()", which sets sg_res to "nr_pages + 1" to avoid some of the 'sg_res==0' code paths. It's safe to remove extra "+1" because sg_res is only used to calculate page size now. Reported-And-Tested-by: Sudeep Dutt Signed-off-by: Jiang Liu Acked-By: David Woodhouse Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit ccebbb7e53c8d028961340d3081d69658f8a5bef Author: Richard Weinberger Date: Thu Nov 6 16:47:49 2014 +0100 UBI: Fix double free after do_sync_erase() commit aa5ad3b6eb8feb2399a5d26c8fb0060561bb9534 upstream. If the erase worker is unable to erase a PEB it will free the ubi_wl_entry itself. The failing ubi_wl_entry must not free()'d again after do_sync_erase() returns. Signed-off-by: Richard Weinberger Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman commit 398dd8fd97a3cf7777c5ee0fb1b52b8ec2edf9b8 Author: Richard Weinberger Date: Mon Oct 27 00:46:11 2014 +0100 UBI: Fix invalid vfree() commit f38aed975c0c3645bbdfc5ebe35726e64caaf588 upstream. The logic of vfree()'ing vol->upd_buf is tied to vol->updating. In ubi_start_update() vol->updating is set long before vmalloc()'ing vol->upd_buf. If we encounter a write failure in ubi_start_update() before vmalloc() the UBI device release function will try to vfree() vol->upd_buf because vol->updating is set. Fix this by allocating vol->upd_buf directly after setting vol->updating. Fixes: [ 31.559338] UBI warning: vol_cdev_release: update of volume 2 not finished, volume is damaged [ 31.559340] ------------[ cut here ]------------ [ 31.559343] WARNING: CPU: 1 PID: 2747 at mm/vmalloc.c:1446 __vunmap+0xe3/0x110() [ 31.559344] Trying to vfree() nonexistent vm area (ffffc90001f2b000) [ 31.559345] Modules linked in: [ 31.565620] 0000000000000bba ffff88002a0cbdb0 ffffffff818f0497 ffff88003b9ba148 [ 31.566347] ffff88002a0cbde0 ffffffff8156f515 ffff88003b9ba148 0000000000000bba [ 31.567073] 0000000000000000 0000000000000000 ffff88002a0cbe88 ffffffff8156c10a [ 31.567793] Call Trace: [ 31.568034] [] dump_stack+0x4e/0x7a [ 31.568510] [] ubi_io_write_vid_hdr+0x155/0x160 [ 31.569084] [] ubi_eba_write_leb+0x23a/0x870 [ 31.569628] [] vol_cdev_write+0x226/0x380 [ 31.570155] [] vfs_write+0xb5/0x1f0 [ 31.570627] [] SyS_pwrite64+0x6a/0xa0 [ 31.571123] [] system_call_fastpath+0x16/0x1b Signed-off-by: Richard Weinberger Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman commit c0d9d658fa9410fa8f4a8b7eb216f236102f02f2 Author: Tony Lindgren Date: Tue Sep 16 13:50:01 2014 -0700 pstore-ram: Allow optional mapping with pgprot_noncached commit 027bc8b08242c59e19356b4b2c189f2d849ab660 upstream. On some ARMs the memory can be mapped pgprot_noncached() and still be working for atomic operations. As pointed out by Colin Cross , in some cases you do want to use pgprot_noncached() if the SoC supports it to see a debug printk just before a write hanging the system. On ARMs, the atomic operations on strongly ordered memory are implementation defined. So let's provide an optional kernel parameter for configuring pgprot_noncached(), and use pgprot_writecombine() by default. Cc: Arnd Bergmann Cc: Rob Herring Cc: Randy Dunlap Cc: Anton Vorontsov Cc: Colin Cross Cc: Olof Johansson Cc: Russell King Acked-by: Kees Cook Signed-off-by: Tony Lindgren Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman commit af74a863b9999e4d2f32140a5edc50e3de5e5a66 Author: Rob Herring Date: Fri Sep 12 11:32:24 2014 -0700 pstore-ram: Fix hangs by using write-combine mappings commit 7ae9cb81933515dc7db1aa3c47ef7653717e3090 upstream. Currently trying to use pstore on at least ARMs can hang as we're mapping the peristent RAM with pgprot_noncached(). On ARMs, pgprot_noncached() will actually make the memory strongly ordered, and as the atomic operations pstore uses are implementation defined for strongly ordered memory, they may not work. So basically atomic operations have undefined behavior on ARM for device or strongly ordered memory types. Let's fix the issue by using write-combine variants for mappings. This corresponds to normal, non-cacheable memory on ARM. For many other architectures, this change does not change the mapping type as by default we have: #define pgprot_writecombine pgprot_noncached The reason why pgprot_noncached() was originaly used for pstore is because Colin Cross had observed lost debug prints right before a device hanging write operation on some systems. For the platforms supporting pgprot_noncached(), we can add a an optional configuration option to support that. But let's get pstore working first before adding new features. Cc: Arnd Bergmann Cc: Anton Vorontsov Cc: Colin Cross Cc: Olof Johansson Cc: linux-kernel@vger.kernel.org Acked-by: Kees Cook Signed-off-by: Rob Herring [tony@atomide.com: updated description] Signed-off-by: Tony Lindgren Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman commit 9670f1a473dc2ce7620e91445c1fe3d9a275356d Author: Myron Stowe Date: Thu Oct 30 11:54:37 2014 -0600 PCI: Restore detection of read-only BARs commit 36e8164882ca6d3c41cb91e6f09a3ed236841f80 upstream. Commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") masked off low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(), which compares 'base == maxbase' to check for read-only BARs. The masking of 'l' means that comparison will never be 'true', so the check for read-only BARs no longer works. Resolve this by also masking off the low-order bits of 'sz' before passing it into pci_size() as 'maxbase'. With this change, pci_size() will once again catch the problems that have been encountered to date: - AGP aperture BAR of AMD-7xx host bridges: if the AGP window is disabled, this BAR is read-only and read as 0x00000008 [1] - BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1] - Intel Sandy Bridge - Thermal Management Controller [8086:0103]; BAR 0 returning 0xfed98004 [2] - Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0]; Bar 0 returning 0x00001a [3] Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git)) Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331 Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991 Reported-by: William Unruh Reported-by: Martin Lucina Signed-off-by: Myron Stowe Signed-off-by: Bjorn Helgaas CC: Matthew Wilcox Signed-off-by: Greg Kroah-Hartman commit 97a3f651913ffb8e13357913d6e93fb8fbacab3a Author: Andrew Jackson Date: Fri Dec 19 16:18:05 2014 +0000 ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap commit 3475c3d034d7f276a474c8bd53f44b48c8bf669d upstream. Flush the FIFOs when the stream is prepared for use. This avoids an inadvertent swapping of the left/right channels if the FIFOs are not empty at startup. Signed-off-by: Andrew Jackson Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit ef3a852c749491082bea51e1f0b4ee27afb2b94c Author: Jarkko Nikula Date: Mon Nov 24 15:32:36 2014 +0200 ASoC: max98090: Fix ill-defined sidetone route commit 48826ee590da03e9882922edf96d8d27bdfe9552 upstream. Commit 5fe5b767dc6f ("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets") revealed ill-defined control in a route between "STENL Mux" and DACs in max98090.c: max98090 i2c-193C9890:00: Control not supported for path STENL Mux -> [NULL] -> DACL max98090 i2c-193C9890:00: ASoC: no dapm match for STENL Mux --> NULL --> DACL max98090 i2c-193C9890:00: ASoC: Failed to add route STENL Mux -> NULL -> DACL max98090 i2c-193C9890:00: Control not supported for path STENL Mux -> [NULL] -> DACR max98090 i2c-193C9890:00: ASoC: no dapm match for STENL Mux --> NULL --> DACR max98090 i2c-193C9890:00: ASoC: Failed to add route STENL Mux -> NULL -> DACR Since there is no control between "STENL Mux" and DACs the control name must be NULL not "NULL". Signed-off-by: Jarkko Nikula Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit c60c3ab7c63c70cf53895eb661e2a3fd2c6faa49 Author: Lars-Peter Clausen Date: Wed Nov 19 18:29:02 2014 +0100 ASoC: sigmadsp: Refuse to load firmware files with a non-supported version commit 50c0f21b42dd4cd02b51f82274f66912d9a7fa32 upstream. Make sure to check the version field of the firmware header to make sure to not accidentally try to parse a firmware file with a different layout. Trying to do so can result in loading invalid firmware code to the device. Signed-off-by: Lars-Peter Clausen Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman commit 68956d1e3aa9b2c417c6b2c0fbd3dcdd31bb1c38 Author: Felix Fietkau Date: Sun Nov 30 21:52:57 2014 +0100 ath5k: fix hardware queue index assignment commit 9e4982f6a51a2442f1bb588fee42521b44b4531c upstream. Like with ath9k, ath5k queues also need to be ordered by priority. queue_info->tqi_subtype already contains the correct index, so use it instead of relying on the order of ath5k_hw_setup_tx_queue calls. Signed-off-by: Felix Fietkau Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 81cb80b578c59a52a17cfd7c7a0571be24906b15 Author: Stefano Stabellini Date: Fri Nov 21 16:56:12 2014 +0000 swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single commit 2c3fc8d26dd09b9d7069687eead849ee81c78e46 upstream. Need to pass the pointer within the swiotlb internal buffer to the swiotlb library, that in the case of xen_unmap_single is dev_addr, not paddr. Signed-off-by: Stefano Stabellini Acked-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 413706774c9eec19a0484392a278bf3aafd11d3b Author: Stephane Grosjean Date: Fri Nov 28 14:08:48 2014 +0100 can: peak_usb: fix memset() usage commit dc50ddcd4c58a5a0226038307d6ef884bec9f8c2 upstream. This patchs fixes a misplaced call to memset() that fills the request buffer with 0. The problem was with sending PCAN_USBPRO_REQ_FCT requests, the content set by the caller was thus lost. With this patch, the memory area is zeroed only when requesting info from the device. Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 5e5740524c3c8c4127d6e0eab62c619f6650e017 Author: Stephane Grosjean Date: Fri Nov 28 13:49:10 2014 +0100 can: peak_usb: fix cleanup sequence order in case of error during init commit af35d0f1cce7a990286e2b94c260a2c2d2a0e4b0 upstream. This patch sets the correct reverse sequence order to the instructions set to run, when any failure occurs during the initialization steps. It also adds the missing unregistration call of the can device if the failure appears after having been registered. Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit a510bea7571d651562bbdc8fec6ce2725a2ad0ab Author: Felix Fietkau Date: Sun Nov 30 20:38:41 2014 +0100 ath9k: fix BE/BK queue order commit 78063d81d353e10cbdd279c490593113b8fdae1c upstream. Hardware queues are ordered by priority. Use queue index 0 for BK, which has lower priority than BE. Signed-off-by: Felix Fietkau Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit f8510534c0878bce9c2d019c850ea507d72f8d1d Author: Felix Fietkau Date: Sun Nov 30 20:38:40 2014 +0100 ath9k_hw: fix hardware queue allocation commit ad8fdccf9c197a89e2d2fa78c453283dcc2c343f upstream. The driver passes the desired hardware queue index for a WMM data queue in qinfo->tqi_subtype. This was ignored in ath9k_hw_setuptxqueue, which instead relied on the order in which the function is called. Reported-by: Hubert Feurstein Signed-off-by: Felix Fietkau Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit e66e0fc5349525ed49935881818b2d5cc27f4468 Author: Junxiao Bi Date: Thu Dec 18 16:17:37 2014 -0800 ocfs2: fix journal commit deadlock commit 136f49b9171074872f2a14ad0ab10486d1ba13ca upstream. For buffer write, page lock will be got in write_begin and released in write_end, in ocfs2_write_end_nolock(), before it unlock the page in ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask for the read lock of journal->j_trans_barrier. Holding page lock and ask for journal->j_trans_barrier breaks the locking order. This will cause a deadlock with journal commit threads, ocfs2cmt will get write lock of journal->j_trans_barrier first, then it wakes up kjournald2 to do the commit work, at last it waits until done. To commit journal, kjournald2 needs flushing data first, it needs get the cache page lock. Since some ocfs2 cluster locks are holding by write process, this deadlock may hung the whole cluster. unlock pages before ocfs2_run_deallocs() can fix the locking order, also put unlock before ocfs2_commit_trans() to make page lock is unlocked before j_trans_barrier to preserve unlocking order. Signed-off-by: Junxiao Bi Reviewed-by: Wengang Wang Reviewed-by: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman