Commit Graph

83872 Commits

Author SHA1 Message Date
Steven Rostedt (Google)
020010fbfa eventfs: Delete eventfs_inode when the last dentry is freed
There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.

To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).

This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).

Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 5bdcd5f533 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02 00:17:27 -04:00
Steven Rostedt (Google)
44365329f8 eventfs: Hold eventfs_mutex when calling callback functions
The callback function that is used to create inodes and dentries is not
protected by anything and the data that is passed to it could become
stale. After eventfs_remove_dir() is called by the tracing system, it is
free to remove the events that are associated to that directory.
Unfortunately, that means the callbacks must not be called after that.

     CPU0				CPU1
     ----				----
 eventfs_root_lookup() {
				 eventfs_remove_dir() {
				      mutex_lock(&event_mutex);
				      ei->is_freed = set;
				      mutex_unlock(&event_mutex);
				 }
				 kfree(event_call);

    for (...) {
      entry = &ei->entries[i];
      r = entry->callback() {
          call = data;		// call == event_call above
          if (call->flags ...)

 [ USE AFTER FREE BUG ]

The safest way to protect this is to wrap the callback with:

 mutex_lock(&eventfs_mutex);
 if (!ei->is_freed)
     r = entry->callback();
 else
     r = -1;
 mutex_unlock(&eventfs_mutex);

This will make sure that the callback will not be called after it is
freed. But now it needs to be known that the callback is called while
holding internal eventfs locks, and that it must not call back into the
eventfs / tracefs system. There's no reason it should anyway, but document
that as well.

Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=Y50WpWAE7_Q@mail.gmail.com/
Link: https://lkml.kernel.org/r/20231101172649.906696613@goodmis.org

Cc: Ajay Kaher <akaher@vmware.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02 00:16:49 -04:00
Steven Rostedt (Google)
28e12c09f5 eventfs: Save ownership and mode
Now that inodes and dentries are created on the fly, they are also
reclaimed on memory pressure. Since the ownership and file mode are saved
in the inode, if they are freed, any changes to the ownership and mode
will be lost.

To counter this, if the user changes the permissions or ownership, save
them, and when creating the inodes again, restore those changes.

Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:55:12 -04:00
Steven Rostedt (Google)
77a06c33a2 eventfs: Test for ei->is_freed when accessing ei->dentry
The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It
is protected by the eventfs_mutex. Anytime the eventfs_mutex is released,
and access to the ei->dentry needs to be done, it should first check if
ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry
is invalid and must not be used. The ei->dentry must only be accessed
under the eventfs_mutex and after checking if ei->is_freed is set.

When the ei is being freed, it will (under the eventfs_mutex) set is_freed
and at the same time move the dentry to a free list to be cleared after
the eventfs_mutex is released. This means that any access to the
ei->dentry must check first if ei->is_freed is set, because if it is, then
the dentry is on its way to be freed.

Also add comments to describe this better.

Link: https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=SoEywQOe19fGP3uR15SGowkdK+_X85Cg@mail.gmail.com/
Link: https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com/
Link: https://lkml.kernel.org/r/20231101172649.477608228@goodmis.org

Cc: Ajay Kaher <akaher@vmware.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:50:22 -04:00
Steven Rostedt (Google)
db3a397209 eventfs: Have a free_ei() that just frees the eventfs_inode
As the eventfs_inode is freed in two different locations, make a helper
function free_ei() to make sure all the allocated fields of the
eventfs_inode is freed.

This requires renaming the existing free_ei() which is called by the srcu
handler to free_rcu_ei() and have free_ei() just do the freeing, where
free_rcu_ei() will call it.

Link: https://lkml.kernel.org/r/20231101172649.265214087@goodmis.org

Cc: Ajay Kaher <akaher@vmware.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:49:55 -04:00
Steven Rostedt (Google)
f2f496370a eventfs: Remove "is_freed" union with rcu head
The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.

Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.

Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:49:32 -04:00
Steven Rostedt (Google)
9037caa09e eventfs: Fix kerneldoc of eventfs_remove_rec()
The eventfs_remove_rec() had some missing parameters in the kerneldoc
comment above it. Also, rephrase the description a bit more to have a bit
more correct grammar.

Link: https://lore.kernel.org/linux-trace-kernel/20231030121523.0b2225a7@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode");
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4SgqasWo-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:49:25 -04:00
Steven Rostedt (Google)
77bc4d4921 eventfs: Remove extra dget() in eventfs_create_events_dir()
The creation of the top events directory does a dget() at the end of the
creation in eventfs_create_events_dir() with a comment saying the final
dput() will happen when it is removed. The problem is that a dget() is
already done on the dentry when it was created with tracefs_start_creating()!
The dget() now just causes a memory leak of that dentry.

Remove the extra dget() as the final dput() in the deletion of the events
directory actually matches the one in tracefs_start_creating().

Link: https://lore.kernel.org/linux-trace-kernel/20231031124229.4f2e3fa1@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-01 23:45:05 -04:00
Steven Rostedt (Google)
29e06c1070 eventfs: Fix typo in eventfs_inode union comment
It's eventfs_inode not eventfs_indoe. There's no deer involved!

Link: https://lore.kernel.org/linux-trace-kernel/20231024131024.5634c743@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-25 21:26:26 -04:00
Steven Rostedt (Google)
a9de4eb15a eventfs: Fix WARN_ON() in create_file_dentry()
As the comment right above a WARN_ON() in create_file_dentry() states:

  * Note, with the mutex held, the e_dentry cannot have content
  * and the ei->is_freed be true at the same time.

But the WARN_ON() only has:

  WARN_ON_ONCE(ei->is_free);

Where to match the comment (and what it should actually do) is:

  dentry = *e_dentry;
  WARN_ON_ONCE(dentry && ei->is_free)

Also in that case, set dentry to NULL (although it should never happen).

Link: https://lore.kernel.org/linux-trace-kernel/20231024123628.62b88755@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-25 21:26:26 -04:00
Jiapeng Chong
64bf2f685c tracefs/eventfs: Modify mismatched function name
No functional modification involved.

fs/tracefs/event_inode.c:864: warning: expecting prototype for eventfs_remove(). Prototype was for eventfs_remove_dir() instead.

Link: https://lore.kernel.org/linux-trace-kernel/20231019031353.73846-1-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6939
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-20 12:14:59 -04:00
Steven Rostedt (Google)
7e8ad67c9b eventfs: Fix failure path in eventfs_create_events_dir()
The failure path of allocating ei goes to a path that dereferences ei.
Add another label that skips over the ei dereferences to do the rest of
the clean up.

Link: https://lore.kernel.org/all/70e7bace-561c-95f-1117-706c2c220bc@inria.fr/
Link: https://lore.kernel.org/linux-trace-kernel/20231019204132.6662fef0@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-20 10:04:01 -04:00
Nathan Chancellor
b8a555dc31 eventfs: Use ERR_CAST() in eventfs_create_events_dir()
When building with clang and CONFIG_RANDSTRUCT_FULL=y, there is an error
due to a cast in eventfs_create_events_dir():

  fs/tracefs/event_inode.c:734:10: error: casting from randomized structure pointer type 'struct dentry *' to 'struct eventfs_inode *'
    734 |                 return (struct eventfs_inode *)dentry;
        |                        ^
  1 error generated.

Use the ERR_CAST() function to resolve the error, as it was designed for
this exact situation (casting an error pointer to another type).

Link: https://lore.kernel.org/linux-trace-kernel/20231018-ftrace-fix-clang-randstruct-v1-1-338cb214abfb@kernel.org

Closes: https://github.com/ClangBuiltLinux/linux/issues/1947
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-18 14:20:17 -04:00
Steven Rostedt (Google)
2819f23ac1 eventfs: Use eventfs_remove_events_dir()
The update to removing the eventfs_file changed the way the events top
level directory was handled. Instead of returning a dentry, it now returns
the eventfs_inode. In this changed, the removing of the events top level
directory is not much different than removing any of the other
directories. Because of this, the removal just called eventfs_remove_dir()
instead of eventfs_remove_events_dir().

Although eventfs_remove_dir() does the clean up, it misses out on the
dget() of the ei->dentry done in eventfs_create_events_dir(). It makes
more sense to match eventfs_create_events_dir() with a specific function
eventfs_remove_events_dir() and this specific function can then perform
the dput() to the dentry that had the dget() when it was created.

Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310051743.y9EobbUr-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-05 10:49:32 -04:00
Steven Rostedt (Google)
5790b1fb3d eventfs: Remove eventfs_file and just use eventfs_inode
Instead of having a descriptor for every file represented in the eventfs
directory, only have the directory itself represented. Change the API to
send in a list of entries that represent all the files in the directory
(but not other directories). The entry list contains a name and a callback
function that will be used to create the files when they are accessed.

struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent,
						const struct eventfs_entry *entries,
						int size, void *data);

is used for the top level eventfs directory, and returns an eventfs_inode
that will be used by:

struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent,
					 const struct eventfs_entry *entries,
					 int size, void *data);

where both of the above take an array of struct eventfs_entry entries for
every file that is in the directory.

The entries are defined by:

typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
				const struct file_operations **fops);

struct eventfs_entry {
	const char			*name;
	eventfs_callback		callback;
};

Where the name is the name of the file and the callback gets called when
the file is being created. The callback passes in the name (in case the
same callback is used for multiple files), a pointer to the mode, data and
fops. The data will be pointing to the data that was passed in
eventfs_create_dir() or eventfs_create_events_dir() but may be overridden
to point to something else, as it will be used to point to the
inode->i_private that is created. The information passed back from the
callback is used to create the dentry/inode.

If the callback fills the data and the file should be created, it must
return a positive number. On zero or negative, the file is ignored.

This logic may also be used as a prototype to convert entire pseudo file
systems into just-in-time allocation.

The "show_events_dentry" file has been updated to show the directories,
and any files they have.

With just the eventfs_file allocations:

 Before after deltas for meminfo (in kB):

   MemFree:		-14360
   MemAvailable:	-14260
   Buffers:		40
   Cached:		24
   Active:		44
   Inactive:		48
   Inactive(anon):	28
   Active(file):	44
   Inactive(file):	20
   Dirty:		-4
   AnonPages:		28
   Mapped:		4
   KReclaimable:	132
   Slab:		1604
   SReclaimable:	132
   SUnreclaim:		1472
   Committed_AS:	12

 Before after deltas for slabinfo:

   <slab>:		<objects>	[ * <size> = <total>]

   ext4_inode_cache	27		[* 1184 = 31968 ]
   extent_status	102		[*   40 = 4080 ]
   tracefs_inode_cache	144		[*  656 = 94464 ]
   buffer_head		39		[*  104 = 4056 ]
   shmem_inode_cache	49		[*  800 = 39200 ]
   filp			-53		[*  256 = -13568 ]
   dentry		251		[*  192 = 48192 ]
   lsm_file_cache	277		[*   32 = 8864 ]
   vm_area_struct	-14		[*  184 = -2576 ]
   trace_event_file	1748		[*   88 = 153824 ]
   kmalloc-1k		35		[* 1024 = 35840 ]
   kmalloc-256		49		[*  256 = 12544 ]
   kmalloc-192		-28		[*  192 = -5376 ]
   kmalloc-128		-30		[*  128 = -3840 ]
   kmalloc-96		10581		[*   96 = 1015776 ]
   kmalloc-64		3056		[*   64 = 195584 ]
   kmalloc-32		1291		[*   32 = 41312 ]
   kmalloc-16		2310		[*   16 = 36960 ]
   kmalloc-8		9216		[*    8 = 73728 ]

 Free memory dropped by 14,360 kB
 Available memory dropped by 14,260 kB
 Total slab additions in size: 1,771,032 bytes

With this change:

 Before after deltas for meminfo (in kB):

   MemFree:		-12084
   MemAvailable:	-11976
   Buffers:		32
   Cached:		32
   Active:		72
   Inactive:		168
   Inactive(anon):	176
   Active(file):	72
   Inactive(file):	-8
   Dirty:		24
   AnonPages:		196
   Mapped:		8
   KReclaimable:	148
   Slab:		836
   SReclaimable:	148
   SUnreclaim:		688
   Committed_AS:	324

 Before after deltas for slabinfo:

   <slab>:		<objects>	[ * <size> = <total>]

   tracefs_inode_cache	144		[* 656 = 94464 ]
   shmem_inode_cache	-23		[* 800 = -18400 ]
   filp			-92		[* 256 = -23552 ]
   dentry		179		[* 192 = 34368 ]
   lsm_file_cache	-3		[* 32 = -96 ]
   vm_area_struct	-13		[* 184 = -2392 ]
   trace_event_file	1748		[* 88 = 153824 ]
   kmalloc-1k		-49		[* 1024 = -50176 ]
   kmalloc-256		-27		[* 256 = -6912 ]
   kmalloc-128		1864		[* 128 = 238592 ]
   kmalloc-64		4685		[* 64 = 299840 ]
   kmalloc-32		-72		[* 32 = -2304 ]
   kmalloc-16		256		[* 16 = 4096 ]
   total = 721352

 Free memory dropped by 12,084 kB
 Available memory dropped by 11,976 kB
 Total slab additions in size:  721,352 bytes

That's over 2 MB in savings per instance for free and available memory,
and over 1 MB in savings per instance of slab memory.

Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-04 17:11:50 -04:00
Linus Torvalds
d2c5231581 Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "Fourteen hotfixes, eleven of which are cc:stable. The remainder
  pertain to issues which were introduced after 6.5"

* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  Crash: add lock to serialize crash hotplug handling
  selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
  mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
  mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
  mm, memcg: reconsider kmem.limit_in_bytes deprecation
  mm: zswap: fix potential memory corruption on duplicate store
  arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
  mm: hugetlb: add huge page size param to set_huge_pte_at()
  maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
  maple_tree: add mas_is_active() to detect in-tree walks
  nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
  mm: abstract moving to the next PFN
  mm: report success more often from filemap_map_folio_range()
  fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
2023-10-01 13:33:25 -07:00
Linus Torvalds
3b347e4032 Merge tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Make sure 32-bit applications using user events have aligned access
   when running on a 64-bit kernel.

 - Add cond_resched in the loop that handles converting enums in
   print_fmt string is trace events.

 - Fix premature wake ups of polling processes in the tracing ring
   buffer. When a task polls waiting for a percentage of the ring buffer
   to be filled, the writer still will wake it up at every event. Add
   the polling's percentage to the "shortest_full" list to tell the
   writer when to wake it up.

 - For eventfs dir lookups on dynamic events, an event system's only
   event could be removed, leaving its dentry with no children. This is
   totally legitimate. But in eventfs_release() it must not access the
   children array, as it is only allocated when the dentry has children.

* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Test for dentries array allocated in eventfs_release()
  tracing/user_events: Align set_bit() address for all archs
  tracing: relax trace_event_eval_update() execution with cond_resched()
  ring-buffer: Update "shortest_full" in polling
2023-09-30 18:19:02 -07:00
Steven Rostedt (Google)
2598bd3ca8 eventfs: Test for dentries array allocated in eventfs_release()
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.

Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f928 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-30 16:26:04 -04:00
Linus Torvalds
25d48d570e Merge tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:

 - Handle a race between writing and shrinking block devices by
   returning EIO

 - Fix a typo in a comment

* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Spelling s/preceeding/preceding/g
  iomap: add a workaround for racy i_size updates on block devices
2023-09-30 11:01:38 -07:00
Linus Torvalds
ae21363998 Merge tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:

 - Fix NFSv4 READ corner case

* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
2023-09-30 09:44:48 -07:00
Linus Torvalds
ba77f7a63f Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
 "Fix for password freeing potential oops (also for stable)"

* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  fs/smb/client: Reset password pointer to NULL
2023-09-30 09:39:23 -07:00
Pan Bian
7ee29facd8 nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails.  If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed.  However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug.  This patch moves the release
operation after unlocking and putting the page.

NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed.  However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.

[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Greg Ungerer
7c31515857 fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example).  The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).

On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary.  This
matters since start_thread() will set the ARM CPSR register as required
based on this flag.  If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.

Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it.  This macro in
the generic case sets PER_LINUX and preserves the upper bytes. 
Architectures can override this for their specific use case, and ARM does
exactly this.

The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware.  If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.

Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Linus Torvalds
9f3ebbef74 Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
 "Two SMB3 server fixes for null pointer dereferences:

   - invalid SMB3 request case (fixes issue found in testing the read
     compound patch)

   - iovec error case in response processing"

* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: check iov vector index in ksmbd_conn_write()
  ksmbd: return invalid parameter error response if smb2 request is invalid
2023-09-29 16:51:38 -07:00
Linus Torvalds
14c06b913d Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A series that fixes an involved 'double watch error' deadlock in RBD
  marked for stable and two cleanups"

* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
  rbd: take header_rwsem in rbd_dev_refresh() only when updating
  rbd: decouple parent info read-in from updating rbd_dev
  rbd: decouple header read-in from updating rbd_dev->header
  rbd: move rbd_dev_refresh() definition
  Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
  ceph: remove unnecessary check for NULL in parse_longname()
2023-09-29 16:46:24 -07:00
Linus Torvalds
10c0b6ba25 Merge tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:

 - fix for commit 68b957f64f ("xfs: load uncached unlinked inodes into
   memory on demand") which address review comments provided by Dave
   Chinner

* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix reloading entire unlinked bucket lists
2023-09-29 16:41:25 -07:00
Quang Le
e6e43b8aa7 fs/smb/client: Reset password pointer to NULL
Forget to reset ctx->password to NULL will lead to bug like double free

Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-28 14:49:51 -05:00
Geert Uytterhoeven
684f7e6d28 iomap: Spelling s/preceeding/preceding/g
Fix a misspelling of "preceding".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-09-28 09:26:58 -07:00
Chuck Lever
0d32a6bbb8 NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.

However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.

Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d752155 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-09-28 10:34:28 -04:00
Linus Torvalds
cac405a3bf Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - delayed refs fixes:
     - fix race when refilling delayed refs block reserve
     - prevent transaction block reserve underflow when starting
       transaction
     - error message and value adjustments

 - fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
   -Wmaybe-uninitialized

 - fix for smatch report where uninitialized data from invalid extent
   buffer range could be returned to the caller

 - fix numeric overflow in statfs when calculating lower threshold
   for a full filesystem

* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: initialize start_slot in btrfs_log_prealloc_extents
  btrfs: make sure to initialize start and len in find_free_dev_extent
  btrfs: reset destination buffer when read_extent_buffer() gets invalid range
  btrfs: properly report 0 avail for very full file systems
  btrfs: log message if extent item not found when running delayed extent op
  btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
  btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
  btrfs: prevent transaction block reserve underflow when starting transaction
  btrfs: fix race when refilling delayed refs block reserve
2023-09-26 09:44:08 -07:00
Linus Torvalds
84422aee15 Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "This contains the usual miscellaneous fixes and cleanups for vfs and
  individual fses:

  Fixes:
   - Revert ki_pos on error from buffered writes for direct io fallback
   - Add missing documentation for block device and superblock handling
     for changes merged this cycle
   - Fix reiserfs flexible array usage
   - Ensure that overlayfs sets ctime when setting mtime and atime
   - Disable deferred caller completions with overlayfs writes until
     proper support exists

  Cleanups:
   - Remove duplicate initialization in pipe code
   - Annotate aio kioctx_table with __counted_by"

* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  overlayfs: set ctime when setting mtime and atime
  ntfs3: put resources during ntfs_fill_super()
  ovl: disable IOCB_DIO_CALLER_COMP
  porting: document superblock as block device holder
  porting: document new block device opening order
  fs/pipe: remove duplicate "offset" initializer
  fs-writeback: do not requeue a clean inode having skipped pages
  aio: Annotate struct kioctx_table with __counted_by
  direct_write_fallback(): on error revert the ->ki_pos update from buffered write
  reiserfs: Replace 1-element array with C99 style flex-array
2023-09-26 08:50:30 -07:00
Christoph Hellwig
381c043233 iomap: add a workaround for racy i_size updates on block devices
A szybot reproducer that does write I/O while truncating the size of a
block device can end up in clean_bdev_aliases, which tries to clean the
bdev aliases that it uses.  This is because iomap_to_bh automatically
sets the BH_New flag when outside of i_size.  For block devices updates
to i_size are racy and we can hit this case in a tiny race window,
leading to the eventual clean_bdev_aliases call.  Fix this by erroring
out of > i_size I/O on block devices.

Reported-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-09-25 08:55:00 -07:00
Jeff Layton
03dbab3bba overlayfs: set ctime when setting mtime and atime
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.

POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 14:53:54 +02:00
Christian Brauner
493c71926c ntfs3: put resources during ntfs_fill_super()
During ntfs_fill_super() some resources are allocated that we need to
cleanup in ->put_super() such as additional inodes. When
ntfs_fill_super() fails these resources need to be cleaned up as well.

Reported-by: syzbot+2751da923b5eb8307b0b@syzkaller.appspotmail.com
Fixes: 78a06688a4 ("ntfs3: drop inode references in ntfs_put_super()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 14:12:42 +02:00
Jens Axboe
2d1b3bbc3d ovl: disable IOCB_DIO_CALLER_COMP
overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.

We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.

Reported-by: Zorro Lang <zlang@redhat.com>
Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
Fixes: 8c052fb300 ("iomap: support IOCB_DIO_CALLER_COMP")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <71897125-e570-46ce-946a-d4729725e28f@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 11:37:28 +02:00
Darrick J. Wong
537c013b14 xfs: fix reloading entire unlinked bucket lists
During review of the patcheset that provided reloading of the incore
iunlink list, Dave made a few suggestions, and I updated the copy in my
dev tree.  Unfortunately, I then got distracted by ... who even knows
what ... and forgot to backport those changes from my dev tree to my
release candidate branch.  I then sent multiple pull requests with stale
patches, and that's what was merged into -rc3.

So.

This patch re-adds the use of an unlocked iunlink list check to
determine if we want to allocate the resources to recreate the incore
list.  Since lost iunlinked inodes are supposed to be rare, this change
helps us avoid paying the transaction and AGF locking costs every time
we open any inode.

This also re-adds the shutdowns on failure, and re-applies the
restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
re-adds a requested comment about the quotachecking code.

Retain the original RVB tag from Dave since there's no code change from
the last submission.

Fixes: 68b957f64f ("xfs: load uncached unlinked inodes into memory on demand")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-09-24 18:12:13 -07:00
Linus Torvalds
5edc6bb321 Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix the "bytes" output of the per_cpu stat file

   The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
   accounting was not accurate. It is suppose to show how many used
   bytes are still in the ring buffer, but even when the ring buffer was
   empty it would still show there were bytes used.

 - Fix a bug in eventfs where reading a dynamic event directory (open)
   and then creating a dynamic event that goes into that diretory screws
   up the accounting.

   On close, the newly created event dentry will get a "dput" without
   ever having a "dget" done for it. The fix is to allocate an array on
   dir open to save what dentries were actually "dget" on, and what ones
   to "dput" on close.

* tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Remember what dentries were created on dir open
  ring-buffer: Fix bytes info in per_cpu buffer stats
2023-09-24 13:55:34 -07:00
Linus Torvalds
85eba5f175 Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
  are cc:stable"

* tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  proc: nommu: fix empty /proc/<pid>/maps
  filemap: add filemap_map_order0_folio() to handle order0 folio
  proc: nommu: /proc/<pid>/maps: release mmap read lock
  mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
  pidfd: prevent a kernel-doc warning
  argv_split: fix kernel-doc warnings
  scatterlist: add missing function params to kernel-doc
  selftests/proc: fixup proc-empty-vm test after KSM changes
  revert "scripts/gdb/symbols: add specific ko module load command"
  selftests: link libasan statically for tests with -fsanitize=address
  task_work: add kerneldoc annotation for 'data' argument
  mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
  sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-23 11:51:16 -07:00
Linus Torvalds
8565bdf8cd Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Six smb3 client fixes, including three for stable, from the SMB
  plugfest (testing event) this week:

   - Reparse point handling fix (found when investigating dir
     enumeration when fifo in dir)

   - Fix excessive thread creation for dir lease cleanup

   - UAF fix in negotiate path

   - remove duplicate error message mapping and fix confusing warning
     message

   - add dynamic trace point to improve debugging RDMA connection
     attempts"

* tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix confusing debug message
  smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
  smb3: remove duplicate error mapping
  cifs: Fix UAF in cifs_demultiplex_thread()
  smb3: do not start laundromat thread when dir leases  disabled
  smb3: Add dynamic trace points for RDMA (smbdirect) reconnect
2023-09-23 11:34:48 -07:00
Linus Torvalds
59c376d636 Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:

 - Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
   less poorly with block device io racing with block device resizing

 - Fix a stale page data exposure bug introduced in 6.6-rc1 when
   unsharing a file range that is not in the page cache

* tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: convert iomap_unshare_iter to use large folios
  iomap: don't skip reading in !uptodate folios when unsharing a range
  iomap: handle error conditions more gracefully in iomap_to_bh
2023-09-23 09:56:40 -07:00
Linus Torvalds
3abc79dce6 Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:

 - Fix an integer overflow bug when processing an fsmap call

 - Fix crash due to CPU hot remove event racing with filesystem mount
   operation

 - During read-only mount, XFS does not allow the contents of the log to
   be recovered when there are one or more unrecognized rcompat features
   in the primary superblock, since the log might have intent items
   which the kernel does not know how to process

 - During recovery of log intent items, XFS now reserves log space
   sufficient for one cycle of a permanent transaction to execute.
   Otherwise, this could lead to livelocks due to non-availability of
   log space

 - On an fs which has an ondisk unlinked inode list, trying to delete a
   file or allocating an O_TMPFILE file can cause the fs to the shutdown
   if the first inode in the ondisk inode list is not present in the
   inode cache. The bug is solved by explicitly loading the first inode
   in the ondisk unlinked inode list into the inode cache if it is not
   already cached

   A similar problem arises when the uncached inode is present in the
   middle of the ondisk unlinked inode list. This second bug is
   triggered when executing operations like quotacheck and bulkstat. In
   this case, XFS now reads in the entire ondisk unlinked inode list

 - Enable LARP mode only on recent v5 filesystems

 - Fix a out of bounds memory access in scrub

 - Fix a performance bug when locating the tail of the log during
   mounting a filesystem

* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
  xfs: only call xchk_stats_merge after validating scrub inputs
  xfs: require a relatively recent V5 filesystem for LARP mode
  xfs: make inode unlinked bucket recovery work with quotacheck
  xfs: load uncached unlinked inodes into memory on demand
  xfs: reserve less log space when recovering log intent items
  xfs: fix log recovery when unknown rocompat bits are set
  xfs: reload entire unlinked bucket lists
  xfs: allow inode inactivation during a ro mount log recovery
  xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
  xfs: remove CPU hotplug infrastructure
  xfs: remove the all-mounts list
  xfs: use per-mount cpumask to track nonempty percpu inodegc lists
  xfs: fix an agbno overflow in __xfs_getfsmap_datadev
  xfs: fix per-cpu CIL structure aggregation racing with dying cpus
  xfs: fix select in config XFS_ONLINE_SCRUB_STATS
2023-09-22 16:32:19 -07:00
Steven Rostedt (Google)
ef36b4f928 eventfs: Remember what dentries were created on dir open
Using the following code with libtracefs:

	int dfd;

	// create the directory events/kprobes/kp1
	tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1");

	// Open the kprobes directory
	dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY);

	// Do a lookup of the kprobes/kp1 directory (by looking at enable)
	tracefs_file_exists(NULL, "events/kprobes/kp1/enable");

	// Now create a new entry in the kprobes directory
	tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1");

	// Do another lookup to create the dentries
	tracefs_file_exists(NULL, "events/kprobes/kp2/enable"))

	// Close the directory
	close(dfd);

What happened above, the first open (dfd) will call
dcache_dir_open_wrapper() that will create the dentries and up their ref
counts.

Now the creation of "kp2" will add another dentry within the kprobes
directory.

Upon the close of dfd, eventfs_release() will now do a dput for all the
entries in kprobes. But this is where the problem lies. The open only
upped the dentry of kp1 and not kp2. Now the close is decrementing both
kp1 and kp2, which causes kp2 to get a negative count.

Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel
to crash! (due to the messed up accounting of the ref counts).

To solve this, save all the dentries that are opened in the
dcache_dir_open_wrapper() into an array, and use this array to know what
dentries to do a dput on in eventfs_release().

Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the
file->private_data, we need to also add a wrapper around dcache_readdir()
that uses the cursor assigned to the file->private_data. This is because
the dentries need to also be saved in the file->private_data. To do this
create the structure:

  struct dentry_list {
	void		*cursor;
	struct dentry	**dentries;
  };

Which will hold both the cursor and the dentries. Some shuffling around is
needed to make sure that dcache_dir_open() and dcache_readdir() only see
the cursor.

Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22 16:58:11 -04:00
Namjae Jeon
73f949ea87 ksmbd: check iov vector index in ksmbd_conn_write()
If ->iov_idx is zero, This means that the iov vector for the response
was not added during the request process. In other words, it means that
there is a problem in generating a response, So this patch return as
an error to avoid NULL pointer dereferencing problem.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-21 14:41:06 -05:00
Namjae Jeon
f2f11fca5d ksmbd: return invalid parameter error response if smb2 request is invalid
If smb2 request from client is invalid, The following kernel oops could
happen. The patch e2b76ab8b5: "ksmbd: add support for read compound"
leads this issue. When request is invalid, It doesn't set anything in
the response buffer. This patch add missing set invalid parameter error
response.

[  673.085542] ksmbd: cli req too short, len 184 not 142. cmd:5 mid:109
[  673.085580] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  673.085591] #PF: supervisor read access in kernel mode
[  673.085600] #PF: error_code(0x0000) - not-present page
[  673.085608] PGD 0 P4D 0
[  673.085620] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  673.085631] CPU: 3 PID: 1039 Comm: kworker/3:0 Not tainted 6.6.0-rc2-tmt #16
[  673.085643] Hardware name: AZW U59/U59, BIOS JTKT001 05/05/2022
[  673.085651] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[  673.085719] RIP: 0010:ksmbd_conn_write+0x68/0xc0 [ksmbd]
[  673.085808] RAX: 0000000000000000 RBX: ffff88811ade4f00 RCX: 0000000000000000
[  673.085817] RDX: 0000000000000000 RSI: ffff88810c2a9780 RDI: ffff88810c2a9ac0
[  673.085826] RBP: ffffc900005e3e00 R08: 0000000000000000 R09: 0000000000000000
[  673.085834] R10: ffffffffa3168160 R11: 63203a64626d736b R12: ffff8881057c8800
[  673.085842] R13: ffff8881057c8820 R14: ffff8882781b2380 R15: ffff8881057c8800
[  673.085852] FS:  0000000000000000(0000) GS:ffff888278180000(0000) knlGS:0000000000000000
[  673.085864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  673.085872] CR2: 0000000000000000 CR3: 000000015b63c000 CR4: 0000000000350ee0
[  673.085883] Call Trace:
[  673.085890]  <TASK>
[  673.085900]  ? show_regs+0x6a/0x80
[  673.085916]  ? __die+0x25/0x70
[  673.085926]  ? page_fault_oops+0x154/0x4b0
[  673.085938]  ? tick_nohz_tick_stopped+0x18/0x50
[  673.085954]  ? __irq_work_queue_local+0xba/0x140
[  673.085967]  ? do_user_addr_fault+0x30f/0x6c0
[  673.085979]  ? exc_page_fault+0x79/0x180
[  673.085992]  ? asm_exc_page_fault+0x27/0x30
[  673.086009]  ? ksmbd_conn_write+0x68/0xc0 [ksmbd]
[  673.086067]  ? ksmbd_conn_write+0x46/0xc0 [ksmbd]
[  673.086123]  handle_ksmbd_work+0x28d/0x4b0 [ksmbd]
[  673.086177]  process_one_work+0x178/0x350
[  673.086193]  ? __pfx_worker_thread+0x10/0x10
[  673.086202]  worker_thread+0x2f3/0x420
[  673.086210]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  673.086222]  ? __pfx_worker_thread+0x10/0x10
[  673.086230]  kthread+0x103/0x140
[  673.086242]  ? __pfx_kthread+0x10/0x10
[  673.086253]  ret_from_fork+0x39/0x60
[  673.086263]  ? __pfx_kthread+0x10/0x10
[  673.086274]  ret_from_fork_asm+0x1b/0x30

Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Reported-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-21 14:41:06 -05:00
Linus Torvalds
b5cbe7c00a Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull finegrained timestamp reverts from Christian Brauner:
 "Earlier this week we sent a few minor fixes for the multi-grained
  timestamp work in [1]. While we were polishing those up after Linus
  realized that there might be a nicer way to fix them we received a
  regression report in [2] that fine grained timestamps break gnulib
  tests and thus possibly other tools.

  The kernel will elide fine-grain timestamp updates when no one is
  actively querying for them to avoid performance impacts. So a sequence
  like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
  result in timestamp f1 to be older than the final f2 timestamp even
  though f1 was last written too but the second write didn't update the
  timestamp.

  Such plotholes can lead to subtle bugs when programs compare
  timestamps. For example, the nap() function in [2] will estimate that
  it needs to wait one ns on a fine-grain timestamp enabled filesytem
  between subsequent calls to observe a timestamp change. But in general
  we don't update timestamps with more than one jiffie if we think that
  no one is actively querying for fine-grain timestamps to avoid
  performance impacts.

  While discussing various fixes the decision was to go back to the
  drawing board and ultimately to explore a solution that involves only
  exposing such fine-grained timestamps to nfs internally and never to
  userspace.

  As there are multiple solutions discussed the honest thing to do here
  is not to fix this up or disable it but to cleanly revert. The general
  infrastructure will probably come back but there is no reason to keep
  this code in mainline.

  The general changes to timestamp handling are valid and a good cleanup
  that will stay. The revert is fully bisectable"

Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1]
Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2]

* tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "fs: add infrastructure for multigrain timestamps"
  Revert "btrfs: convert to multigrain timestamps"
  Revert "ext4: switch to multigrain timestamps"
  Revert "xfs: switch to multigrain timestamps"
  Revert "tmpfs: add support for multigrain timestamps"
2023-09-21 10:15:26 -07:00
Josef Bacik
b4c639f699 btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this

  fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
  fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
  uninitialized [-Wmaybe-uninitialized]
   4828 |                 ret = copy_items(trans, inode, dst_path, path,
	|                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   4829 |                                  start_slot, ins_nr, 1, 0);
	|                                  ~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
   4725 |         int start_slot;
	|             ^~~~~~~~~~

The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized.  However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:23 +02:00
Josef Bacik
20218dfbaa btrfs: make sure to initialize start and len in find_free_dev_extent
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y
that looks like this

  In function ‘gather_device_info’,
      inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8:
  fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized]
   5245 |                 devices_info[ndevs].dev_offset = dev_offset;
	|                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
  fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’:
  fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here
   5196 |         u64 dev_offset;

This occurs because find_free_dev_extent is responsible for setting
dev_offset, however if we get an -ENOMEM at the top of the function
we'll return without setting the value.

This isn't actually a problem because we will see the -ENOMEM in
gather_device_info() and return and not use the uninitialized value,
however we also just don't want the compiler warning so rework the code
slightly in find_free_dev_extent() to make sure it's always setting
*start and *len to avoid the compiler warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:20 +02:00
Steve French
c8ebf077fb smb3: fix confusing debug message
The message said it was an invalid mode, when it was intentionally
not set.  Fix confusing message logged to dmesg.

Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 19:50:05 -05:00
Paulo Alcantara
7fb77d9c87 smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
Fix missing set of cifs_open_info_data::reparse_point when SMB2_CREATE
request fails with STATUS_IO_REPARSE_TAG_NOT_HANDLED.

Fixes: 5f71ebc412 ("smb: client: parse reparse point flag in create response")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 16:12:09 -05:00
Steve French
6e2e27e47c smb3: remove duplicate error mapping
In status_to_posix_error STATUS_IO_REPARSE_TAG_NOT_HANDLED was mapped
to both -EOPNOTSUPP and also to -EIO but the later one (-EIO) is
ignored. Remove the duplicate.

Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 16:04:51 -05:00