本文主要是介绍linux驱动资源没有及时释放排查,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
linux驱动资源没有及时释放排查
之前项目过程有遇到一个问题,明明应用已经close fd了,但是再次open设备的时候会出现“device is busy”的情况。刚开始出现这个问题的时候,还以为是应用没有及时的close fd导致的异常,同时排查了内核close设备的流程,close流程如下:
// fs/open.c
SYSCALL_DEFINE1(close, unsigned int, fd)close_fd(fd)filp_close(file, files)filp->f_op->flush(filp, id)fput(filp);fput_many(file, 1)
通过上面,并没有发现有相关的 file->f_op->release(inode, file) 行为,那么这个驱动的释放,到底是在哪里进行的呢?我们再关注一下 fput_many() 函数的实现。
static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);void fput_many(struct file *file, unsigned int refs)
{// 对file句柄的计数-1并测试是否为0,返回true则是可释放if (atomic_long_sub_and_test(refs, &file->f_count)) {struct task_struct *task = current;if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {init_task_work(&file->f_u.fu_rcuhead, ____fput);if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))return;/** After this task has run exit_task_work(),* task_work_add() will fail. Fall through to delayed* fput to avoid leaking *file.*/}if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))schedule_delayed_work(&delayed_fput_work, 1); // 最后这里调度delayed_fput_work,也就是调用delayed_fput()}
}void fput(struct file *file)
{fput_many(file, 1);
}
在 delayed_fput() 函数中,最后调用到 __fput() 函数。
/* the real guts of fput() - releasing the last reference to file*/
static void __fput(struct file *file)
{struct dentry *dentry = file->f_path.dentry;struct vfsmount *mnt = file->f_path.mnt;struct inode *inode = file->f_inode;fmode_t mode = file->f_mode;if (unlikely(!(file->f_mode & FMODE_OPENED)))goto out;might_sleep();fsnotify_close(file);/** The function eventpoll_release() should be the first called* in the file cleanup chain.*/eventpoll_release(file);locks_remove_file(file);ima_file_free(file);if (unlikely(file->f_flags & FASYNC)) {if (file->f_op->fasync)file->f_op->fasync(-1, file, 0);}if (file->f_op->release)file->f_op->release(inode, file); //真正在,在这里才会进行驱动的释放if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&!(mode & FMODE_PATH))) {cdev_put(inode->i_cdev);}fops_put(file->f_op);put_pid(file->f_owner.pid);put_file_access(file);dput(dentry);if (unlikely(mode & FMODE_NEED_UNMOUNT))dissolve_on_fput(mnt);mntput(mnt);
out:file_free(file);
}
那么,回头我们的问题,为什么应用调用了close函数,驱动却没有释放呢?从上面的代码流程来看,只有一个可能,那就是这个file的引用计数不为0,还有其他地方在引用,导致无法release。
在内核搜索代码可以发现,调用 get_file() 函数,将会导致这个引用计数f_count自增。
最后分析代码发现,在open的时候, 没有用O_CLOEXEC flag,导致进程中如果出现popen或者system打开的进程将会拷贝一份当前进程的fd信息,导致资源引用计数+1,需要等待所有进程都退出后,fd的引用计数才为0。
所以针对这个问题,只需要在open节点的时候,增加O_CLOEXEC
这个标识即可。
下面附上O_CLOEXEC 这个标识的作用说明:
O_CLOEXEC (since Linux 2.6.23)Enable the close-on-exec flag for the new file descriptor. Specifying this flag permits a program to avoid additional fcntl(2) F_SETFD operations to set the FD_CLOEXEC flag.Note that the use of this flag is essential in some multithreaded programs, because using a separate fcntl(2) F_SETFD operation to set the FD_CLOEXEC flag does not suffice to avoid race conditions where one threadopens a file descriptor and attempts to set its close-on-exec flag using fcntl(2) at the same time as another thread does a fork(2) plus execve(2). Depending on the order of execution, the race may lead to thefile descriptor returned by open() being unintentionally leaked to the program executed by the child process created by fork(2). (This kind of race is in principle possible for any system call that creates a filedescriptor whose close-on-exec flag should be set, and various other Linux system calls provide an equivalent of the O_CLOEXEC flag to deal with this problem.)
这个标识,在多线程的程序中是必不可少的,避免open返回的文件描述符无意泄漏给fork创建的子进程。
这篇关于linux驱动资源没有及时释放排查的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!