I implemented a working low level fuse proxy fs, which is usable for simple directory transversal. Its a simple single threaded read only proxyfs, and doesnt have the whole suite of optimisations that fuse guys have implemented.
The low level implementation is still highly unstable, and the deadlocks are somehow really hard to debug, I had a really tough time finding the problem as every time the process went into unterruptable sleep making it impossible to kill! even gdb wouldnt respond and the only option left is to restart the pc, turned out that the kernel deadlocks in a recursive syscall to the fuse fs and I still need to find a fix for it.
Although I do realize that going through this approach could take a some effort reimplementing all of the optimizations that default fuse high level apis have implemented, so I might end up copying over source files from the libfuse project or statically compile the project with the libfuse library with modifications. But I also started to look into maintaining an inode table over the high level api and then build caching on top of that, so that the optimisations from fuse come from free in exchange of some overhead of duplicating the mappings of inodes to paths, both at the fuse level and at our fs level. The codebase is pretty modular so for now I can switch between using both of the apis very easily and currently maintaining implementation over both of them.
At the same time I have also started to benchmark different approaches. So I got monkey running over the fuse fs and standard fs and then compared them. but initial benchmarks werent very good, the numbers are really bad with proxyfs, fuse guys have an overlay server in their codebase and even running benchmark over that gives some really bad numbers. For concurrent connection in ranges like hundreds, seems like the context overhead of fuse fs really starts to push the throughput down,
I used wrk as the benchmarking tool (https://github.com/wg/wrk), running at 400 concurrent connections, and duration of 30s, using standard fs over monkey we get:
but over proxyfs we get:
and with the fusexmp_fh overlay fs in libfuse sources we get similar results to the proxyfs:
Something is going terribly wrong in the fuse world, maybe its concurrent io requests but I dont really know for now, running something like callgrind over the the program suggests that most of the time is taken in the libfuse world. I have started a discussion over the libfuse devel mailing list and looking to get their advice.
I will continue to measure the perf numbers from now. anyways I have started initial work over a caching proxyfs. currently it maintains file descriptors over longer periods of time (even after the server closing them) so that the kernel doesnt drop the caches, but I would soon land an implementation using mmap and continue the experiments. The modular structure of the project turned out to be really useful as I have multiple approaches and experiements all in the same codebase (even both using high level fuse api and the low level) and they can easily be turned on and off individually using the makefile and compiled into a fuse fs.