I did quite a lot of progress this week. I profiled the cpu usage of the entire plugin using google’s performance tools  and mostly found that the cache statistics were causing some performance regression. I had an idea about this as they lock a mutex to increment the request counters. Since the last week I had been working on a new timer abstraction in the cache plugin for doing specific tasks based after every interval. Its landed now, it uses timerfd functionality in linux, and registers it with the monkey scheduler to get ticks after every certain time (currently one sec). I refactored all the statistics related code in a new file, and this time only thread local counters are updated on every request, and after every timers tic, the global counters are updated. And as the timer only fired in a single thread whichever can read the timerfd first) no mutexes are needed. The performance is now back to where it used to be and there is be no performance regression when serving requests with monkey plugin or without., yet the plugin is a a lot more complex, and does a lot more.
I also added file eviction functionality with the timer infrastructure in place, so the idle files which are evictable (can be opened again, not the case for custom overlays added through the plugin external api) are evicted. This should automatically reduce the footprint of plugin in idle time. I am also gonna soon shift the request and pipe pools to the new technique so the limits could automatically adjust based on statistics during a specific period. And cool down to zero extra footprint.
I fixed quite a lot of bugs, mainly with request errors and the http pipelined requests. I also experimented this week with sending out the raw buffers directly using write (using mmaped buffers or just raw ones) and I could only get performance regressions. I was planning to use a key value store to maintain raw file buffers in memory, but it turns out to be slower to push buffers to socket then to just splicing them or using sendfile over a cached fd (fd which is probably cached in kernel memory)
for the next week, I will stabilize the entire codebase, add more testing and make everything configurable. I will try to reduce configuration as less as possible and try to make the system automatically adjust to the optimal levels (now that the timer infrastructure is added and it can learn as it goes). I will continue the performance profiling to decrease squeeze more performance out of it, and also find a fix bugs. I have had an awesome time embarking on this project which taught me so much, way ahead of anything I had done in the low level programming arena.
Github Project: https://github.com/ziahamza/monkey-cache
I mostly fiddled around with timerfds this week. I wanted to add timers so that statistics could easily be tracked without keeping a global mutex which caused some performance regression. I also tried to implement dynamically deciding on limits for the number of cached requests and buffer pipes. Depending on requests/sec the system could be more aggressive and then fall down to low mem usage when its cold. this introduced quite a lot of bugs and still pending, but I should push it soon and polish it up.
Most of the work this week went into refactoring the code and polishing it further. I refactored quite a lot of code out of the main cache.c, and the code base is quite manageable now and is stabilizing. I found some bugs in the way and fixed them. The file caches now have a reference count for correctly resetting when there are some pending requests using the old file cache. The cache plugin deletes the file cache out of the table but lets the pending request use the original cache until they are served.
I also added more statistics this week to the webui and added 2 new real time graphs (well, they update every second for now ;)) The first one is the request served per second graph. I wanted to add this for quite some time and its landed now. Really useful to look at when stress testing the server. The other graph was the memory usage of the plugin used in pipes. I only focus memory used by pipes as the malloced memory is quite low.
I have also updated the README for more information about the api. Webui has also been polished, and works better for lower resolution devices like tablets. There was a bug in this regard with bootstrap 3.0 so reverted it back to the 2.3.
thats it for now, to try it out head on to the github project : https://github.com/ziahamza/monkey-cache
I added cache ‘reset’ and ‘add’ apis this week. Reset api allows to evict a file out of the cache pool altogether. Add api creates a temporary storage (by creating a tmp file and unlinking it) and then caches the data. Add can even be used to overlay a certain resource over another cached file.
reset api is as follows: get request to /cache/reset/(uri for the file)
add api: post request with file data to /cache/add/(uri for the new file)
curl -d ‘hello world’ localhost:2001/cache/add/first/file # will add a new file with contents hello world
curl localhost:2001/first/file # should display hello world
curl localhost:2001/reset/first/file # should evict the file out of memory, should get 404 when trying to get first/file now
The webui now has ability to delete files through the ui. Its still pretty basic but I plan to improve it next week.
Internally a lot of changes happened. I added a pipe pool this week, for combining the pipe consumtions of requests and file caches. But it turned out to be a bad idea at the end as it was a thread local cache and pipes were unevenly divided in different threads when releasing cached files. At the end I decided to let requests keep the pipes in their cache and only support pipe pool for files. I added more control in the memory usage pattern, by allowing to set (for now static constants) pipe pool and request cache pool limits. Setting both to zero will bring the resources down to zero when everything is evicted and no request is pending. I will add an api to configure this soon.
For this week, thats it folks!
Github account for the project: https://github.com/ziahamza/monkey-cache
A webui has been added in the plugin to monitor the cache plugin. For now its pretty simple and just dumps the stats api every second. For now, the webui and the cache plugin json api is pretty simple, should evolve both of them at the same time now that things are all setup.
Basic mimetype support was also added. Monkey has functions to handle mime types in mk_mimetype.c but they are not exposed to plugins so I made a pretty simple mime type hander which looks into configs, for now picks up 10 mime types and looks them up for each file extention. The headers are cached so this only happens when the header cache is still cold/empty (for now only in the first request for a file).
The api now now has a path field for each cached file. I changed the api prefix from monkey-cache to just cache for now, easier to develop with 🙂
To view the webui, just go to /cache/webui/index.html and watch the numbers change as you request different files in another window.
to view the raw api go to /cache/stats
Other than that I have refactored a lot of code out of the main plugin_30 handler. And a few bugs have been fixed. One really nasty one was due to the logger plugin expecting the http status to be set, which was not the case the cache plugin had the headers cached and there was no need to set the status, it just had to dump the headers directly from memory. Its fixed now, the cache plugin always sets the http status code.
This week I had been thinking about how apis could fit in monkey cache. The plan is to add the apis and then build a web interface to see the statistics and controls right the browser (maybe updated after an interval or realtime in case of web sockets). I would probably need to add some authentication mechanism which still needs a little thought on but shouldnt be hard to add as a monkey http auth plugin is already in place and I could integrate it in.
I have added a json library to the project. I looked into json package in duda.io and used the library that it used namely cJSON. It was pretty lightweight and nice.
I have also added a simple filter for the path /monkey-cache/stats which dumps simple numbers for now about the current state of the monkey server. It lists all the file cached with inodes and their sizes. It also gives the size of the pipe used in the cached and the total amount of memory consumed with pipes (which is the only significant footprint of the plugin) which includes files, cached headers and temporary request files.
For now thats all, hope I can increase my pace even further from the following week as I am leaving for my hometown from university as my exams are over and should work there with basically no distraction what so ever.
This week I polished the http header caching in the cache plugin. It works pretty well with both http pipe lining. For small files now a separate pipe is created with cached header data along with other file data. Currently the life of the cache is infinite, but its shouldnt be hard and I would start to add limits such that they are configurable along with other parameters like max file size for files, pipe sizes etc.
Now the header data and the initial file data live in the same pipe which is a bit more efficient than last time where header and file contents were put in a temporary pipe once and then were flushed.
My last exam was finished today (main reason why the weekly progress got delayed), should now pick up the pace of the project even further from now on. Hope to get the initial json api implementation out along with a simple web ui to accompany it by next week. Also want to add configuration support for all the static parameters in the codebase.