GSOC 2013 Status: caching plugin week 14

I did quite a lot of progress this week. I profiled the cpu usage of the entire plugin using google’s performance tools [1] and mostly found that the cache statistics were causing some performance regression. I had an idea about this as they lock a mutex to increment the request counters. Since the last week I had been working on a new timer abstraction in the cache plugin for doing specific tasks based after every interval. Its landed now, it uses timerfd functionality in linux, and registers it with the monkey scheduler to get ticks after every certain time (currently one sec). I refactored all the statistics related code in a new file, and this time only thread local counters are updated on every request, and after every timers tic, the global counters are updated. And as the timer only fired in a single thread whichever can read the timerfd first) no mutexes are needed. The performance is now back to where it used to be and there is be no performance regression when serving requests with monkey plugin or without., yet the plugin is a a lot more complex, and does a lot more.

I also added file eviction functionality with the timer infrastructure in place, so the idle files which are evictable (can be opened again, not the case for custom overlays added through the plugin external api) are evicted. This should automatically reduce the footprint of plugin in idle time. I am also gonna soon shift the request and pipe pools to the new technique so the limits could automatically adjust based on statistics during a specific period. And cool down to zero extra footprint.

I fixed quite a lot of bugs, mainly with request errors and the http pipelined requests. I also experimented this week with sending out the raw buffers directly using write (using mmaped buffers or just raw ones) and I could only get performance regressions. I was planning to use a key value store to maintain raw file buffers in memory, but it turns out to be slower to push buffers to socket then to just splicing them or using sendfile over a cached fd (fd which is probably cached in kernel memory)

for the next week, I will stabilize the entire codebase, add more testing and make everything configurable. I will try to reduce configuration as less as possible and try to make the system automatically adjust to the optimal levels (now that the timer infrastructure is added and it can learn as it goes). I will continue the performance profiling to decrease squeeze more performance out of it, and also find a fix bugs. I have had an awesome time embarking on this project which taught me so much, way ahead of anything I had done in the low level programming arena.

Github Project: https://github.com/ziahamza/monkey-cache

 

 

GSOC 2013 Status: caching plugin week 13

I mostly fiddled around with timerfds this week. I wanted to add timers so that statistics could easily be tracked without keeping a global mutex which caused some performance regression. I also tried to implement dynamically deciding on limits for the number of cached requests and buffer pipes. Depending on requests/sec the system could be more aggressive and then fall down to low mem usage when its cold. this introduced quite a lot of bugs and still pending, but I should push it soon and polish it up.

 

github: https://github.com/ziahamza/monkey-cache

 

GSOC 2013 Status: caching plugin week 12

Most of the work this week went into refactoring the code and polishing it further. I refactored quite a lot of code out of the main cache.c, and the code base is quite manageable now and is stabilizing. I found some bugs in the way and fixed them. The file caches now have a reference count for correctly resetting when there are some pending requests using the old file cache. The cache plugin deletes the file cache out of the table but lets the pending request use the original cache until they are served.

I also added more statistics this week to the webui and added 2  new real time graphs (well, they update every second for now ;)) The first one is the request served per second graph. I wanted to add this for quite some time and its landed now. Really useful to look at when stress testing the server. The other graph was the memory usage of the plugin used in pipes. I only focus memory used by pipes as the malloced memory is quite low.

I have also updated the README for more information about the api. Webui has also been polished, and works better for lower resolution devices like tablets. There was a bug in this regard with bootstrap 3.0 so reverted it back to the 2.3.

thats it for now, to try it out head on to the github project : https://github.com/ziahamza/monkey-cache

 

 

GSOC 2013 Status: caching plugin week 11

I added cache ‘reset’ and ‘add’ apis this week. Reset api allows to evict a file out of the cache pool altogether. Add api creates a temporary storage (by creating a tmp file and unlinking it) and then caches the data. Add can even be used to overlay a certain resource over another cached file. 

reset api is as follows: get request to /cache/reset/(uri for the file)

add api: post request with file data to /cache/add/(uri for the new file)

e.g.

curl -d ‘hello world’ localhost:2001/cache/add/first/file # will add a new file with contents hello world

curl localhost:2001/first/file # should display hello world

curl localhost:2001/reset/first/file # should evict the file out of memory, should get 404 when trying to get first/file now

The webui now has ability to delete files through the ui. Its still pretty basic but I plan to improve it next week.

Internally a lot of changes happened. I added a pipe pool this week, for combining the pipe consumtions of requests and file caches. But it turned out to be a bad idea at the end as it was a thread local cache and pipes were unevenly divided in different threads when releasing cached files. At the end I decided to let requests keep the pipes in their cache and only support pipe pool for files. I added more control in the memory usage pattern, by allowing to set (for now static constants) pipe pool and request cache pool limits. Setting both to zero will bring the resources down to zero when everything is evicted and no request is pending. I will add an api to configure this soon.

For this week, thats it folks!

Github account for the project: https://github.com/ziahamza/monkey-cache

GSOC 2013 Status: caching plugin week 10

A webui has been added in the plugin to monitor the cache plugin. For now its pretty simple and just dumps the stats api every second. For now, the webui and the cache plugin json api is pretty simple, should evolve both of them at the same time now that things are all setup.

Basic mimetype support was also added. Monkey has functions to handle mime types in mk_mimetype.c but they are not exposed to plugins so I made a pretty simple mime type hander which looks into configs, for now picks up 10 mime types and looks them up for each file extention. The headers are cached so this only happens when the header cache is still cold/empty (for now only in the first request for a file).

The api now now has a path field for each cached file. I changed the api prefix from monkey-cache to just cache for now,  easier to develop with 🙂

To view the webui, just go to /cache/webui/index.html and watch the numbers change as you request different files in another window.

to view the raw  api go to /cache/stats

Other than that I have refactored a lot of code out of the main plugin_30 handler. And a few bugs have been fixed. One really nasty one was due to the logger plugin expecting the http status to be set, which was not the case the cache plugin had the headers cached and there was no need to set the status, it just had to dump the headers directly from memory. Its fixed now, the cache plugin always sets the http status code.

Github: https://github.com/ziahamza/monkey-cache

GSOC 2013 Status: caching plugin week 9

This week I had been thinking about how apis could fit in monkey cache. The plan is to add the apis and then build a web interface to see the statistics and controls right the browser (maybe updated after an interval or realtime in case of web sockets). I would probably need to add some authentication mechanism which still needs a little thought on but shouldnt be hard to add as a monkey http auth plugin is already in place and I could integrate it in.

I have added a json library to the project. I looked into json package in duda.io and used the library that it used namely cJSON. It was pretty lightweight and nice.

I have also added a simple filter for the path /monkey-cache/stats which dumps simple numbers for now about the current state of the monkey server. It lists all the file cached with inodes and their sizes. It also gives the size of the pipe used in the cached and the total amount of memory consumed with pipes (which is the only significant footprint of the plugin) which includes files, cached headers and temporary request files.

For now thats all, hope I can increase my pace even further from the following week as I am leaving for my hometown from university as my exams are over and should work there with basically no distraction what so ever.

GSOC 2013 Status: caching plugin week 8

This week I polished the http header caching in the cache plugin. It works pretty well with both http pipe lining. For small files now a separate pipe is created with cached header data along with other file data. Currently the life of the cache is infinite, but its shouldnt be hard and I would start to add limits such that they are configurable along with other parameters like max file size for files, pipe sizes etc. 

Now the header data and the initial file data live in the same pipe which is a bit more efficient than last time where header and file contents were put in a temporary pipe once and then were flushed.

My last exam was finished today (main reason why  the weekly progress got delayed), should now pick up the pace of the project even further from now on. Hope to get the initial json api implementation out along with a simple web ui to accompany it by next week. Also want to add configuration support for all the static parameters in the codebase.

GSOC 2013 Status: caching plugin week 7

Sorry about the late progress post, our internet at house went down from yesterday and got fixed today

Not a lot of coding happened this week. One of the reasons for not a lot of active work was the fact that my university papers started, and they should finish this week. But I plan to make up for the time taken, and still contribute with my normal pace as much as possible.

I did do some reading about http pipelined requests, and I have a branch for working on http header caching implemented. I want to make it generic so that arbitrary data can be added. My plan is to also add an api such that any process can add data in cache using an http api, which should also allow a lot of possibilities to open up. Having a caching infrastructure for just files may not be a big gain but it would be very useful for caching dynamic resources.

GSOC 2013 Status: caching plugin week 6

The whole caching pipeline is streamlined this week. The cache plugin now handles the lifetime of connection the right way so http pipeline requests work transparently. Thanks to Sonny for clarifying to me how it is suppose to be implemented.

Currently all the cache files have their fds open the first time they are accessed. An mmap is maintained over that fd, since this week cache plugin also maintains an on demand pipe list which are just pointers to the mapped memory (using vmsplice). but if a file is already cached than the pipes are used to directly splice content to the request. Before only one pipe was maintained.

A request pool is also used this time to maintain ready made request structures along with temporary pipe buffers to splice data to the socket. The same pipes are used again when the request is closed and a new one comes in. So for the lifecycle of a request for a file which is cached, only a tee followed by a splice is called to completely handle the request along with syscalls for sending the headers. in comparison to a normal scenario where open is called followed by sendfile and then close along with syscalls to handle headers.

Currently I am working on caching headers, but they are not streamlined yet for all cases and should come in very soon so the whole request lifecycle only ever requires 2 syscalls to send data if its all cached and all the response data fits inside the pipe. Right now the plugin caches all possible files, but I would restrict it soon to only cases where it has a chance to optimize like files which including with headers can fit in a couple of files, and resort to normal request handling for others

Weekly Progress

Work this week was pretty skewed, a lot of reading on some days and a lot of coding on others. 

I now maintain thread specific request handles which keep track of the socket, bytes send, cached file and other request metadata. I completely changed the old approach of using stages and now take over in the stage30 using using read events directly on sockets.

I also maintain a global file cache hash table based on inodes. It contains file related data including an mmap to the whole file, a custom sized pipe for storing initial chunks of file in kernel and other file statistics. Currently I use pthread rwlocks to maintain consistency across threads.

I started to implement different file serving implementations, first one was simply a sendfile implementation which had been there last week. This week I added a basic mmap implementation which takes the mmap from the file cache and just writes it down the socket, it worked but it was never on par with the sendfile implementation in terms of performance. The third implementation as an improvement over the second one where I use the linux zero kernel apis to push data directly from mmap to the socket without (hopefully) copying any bytes. It gifts bytes over to a kernel buffer (or pipe) using vmsplice and then splices data directly to the socket. Normally pipes have a max size of 16 pages (in total 64K) but I change it using the linux specific SETPIPE_SZ flag to a bigger size which in return performs better with large files. This implementation also maintains another  (kinda readonly) pipe with initial file data for every file for improving performance for small files which completely fit inside a pipe. With this it flushes the intial file data directly from the pipe without even touching the mmaps and for bigger files resorts to splicing from mmaps. The third implementation (although contains some bugs when used with multiple files which I think is rather monkey bug but I still have to debug more) is the default implementation and resorts to 1st implementation if mmap doesnt work.

In terms of performance currently implementation is currently on par with the default monkey static file server (although I was hoping for more this week) despite the fact that it maintain a global hash table for keeping track of file caches and does a lot more syscalls compared to a single main sendfile syscall in monkey. Here are some numbers

Without cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   18.938 seconds
Complete requests:      47132
Failed requests:        0
Write errors:           0
Total transferred:      1265512772 bytes
HTML transferred:       1256433587 bytes
Requests per second:    2488.75 [#/sec] (mean)
Time per request:       200.904 [ms] (mean)
Time per request:       0.402 [ms] (mean, across all concurrent requests)
Transfer rate:          65257.76 [Kbytes/sec] received

With cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   19.710 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      1332800000 bytes
HTML transferred:       1325500000 bytes
Requests per second:    2536.79 [#/sec] (mean)
Time per request:       197.100 [ms] (mean)
Time per request:       0.394 [ms] (mean, across all concurrent requests)
Transfer rate:          66035.77 [Kbytes/sec] received

Basically without cache plugin you get 2488 req/sec and with cache you get 2536 req/sec. Not a big boost but its basically comparing the kernel senfile implementation using file readahead cache with the plugin performance.

I already cache small files completely in pipes which reside in kernel memory, plan for next week is to get the entire file (given that it fits into memory) in pipes residing in kernel and try to write them directly to sockets (basically maintaining the custom cache directly in the kernel which i guess cant be swapped out where as memory mappings can). I also want to  cache http headers and also append them to the file pipes for even lower latency and send them all in one chunk, which should reduce cpu time even further while handling the request, should be easy inside monkey but need to find out how it can be done inside a plugin

github link: https://github.com/ziahamza/monkey-cache

blog: https://ziahamza.wordpress.com/