Weekly Progress

Work this week was pretty skewed, a lot of reading on some days and a lot of coding on others. 

I now maintain thread specific request handles which keep track of the socket, bytes send, cached file and other request metadata. I completely changed the old approach of using stages and now take over in the stage30 using using read events directly on sockets.

I also maintain a global file cache hash table based on inodes. It contains file related data including an mmap to the whole file, a custom sized pipe for storing initial chunks of file in kernel and other file statistics. Currently I use pthread rwlocks to maintain consistency across threads.

I started to implement different file serving implementations, first one was simply a sendfile implementation which had been there last week. This week I added a basic mmap implementation which takes the mmap from the file cache and just writes it down the socket, it worked but it was never on par with the sendfile implementation in terms of performance. The third implementation as an improvement over the second one where I use the linux zero kernel apis to push data directly from mmap to the socket without (hopefully) copying any bytes. It gifts bytes over to a kernel buffer (or pipe) using vmsplice and then splices data directly to the socket. Normally pipes have a max size of 16 pages (in total 64K) but I change it using the linux specific SETPIPE_SZ flag to a bigger size which in return performs better with large files. This implementation also maintains another  (kinda readonly) pipe with initial file data for every file for improving performance for small files which completely fit inside a pipe. With this it flushes the intial file data directly from the pipe without even touching the mmaps and for bigger files resorts to splicing from mmaps. The third implementation (although contains some bugs when used with multiple files which I think is rather monkey bug but I still have to debug more) is the default implementation and resorts to 1st implementation if mmap doesnt work.

In terms of performance currently implementation is currently on par with the default monkey static file server (although I was hoping for more this week) despite the fact that it maintain a global hash table for keeping track of file caches and does a lot more syscalls compared to a single main sendfile syscall in monkey. Here are some numbers

Without cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   18.938 seconds
Complete requests:      47132
Failed requests:        0
Write errors:           0
Total transferred:      1265512772 bytes
HTML transferred:       1256433587 bytes
Requests per second:    2488.75 [#/sec] (mean)
Time per request:       200.904 [ms] (mean)
Time per request:       0.402 [ms] (mean, across all concurrent requests)
Transfer rate:          65257.76 [Kbytes/sec] received

With cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   19.710 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      1332800000 bytes
HTML transferred:       1325500000 bytes
Requests per second:    2536.79 [#/sec] (mean)
Time per request:       197.100 [ms] (mean)
Time per request:       0.394 [ms] (mean, across all concurrent requests)
Transfer rate:          66035.77 [Kbytes/sec] received

Basically without cache plugin you get 2488 req/sec and with cache you get 2536 req/sec. Not a big boost but its basically comparing the kernel senfile implementation using file readahead cache with the plugin performance.

I already cache small files completely in pipes which reside in kernel memory, plan for next week is to get the entire file (given that it fits into memory) in pipes residing in kernel and try to write them directly to sockets (basically maintaining the custom cache directly in the kernel which i guess cant be swapped out where as memory mappings can). I also want to  cache http headers and also append them to the file pipes for even lower latency and send them all in one chunk, which should reduce cpu time even further while handling the request, should be easy inside monkey but need to find out how it can be done inside a plugin

github link: https://github.com/ziahamza/monkey-cache

blog: https://ziahamza.wordpress.com/

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s