Re: [arch-d] HTTP 2.0 + Performance improvement Idea

I think i should be explaining what was done and what we saw as
improvements. What i meant was lets say the block size of a file system is
64kB. Now, if a file is lets say 512GB, in the normal world, i would fetch
64kB of data sequentially until i read all of data. Now every 64KB data
sent across the wire could get chunked by TCP and the chunks would get
re-ordered at the client side to present the 64KB data to the application
in-tact. Please note that i am not referring to this re-ordering as part of
my idea.

 Further, what was done was divide the whole file into 64KB zones and have
X workers waiting to fetch each zone. So 1 to X workers just went ahead
reading the first X  zones of data. Whoever came back first among these X
workers was assigned the task of reading the X+1 zone and so on and so
forth.

This experiment saw a minimum performance improvement of 20-30% in terms of
time taken to read the file. The only thing that stopped us from sending
the reads across the wire was that the client expected the data bytes to be
in-order. i.e: if the any worker came back first before the 1st worker, i
could not send the data that was fetched by the that worker because of the
constraint that i have mentioned previously.

This also led to a situation where i had to hold onto the buffer resources
even though i completed reading sub-portions of the file because fetching
 the whole file was yet to be finished.

On Wed, Mar 26, 2014 at 10:46 AM, Joe Touch <touch@isi.edu> wrote:

>
> On Mar 25, 2014, at 6:01 PM, Rakshith Venkatesh <vrock28@gmail.com> wrote:
>
> > I am not sure if an NFS client can accept blocks out of order from a
> server. I need to check on that.
>
> NFS should be matching read responses to requests; order should not
> matter, esp. because when using UDP the requests and responses can be
> reordered by the network.
>
> > If it does then NFS over HTTP sounds good. The way i see it is any File
> service protocol such as NFS, SMB (Large MTU's), HTTP, FTP should have a
> way which would help the server achieve true parallelism in reading a file
> and not worry about sending data/blocks in-order.
>
> FTP has block mode that should support any order of response, but if
> they're all going over the same TCP connection it can help only with the
> source access order rather than network reordering.
>
> HTTP has chunking that should help with this (e.g., using a chunk
> extension field that indicates chunk offset).
>
> However, most file systems are engineered to manage files via in-order
> access, even when parallelized, so I'm not sure how much this will all
> help. I.e., I don't know if your assumption is valid, that overriding the
> serial access will be faster.
>
> It might be useful to show that you can get to the components of a file
> faster as you assume first.
>
> Joe
>
>
> >
> > Rakshith
> >
> >
> > On Wed, Mar 26, 2014 at 2:47 AM, Joe Touch <touch@isi.edu> wrote:
> > You sound like you're looking for NFS; maybe you should propose an "NSF
> over HTTP" variant where responses can be issued out of order?
> >
> > Joe
> >
> >
> > On 3/25/2014 4:56 AM, Rakshith Venkatesh wrote:
> > Hi,
> >
> > I was going through SPDY draft or the HTTP 2.0 initial draft. I had an
> > idea which I think would require a change in the architecture of the
> > same and so I am dropping this mail. Here is the idea:
> >
> > A Http client expects the server to send data in-order. (NOTE: When I
> > say a server, I am referring to an appliance to which disks are attached
> > and the file resides on the disk.). If there is a file let's say 10GB
> > and if the client asks for this file from the server, the data has to be
> > given in-order from the 1^st byte till the last byte. Now let's say I
> >
> > implement an engine at the server side to do a parallel read on this
> > huge file to fetch data at various offsets within the file, I will be
> > able to fetch the data faster for sure but, I will not be able to send
> > across the data immediately as and when I fetch it. I am expected to
> > finish doing all parallel reads on the file till I read all the bytes,
> > then send across the wire to the client.
> >
> > Now if we can have some tag or header which can be introduced as part of
> > HTTP 2.0 which actually can help re-order the byte stream at the session
> > layer or at a layer in between the application and session layer, we
> > could potentially improve the performance for File reads using HTTP by
> > making sure that the new module looks at this new tag and jumbles around
> > the data based on this tag and eventually presents the data to HTTP to
> > make it look all seamless.
> >
> > So the server can just do parallel reads on the same file at various
> > offsets without worrying about ordering and send across the read chunks
> > and this new module sitting at the client side can intervene and look at
> > some form of tag/header to make sure it swaps the data accordingly and
> > waits till all data is received and then present it to the Application
> > protocol.
> >
> > NOTE: I am not referring to packet-reordering at TCP.
> >
> > By having this, servers can attain true parallelism in reading the file
> > and can effectively improve file transfer rates.
> >
> >
> > Thanks,
> >
> > Rakshith
> >
> >
> >
> > _______________________________________________
> > Architecture-discuss mailing list
> > Architecture-discuss@ietf.org
> > https://www.ietf.org/mailman/listinfo/architecture-discuss
> >
> >
>
>