Re: [Coin] challenges for COIN

Hi Jianfei and all,

thanks for the input — let me share some thoughts on this:

> (1) " is in-net compute a wrong direction?"

This sounds like the wrong question to ask. Apparently, we have these 
two schools of thinking at this point:

1) “continuum of cloud computing”, extending cloud computing 
compute/storage instances (plus perhaps associated infrastructure & 
management) to other parts of the network.

2) “application logic” and/or associated elaborate networking 
functionality on switches.

> Now, we have more use cases beyond DCN, such as industrial control 
> networks. But when I was checking the latest Sigcomm CCR yesterday, I 
> noticed that a “negative” paper (below), which concludes that 
> "shoehorning" application semantics into the limited capabilities of 
> programmable switch ASICs is both "unnecessary" and "harmful".
> https://ccronline.sigcomm.org/wp-content/uploads/2019/02/sigcomm-ccr-final266.pdf

I am aware of the “Thoughts on Load Distribution and the Role of 
Programmable Switches” paper, and it has some good points. OTOH, this 
is an emerging field, this particular paper is not peer-reviewed, and 
there are proponents of the approach that would characterize the pros 
and cons differently. IMO, we should avoid changing horses every time 
somebody stumbles across another use case or paper. Instead, I was 
hoping that COIN could develop some assumptions to work with and then 
validate/elaborate those assumptions as we go.

> Personally, my view is, if the solution has enough benefit, such as 
> speeding up the machine learning by 3 times and nearly no extra cost, 
> there is no reason why it won’t be used in a managed network like 
> DCN or a rack-scale network.

IMO, the question whether it is “used” or not is not even relevant, 
as this is research, and we are at a point where people are conducting 
experiments and are still learning about the potential and limitations. 
One way of seeing it is that the current application point-solutions are 
just demonstrators of the potential.

> But I do agree with this paper in some degree that functions to be 
> implemented into network devices should be less coupled with app 
> semantics, so I wonder we may need better abstraction and less 
> customized for specific applications. A good research topic?

Sure — I think we have proposed this in the past. This could be one of 
the goals/thesis of COIN, i.e., work with the assumption that it is 
feasible/valuable to identify application-independent architectural 
components and functions, and then validate or invalidate that 
assumption.

Looking at 1) and 2) together, we have to concede that there is a lot of 
computing in the networking already. So instead of asking “in this a 
wrong direction”, a better question may be “are there better ways to 
build systems like that, i.e., wrt joint optimzation of 
compute/storage/network resources and also the other aspects that you 
mention below:

> (2) transport, privacy and security
> Aaron raised a question how end-to-end privacy and security and the 
> new model of in-network computation can co-exist. But even for the 
> managed networks where privacy and security is not a big problem, we 
> still need something new at the transport layer for in-network 
> compute, for the reasons of reliability to packet loss and congestion 
> control(in a multi-hop network beyond rack). We may have two design 
> choices: let the network device in the middle to react, or leave these 
> tasks to hosts...

OK, a couple of remarks:

1) Let’s please forget the notion of “managed networks where privacy 
and security is not a big problem”. I am not saying that you could not 
imagine networks where you don’t care about that (your toy data center 
etc.), but for the purpose of the work in this group I suggest that we 
assume an Internet model, i.e., a system where — even when a 
distributed applications runs in a “shielded” DC — those 
applications could be extended/connected to Internet and where you want 
to be able to move components from different realms without violating 
privacy (for example).

2) I don’t agree that we only these two choices. The question is more 
about the architectural model and what are its elements. For example, I 
am not sure the terms “host” and “network device” are 
particularly helpful in this discussion. In todays CDNs or edge 
computing systems, the compute functions in the network are hosts, I.e., 
they terminate transport and application layer sessions. We may want to 
think about a different model though (that’s a research question), and 
the applying concepts and terms from “TCP/IP 101” may not take us 
very far. If you want to explore fine-granular, very dynamic in-network 
computing, then I would list questions such as:

- how do you achieve a scalable resource management (coming back to 
joint resource optimization) so that you don’t have punt every 
instantiation, offloading, positioning decision to 
management/orchestration
- how could the network help with certain resource usage decisions such 
as load management, “retransmission”, replication etc.
- what do we mean by “network” anyway, i.e., what is the network 
model and what are the capacities of its elements?
- what does “end-to-end” mean in this context.
- You have picked up the notion of “end-to-end privacy” and 
“end-to-end security”. IMO that is the right direction, but the 
question may be deeper. For example, take privacy-preserving analytics 
in the networks. Privacy / Security may not be a black-and-white thing 
here — instead you would be interested in controlled sharing, 
potentially role-based authorisation and access control etc. I agree 
that “end-to-end” may have a data security notion, but its more that 
just locking all the data into per-application silos.

> (3) what should be COIN's unique role in the edge/fog compute or 
> “cloud continuum”(a good term suggested by Marie-Jose to me)
> "Edge" looks promising when nowadays people think about low latency 
> services or offloading some tasks from terminals to the edge/cloud. 
> How can we make difference by leverage the expertise in this 
> community, given that a lot of existing works (open source projects, 
> workshops and events)? It is straightforward that networks are crucial 
> for distributed systems, by definition. But deriving a new network 
> architecture from this, requires re-thinking the relationship or 
> demarcation between compute, data and networking. This could be a big 
> win, but a big challenge as well.

IMO, one interesting way of thinking about this is *not* to say “we 
need networking for distributed systems”, but instead conceiving 
“the network” as a distributed system (investigating the question I 
alluded to above). This would one differentiator from most of the 
mainstream work.

For that, looking at emerging approaches for building and programming, 
for example, distributed ML, could be interesting, i.e., systems such as 
Ray.

> (4) can we have a good abstraction to cover different use cases(from 
> DCN, industrial net to edge/fog etc) and different hardware(from 
> programmable switch, NPU to CPU etc)?
> The easiest way is always to catalogue them into groups and address 
> separately, but then, we may miss an opportunity to get a common and 
> generic model and treat each case just an specific case in this model.

I am hoping a COIN RG could be a home for people to conduct experiments 
in these different areas that would help to assess (and better define) 
the “common abstraction” concept after some time. Again, this is 
research + community-driven, i.e., you cannot really force people. So, 
the challenge is to provide a promising theme and framework, asking the 
right questions etc, but you cannot promise/control an exact outcome.

> Well, hopefully these questions are enough to have fun in COIN...

Absolutely — I would be interested to see more views and then discuss 
in Prague.

Thanks,
Dirk

> He Jianfei（Jeffrey）
> Innovation . Relevance . Rigor
> Huawei Technologies Co.,Ltd.
>
>
>
> -- 
> Coin mailing list
> Coin@irtf.org
> https://www.irtf.org/mailman/listinfo/coin