(i).	Page 2, Section 1:

	It seems that much of the discussion in the draft is
	about monitoring applications (e.g., Section 3.1,
	3.2). However, Section 3.2 talks about monitoring to
	improve fairness in wireless networks. More detail on
	this would be helpful.


	I just read the references and I think I understand the
	use case.  A couple of comments on the bullets on page 3:

	(a).	I assume we want to do model-free RL. That is, a
		Markov Decision Process (MDP) (S,A,P,R,\gamma)
		where we don't know P (the probability/state
		transition matrix) or R (the reward
		function). This still requires that we know the
		state space S and which actions a \in A that we
		can take in a state s \in S, and possibly some
		estimate of the reward function, if it isn't
		inherent (and example of inherent reward function
		is a video game, where the reward is your
		score). Most of the work then is in estimating
		the expected discounted reward. There's ton's of
		literature/code on this one. 

	(b).	We might need a definition of goal states as
		well. 

	(c).	In networks my guess is that we really have a
		Partially Observable MDP (POMDP), which
		introduces another set of issues.

(ii).	Section 3.1

	It might be useful to decribe what "optimal paths"
	are. For example, are paths "trajectories" in the state
	space? 

(iii).	Section 3.3

	Its not clear how RL would be applied to network issues
	such as latency, etc. A bit of discussion of that would
	be helpful.

(iv).	As I mentioned the emergence of 2-player games (minimax
	and others) in ML is really interesting. Technology like
	variational autoencoders, GANs, AlphaGo and others.

(v).	Section 5.2

	What is the reward function? Also, as I mentioned above
	one needs to know the state space S and action space A. 

	In addition, there is a classic tradeoff between
	exploitation and exploration which gates learning; that
	is kind of indirectly alluded to in this section, but it
	might be worth explictly explainging that tradeoff and
	how it is managed here.

	Optimal paths are mentioned again. It might be worthwhile
	defining what exact an optimal path is (for example, is
	it a path (trajectory) in the state space or something
	different?).

	What the Distance-and Frequency technique is isn't
	defined ("based on Euclicdean distance", but between what
	points, and what do those points represent?).

(vi).	Section 5.4

	This section makes it seem as if paths are trajectories
	in the state space. Is that correct?


	It might be useful to describe how agents communicate,
	how the distributed RL algorithm works, what its
	properties are, etc.

	Also, what are privacy and security implications of the
	distributed enviroment (how much information is
	exchanged, what is is sensitivity, how is it protected,
	...)

	"The agents have limited resources and incomplete
	knowledge of their environments." -- Does this mean that
	the model is a POMDP?

(vii).	Cluttered-index-based scheme

	This is not really described; it might be helpful to give
	an overview of what the Cluttered-index-based scheme is
	and what its properties are.

(viii).	Section 6

	"...shown in figure 1, where the architecture is combined
	with a hybrid architecture making use of both a master /
	slave architecture and a peer-to-peer."

	This is hard to understand. Which is the "architecture"
	and which is the "hybrid architecture"?

	Here it would be useful to understand the distributed RL
	algorithm that this architecture is supporting. 

(ix).	Figure 3 is really hard to understand. I can't really
	understand the algorithm and its properties from this. In
	addition:

	(a).	In the "Do optimized exploration..." box
		
		(3).	How is R calculated
		(4).	Where does the Policy come from (or is it
			just epsilon-greedy)? 
		(6).	Now sure what "update the learning model"
			means. What is updated?
		(7).	Where does Sn come from? ((4) in the
			above box?)

Finally, we might consider Evolution Strategies [0,1] as more black-box
approach to RL (doesn't require gradients, highly parallelizable,
etc). Code is here: https://github.com/openai/evolution-strategies-starter

[0] https://blog.openai.com/evolution-strategies/
[1] https://arxiv.org/abs/1703.03864