The problems and the future of the web and a formal internet technology proposal

Raphaël Hendricks <rhendricks@netcmail.com> Fri, 01 January 2021 07:37 UTC
Mime-Version: 1.0 (Apple Message framework v753.1)
Content-Type: text/plain; charset="ISO-8859-1"; delsp="yes"; format="flowed"
Message-Id: <51AEC656-F7E6-40F5-8896-942C03B9ED29@netcmail.com>
Cc: project-admin@oasis-open.org
Content-Transfer-Encoding: quoted-printable
From: Raphaël Hendricks <rhendricks@netcmail.com>
Subject: The problems and the future of the web and a formal internet technology proposal
Date: Fri, 01 Jan 2021 02:37:13 -0500
To: Sir Timothy Berners-Lee <cos@timbl.com>, Liam Quin <liam@w3.org>, Ivan Herman <ivan@w3.org>, Eric Prud'hommeau <eric@w3.org>, ietf@ietf.org, xml-dev@lists.xml.org, public-exploresemdata@w3.org, public-philoweb@w3.org, public-web-copyright@w3.org, public-dntrack@w3.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/j8uphKr9alY6yKjBoK5LsXkJ9a0>
Precedence: list
The last decade of change in internet technologies has brought up  
significant change and with it, some new unadressed issues. In 2005,  
there has been a schism between two groups in their idea for the  
future of one specific internet technology, namely the web. I am  
adressing this message to several concerned groups and individuals  
about some serious problems which have plagued the web as it is. In  
this message, I am going over the problems and making a formal  
proposal to replace the World Wide Web in its current form with two  
new technological platforms as well as replacing the World Wide Web  
consortium with two new consortiums, one for each new technological  
platform; this message will be of particular interest to XML people,  
Semantic Web people and anti-DRM people, but also, to some degree, to  
privacy-advocates and accessibility-advocates as all these points are  
adressed, the message will also be of interest to the IETF since I am  
suggesting some changes in the administrative structure of some  
internet technologies. I encourage everyone reading this message to  
diffuse it as widely as possible, the more people read it, the better.

Message for Sir Timothy Berners-Lee: your ideas for basing the future  
of the web on XML, XPath, properly structured data and documents, the  
semantic web, and the openness principle were really great; it is a  
shame that such a vision never came to be implemented, I am reviving  
your ideas in this proposal again, for one of the two platforms,  
because I believe that they are worth another consideration.

On one hand, there were the usual W3C working groups preparing the  
furthering of the traditional web principles, namely accessibility,  
ease of indexing and referencing, separation of content, meaning and  
presentation, making sure that any content is not limited to a  
specific use by allowing a proper structure for the content, allowing  
easy archiving, implementing mandatory validation, and so on. All  
this was being done through the working on XHTML 2.0 draft, RDF/RDFA,  
the semantic web in general (which would allow better indexing and  
content reuse and which would be mostly based on the previously  
mentioned languages),  XSLT, XForms, XML in general, XInclude, and so  
on. There were working to reduce the need for ECMAScript and, more  
importantly, Javascript (partly through technologies such as XForms,  
XPath and XML Events), which is a good thing as scripting works  
against the previously stated goals and should only be used as a last  
resort (on top of all, Javascript is a proprietary language (owned by  
Mozilla), which makes it even worse a choice, while ECMAScript, which  
is only a subset of Javascript, is, at least, a standard (ECMA262),  
which makes it a lesser evil). Work was being done to advance  
development of aural style sheets as they had rightfully understood  
that content may not exclusively be used by generating its visual  
rendering. Those working groups were also trying to cut clean with  
the tag-soup era. In such a vision, there was, of course, no place  
for DRM as it goes against the openness which the above stated goals  
tries to achiveve. It is well known that information wants to be  
free. The attempt to switch to XML and the semantic web can be seen  
as an attempt to go back to the original vision that Sir Timothy  
envisioned when he chose to connect his web technology to the wide  
internet in 1991 (making information freely available (think free- 
speech, not free beer) and easely usable for any use without use  
restrictions). The tag-soup era that preceeded can be seen as a move  
away from that vision.

On the other hand, there were some entities which were opposed to the  
switch to XML and the semantic web. They stated that those approach  
were too document-centric and that they wanted to create a technology  
more adapted to webshops, forums and so on, they wanted to better  
support interactivity than what was proposed with the switch to XML  
and the semantic web, they wanted to support client-side programming  
(as opposed to the move away from scripting), they wanted to support  
client-side dynamically updated content and dynamic capabilities (as  
opposed to the server-side dynamic capability with static-only client  
side, except for form-validation through XForms, XML to XML  
conversion through XSLT and timing/events thorough XML events and  
SMIL animation, or where client side dynamic capabilities are  
declarative only, with no imperative support). They were also  
concerned with the fact that the strict well-formedness and  
validation requirements of the XML technology made integration with  
several server-side programming languages problematic to say the  
least. Everyone has sawn that XHTML webpage which was valid 9 times  
out of 10 and invalid 1 time out of 10 due to the page having dynamic  
content wich at times would make the page invalid or badly-formed.  
This was due to most of those server-side languages being based  
either on the approach of inserting program code within a to-be- 
enriched incomplete page, which makes a page whose validation cannot  
be wholely be verified or around the concept of generating the page  
source by generating the markup text (as a sequence of characters) in  
which case only examples of the script execution result can have  
their validation tested but validation cannot be garanteed, this  
instead of generating the page structure (through DOM core methods or  
using XPath based technologies). The server-side programming language  
which, at the time were the W3C was trying to switch the web to XML,  
was the most popular, PHP, was particularly affected by this, and  
languages such as Perl, Python and ASP were not in a much better  
state. Now, before someone points it out, yes, there has been cases  
when carefull programmers were able to use languages, such as PHP, to  
implement some proper server-side dynamic capatilities by having php  
files with no XML content, and, for each page, a small user- 
modifiable XML config file serving to map variables to XML attribute  
values or tag content, one or more XML templates, an XSLT module to  
allow calling an XSLT interperter from the main language and an XSLT  
transform sheet, applied to the template and used to generate the  
well-formed, perfectly valid XHTML page, these cases were the  
exception rather than the norm, while pages which were usually valid  
but without a garantee that the dynamic features would not break the  
validity (cases which would occur occasionnally), were the norm.  
There were server-side technologies with proper XML support and  
integration (such as, for example, for advanced capabilities: JSP/ 
Servlets (and debatably Ruby Electric XML), for intermediate  
capabilities: server-side javascript, for rudimentary capabilities,  
XInclude) but many server-side programmers didn't want to switch to  
those languages, part of the resistance against switching the web to  
XML certainly partly, but not only, comes from there. The entities  
which were opposed to the switch to XML and the semantic web did not  
want the WWW remaining a web of documents interconnected with  
hypertext links (and extended through for form-validation through  
XForms, XML to XML conversion through XSLT and timing/events thorough  
XML events and SMIL animation), this being the too-document-centric  
criticism. They wanted to turn the web into a technology to run  
software in the browser. These entities were mostly Google and  
several major browser makers and they went on to establish the WHATWG  
to create the HTML5 specification, promote the XML-RPC format and  
create the JSON and JSON-rpc formats, all of which would serve as a  
basis to implement that change.

The WHATWG idea of turning the web into a platform to run software in  
the browser was a stupid and extremely bad idea. It however needs to  
be stated that there are legitimate reasons to wish to run software  
operating in a client-server split model, that is not the problem.  
The problem is that bastardizing the WWW and the technologies  
supporting it is not the proper way to answer this legitimate need.  
The proper way to meet this need is to create a dedicated  
standardized platform to run client software connecting to server  
software over the internet, but with the said platform being separate  
from the web. There is a need for a technology to make openly  
available hyperlinked documents, with the documents using XML markup  
to indicate their structure and using RDF or RDFa to indicate  
semantic information, and using SMIL aniation / XML events to generte  
declarative animations, and using XForms to validate user-input data  
against XPath expressions or an XML Schema before using the said data  
to further the content available as hyperlinked documents. There is  
also a need for a technology to run client-side software that  
connects to remote software running on the server; this type of  
technology, with client software connecting to software running on a  
server, is the right way to handle transactional operations such a  
banking and stock buying as well as online shopping; it is also well  
suited for highly interactive applications, such as playing networked  
computer games. There should be two different platforms for two  
different uses, not one general purpose monster.

In cases such as real-time stock buying, having a remotely executely  
executable application should not be seeen as being incompatible with  
having a real-time XML feed with the raw data, with a basic  
stylesheet which allows easier reading for those who prefer to read  
the raw data; it should be the objective to always make the raw data  
available where possible in parallel with having the data accessible  
through the client-side application. For example, one may wish to  
access the raw standard bus schedules of a city or the raw realtime  
bus data feed (with a basic stylesheet to ease reading) rather than  
have to go through the official bus operator supplied online  
interface and should not be prohibited from doing so; the said raw  
data could then also be used for other uses than online reading (ex:  
statistical).

As stated above, there is a need for a technology to make openly  
available hyperlinked documents, with the documents using XML markup  
to indicate their structure and using RDF or RDFa to indicate  
semantic information, and using SMIL aniation / XML events to generte  
declarative animations, and using XForms to validate user-input data  
against XPath expressions or an XML Schema before using the said data  
to further the content available as hyperlinked documents; this  
allows easy indexing, using RDF/RDFa to derive meaning as opposed to  
having to extract the information from human readable documents  
containing no extra annotation for computers; this would allow many  
search engines to be easely implemented bringing competition in the  
field as opposed to having an oligopoly made out of a few companies  
which have access to the advanced technology to extract data from  
human readable content without machine annotations, technology which  
nonetheless still produce search results inferior to that of a  
simpler search engine indexing only sementic-information-containing  
documents. Having a unified URL+URN=URI framework is important in  
information sciences and archivivsm, abolishing the URI approach to  
go back to old-style URLs as was done with HTML5 (they even re- 
introduced the URLs starting by the javascript: pseudo-protocol  
designation, which is beyound the limits of decency) is plain stupid.  
It is essential that formatting and presentation information not be  
used to convey the structure of the document, using said presentation  
information to convey structure limits the use of the data inside of  
the document to that originally thought by the author but the data  
then cannot easely be used otherwise. A user may wish to turn off  
stylesheets for various reasons (for example because the user has  
poor vision) and the document structure should still be easy to grab  
when seen with the user-defined stylesheet inside the browser, this  
is not possible if the document content is not properly structured;  
not surprisely, XHTML documents tend to be much better structured  
than HTML documents as, in XML, the structure is everything. It is  
also baffling that none of the major websites uses aural stylesheets  
and that most browsers don't support them even though page content is  
meant to be independent of the rendering method and having an audio  
rendering of a page is a legetimate use of the content, aural  
stylesheets should be used and supported as standard just as widely  
as visual stylesheets. Moreover, having content not tied to an  
expected use allows unexpected new uses to come up ulteriorly. For  
example, if various sellers put up catalogs of their products online  
and encode the data about the products properly using a generic  
fashion, even if the catalogs were meant to be read by humans, they  
can easely be used by price comparison tools or they can be easely  
used by statistical tools to study the historic evolution of prices.  
If the catalog documents put online are only human-readable with too- 
poor-a-structure to be easely analysed by computers, such extra uses  
cannot come-up. Having documents which are only human-readable,  
having a structure insufficient to be analysed by computers has a DRM- 
lite effect by restricting the uses of the said documents. It is  
therefore obvious that having content not being made for a specific  
use is highly beneficial. Such a technology would likely be deployed  
for uses such as publishing documentation, governement websites,  
university/faculty/department/professor websites, company product  
documentations, blogs, amateur websites, digital document archives  
(using XForms for searching) and so on. It could also contain content  
which is available against a payment but where, after supplying the  
payment, the available content is non-usage-restricted. It would not  
be used in cases when the content maker wants to restrict the usage  
of the content but this is a good thing as such usage-restricted  
content has no place on such a platform.

A technology to run client-side software connecting to server-side  
software over the internet supplying a standard method for the said  
software to be transferd to, as well as deployed and run on the  
client is highly useful. Such a technology should not use a page- 
based content structure. Making pages out of content only makes sense  
if the content is constant and indexable, which is obviously not the  
case for dynamically generated or dynamically updated content. Such a  
technology should not use ECMAScript or Javascript as its language  
since it is inappropriate for the use, Javascript was originally  
created as a scripting language, not a programming language, it was  
meant to add a bit of dynamic capabilities to web pages, not to write  
software (hence its reputation as a hackish language among those  
which use it to write real software, javascript is a good scripting  
language but a poor programming language); on a platform allowing  
client-side software to connect to server-side software, the client- 
side software should be written using a proper programming language.  
Cases such as real-time stock prices and real time bus/tramway  
locations are good examples of time-varying content where client-side  
dynamic capabilities make sense but putting the content in permanent  
static pages doesn't. Cases such as inventory management, online  
banking and product purchasing are good examples of highly- 
transactional content where client-side dynamic capabilities are  
useful but putting the content in pages for indexing or not-yet- 
invented future uses has limited use. Cases such as running networked  
computer games or running non-networked computer games are good  
examples of highly interactive content where client-side adequate  
software capabilites with hardware acceleration accessible by the  
client-side software is useful but trying to structure the content  
into pages makes no sense and the content cannot truly be indexed.  
All these examples show that there are cases where a technology  
distinct from the one to create a web of hyperlinked, structured and  
semantically-encoded documents is highly needed and that such a need  
doesn't nullify the one for the previously stated web of hyperlinked,  
structured and semantically-encoded documents, since the use-cases  
for each are highly different.

The HTML5/javascript/JSON-rpc approach is a real problem. It meets  
neither of the two needs properly. For the structured and, perhaps,  
semantically encoded, hyperlinked document need, the HTML5/javascript/ 
JSON-rpc is a disaster. The rpc capabilities make for dynamic content  
impossible to index. JSON data doesn't allow content indexing and  
reuse as easely as XML data. The reliance on javascript make the  
documents harder to analyze and index, and ties the content to a  
single use, which is online viewing (often visual only, without even  
supporting audio rendering). Cookies, while not strictly part of  
HTML5 (they are part of HTTP/HTTPS) are nonetheless a feature coming  
from the tag-soup era which should be phased out, instead of that,  
HTML5 introduced an enhanced version of the concept in the form of  
webstorage. Cookies are almost exclusively used for three uses,  
namely sessions, interwebsite-tracking and intra-website analytics.  
Session cookies are an unnecessary and absurd mechansim to implement  
sessions which is less safe and less sensible than the built-in HTTP  
session mechanism, which should be used instead. Intra-website  
analytics may be a privacy issue, webmasters requiring intra-website  
analytics should either limit themselves to statistics which do not  
require users to be traced or be upfront about the fact that users  
must open a session, use the HTTP session mechanism for this and be  
clear that their usage data will be logged and analyzed, doing  
otherwise is dishonest and a lack of transparency, in a way, it is  
requiring users to create a identifier without being open about it  
and making them open sessions through the back door. Inter-website  
tracking cookies are a privacy breach and represent a practice whihch  
needs to be abolished. The few other cases where cookies are used,  
namely where keeping variables from one page to the next is  
necessary, can easely be handled by using address rewriting instead  
of cookies. Webstorage brings the same problems as cookies but to a  
greater extent. The DRM inclusion in HTML5 is the antithesis of the  
openness goal. It intentionally restricts usage when the maximum  
openness goal requires implementing technology to facilitate ease of  
access and facilitate new and unexpected usage of the content,  
removing as much technological-limitation-derived usage restruction  
as possible. DRM also makes archival problematic. While not part of  
HTML5, the HTTP/2 protocol often used to transfer the said content,  
breaks the protocol layering principle and is a problem because of  
this. The WHATWG even had the indecency to reintroduce an element  
coming from the tag-soup era and having never been part of any  
standard HTML or XHTML version, namely the embed tag. From all these  
elements, it is obvious that the HTTP/2/HTML5/JSON/javascript/JSON- 
rpc/ based web infrastructure is ill-suited to answer the need for an  
open and privacy respecting web of structured and, perhaps,  
semantically encoded, hyperlinked document described previously. If a  
news/blog/text+image website cannot be read without enabling  
javascript, it fails, unfortunately, such cases have become common,  
yet people create "webapps" which need to be "run" just to access a  
page of content, this is the kind of situation brought by HTML5/JSON/ 
javascript/JSON-rpc. The interest for XMl has not died when the W3C  
decided to base the future of the web on HTML5, the companies and  
other entities which were previously behind the XML effort switched  
most of their XML efforts to the OASIS group, which is concerned with  
the use of XML for document-centric/data-centric uses (but not  
diffused on the web); this is proof that interest for the XML  
structured/data-centric approach continued after the W3C decision.  
For the other need, that of running software on the client-side which  
connects to software running on the server (which is what "web  
applications" are trying to do), HTML5/JSON/javascript is just as  
badly suited to the need. Trying to turn HTML/CSS/XHTML/XML into a  
application writing platform was trying to trun a squirrel into a  
dinosaur. As stated previously a page based structure is not adapted  
for this use, yet HTML5 retains the page approach. HTML5/JSON/ 
javascript/JSON-rpc is no longer a propper squirrel, nor did it turn  
in a proper dinosaur, it is some sort of ugly chimera.

When creating the new HTML working group tasked with developping  
HTML5 Sir Timothy Berners-Lee said that:
 > Unlike the previous one, this one will be chartered to do
 > incremental improvements to HTML, as also in parallel xHTML.
No real efforts have in fact been put in XHTML in the last 12 years,  
while there is a theorical XHTML5, it only exists on paper.
About forms he said that:
 > The plan is, informed by Webforms, to extend HTML forms. At
 > the same time, there is a work item to look at how HTML forms
 > (existing and extended) can be thought of as XForm
 > equivalents, to allow an easy escalation path. A goal would
 > be to have an HTML forms language which is a superset of the
 > existing HTML language, and a subset of a XForms language
 > wit [sic] added HTML compatibility.
All of this has never happened, in fact the XForms 2.0 specification  
was never completed, the most recent draft is from 2010. The attempt  
to use a new HTML working group to gradually move the standard to a  
point where the switch to XML would be easier never worked, the  
WHATWG acted as if they are the only ones in charge, refusing to  
address concerns addressed by the other W3C people, which brought  
long fights where the WHATWG would continue fighting until the other  
side would give up; they acted as though the only role for the W3C  
was to approve what the WHATWG was doing, giving it legitimacy, and  
they have sadly succeeded; Ian Hickson is particularly guilty in this  
case.

Once it has been established that there is a need to put an end to  
the web as it is and create two succeding platforms, one, a new and  
reborn web based on XML/XPath/RDF/RDFa for data structurally and,  
eventually, semantically encoded and the other one, a remote  
application execution platform based on some appropriate programming  
technology, one may wonder why the change should be done now. Some,  
especially those whose web development falls in the second use, that  
of a remote application execution platform and probably not those  
whose web production falls in the first use, may think that the  
current web is usable enough for the said purpose. To address this,  
one must first remember that moving away from XML/XPath also means  
moving away from the structurally and semantically encoding  
technology which is problematic for the first of the two needs, that  
of the said structurally and semantically encoding technology,  
second, for the remote application execution need, the ill-suited  
nature of the current web, with the amount of needed workarounds it  
brings, means a huge amount of anually wasted manpower worldwide. The  
number of lost man-hours on an annual basis which are lost wordwide  
due to the issue of trying to get the web technologies to do  
something for which they are not suited is probably in the millions.  
The other big reason for doing the change now is that the underlying  
internet infrastructure is about to change, and it would be best to  
design the remote application execution platform around the new  
upcoming infracture. The said infrastructure change is the soon-to-be  
widespread deployment of edge-computing. The deployment of edge- 
computing will shift the approach from a client/server operation to a  
client / edge-server / remote-server operation; the adoption of 4.5G  
and 5G telephony will popularize the use of edge computing, which is  
expected to play a significant rôle on those networks. Of course,  
some more workarounds could probably be founds to operate the current  
web platform with the newly running edge-servers; however, the proper  
solution would be a technological redesign, it really would make  
sense to design the remote application execution platform for this  
model (client / edge-server / remote-server) from the beginning.  
There should be a standardized method for a client terminal to send a  
request to the closest edge-server with the identity of the service,  
available from a remote-server, service for which the edge-server  
would then download its portion of software (the part to be run on  
the edge-server), as well as that of the client, through a  
standardized mechanism, the standardized technology should also  
define how the edge-server is to send to the client its portion of  
the software (the part to be run on the client) coming from the  
remote-server and finally it should define the format of the software  
code. This would make remote application execution platform adapted  
to the presence of edge-computing. For the other need platform, that  
of a new and reborn web based on XML/XPath/RDF/RDFa for data  
structurally and eventually semantically encoded, there is no use for  
edge-servers, it is best to have the client computer contact directly  
the remote and sole server with the said remote and sole server doing  
most of the processing and simply serving the data to the client  
computer, with the only processing done on the client being for XSLT/ 
XPath/XForms/SMIL-animation/XML-events. The current World Wide Web  
consortium should be replaced by two new groups one to develop each  
of the succeding platforms. The current World Wide Web, from a great  
organization has turned into an ugly three-headed monster, one head  
is the semantic web / XML / RDF people, the second head is the WHATWG  
people trying to turn the web into a remote application execution  
framework, the third and final head is the copyright industry. The  
first new consortium, developping the structured and semantic web,  
based on XML / XPath / RDF / RDFa, should be a joint IETF/OASIS  
consortium, since the IETF is generallly commited to the openness of  
technologies and OASIS is where most work around XML has happened  
since XML has started leaving the web, this would help ensure good  
integration with other XML technologies and the proximity with many  
XML people (from OASIS) would help jumpstart the XML uptake; of  
course, Sir Timothy Berners-Lee would be the proper chair for the  
consortium. The second consortium concerned with creating a remote  
application execution platform should be a joint consortium between  
the IETF and a second group, the WHATWG being an option but the  
Khronos group being preferable (the platform function would fit well  
in its "connecting software to silicon" mandate), even the Object  
Management Group could be chosen. The following two paragraphs  
contain formal proposals for the subsequent two platforms which would  
badly need to succeed to the current WWW.

In creating the platform for the structured and semantic web, based  
on XML / XPath / RDF / RDFA, there are at least two sensible choices  
as a basis for the central language for the platform (meant to  
finally replace HTML). One is to resurect the XHTML2.0 working draft  
since, after all, the people behind it are competent and did good  
work and the XHTML2 was done with the very same objective which would  
be pursued by this platform. The second obvious sensible choice is to  
break-down some XML-based standards from OASIS, such as Docbook, into  
modules, taking all the modules which cover needed functionalities  
and completing the language with new modules, this would allow high  
integration with the other XML standards developed at OASIS (the main  
XML development group at the time). Of course, a fully new language  
could also be designed, but it would provide neither of the  
advantages of the previous two approaches. It is probably best not to  
call this XML-based new language XHTML as people see the letter  
sequence HTML (regardless of the leading X) they expect the language  
to be compatible with HTML4 and will fight anything which isn't (for  
the XHTML2.0 draft, it was likely a mistake not to have changed the  
name). Of course, the new consortium developing the platform should  
take-up the development of the core XML, XPath, RDF/RDFa (and even  
XForms and XInclude) languages. Since so many people seem to be  
obsessed with support for scripting/programming, it is probably a  
good idea to develop a complementary fully declarative scripting  
language, based on XPath, XML Events, and SMIL Animation, with some  
extra XML markup, to allow scripting which is fully XML/XPath based  
and avoid seing external languages being grafted on top of the  
platform as has happened to HTML with javascript. It is important  
that there is no cookies/webstorage, there should however be the  
option to use the protocol session mechanism either that of HTTP or  
of a new XML-based protocol (see the subsequent part of this  
paragraph). It would also be beneficial to define standardized  
styling mechanisms, both for visual stylesheets and  
aural.stylesheets. The visual styling mechanism should get rid of the  
GUI building components available in CSS3. For the aural stylesheet  
mechanism, one option is to use the already existing, but never  
implemented, W3C aural stylesheets, it would have the inconvenient of  
not being XML/XPath based, a better approach, however would be to  
start with an XML reformulation of the W3C aural stylesheets and  
replace the current selectors with XPath based selectors, this would  
give a language using XML for the styling definitions and XPath for  
the selectors. For the visual stylesheets, there are several possible  
options. One is to use CSS3 with the GUI building parts removed, it  
would however have the inconvenient of not being fully XML/XPath  
based, another option is to combine an XML reformulation of the CSS3  
styling definitions with XPath selectors, this would supply a fully  
XML/XPath solution, another option is to use XSLT/XSL-FO, another  
option would be to use XSLT to generate SVG data for rendering (there  
is unfortunately some overlap in capabilities in SVG and XSL-FO even  
though XML languages should ideally, instead of reimplementing  
functionalities, import modules from other languages). Of course, a  
fully new XML and/or XPath styling language can be created, however  
the previous approaches would allow better integration with existing  
technologies than a new language would. The visual styling mechanism  
should mandate the definition of two styling types: paper-like and  
video-like, chosen automatically based on a browser-setting  
parameter, forcing all users to have a paper-like rendering as is  
currently done on the web is not the best option; a video-like style  
uses pale text on a dark or coloured background with characters based  
on thick lines, with no serifs or limited serifs (terminal-font- 
like), may optionally use character outlines for readability on any  
image backgrounds, and can be easier to read for some users with  
impaired sight as well as be more suited to some display types, a  
paper-like style uses dark text on a pale background with characters  
of varying width and varying serif styles; most webpages and current  
GUIs can be described as paper-like while some vintage GUIs as well  
as most CLIs can be described as video-like. Mandating both in a  
visual stylesheet would allow the user to choose which to use through  
a simple browser setting. It is also worth considering wether the  
consortium should develop, for the platform, a new protocol to  
replace the HTTP(S), especially the SPDY based HTTP/2 which is  
problematic, protocol which would be XML-based, one possibility would  
be to have SOAP-over-TCP, similarly to the way that OASIS developped  
SOAP-over-UDP; again, it would be a good idea not to call this HTTP 
(S) to avoid raising false expectations. If combining SOAP-over-TCP  
with an XML session mechanism and XHTML2 / RDFa with XSLT / XForms /  
SMIL-animation / XSLT / XSL-FO or SVG / XML Events and SMIL Animation  
it would finally allow to only have XML/Xpath for the whole stack  
with no other technologies. Of the two platforms, the one for  
hyperlinked structured documents, possibly semantically encoded,  
documents based on XML/XPath, is the one to keep the name "the Web"  
or the "World Wide Web", since it would implement what the Web was  
meant to be. The other should be called something else, for example  
Online Service Platform (OSP).

The other platform, to implement remote software execution should be  
integrated with edge computing. There should be a standardized method  
for the client to contact the edge-server and indicate which service  
is to be accessed from a remote-server, the standard should specify  
the mechanism to download the edge-server code and client code from  
the remote-server to the edge server (and mechanisms for caching both  
on the edge-server when possible) and the mechanism to download the  
client code from the edge-server (previously received from the remote- 
server) to the client (on a per-module basis instead of all at once,  
module method which could also be used to transmit data as modules,  
data to be used by the software running on the client). There should  
be a mechanism for the client to request that the edge-server open,  
on its behalf, a session on the remote-server, transmitting its  
identity to the remote server, this would allow a given client  
operator to have a permanent account on the remote-server, even  
though the operator may use differing clients connecting to different  
edge-servers over time. Since edge-servers are being used, this is  
where the bulk of the processing workload should lie. The client  
processing workload should be mostly limited to rendering and  
handling user interaction with a few extras here and there, this  
allows to reduce power usage at the client-point (useful for portable  
devices), it also allows to reduce the needed processing power,  
except for audio/graphics rendering circuits, at the client point  
(reduces the manufacturing cost of the client devices). The remote- 
server processing workload should be limited to that which cannot be  
done on the edge-server, such as storing user data between sessions,  
serving the edge-server and client code to the edge servers,  
processing that which only needs to be calculated once before being  
sent to all the connected edge-servers and relaying data between the  
connected edge servers; pushing most of the load on the edge-server  
allows a lower latency operation for the client. There should also be  
a mechanism for the client to indicate its device class to the edge- 
server and the information should be available to the software  
downloaded from the remote-server and running on the edge-server;  
sensible device classes allowed should include at least the  
following: small_touchscreen, big_touchscreen, pointer_based,  
remotecontrol_based and maybe others. There should be two more flags  
accompanying the device class; the first one being the presence or  
absence of a keyboard, this would allow the software to modify its  
interface when no keyboard is available so as to reduce the need of  
the (hard to use) onscreen keyboard to the minimum and to increase  
the reliance on the keyboard when a physical keyboard is available;  
the second one being the presence or absence of a joystick/joypad,  
some game software may require a device class other than a  
small_touchscreen as well as either a pointer_based device class or  
another class accompanied by a keyboard or a joystick/joypad to be  
playable and may need to check for this. When the client connects to  
the edge-server, there should be a method for the client to transmit  
to the edge-server a parameter indicating its prefered colour scheme  
and text style (parameter made available and used by the software  
coming from remote-servers and running on the edge-server and  
client), with two options being available, video-like and paper-like;  
forcing all users to have a paper-like rendering as is currently done  
on the web is not the best option; a video-like style uses pale text  
on a dark or coloured background with characters based on thick  
lines, with no serifs or limited serifs (terminal-font-like), may  
optionally use character outlines for readability on any image  
backgrounds, and can be easier to read for some users with impaired  
sight as well as be more suited to some display types, a paper-like  
style uses dark text on a pale background with characters of varying  
width and varying serif styles; most operators of remotecontrol_based  
device class clients would likely opt for the video-like mode and  
operators of other device classes would use either, but users may  
have varying reasons, as stated previously, to choose either mode.  
Edge-server operators could maintain a list of problematic remote- 
server operators, used as a black list, which can help avoiding  
client operators being defrauded by unknowingly connecting to  
fraudulent online services. The platform should include a  
standardized payment mechanism used for services operated on a  
commercial basis. When a client operator opens a trasaction requiring  
a payment with a remote-server operator, there should be a mechanism  
for the edge-server operator to bill the client; the edge-server  
operator could then act as an escrow and wait until the service has  
been supplied in a satisfactory manner to transfer the funds to the  
service provider / remote-server operator; this will push the service  
providers / remote-service providers and client operators to behave  
properly as opposed to the far-west that is the current state of  
online business. For cases where client-devices need to have managed  
or limited payment initiating capabilities such as in internet cafés  
(where a client operator would first need to pay the café employee to  
make funds available before initiating a payment), there should be a  
mechanism for one client-device to manage the payment initiating  
capabilities of other client devices. When remote-servers announce  
the services which are available from them, there should be a  
mechanism to indicate if the service is fully free, fully paid or  
partly free and partly paid, it should also indicate if the free part  
has advertizing or not and if the paid part has advertizing or not.  
The mechanism should also allow indicating to which standardized  
category it belongs, there should be at least the following six  
categories (and maybe others): online-shopping, for-profit  
transactional accounts (banking, commercial utilities, etc.), non- 
commercial accounts (accounts at municipalities, provinces,  
countries, NGOs), non-commercial media, commercial media and other.  
This would allow easy classifying and finding of the services. The  
standard should specify some programming languages for execution on  
the edge-servers and clients. There should be several interpreted  
programming languages supported as standard and an intermediate  
representation language to support software written in other  
languages and compiled to the intermediate representation (this will  
avoid having one or more of the interpreted programming languages  
serving as a de facto intermediate language which is inefficient).  
For maximum portability and interoperatability, the intermediate  
representation language should be text based rather than binary, it  
should be endian-neutral by implementing program adressing using  
labels instead of hard adresses and data adressing by using named  
variables instead of hard adresses, by supporting strings as a native  
type and having numerical values (signed/unsigned ints/floats of  
varying lengths) specified in hexadecimal encoded big endian format  
(the most readable) and converted to the native binary format of the  
appropriate endianness by the back-end compiler running on the client  
and which generates the binary which is to run on the client-hosted  
virtual machine. The intermediate representation language should  
ideally be statically and strongly typed, with type handling left to  
the frontend compiler (the feaseability of handling type definitions  
and conversions in a compiler has been shown with the Nim and Crystal  
compilers). On the other hand, the intermediate representation  
language should be garbage collected as leaving the memory management  
to the software developper or the front-end compiler risks corrupting  
the memory of client-device (unless there is a garantee that the  
client has memory protection); the client should handle memory  
through compile-time garbage collection (in the back-end compiler  
producing the binary code for the virtual machine) or run-time  
garbage collection (inside the virtual machine). There should also be  
a mechanism for loading shaders on the client GPU from within the  
intermediate representation language, the best shader format probably  
being a slightly modified SPIR-V assembly, which would be endian- 
neutral, again by handling code adressing through labels instead of  
real addresses, by handling data adressing by using named variables  
instead of real addresses and by having numerical values written in  
big-endian format (the most readable) in the transfered code and  
converted to the final endianness by the client device before  
assembling the shader. For the interpreted languages, endian- 
neutralness can be handled by first choosing languages which do not  
allow direct manipulation of addresses, which is the case of most  
high engough languages and second by again having numerical values  
written in big-endian format (the most readable) in the transfered  
code and converted to the final endianness by the client device  
before interpreting the software code. The interpreted programming  
languages should have the choice to load the same type of GPU shader  
as the intermediate representation language or use a mid-level  
library. There should be hardware-accelerated OpenMAX DL (with a  
generic DCT/IDCT extension not tied to a particular use unlike the  
current versions which are for JPEG, MPEG4 AVC and MPEG4 SP only) /  
OpenSL ES, Vulkan and OpenVG available. As for the choice of  
interpreted programming languages, the following list might be a  
sensible choice: Python, which has become the high-level interpreted  
programming language of choice in the unix-like OSes community, Ruby,  
which positions itself as the competitor to Python and is used by  
those who wish to avoid Python, ISLisp as those programmers who do  
not identify with the unix culture often are adepts of Lisp and  
ISLisp is lightweight (and as such better suited to this use case)  
and consists of the common subset of the major Lisp variants,  
finally, the language Mercury, as it would put a purely declarative  
language on the list as an alternative to the imperative or hybrid  
languages, as it allows the use of three declarative programming  
paradigms (logic, functional and the declarative sub-variant of  
object-oriented) and, unlike most purely, declarative languages, it  
has a bit of uptake in the industry and outside of the academic  
world; of course other languages can be chosen. It may make sense to  
standardize the use of the same interpreted programming languages and  
intermediate representation language for the software running on the  
edge-servers as for that running on the clients as it would ease the  
development process. Big companies may use remote-servers, edge- 
servers (one per site) and on-site clients as an alternative to  
networked desktop computers, or they can let telecomuters use their  
own client and associated edge-server, to connect to the company  
remote-server to work on it, this approach of edge-server and client  
has the potential to replace part of the desktop market. As a last  
point, the case of DRM. DRM is fundamentally wrong and constitutes a  
stupid and useless idea, however, if the copyright industry is going  
to force it on a platform somewhere, it should rather be on this  
platform than elsewhere, it should definitively not be allowed on the  
other platform described previously (the one for hyperlinked,  
structured and semantically encoded documents), where openness is  
paramount. While trying to protect the "intellectual property" of the  
copyright industry, when implemented on user-owned client devices,  
DRM violates the physical property rights of the user. Client devices  
can come in two forms, user-owned and user-rented, when owned by the  
user, the user should be in control of the device, when user-rented,  
the owner renting-out the device defines the device-use limitations,  
having DRM on user-rented devices is a lesser evil. It could be  
decided that DRM on user-owned devices is prohibited while still  
allowing DRM on user-rented devices (the copyright industry would be  
free to make some content only available on rented devices if they  
really want to). While this is not a technical decision but a legal  
one, and as such, out of the scope of the people reading this, the  
various entities involved with the process can make it their official  
position that DRM on user-owned devices should be prohibited and, as  
such, help push for this legal concept. A broader deployment of  
rented client devices would probably resonate well with the public in  
this day and age. This is the time of XYZ-as-a-service all over the  
place, so having "services access as a service" would be bringing the  
concept to its ultimate level.

May XML live-on till the end of times.

Raphaël Hendricks
The problems and the future of the web and a form… Raphaël Hendricks
Re: The problems and the future of the web and a … Phillip Hallam-Baker
Re: The problems and the future of the web and a … Raphaël Hendricks
Re: [xml-dev] The problems and the future of the … Raphaël Hendricks
Re: The problems and the future of the web and a … Raphaël Hendricks
Re: [xml-dev] The problems and the future of the … John Cowan
Re: [xml-dev] The problems and the future of the … Raphaël Hendricks