Re: [Din] distributed search engine

Jon Crowcroft <Jon.Crowcroft@cl.cam.ac.uk> Sat, 03 July 2021 11:11 UTC

Return-Path: <Jon.Crowcroft@cl.cam.ac.uk>
X-Original-To: din@ietfa.amsl.com
Delivered-To: din@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A48F3A0C25 for <din@ietfa.amsl.com>; Sat, 3 Jul 2021 04:11:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jvsRe5LIJcIQ for <din@ietfa.amsl.com>; Sat, 3 Jul 2021 04:11:28 -0700 (PDT)
Received: from mta1.cl.cam.ac.uk (mta1.cl.cam.ac.uk [IPv6:2a05:b400:110::25:1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 955063A0C1E for <Din@irtf.org>; Sat, 3 Jul 2021 04:11:28 -0700 (PDT)
Received: from ely.cl.cam.ac.uk ([2001:630:212:238:230:48ff:fefe:c314]) by mta1.cl.cam.ac.uk with esmtp (Exim 4.90_1) (envelope-from <Jon.Crowcroft@cl.cam.ac.uk>) id 1lzdYP-0007vS-UW; Sat, 03 Jul 2021 11:11:21 +0000
From: Jon Crowcroft <Jon.Crowcroft@cl.cam.ac.uk>
To: Stan Srednyak <stan.sredn@gmail.com>
cc: Din@irtf.org, Jon Crowcroft <Jon.Crowcroft@cl.cam.ac.uk>
In-reply-to: <CAE-786g_VpQLXkjXhRGuQkK+qes-RzLRL4FJ9ViSatHkiCwS-w@mail.gmail.com>
References: <CAE-786g_VpQLXkjXhRGuQkK+qes-RzLRL4FJ9ViSatHkiCwS-w@mail.gmail.com>
Comments: In-reply-to Stan Srednyak <stan.sredn@gmail.com> message dated "Fri, 02 Jul 2021 14:09:59 -0400."
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <25364.1625310681.1@ely.cl.cam.ac.uk>
Date: Sat, 03 Jul 2021 12:11:21 +0100
Message-Id: <E1lzdYP-0007vS-UW@mta1.cl.cam.ac.uk>
Archived-At: <https://mailarchive.ietf.org/arch/msg/din/pyuliky0bfILzv2WtgxXo-ujQdU>
Subject: Re: [Din] distributed search engine
X-BeenThere: din@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of distributed Internet Infrastructure approaches, aspects such as Service Federation, and underlying technologies" <din.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/din>, <mailto:din-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/din/>
List-Post: <mailto:din@irtf.org>
List-Help: <mailto:din-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/din>, <mailto:din-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Jul 2021 11:11:34 -0000

would be nice  - in the meantime, you an use
https://brave.com/search/
or duckduckgo
and not have adverts if you don't like them...
> hi DINRG,
> 
> It seems that there is a high demand for development of decentralized
> Internet. While decentralization must address many issues, in particular,
> social networks, it is of utmost importance to develop decentralized 
> search
> engines, as search engines are the gateway to the web. Search engine
> industry has been monopolized by a few large companies. While there are
> many negative impacts that this centralization has had ( i.e., 
> manipulation
> of the rank), the one that is particularly conspicuous is the secrecy of
> the ranking algorithms. It is highly desirable to have open ranking
> algorithms and allow the users to choose from a variety of algorithms.
> 
> Some time ago I started working on the design and implementation of a
> decentralized search engine. The basic idea of my approach is that while
> computational power needed to realize a distributed search engine is
> immense, it is possible to design a communication protocol that would
> orchestrate data collection, analysis, ranking, and serving search queries
> to users and split the work load among participating nodes. The
> participating nodes in my design are computers of ordinary internet users.
> I have developed corresponding algorithms that are necessary to realize
> search and ranking operations on a distributed network of user computers.
> One of the challenges lies in achieving acceptable latency (<1 second) in
> serving search transactions. According to my estimates, it is possible to
> achieve latency comparable to the existing search engines. In addition, I
> have shown that it is possible to guarantee the computation and delivery 
> of
> the true rank (the one that is actually being requested by the user) to 
> the
> end users ( of course, there is the obvious problem that in decentralized
> architectures the nodes may try to manipulate the rank, and rank some 
> pages
> unjustifiably high or low. Nonetheless, it is possible to design a network
> communication protocol in such a way that it is highly improbable that
> malicious nodes can manipulate the rank, as long as their total fraction 
> is
> below a certain threshold).
> 
>  Some of the details of the project can be found at https://rorur.com. To
> incentivize people to maintain "search nodes" ( analogously to Ethereum
> nodes), I proposed an architecture that allows individuals and companies 
> to
> advertise on this network, quite analogously to what is done on the usual
> search engines, with the difference that the revenue is distributed to the
> node maintainers. There are some details on how to achieve this in a 
> secure
> fashion, and some of them can be found on the site linked above. I will be
> rolling out the first stage of this project quite soon, and I would like 
> to
> know if there is any interest in this project. Of course , this has to be 
> a
> collaborative project. It is impossible to run it on individual hardware (
> although it is possible to deploy it on a centralized data center). There
> will be several stages in the deployment, in particular, several versions
> of the communication protocol. I will detail on these stages in a
> forthcoming publication. There are various roles you can participate in 
> the
> project, from maintaining a node, to software development, to algorithm
> design. I will be very happy to hear from you. I think this project ties 
> in
> really well with the spirit of this group and more generally, with the
> spirit of IETF. As explained in the white paper, this project, if
> successful, can lead to the transformation of the web into a "knowledge
> system". There is a large discussion that is necessary here, but to make 
> it
> brief, it may allow for creation of personalized search and personal
> knowledge graphs. It can also be instrumental in creating more robust
> Internet infrastructure. I will try to develop this project in close
> collaboration with the IETF community, because the issues it addresses 
> have
> to do with fundamental aspects of the web, at the level of protocols and
> data routing. If I am not mistaken in my calculations, distributed search
> operations can be added on top of the standard protocol stack and thus
> become part of everyday web operation.
> 
> 
> 
> best regards,
> Stan Srednayk
> 
> _______________________________________________
> Din mailing list
> Din@irtf.org
> https://www.irtf.org/mailman/listinfo/din
>