[Din] distributed search engine

Stan Srednyak <stan.sredn@gmail.com> Fri, 02 July 2021 18:10 UTC

Return-Path: <stan.sredn@gmail.com>
X-Original-To: din@ietfa.amsl.com
Delivered-To: din@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 7CCE33A26C3 for <din@ietfa.amsl.com>; Fri, 2 Jul 2021 11:10:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id hrgnVnExvOex for <din@ietfa.amsl.com>; Fri, 2 Jul 2021 11:10:17 -0700 (PDT)
Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 900B93A26C0 for <Din@irtf.org>; Fri, 2 Jul 2021 11:10:13 -0700 (PDT)
Received: by mail-lf1-x12b.google.com with SMTP id w19so19593649lfk.5 for <Din@irtf.org>; Fri, 02 Jul 2021 11:10:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=IDZbiBU3B21p3lJ68KPPWORfujY9iDYrYfq0H8MUFLU=; b=Mvu89x2a1JuccgDFaQPEQGFXW4RKBao5Ta3Y2vPqonuw2/g6Wr+WBuiE52ttArTnTL c+FGWclTR+1ZNFFOzgHEJixxibSobW5TPVx3A8+ZQ67dhTq89DcMa/igA72n0hEXQ+iB uvUIh/lVNk0V9WhRF8y9UWpTpGINMHvMAFJJQqrhSLQFJHMWIimC1HrKgauuykR4cikK 2TLvatDFCV2Kp7VqJMQZPxH9R4+VlyJy811MFT9P10ku1D/GC5JEN+uzfZt9tZZ8CWGa I8bySxl0BJ0SCv4DHXv4wqw8vuvjMyx74LI/R2Hy+khUDW9GNpCJGwFQG7OGBZ2m2kLY 4L5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=IDZbiBU3B21p3lJ68KPPWORfujY9iDYrYfq0H8MUFLU=; b=dGw207FfFKLDwpH3ZGRrrmpRxLaTm1WSKKlqRCU7uVXPKGUG0Ii8z8gf6rc829221G sHhkuZKDIbiPVBybs6nR4OWfMxA1I5YP8QU+DeoTby7mT9DCTjOhVLl/0im0twueRIvX 2JRgi1RYjMz86pckptNRChJTwvld8lYNx7RzbVyxMxwSsd2AIKdIDNpoDfkJSFk4V3UG 4RJK1ouUQ3fPbGceD0E2Edv728241QZdWXa7zbaJmp2RDy4lFw+cJ8Lvcm0ZS53vYU0G Z0+vca/o7C3MOyzvAA7Ri1PxatloIm9uTbFjrxjtXzWxF0r7L8PB0eRklt87E/iwzNdw 9QcA==
X-Gm-Message-State: AOAM532NItPVQRN22POqjc8YdGctdOFqKABJtEEGRQtJZzqO3T/RPmRL L+pg4eWaOU9SGgdxw1YNcnZWBqOfcHIAoxv5XBd7KnGJKMjasA==
X-Google-Smtp-Source: ABdhPJz/DJ2o8LRbPTO5EDwXgObVMRS6p0HR5T6NRQaurfYT8FrqHtjMAWsqzqclyBeOcn3NdUY86IbMJ7BgHFi36Ys=
X-Received: by 2002:a05:6512:3332:: with SMTP id l18mr656750lfe.439.1625249410396; Fri, 02 Jul 2021 11:10:10 -0700 (PDT)
MIME-Version: 1.0
From: Stan Srednyak <stan.sredn@gmail.com>
Date: Fri, 2 Jul 2021 14:09:59 -0400
Message-ID: <CAE-786g_VpQLXkjXhRGuQkK+qes-RzLRL4FJ9ViSatHkiCwS-w@mail.gmail.com>
To: Din@irtf.org
Content-Type: multipart/alternative; boundary="000000000000c736ee05c627ddc3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/din/cv-OVx3xb8xgoOCaWDNj2ae9Ikw>
Subject: [Din] distributed search engine
X-BeenThere: din@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of distributed Internet Infrastructure approaches, aspects such as Service Federation, and underlying technologies" <din.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/din>, <mailto:din-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/din/>
List-Post: <mailto:din@irtf.org>
List-Help: <mailto:din-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/din>, <mailto:din-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Jul 2021 18:10:22 -0000


It seems that there is a high demand for development of decentralized
Internet. While decentralization must address many issues, in particular,
social networks, it is of utmost importance to develop decentralized search
engines, as search engines are the gateway to the web. Search engine
industry has been monopolized by a few large companies. While there are
many negative impacts that this centralization has had ( i.e., manipulation
of the rank), the one that is particularly conspicuous is the secrecy of
the ranking algorithms. It is highly desirable to have open ranking
algorithms and allow the users to choose from a variety of algorithms.

Some time ago I started working on the design and implementation of a
decentralized search engine. The basic idea of my approach is that while
computational power needed to realize a distributed search engine is
immense, it is possible to design a communication protocol that would
orchestrate data collection, analysis, ranking, and serving search queries
to users and split the work load among participating nodes. The
participating nodes in my design are computers of ordinary internet users.
I have developed corresponding algorithms that are necessary to realize
search and ranking operations on a distributed network of user computers.
One of the challenges lies in achieving acceptable latency (<1 second) in
serving search transactions. According to my estimates, it is possible to
achieve latency comparable to the existing search engines. In addition, I
have shown that it is possible to guarantee the computation and delivery of
the true rank (the one that is actually being requested by the user) to the
end users ( of course, there is the obvious problem that in decentralized
architectures the nodes may try to manipulate the rank, and rank some pages
unjustifiably high or low. Nonetheless, it is possible to design a network
communication protocol in such a way that it is highly improbable that
malicious nodes can manipulate the rank, as long as their total fraction is
below a certain threshold).

 Some of the details of the project can be found at https://rorur.com. To
incentivize people to maintain "search nodes" ( analogously to Ethereum
nodes), I proposed an architecture that allows individuals and companies to
advertise on this network, quite analogously to what is done on the usual
search engines, with the difference that the revenue is distributed to the
node maintainers. There are some details on how to achieve this in a secure
fashion, and some of them can be found on the site linked above. I will be
rolling out the first stage of this project quite soon, and I would like to
know if there is any interest in this project. Of course , this has to be a
collaborative project. It is impossible to run it on individual hardware (
although it is possible to deploy it on a centralized data center). There
will be several stages in the deployment, in particular, several versions
of the communication protocol. I will detail on these stages in a
forthcoming publication. There are various roles you can participate in the
project, from maintaining a node, to software development, to algorithm
design. I will be very happy to hear from you. I think this project ties in
really well with the spirit of this group and more generally, with the
spirit of IETF. As explained in the white paper, this project, if
successful, can lead to the transformation of the web into a "knowledge
system". There is a large discussion that is necessary here, but to make it
brief, it may allow for creation of personalized search and personal
knowledge graphs. It can also be instrumental in creating more robust
Internet infrastructure. I will try to develop this project in close
collaboration with the IETF community, because the issues it addresses have
to do with fundamental aspects of the web, at the level of protocols and
data routing. If I am not mistaken in my calculations, distributed search
operations can be added on top of the standard protocol stack and thus
become part of everyday web operation.

best regards,
Stan Srednayk