Re: [Blockchain-interop] Gatweay Crash Recovery Discussion #1

Thomas Hardjono <hardjono@mit.edu> Mon, 07 December 2020 14:06 UTC

Return-Path: <hardjono@mit.edu>
X-Original-To: blockchain-interop@ietfa.amsl.com
Delivered-To: blockchain-interop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C73A3A13A5 for <blockchain-interop@ietfa.amsl.com>; Mon, 7 Dec 2020 06:06:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UC28TCy79dOR for <blockchain-interop@ietfa.amsl.com>; Mon, 7 Dec 2020 06:06:24 -0800 (PST)
Received: from outgoing-exchange-1.mit.edu (outgoing-exchange-1.mit.edu [18.9.28.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6ED113A0D78 for <blockchain-interop@ietf.org>; Mon, 7 Dec 2020 06:06:20 -0800 (PST)
Received: from w92exedge3.exchange.mit.edu (W92EXEDGE3.EXCHANGE.MIT.EDU [18.7.73.15]) by outgoing-exchange-1.mit.edu (8.14.7/8.12.4) with ESMTP id 0B7E6HX7021734; Mon, 7 Dec 2020 09:06:19 -0500
Received: from oc11expo23.exchange.mit.edu (18.9.4.88) by w92exedge3.exchange.mit.edu (18.7.73.15) with Microsoft SMTP Server (TLS) id 15.0.1293.2; Mon, 7 Dec 2020 09:05:18 -0500
Received: from oc11expo23.exchange.mit.edu (18.9.4.88) by oc11expo23.exchange.mit.edu (18.9.4.88) with Microsoft SMTP Server (TLS) id 15.0.1365.1; Mon, 7 Dec 2020 09:06:05 -0500
Received: from oc11expo23.exchange.mit.edu ([18.9.4.88]) by oc11expo23.exchange.mit.edu ([18.9.4.88]) with mapi id 15.00.1365.000; Mon, 7 Dec 2020 09:06:05 -0500
From: Thomas Hardjono <hardjono@mit.edu>
To: Rafael Belchior <rafael.belchior@tecnico.ulisboa.pt>, Martin Hargreaves <martin.hargreaves@quant.network>
CC: "blockchain-interop@ietf.org" <blockchain-interop@ietf.org>
Thread-Topic: [Blockchain-interop] Gatweay Crash Recovery Discussion #1
Thread-Index: AQHWxzjaLfsSgHvBpEij+cXceb4u96nhMacpgABiJICAAVy3a4ABLMsAgAAPJQCAB2sagIAAG9D2
Date: Mon, 07 Dec 2020 14:06:05 +0000
Message-ID: <2dc519dbd48f404bbd01c4d96d13f646@oc11expo23.exchange.mit.edu>
References: <666e283e0d7a452fbf31dc7a42ec71b6@tecnico.ulisboa.pt>, <a1666b75233e112cd7d828ea4fa4fada@tecnico.ulisboa.pt> <a40dc7708df646b385e5ebbdcab43781@oc11expo23.exchange.mit.edu>, <a87a56a2e6e85666e32145d1c83e892e@tecnico.ulisboa.pt> <c0564c70e6a44fd29798e1ee6b4db5ae@oc11expo23.exchange.mit.edu> <baae5633d50058666b7b71bc45ec9ea8@tecnico.ulisboa.pt> <LO2P123MB3872DFE8357DA5B8737F80B8FCF30@LO2P123MB3872.GBRP123.PROD.OUTLOOK.COM>, <b73189779f4244f2bc71a424200ea32d@tecnico.ulisboa.pt>
In-Reply-To: <b73189779f4244f2bc71a424200ea32d@tecnico.ulisboa.pt>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [73.167.220.69]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/blockchain-interop/myIMwjU3KyfbZuH9HopirbXMs5k>
Subject: Re: [Blockchain-interop] Gatweay Crash Recovery Discussion #1
X-BeenThere: blockchain-interop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Blockchain Gateway Interoperability Protocol <blockchain-interop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/blockchain-interop>, <mailto:blockchain-interop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/blockchain-interop/>
List-Post: <mailto:blockchain-interop@ietf.org>
List-Help: <mailto:blockchain-interop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/blockchain-interop>, <mailto:blockchain-interop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Dec 2020 14:06:27 -0000

We may have to consider the resumption at 2 levels, namely:

(a) Asset-transfer protocol (ODAP) resumption (i.e. which step of the transfer protocol does G1 and G2 restart from);

(b) TLS session resumption (i.e. can G1 and G2 re-use the existing TLS parameters, or should they establish a new TLS session).


A sketch would look something like this (assuming G2 crashed):

G2 ---> G1:  Transfer-Resume [SessionID; crash-error-code]

G1 ---> G2:  Transfer-Continue[SessionID; protocol-step-X; TLS-resume]

G2 <---> G1: (re-establish TLS channel)

G2 ---> G1:  (protocol-step-X)


It'd be nice if the recovery protocol can work both ways (reversible), in that it does not matter whether its G1 or G2 that crashes (recovery protocol runs the same).



-- thomas --


________________________________________
From: Rafael Belchior [rafael.belchior@tecnico.ulisboa.pt]
Sent: Monday, December 7, 2020 2:17 AM
To: Martin Hargreaves
Cc: Thomas Hardjono; blockchain-interop@ietf.org
Subject: Re: [Blockchain-interop] Gatweay Crash Recovery Discussion #1

Hello Martin,
Yes - on the crash recovery protocol we ought to support recovery
messages that allow establishing to all parties what is the current
state of an asset transfer.

We can specify this in the "3.3 Recovery procedure" section.

Cheers,
Rafael


A 2020-12-02 14:00, Martin Hargreaves escreveu:
> Hi both,
>
> In terms of protocol messaging, how should we support this?
>
> It sounds like, on recovery, the recovered gateway scans its logs and
> finds a set of uncompleted transactions, then need to send some kind
> of "recovery message" to its counterparties - offers to continue
> processing, with a list of transactions and phases in each transaction
> to pick up from.
>
> The gateways that previously saw it time out as it crashed can then
> evaluate these and respond as to whether they wish to proceed or back
> out (or indeed don't recognise) the transactions.
>
> What do you think?
>
> Thanks
>
> Martin
>
>> -----Original Message-----
>> From: Blockchain-interop <blockchain-interop-bounces@ietf.org> On
>> Behalf
>> Of Rafael Belchior
>> Sent: Wednesday, December 2, 2020 1:07 PM
>> To: Thomas Hardjono <hardjono@mit.edu>
>> Cc: blockchain-interop@ietf.org
>> Subject: Re: [Blockchain-interop] Gatweay Crash Recovery Discussion #1
>>
>> CAUTION: This email originated from outside of the organisation. Do
>> not click
>> links or open attachments unless you recognise the sender and know the
>> content is safe.
>>
>>
>> Thomas,
>> Yes, those assumptions are reasonable; i) we can also consider the
>> case
>> where a public-private key pair is lost, and thus a new pair needs to
>> be
>> generated. ii) I think we can assume a new SSL connection is created.
>> Do you
>> envision problems with this?
>>
>> I think it makes sense for the gateway to resume operations from the
>> last
>> message before the crash because this mode is blocking, in principle -
>> we can
>> resume from the last event.
>>
>> If anyone thinks this could be improved, please do not hesitate in
>> providing
>> feedback :)
>>
>> Cheers,
>> Rafael
>>
>>
>> A 2020-12-02 00:17, Thomas Hardjono escreveu:
>> > Thanks Rafael.
>> >
>> > Inline:
>> >
>> >>>> > -- Which mode (self-healing mode, or primary-backup mode) do you
>> >>>> >>> > recommend?  (Which one would be the simplest approach for
>> >>>> >>> > now, and
>> >>>> > what assumptions would we need to make).
>> >>>>
>> >>>> The self-healing mode is simpler, as the same machine eventually
>> >>>> recovers, continuing its operations since the latest log entry. It
>> >>>> does not require, in principle, to read from the log storage API.
>> >>>> However, we
>> >>>> are assuming it eventually recovers, and while this happens the
>> >>>> system is down, prejudicing availability.
>> >
>> > For this scenario, there may need to be some assumptions about the
>> > gateway node.
>> >
>> > For example, (i) the gateway recovers to 100% without any internal
>> > losses (e.g. loss of private-keys); (ii) that the SSL connection has a
>> > longer life-time than the duration of crash (unavailability); etc.
>> >
>> > For short-duration unavailabilities, would the gateway pick-up
>> > (restart) from the last message before crash, or does it start from
>> > the beginning of the Phase?
>> >
>> > Best
>> >
>> >
>> > -- thomas --
>> >
>> >
>> >
>> >
>> >> ________________________________________
>> >> From: Blockchain-interop [blockchain-interop-bounces@ietf.org] on
>> >> behalf of Rafael Belchior
>> >> [rafael.belchior=40tecnico.ulisboa.pt@dmarc.ietf.org]
>> >> Sent: Monday, November 30, 2020 11:49 AM
>> >> To: blockchain-interop@ietf.org
>> >> Subject: [Blockchain-interop] Gatweay Crash Recovery Discussion #1
>> >>
>> >> Dear All,
>> >> Attached, the slides of the first discussion on the crash recovery
>> >> mechanism for gateways, that took place during the last meeting.
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> --
>> >> Rafael Belchior
>> >> Ph.D. student in Computer Science and Engineering, Blockchain -
>> >> Técnico Lisboa https://rafaelapb.github.io/
>> >> https://www.linkedin.com/in/rafaelpbelchior/
>> >
>> > --
>> > Rafael Belchior
>> > Ph.D. student in Computer Science and Engineering, Blockchain -
>> > Técnico Lisboa https://rafaelapb.github.io/
>> > https://www.linkedin.com/in/rafaelpbelchior/
>>
>> --
>> Rafael Belchior
>> Ph.D. student in Computer Science and Engineering, Blockchain -
>> Técnico
>> Lisboa https://rafaelapb.github.io/
>> https://www.linkedin.com/in/rafaelpbelchior/
>>
>> --
>> Blockchain-interop mailing list
>> Blockchain-interop@ietf.org
>> https://www.ietf.org/mailman/listinfo/blockchain-interop
> This message is intended solely for the addressee and may contain
> privileged and confidential information. If you have received this
> message in error, please send it back to us, and immediately and
> permanently delete it. Do not use, copy or disclose the information
> contained in this message or in any attachment. Quant Network does not
> guarantee that this email has not been intercepted and amended or that
> it is virus free.

--
Rafael Belchior
Ph.D. student in Computer Science and Engineering, Blockchain - Técnico
Lisboa
https://rafaelapb.github.io/
https://www.linkedin.com/in/rafaelpbelchior/