Re: Advice on NID for media fingerprint

Jonas Oberg <> Tue, 03 May 2016 18:21 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C8BE012D0B7 for <>; Tue, 3 May 2016 11:21:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id FBY3U2DkJheh for <>; Tue, 3 May 2016 11:21:29 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8F90F12D10F for <>; Tue, 3 May 2016 11:21:29 -0700 (PDT)
Date: Tue, 3 May 2016 20:21:25 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;; s=dkim; t=1462299726; bh=xyW+cHS7Mc9DwppWvWLQkiE7MdmVyXOOaJk3e9Uepvw=; h=From:To:Cc:Subject:References:In-Reply-To:From; b=hiXR30gYNqxYbaFPeWwB8gB9qj+dqH/5tG8x2HgokGJSBBNjvMxCuGcNvWpKXnuZJ M8r/c5ZikKEVraGmpII1W6dEXB4ym7aKRpUywJyOfBsuWgv+PDFRxqHjrg+I07gmZH UZ9Vwi3gMOF7CeSVgf8bUQEk8vysye+bFlKjstZA=
From: Jonas Oberg <>
To: Sean Leonard <>
Subject: Re: Advice on NID for media fingerprint
Message-ID: <20160503182125.GA30415@silk>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <>
Archived-At: <>
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: discussion of new namespace identifiers for URNs <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 03 May 2016 18:21:32 -0000

Hi Dale, Sean,

many thanks for your feedback, it's well appreciated. The collision rate
of any algorithm in this line of work is almost bound to be larger than,
say, MD5 or SHA1. We can not exclude collissions, though some algorithms
are less likely to cause them than others.

Ultimately, the uniqueness is application-dependant: for some uses, a
fingerprint may offer more collisions, when having two or more works
identified by the same fingerprint provide useful information about
the relation between those images.

For other uses, a more unique fingerprint may be choosen if it's more
important to distinguish the individual works. We took inspiration
from RFC 1737 in this, where the authors in section 5 open up for the
possibility of allowing the naming authority to determine to what
extent they should distinguish resources from each other.

However, your point is well taken, and it could indeed be we should
consider another scheme for this. A URI scheme does not seem relevant
or useful, but perhaps we simply make do with a defined URL scheme
instead :-)


On Mon, May 02, 2016 at 05:23:05PM -0700, Sean Leonard wrote:
> Yep. Ditto.
> With URNs, resources/things named by the URN can’t collide. The namespace is supposed to guarantee that collisions (i.e., reassignment or multiple-assignment) does not occur.
> So URN is inappropriate.
> You could register a URI scheme, which has a lower barrier to entry (in some respects). However, it’s not even clear that you need these identifiers to fit in a URI protocol slot. If you don’t need to shoehorn the identifier into a URI protocol slot, don’t bother.
> Best regards,
> Sean
> > On Apr 25, 2016, at 10:53 AM, Dale R. Worley <> wrote:
> > 
> > Certainly the idea is interesting, but I'm not sure it qualifies as a
> > system of "names" -- in a sample of 100,000 images, 1% of the images had
> > the same blockhash as one or more other images.  (There are something
> > like 30 million images in Wikipedia, which would imply over 300,000
> > collisions.)
> > 
> > Dale
> > 

Jonas Öberg, Executive Director
Free Software Foundation Europe |
Your donation enables our work (