Re: #428 Accept-Language ordering for identical qvalues

Mark Nottingham <mnot@mnot.net> Mon, 21 January 2013 02:08 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 517F421F8777 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 20 Jan 2013 18:08:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.925
X-Spam-Level:
X-Spam-Status: No, score=-7.925 tagged_above=-999 required=5 tests=[AWL=0.075, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s14iDktBnk85 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 20 Jan 2013 18:08:38 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id CBBEA21F8DD6 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 20 Jan 2013 18:08:37 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Tx6nC-0003k1-Eg for ietf-http-wg-dist@listhub.w3.org; Mon, 21 Jan 2013 02:07:22 +0000
Resent-Date: Mon, 21 Jan 2013 02:07:22 +0000
Resent-Message-Id: <E1Tx6nC-0003k1-Eg@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1Tx6n8-0003jH-3Q for ietf-http-wg@listhub.w3.org; Mon, 21 Jan 2013 02:07:18 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1Tx6n6-0004mw-Ck for ietf-http-wg@w3.org; Mon, 21 Jan 2013 02:07:18 +0000
Received: from [192.168.1.80] (unknown [118.209.240.13]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id B466622E1F3; Sun, 20 Jan 2013 21:06:50 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <50FCA047.8010101@treenet.co.nz>
Date: Mon, 21 Jan 2013 13:06:47 +1100
Cc: ietf-http-wg@w3.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <316F5F01-1C9F-4077-B400-68CDE6B391CA@mnot.net>
References: <em144175d2-e44d-4209-b5a2-f2dbf14d99d4@bombed> <50FCA047.8010101@treenet.co.nz>
To: Amos Jeffries <squid3@treenet.co.nz>
X-Mailer: Apple Mail (2.1499)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-4.1
X-W3C-Hub-Spam-Report: AWL=-2.220, BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Tx6n6-0004mw-Ck 5ee8fcbbe5079a9b80f2cb02c55878ee
X-Original-To: ietf-http-wg@w3.org
Subject: Re: #428 Accept-Language ordering for identical qvalues
Archived-At: <http://www.w3.org/mid/316F5F01-1C9F-4077-B400-68CDE6B391CA@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16074
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

That's interesting, thanks. 

One thing to add; even if the client includes a q=0, the server can still ignore it. 

Cheers,

P.S. If you are able (considering privacy issues, etc.) and want to dump such data in a useable format, feel free to ask for a repository on the github account.



On 21/01/2013, at 12:56 PM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> On 21/01/2013 12:30 p.m., Adrien W. de Croy wrote:
>>  ------ Original Message ------
>> From: "James M Snell" <jasnell@gmail.com <mailto:jasnell@gmail.com>>
>>> 
>>> +1.. in fact, for 2.0, I'd very much like to get rid of q-values entirely and depend entirely on order.
>>> 
>> same here.
>> The idea may have been laudable in 1998, but really, how can a web server tell if some resource is 80% better than another? A human needs to tell it, and humans have enough trouble with other things.
>> the q=0 option would need to be turned into a Naccept-* header or something.   But does anyone even use it outside of testing for 406 responses which never come?
> 
> My collection of 2 years worth of language headers says no.
> 
> Of 2018 unique Accept-Language header field-values;
>  1532 are using q-values in a strictly sorted list
>  491 are not using q-values
>  14 are using "q=0.0".
>  5 are using q-values and non-qvalues without ordering the sent list (1 looks otherwise normal, teh others are using puny-codes)
> 
> The 14 are also unique in being very long and having multiple entries with equal q-values. They are still without exception strictly ordered with the entries having no q-value entries first (as if q=1.0 was used for sort but omitted sending). They are also containing a number of oddities such as multiple entries for language codes with differing q-values.
> 
> NP: Of those 14 odd A-L headers noted above I have UA details on 8 of them. All claim to be Firefox but the Gecko dates do not line up with other info on those versions (the 11.0 was released some years before 3.5.9 on the same OS) so the whole input is a bit suspect.
> 
> 
> The 5 cases un-ordered list have puny-code values with no q-value being listed after an otherwise normal series of languages. Like so:
> "en-us,en;q=0.5,x-ns1qHkbtrt8Nhv,x-ns2E1e0Nnym7b6"
> 
> I have a few cases of q-value ordered list followed by wildcard "*" with no q-value. Sender obviously assuming the list is ordered.
> 
> 
> 
> Broken down by UA, which I started ~6 months ago at Juliens suggestion I have 54289 distinct UA visiting, of which;
>  21756 are not sending A-L header at all
>  19621 unique UA are using a single language code with no q-value
>  12495 unique UA are using q-values as above.
>  8 are sending only wildcard "*" or "*/*"
> 
> The remainder ~400 roughly match up with the 491 AL field-values not using q-values. Are older agents (Windows 98, NT, 2k stand out), agents sending the same language multiple times (VoilaBot variants and Safari there), or sending sub-language variants with the generic form last eg "en-GB,en", "en-US,en", "en-US,en,*" (Tablets and Mobile Safari mostly). Obviously assuming sorted lists even back into the Windows 98 ones.
> 
> There are also a few bots sending exactly 2 puny-code entries.
> 
> 
> Amos
> 

--
Mark Nottingham   http://www.mnot.net/