Re: #428 Accept-Language ordering for identical qvalues

Amos Jeffries <> Thu, 24 January 2013 08:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 27BA921F8717 for <>; Thu, 24 Jan 2013 00:39:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -8.819
X-Spam-Status: No, score=-8.819 tagged_above=-999 required=5 tests=[AWL=-0.079, BAYES_20=-0.74, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id InQG9yDk96WE for <>; Thu, 24 Jan 2013 00:39:12 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 22E4521F86FA for <>; Thu, 24 Jan 2013 00:39:12 -0800 (PST)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1TyIK8-0006F8-Ez for; Thu, 24 Jan 2013 08:38:16 +0000
Resent-Date: Thu, 24 Jan 2013 08:38:16 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1TyIK0-0006EP-I2 for; Thu, 24 Jan 2013 08:38:08 +0000
Received: from ([] by with esmtp (Exim 4.72) (envelope-from <>) id 1TyIJs-0006sc-Vk for; Thu, 24 Jan 2013 08:38:08 +0000
Received: from [] (unknown []) by (Postfix) with ESMTP id B72F3E6F8D for <>; Thu, 24 Jan 2013 21:37:30 +1300 (NZDT)
Message-ID: <>
Date: Thu, 24 Jan 2013 21:37:24 +1300
From: Amos Jeffries <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
References: <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-3.449, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: 1TyIJs-0006sc-Vk 52c6e8c38ae1b835764dc504153147f1
Subject: Re: #428 Accept-Language ordering for identical qvalues
Archived-At: <>
X-Mailing-List: <> archive/latest/16143
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

On 23/01/2013 2:53 a.m., Julian Reschke wrote:
> On 2013-01-22 14:40, Nicholas Shanks wrote:
>> On 17 January 2013 09:14, Julian Reschke wrote:
>>> On 2013-01-17 09:59, Roy T. Fielding wrote:
>>>> than there are servers that implement language negotiation and
>>>> actually want to resolve ties at random.
>>> They do not "want" to resolve at random; they do so because they have
>>> implemented what the spec says. There's no reason to create an 
>>> ordered list
>>> structure when the spec says that an unordered list is sufficient.
>> I think no implication of randomness should be permitted by the 
>> specifications.
>> They should instead require that a deterministic process be used, and
>> that, other than requests to services which explicitly exist to
>> provide random results (e.g. Wikipedia's "Random Page" link), the same
>> request should generate the same result providing nothing pertinent to
>> the resource has changed on the server.
>> Someone, I don't recall who, gave the example of a home page loading
>> blog posts via AJAX, where the blog posts are available in two
>> languages. Random selection between the variants, where (q * qs)
>> values are equal for both languages, or are being ignored, would

That would be me. Take a note of the Androids below...

> Can you please give an example of clients sending these kind of header 
> field values?
> Clients that care can provide different qvalues, and as a matter of 
> fact, they do.

Uhm. Lets see..... where shall I start ?
  I think an overview of what happens what agents "care" enough to send 
  Followed by a small sample of the 513 agents I have on record with no 
q-values at all.
  Judge for yourself which ones are interpreted better as sorted lists.

For starters I would like to say, that to be completely fair the 
majority of agents that I have on record (~54% of unique language:agent 
pair entries) *do* send q-values properly in accordance with the 
specification - and that same 54% of unique agent entries is all 
'voting' for the list to be ordered. I am presenting this sub-set as 
what types of complexity/confusion issues we are introducing when we 
rely solely on q-values to provide ordering semantics in the list.

WebKit ...

cs, en-us; 0.9, de-de; 0.8, ru-ru; 0.7
  - Mozilla/5.0 (X11; U; Linux; cs-CZ) AppleWebKit/532.4 (KHTML, like 
Gecko) Arora/0.10.1 Safari/532.4
  + do we consider that a list with q-values or not?
  + notice also how it is a much more "up to date" version the the 

en;q=1.0, en;q=0.5, zh-cn, zh;q=0.5, en;q=0.5
  - Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/300.21.012; 
Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like 
Gecko) Safari/413
  + Nokia Symbian and SonyEricsson WebKit/ 4XX-532 derived agents across 
the board seem to have 1 primary language set at q=1.0 followed by a 
list of others all sharing q=0.5 or no q-value at all as seen above.

cs-CZ, en-US
  - Mozilla/5.0 (Linux; U; Android 2.2; cs-cz; HTC Legend Build/FRF91) 
AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
  + Starting with WebKit/533 all the mobiles seem to have moved to this 
2-language model with something then "en-US"

da-DK, en-US
  - Mozilla/5.0 (Linux; U; Android 4.0.4; da-dk; GT-P5110 Build/IMM76D) 
AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30

  - Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; Valve Steam Client; 
) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.444.0 Safari/534.1

th-TH, en-US
  - Mozilla/5.0 (Linux; U; Android 4.0.3; th-th; A1 Build/IML74K) 
AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30

... and then we have iTunes. A massive "WTF?" going out to the iTunes 
developers if anyone is reading.

  -  iTunes-iPad/5.1.1 (2; 32GB; dt:74)

  - iTunes-iPhone/5.0 (4; 16GB)

  - iTunes-iPhone/4.3.5 (3; 16GB)

... spiders are mostly doing a remarkably good job. At least it looks 
that way until the q-values get involved.

  - Baiduspider+(+

  - Mozilla/5.0 (compatible; Steeler/3.5;

  ru, uk;q=0.8, be;q=0.8, en;q=0.7, *;q=0.01
  - Mozilla/5.0 (compatible; YandexBot/3.0; +
  + q=0.8 - Ukranian or Belarusian ?

  - TosCrawler/Nutch-1.5.1 
(; <dc-crawler at ml 
dot toshiba dot co dot jp>)
  + q=1.0 - English US or British? (no so much trouble for humans but 
for a search engine it might cause indexing trouble).

Don't know if you would call some of the major search engine bots 
popular or even "fixable problem"?

I host a translation server so it is likely that these below are from 
actual users working on text translation. You know, the kind of person 
who *really* objects to getting a randomly-wrong language displayed. 
Also these people are highly knowledgeable about language codes and what 
they mean, so if they entered these manually it was for a specific 
reason according to how they or their tools author interpreted the 
Accept-Language specs.

Note how the first entries have no q-value and are *sorted* as if they 
were q=1.0, which is what the spec says to do when no q-value is 
supplied remember ... Treat it as q=1.0.

  - Mozilla/5.0 (X11; Linux x86_64; rv:10.0.6) Gecko/20100101 
Firefox/10.0.6 Iceweasel/10.0.6
  + q=1.0 - Catalan Valencian or Spanish Catalan?
  + q=0.9 - Spanish or English? Generic or nationalized grammar?
  + q=0.8 - Spanish or Catalan Andoran or English or German or Catalan 
  + q=0.6 - want to try again with German or Catalan Generic?
  + q=0.5 - Spanish or Australian English or French?
  + q=0.4 - what about French or Russian?
  + q=0.3 - Argentine Spanish or Japanese?
  + q=0.1 - Spanish or Dutch?

  - Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: 
Gecko/20110303 Firefox/3.6.15
  + q=0.9 - English Generic or US-centric ?
  + q=0.8 - Dutch or English?
  + q=0.5 - German or Turkish?
  + q=0.3 - Dutch or German?
  + q=0.2 - English or Polish?
  + q=0.1 - German or English?
  + q=0.1 - oops Cancel that q=0.9 US English option.
  + q=0.0 - oops Cancel that q=0.9 generic English option.

  + I skip q=1.0 (none), q=0.7, q=0.6 and q=0.4 because these, while 
being alternatives sharing a q-value, are in the ISO definitions 
semantically equivalent aliases for the same language. So any selection 
algorithm other than if-it-exists is a waste of CPU cycles but not a 
user problem.

We have only a few agents sending "q=1.0", by my interpretation of 2616 
these few are the "correct" users of q-values when q=1:

  - w3m/0.5.2
  also the YoudaoBot spider with a mix of language codes. It seems to be 
trying to fetch different translations specifically for some reason.

en-us;q=1.0, es-ve;q=0.5
  - Mozilla/4.1 (U; BREW 3.1.5; en-US; Teleca/Q05A/INT)
  - NetFront/3.5.1 (BREW; U; en-us; LG; NetFront/3.5.1/AMB) 
Sprint LN510 MMP/2.0 Profile/MIDP-2.1 Configuration/CLDC-1.1
  there are a few other variations of this "NetFront/" framework from 
Samsung and LG mobile devices.

The rest (~50 unique agent:language pairs) using q=1.0 somewhere in the 
A-L header are all WebKit derived agents. We already covered how well 
they handle q-values.

Still a fair few browser few browser agents around with no q-values.

  - Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 
Gecko/2008070208 Firefox/3.0.1

- Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 
Gecko/20100401 Firefox/3.6.3

  - Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20081217 
Firefox/ Novarra-Vision/8.0

ru, en-US, en
  - Mozilla/5.0 (compatible; Konqueror/4.4; Linux) KHTML/4.4.5 (like Gecko)

ru, uk, en-US, en
- Mozilla/5.0 (compatible; Konqueror/4.4; FreeBSD) KHTML/4.4.3 (like Gecko)