From: chris on
Hi all,

Given this data:

0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
0 - TCGTTGGTAACAATATCTAC-TTT-CT
3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T
0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT


sort v5.97 (as per Centos5.4) gives this:
> $ sort -k2 file
0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT
0 - TCGTTGGTAACAATATCTAC-TTT-CT
0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T

i.e. it's sorting on column 3 not 2.

sort v5.93 (as per Mac OS 10.5.8) gives:
0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT
0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
0 - TCGTTGGTAACAATATCTAC-TTT-CT
0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T

Which looks like it's sorting column 2 then 3. Anyone else seen this and
is it a bug?
Cheers,

Chris
From: Gordon Henderson on
In article <op.vb4nfilos4ghqh(a)caterpillar.compbio.dundee.ac.uk>,
chris <ithinkiam(a)gmail.com> wrote:
>Hi all,
>
>Which looks like it's sorting column 2 then 3. Anyone else seen this and
>is it a bug?

It's not a bug, but a feature.

See: http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

where it says:

# If you use bash or some other Bourne-based shell,
export LC_ALL=POSIX

# If you use a C-shell,
setenv LC_ALL POSIX

Gordon
From: chris on
On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson
<gordon+usenet(a)drogon.net> wrote:

> In article <op.vb4nfilos4ghqh(a)caterpillar.compbio.dundee.ac.uk>,
> chris <ithinkiam(a)gmail.com> wrote:
>> Hi all,
>>
>> Which looks like it's sorting column 2 then 3. Anyone else seen this and
>> is it a bug?
>
> It's not a bug, but a feature.
> See:
> http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>
> where it says:
>
> # If you use bash or some other Bourne-based shell,
> export LC_ALL=POSIX
> # If you use a C-shell,
> setenv LC_ALL POSIX
>
> Gordon

LOCALES strikes again! Thanks Gordon.
From: Tom Anderson on
On Mon, 3 May 2010, Gordon Henderson wrote:

> In article <op.vb4nfilos4ghqh(a)caterpillar.compbio.dundee.ac.uk>,
> chris <ithinkiam(a)gmail.com> wrote:
>> Hi all,
>>
>> Which looks like it's sorting column 2 then 3. Anyone else seen this and
>> is it a bug?
>
> It's not a bug, but a feature.
>
> See: http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

I don't get it. I tried that locally, and got tthe same problem as Chris,
and the solution there fixed it. But i don't understand why. Why does the
locale affect sorting of a column containing only + and -? Is it something
to do with how the columns are defined? Is it that the collation sequence
for en_GB.UTF-8 sorts '-' and '+' equally, and so sort falls back to
comparing the whole line? If the latter, is that not an astonishing bug?

tom

--
Science is the outcome of being prepared to live without certainty and
therefore a mark of maturity. -- AC Grayling
From: chris on
On Mon, 03 May 2010 13:32:09 +0100, chris <ithinkiam(a)gmail.com> wrote:

> On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson
> <gordon+usenet(a)drogon.net> wrote:
>
>> In article <op.vb4nfilos4ghqh(a)caterpillar.compbio.dundee.ac.uk>,
>> chris <ithinkiam(a)gmail.com> wrote:
>>> Hi all,
>>>
>>> Which looks like it's sorting column 2 then 3. Anyone else seen this
>>> and
>>> is it a bug?
>>
>> It's not a bug, but a feature.
>> See:
>> http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
>>
>> where it says:
>>
>> # If you use bash or some other Bourne-based shell,
>> export LC_ALL=POSIX
>> # If you use a C-shell,
>> setenv LC_ALL POSIX
>>
>> Gordon
>
> LOCALES strikes again! Thanks Gordon.

Wait a sec! This issue initially cropped up with a multi-column sort and I
thought I'd whittled it down to a 'simple' example. However, the original
problem is still not solved.

Given this file:
2 20140192 +
0 25394313 +
0 17128576 -
1 19332581 -
2 5214084 -
0 9019334 -
2 1232272 -
2 11075440 -
3 242532 +
3 7434705 -
1 19397725 -
1 8621880 +
2 17445849 -
1 6685383 -
4 15377341 +
1 14265470 +
3 796183 +
3 13285233 -
2 5241794 -
0 2370091 +


I want to sort on -k1n -k3 -k2n, but it still doesn't work even with
LC_ALL=POSIX? I can sort on columns 1 and 3 or 1 and 2, but three gives:
> $ sort -k1n -k3 -nk2 file

0 2370091 +
0 9019334 -
0 17128576 -
0 25394313 +
1 6685383 -
1 8621880 +
1 14265470 +
1 19332581 -
1 19397725 -
2 1232272 -
2 5214084 -
2 5241794 -
2 11075440 -
2 17445849 -
2 20140192 +
3 242532 +
3 796183 +
3 7434705 -
3 13285233 -
4 15377341 +

It's sorting cols 1 and 2, but not 3. What's wrong here?