Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"A mind is a terrible thing to have leaking out your ears." -- The League of Sadistic Telepaths


devel / comp.unix.shell / sort by multiple columns

SubjectAuthor
* sort by multiple columnsMartin Τrautmann
+* Re: sort by multiple columnsSpiros Bousbouras
|`- Re: sort by multiple columnsMartin Τrautmann
+* Re: sort by multiple columnsJanis Papanagnou
|`* Re: sort by multiple columnsJanis Papanagnou
| `- Re: sort by multiple columnsMartin Τrautmann
+* Re: sort by multiple columnsHelmut Waitzmann
|`* Re: sort by multiple columnsMartin Τrautmann
| `* Re: sort by multiple columnsHelmut Waitzmann
|  `* Re: sort by multiple columnsMartin Τrautmann
|   +* Re: sort by multiple columnsHelmut Waitzmann
|   |`* Re: sort by multiple columnsHelmut Waitzmann
|   | `* Re: sort by multiple columnsMartin Τrautmann
|   |  `* Re: sort by multiple columnsMartin Τrautmann
|   |   `* Re: sort by multiple columnsHelmut Waitzmann
|   |    +* Re: sort by multiple columnsBen Bacarisse
|   |    |`- Re: sort by multiple columnsHelmut Waitzmann
|   |    `* Re: sort by multiple columnsMartin Τrautmann
|   |     +* Re: sort by multiple columnsLew Pitcher
|   |     |`* Re: sort by multiple columnsMartin Τrautmann
|   |     | `* Re: sort by multiple columnsKeith Thompson
|   |     |  `* Re: sort by multiple columnsMartin Τrautmann
|   |     |   +- Re: sort by multiple columnsSpiros Bousbouras
|   |     |   `* Re: sort by multiple columnsKeith Thompson
|   |     |    `- Re: sort by multiple columnsMartin Τrautmann
|   |     `- Re: sort by multiple columnsKaz Kylheku
|   +* Re: sort by multiple columnsDavid W. Hodgins
|   |+* The size of pipes (Was: sort by multiple columns)Kenny McCormack
|   ||+* Re: The size of pipes (Was: sort by multiple columns)Janis Papanagnou
|   |||`* Re: The size of pipes (Was: sort by multiple columns)Kenny McCormack
|   ||| `- Re: The size of pipes (Was: sort by multiple columns)Kaz Kylheku
|   ||+* Re: The size of pipes (Was: sort by multiple columns)Spiros Bousbouras
|   |||+- Re: The size of pipes (Was: sort by multiple columns)Spiros Bousbouras
|   |||`* Re: The size of pipes (Was: sort by multiple columns)Janis Papanagnou
|   ||| `* Re: The size of pipes (Was: sort by multiple columns)Richard Harnden
|   |||  `- Re: The size of pipes (Was: sort by multiple columns)Janis Papanagnou
|   ||+* Re: The size of pipesFelix Palmen
|   |||+* Re: The size of pipesJanis Papanagnou
|   ||||`* Re: The size of pipesFelix Palmen
|   |||| +* Re: The size of pipesDavid W. Hodgins
|   |||| |`- Re: The size of pipesJanis Papanagnou
|   |||| `* Re: The size of pipesJanis Papanagnou
|   ||||  +- Re: The size of pipesSpiros Bousbouras
|   ||||  `* Re: The size of pipesFelix Palmen
|   ||||   `- Re: The size of pipesJanis Papanagnou
|   |||`- Re: The size of pipesDavid W. Hodgins
|   ||+- Re: The size of pipes (Was: sort by multiple columns)David W. Hodgins
|   ||`* Re: The size of pipes (Was: sort by multiple columns)John-Paul Stewart
|   || +* Re: The size of pipes (Was: sort by multiple columns)David W. Hodgins
|   || |`* Re: The size of pipes (Was: sort by multiple columns)Kaz Kylheku
|   || | `- Re: The size of pipesFelix Palmen
|   || +- Re: The size of pipes (Was: sort by multiple columns)Lew Pitcher
|   || `* Re: The size of pipes (Was: sort by multiple columns)vallor
|   ||  `* Re: The size of pipes (Was: sort by multiple columns)Janis Papanagnou
|   ||   +* Re: The size of pipes (Was: sort by multiple columns)Geoff Clare
|   ||   |`* Re: The size of pipes (Was: sort by multiple columns)Kenny McCormack
|   ||   | `* Re: The size of pipes (Was: sort by multiple columns)David W. Hodgins
|   ||   |  `- Re: The size of pipes (Was: sort by multiple columns)Geoff Clare
|   ||   `- Re: The size of pipes (Was: sort by multiple columns)Eric Pozharski
|   |`* Re: sort by multiple columnsMartin Τrautmann
|   | +* Re: sort by multiple columnsChris Elvidge
|   | |`* Re: sort by multiple columnsMartin Τrautmann
|   | | `* Re: sort by multiple columnsRichard Harnden
|   | |  `* Re: sort by multiple columnsMartin Τrautmann
|   | |   +* Re: sort by multiple columnsLew Pitcher
|   | |   |`- Re: sort by multiple columnsMartin Τrautmann
|   | |   `- Re: sort by multiple columnsDavid W. Hodgins
|   | `- Re: sort by multiple columnsHelmut Waitzmann
|   `- Re: sort by multiple columnsJanis Papanagnou
+* Re: sort by multiple columnsDr Eberhard W Lisse
|+- Re: sort by multiple columnsMartin Τrautmann
|`* Re: sort by multiple columnsMartin Τrautmann
| `* Re: sort by multiple columnsKenny McCormack
|  `* Re: sort by multiple columnsMartin Τrautmann
|   +* Miller (Was: sort by multiple columns)Kenny McCormack
|   |+- Re: Miller (Was: sort by multiple columns)Martin Τrautmann
|   |+- Re: Miller (Was: sort by multiple columns)Martin Τrautmann
|   |`- Re: Miller (Was: sort by multiple columns)Dr Eberhard W Lisse
|   +* Re: sort by multiple columnsgerg
|   |`- Re: sort by multiple columnsDr Eberhard W Lisse
|   `- Re: sort by multiple columnsDr Eberhard W Lisse
+* Re: sort by multiple columnsPopping Mad
|`* Re: sort by multiple columnsMartin Τrautmann
| +* Re: sort by multiple columnsKaz Kylheku
| |`* Re: sort by multiple columnsMartin Τrautmann
| | `* Other tools (Was: sort by multiple columns)Kenny McCormack
| |  `* Re: Other tools (Was: sort by multiple columns)Martin Τrautmann
| |   `* Re: Other tools (Was: sort by multiple columns)Chris Elvidge
| |    +* Re: Other tools (Was: sort by multiple columns)Janis Papanagnou
| |    |`* Re: Other tools (Was: sort by multiple columns)Kenny McCormack
| |    | +- Re: Other tools (Was: sort by multiple columns)Janis Papanagnou
| |    | `* Re: Other tools (Was: sort by multiple columns)Kaz Kylheku
| |    |  `- Re: Other toolsKeith Thompson
| |    `- Re: Other toolsKeith Thompson
| `- Re: sort by multiple columnsKenny McCormack
`* Re: sort by multiple columnsBenjamin Esham
 `* Re: sort by multiple columnsMartin Τrautmann
  `* Re: sort by multiple columnsBenjamin Esham
   `* Re: sort by multiple columnsMartin Τrautmann
    `* Re: sort by multiple columnsJanis Papanagnou
     +* Re: sort by multiple columnsDavid W. Hodgins
     `- Re: sort by multiple columnsBenjamin Esham

Pages:12345
sort by multiple columns

<slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6113&group=comp.unix.shell#6113

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: t-use...@gmx.net (Martin Τrautmann)
Newsgroups: comp.unix.shell
Subject: sort by multiple columns
Date: Wed, 19 Apr 2023 09:27:12 +0200
Organization: slrn user
Lines: 85
Message-ID: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
Reply-To: traut@gmx.de
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="cff29aa190cfd830722c48fe2b79833e";
logging-data="4164536"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19FSCCpq2Iws9egHf0m0YW/"
User-Agent: slrn/1.0.3 (Darwin)
Cancel-Lock: sha1:2ZbAhumBKEzm5DG6KowCOG1kyYw=
X-No-Archive: Yes
 by: Martin Τrautmann - Wed, 19 Apr 2023 07:27 UTC

Hi all,

how do I sort by multiple columns?

Example:
+++
Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590
Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833
Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915
Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693
Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026
Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356
Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684
Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854
Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032
Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635
Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865
Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545
Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833
Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380
Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462
Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137
Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
+++

I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3

So the result should be
+++
Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635
Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833
Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915
Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137
Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833
Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380
Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462
Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865
Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545
Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590
Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684
Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693
Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854
Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356
Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032
Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026
Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
+++

I tried both
sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
and
sort -k4 -t";" -n -k2,2 -k3,3
and some permutations and reverted orders, without success.
The sort by column 4 just gets lost or resorted.

I'm not sure about the man page
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)

So I tried relative positions with
-k3,1
as well, without success.

How do I apply the sort syntax properly?

Thanks
Martin

Re: sort by multiple columns

<G+BrL7i7U29f8chHB@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6114&group=comp.unix.shell#6114

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Wed, 19 Apr 2023 08:43:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <G+BrL7i7U29f8chHB@bongo-ra.co>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 19 Apr 2023 08:43:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="67ec64f7ee191c07f019d55e7cda97a9";
logging-data="4186242"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18iL4fHEppmEHHJ2PW1Tsdn"
Cancel-Lock: sha1:m/XoMpyID187uQmeEnheHFzArZQ=
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
In-Reply-To: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
 by: Spiros Bousbouras - Wed, 19 Apr 2023 08:43 UTC

On Wed, 19 Apr 2023 09:27:12 +0200
Martin Trautmann <t-usenet@gmx.net> wrote:
>
> Hi all,
>
> how do I sort by multiple columns?
>
> Example:
> +++
> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
> Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
> Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
> Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
> Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590
> Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833
> Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915
> Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693
> Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026
> Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356
> Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684
> Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854
> Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032
> Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
> Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635
> Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865
> Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545
> Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833
> Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380
> Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462
> Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137
> Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
> Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
> +++
>
> I want to sort
> * first by column 4, numerical,
> * second by column 2
> * third by column 3
>
> So the result should be
> +++
> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
[...]
> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354

Why are these 2 lines sorted this way ? Column 4 is the same ("1" in
both) so it boils down to how "D9444" and "D9386" get sorted. What
comes first and why ? It seems to me that "D9386" comes earlier than
"D9444" .

Your locale may also turn out to be relevant so you should mention
that.

Unrelated but the first letter of your last name is unicode codepoint
3A4 which is the Greek upper case tau. Was this intentional or an
accident ?

Re: sort by multiple columns

<u1o9km$3voat$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6115&group=comp.unix.shell#6115

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Wed, 19 Apr 2023 10:44:05 +0200
Organization: A noiseless patient Spider
Lines: 100
Message-ID: <u1o9km$3voat$1@dont-email.me>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 19 Apr 2023 08:44:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d338c1ef2fe268c3af380f48faa85f53";
logging-data="4186461"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+yEFI2XPKlq97l2/0QUVvh"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:r8dnRR5RfuxKy9orT5CbB7coS3I=
In-Reply-To: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
 by: Janis Papanagnou - Wed, 19 Apr 2023 08:44 UTC

On 19.04.2023 09:27, Martin Τrautmann wrote:
>
> Hi all,
>
> how do I sort by multiple columns?
>
> Example:
> +++
> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
> Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
> Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
> Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
> Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590
> Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833
> Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915
> Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693
> Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026
> Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356
> Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684
> Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854
> Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032
> Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
> Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635
> Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865
> Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545
> Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833
> Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380
> Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462
> Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137
> Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
> Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
> +++
>
> I want to sort
> * first by column 4, numerical,
> * second by column 2
> * third by column 3

From that specification I'd write

sort -t\; -k4n -k2 -k3

but your expected data below doesn't follow your own spec. So the
specification probably needs a correction.

(Option -s for a "stable sort" may also be part of your solution.)

Janis

>
> So the result should be
> +++
> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
> Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116
> Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464
> Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468
> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
> Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584
> Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635
> Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833
> Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915
> Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137
> Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833
> Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380
> Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
> Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462
> Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865
> Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545
> Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590
> Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684
> Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693
> Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854
> Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356
> Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032
> Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026
> Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
> +++
>
> I tried both
> sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
> and
> sort -k4 -t";" -n -k2,2 -k3,3
> and some permutations and reverted orders, without success.
> The sort by column 4 just gets lost or resorted.
>
> I'm not sure about the man page
> -k, --key=POS1[,POS2]
> start a key at POS1, end it at POS2 (origin 1)
>
> So I tried relative positions with
> -k3,1
> as well, without success.
>
> How do I apply the sort syntax properly?
>
> Thanks
> Martin
>

Re: sort by multiple columns

<u1odjd$cda$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6116&group=comp.unix.shell#6116

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Wed, 19 Apr 2023 11:51:41 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <u1odjd$cda$1@dont-email.me>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<u1o9km$3voat$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 19 Apr 2023 09:51:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d338c1ef2fe268c3af380f48faa85f53";
logging-data="12714"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19enmUZHVMjOdCrZzurgifo"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:PjW7RNMcdxgaWJHFr6eNEFsAbL8=
In-Reply-To: <u1o9km$3voat$1@dont-email.me>
 by: Janis Papanagnou - Wed, 19 Apr 2023 09:51 UTC

On 19.04.2023 10:44, Janis Papanagnou wrote:
> On 19.04.2023 09:27, Martin Τrautmann wrote:
>>
>> Hi all,
>>
>> how do I sort by multiple columns?
>>[...]
>>
>> I want to sort
>> * first by column 4, numerical,
>> * second by column 2
>> * third by column 3
>
> From that specification I'd write
>
> sort -t\; -k4n -k2 -k3

Oops... - make that

sort -t\; -k4,4n -k2,2 -k3,3

>
> but your expected data below doesn't follow your own spec. So the
> specification probably needs a correction.

You probably meant something like

sort -t\; -k3,3 -k4,4n -k2,2

Janis

Re: sort by multiple columns

<slrnu3vh6r.34b.t-usenet@ID-685.user.individual.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6117&group=comp.unix.shell#6117

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: t-use...@gmx.net (Martin Τrautmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Wed, 19 Apr 2023 12:39:22 +0200
Organization: slrn user
Lines: 27
Message-ID: <slrnu3vh6r.34b.t-usenet@ID-685.user.individual.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<G+BrL7i7U29f8chHB@bongo-ra.co>
Reply-To: traut@gmx.de
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="3785da7d7584bd3c983af5dc90de9e87";
logging-data="24828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18BIBDuRC0dzynEkOPwE8HT"
User-Agent: slrn/1.0.3 (Darwin)
Cancel-Lock: sha1:ELB8WQrxF9iriyAN3RvuxNU+gaI=
X-No-Archive: Yes
 by: Martin Τrautmann - Wed, 19 Apr 2023 10:39 UTC

On Wed, 19 Apr 2023 08:43:15 -0000 (UTC), Spiros Bousbouras wrote:
>> So the result should be
>> +++
>> Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109
> [...]
>> Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354
>
> Why are these 2 lines sorted this way ? Column 4 is the same ("1" in
> both) so it boils down to how "D9444" and "D9386" get sorted. What
> comes first and why ? It seems to me that "D9386" comes earlier than
> "D9444" .

You took an example where both D9444 / D9386 and "Auf der Lindenstätte"
/ "Lindenstätte" differ.

> Your locale may also turn out to be relevant so you should mention
> that.

LANG=en_US.UTF-8
LC_ALL=en_US.UTF-8
LC_CTYPE=UTF-8

> Unrelated but the first letter of your last name is unicode codepoint
> 3A4 which is the Greek upper case tau. Was this intentional or an
> accident ?

unrelated - it's a check for proper UTF8 handling within headers.

Re: sort by multiple columns

<slrnu3vhbp.34b.t-usenet@ID-685.user.individual.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6118&group=comp.unix.shell#6118

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: t-use...@gmx.net (Martin Τrautmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Wed, 19 Apr 2023 12:42:01 +0200
Organization: slrn user
Lines: 12
Message-ID: <slrnu3vhbp.34b.t-usenet@ID-685.user.individual.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<u1o9km$3voat$1@dont-email.me> <u1odjd$cda$1@dont-email.me>
Reply-To: traut@gmx.de
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="3785da7d7584bd3c983af5dc90de9e87";
logging-data="24828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19dj2NXBrYetaiW+NckYWHL"
User-Agent: slrn/1.0.3 (Darwin)
Cancel-Lock: sha1:RY4eMTFtcN9QGvetjYXJBas2ieo=
X-No-Archive: Yes
 by: Martin Τrautmann - Wed, 19 Apr 2023 10:42 UTC

On Wed, 19 Apr 2023 11:51:41 +0200, Janis Papanagnou wrote:
> On 19.04.2023 10:44, Janis Papanagnou wrote:
>> On 19.04.2023 09:27, Martin Τrautmann wrote:
> You probably meant something like
>
> sort -t\; -k3,3 -k4,4n -k2,2

Wow, that's just perfect. I did not know I can attach the n directly to
the key option

Thanks!
Martin

Re: sort by multiple columns

<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6119&group=comp.unix.shell#6119

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nn.throt...@xoxy.net (Helmut Waitzmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sat, 22 Apr 2023 03:33:43 +0200
Organization: A noiseless patient Spider
Lines: 58
Sender: Helmut Waitzmann <12f7e638@mail.de>
Message-ID: <83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="e48fcfe179269963f68fe5e14e9d0fe2";
logging-data="3257566"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/S36i2kDtyi+/4njiROOU0NWIIBplWtf0="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:2kB36KibuBvKt37c/3SWEXaLPNs=
sha1:TkG6GmgE7mMhnTOatH8FfGOUyh8=
Mail-Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
Mail-Copies-To: nobody
 by: Helmut Waitzmann - Sat, 22 Apr 2023 01:33 UTC

> Martin Τραωτμανν <t-usenet@gmx.net>:
>
> how do I sort by multiple columns?
>

[An example text…]

> I want to sort
> * first by column 4, numerical,
> * second by column 2
> * third by column 3

[…with sorted result]

The sorted result of your example has apparently been sorted
according to the following description:

First, group the lines sorted by column 3, that is, sort the
lines in a manner that results in alphabetically ascending values
in column 3.

Then, in each group of lines, that have got a common value in
column 3, sort the lines independently in a manner that results
in alphabetically ascending values in column 2.

Then, in each group of lines that have got common values in
columns 3 and 2 respectively, sort the lines independently in a
manner that results in numerically ascending values in column 4.

Finally, each group of lines that has got equal values in columns
3, 2, and 4 according to the sort criteria as specified above, is
sorted according to a default sorting criterium which comprises
the whole line.

This can be achieved using the following commandline:

sort -t ';' -k 3,3 -k 2,2 -k 4,4n

You might read the description of the "sort" utility in the POSIX
standard
(<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#top>),
especially the last paragraph in the "OPTIONS" section
(<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>): 
"When there are multiple key fields, later keys shall be compared
only after all earlier keys compare equal.  Except when the -u
option is specified, lines that otherwise compare equal shall be
ordered as if none of the options -d, -f, -i, -n, or -k were
present (but with -r still in effect, if it was specified) and
with all bytes in the lines significant to the comparison.  The
order in which lines that still compare equal are written is
unspecified."

Re: sort by multiple columns

<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6120&group=comp.unix.shell#6120

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: t-use...@gmx.net (Martin Τrautmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sat, 22 Apr 2023 08:57:55 +0200
Organization: slrn user
Lines: 55
Message-ID: <slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
Reply-To: traut@gmx.de
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="3a52ccce8375e42a828f965b8cd6f9ee";
logging-data="3343190"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19H+k24itnHF4XWU52CTEnh"
User-Agent: slrn/1.0.3 (Darwin)
Cancel-Lock: sha1:fiS5H8lIr6YZLcvfUpumfeBk6L0=
X-No-Archive: Yes
 by: Martin Τrautmann - Sat, 22 Apr 2023 06:57 UTC

On Sat, 22 Apr 2023 03:33:43 +0200, Helmut Waitzmann wrote:
>> I want to sort
>> * first by column 4, numerical,
>> * second by column 2
>> * third by column 3
>
> […with sorted result]
>
>
> The sorted result of your example has apparently been sorted
> according to the following description:
>
> First, group the lines sorted by column 3, that is, sort the
> lines in a manner that results in alphabetically ascending values
> in column 3.

That's a matter of concern how the sort works.

If I want to pre-sort by 3 first, then sub-sort by column 2, that's
fine. But when I pipe one sort to the other, the second sort will
destroy the sort before. That's why i had my sort order in reverted
order, using a pipe example.

If all sorts can be done within a single command, the direct order works
better. I had not been aware of the direkt -k4,4n option, while the -n
option could not be applied by me as desired.

> You might read the description of the "sort" utility in the POSIX
> standard
> (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#top>),
> especially the last paragraph in the "OPTIONS" section
> (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>): 
> "When there are multiple key fields, later keys shall be compared
> only after all earlier keys compare equal.  Except when the -u
> option is specified, lines that otherwise compare equal shall be
> ordered as if none of the options -d, -f, -i, -n, or -k were
> present (but with -r still in effect, if it was specified) and
> with all bytes in the lines significant to the comparison.  The
> order in which lines that still compare equal are written is
> unspecified."

This description is much better than my man and info sort - but
unfortunately I can't be sure that the POSIX info actually does work on
my local sort implementation:
sort 5.93 November 2005

AUTHOR
Written by Mike Haertel and Paul Eggert.

REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.

COPYRIGHT
Copyright (C) 2005 Free Software Foundation, Inc.

Re: sort by multiple columns

<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6121&group=comp.unix.shell#6121

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nn.throt...@xoxy.net (Helmut Waitzmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sun, 23 Apr 2023 03:33:47 +0200
Organization: A noiseless patient Spider
Lines: 62
Sender: Helmut Waitzmann <12f7e638@mail.de>
Message-ID: <834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="6cc1fc411869ce73cca638c7a6ab0999";
logging-data="3701341"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19CAfidRM0a4DjlYR+3OgDbhGZ4LfRcZ6Q="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:NKXCDmCCDtRSFXFBysiEiXuV96E=
sha1:0Y03zMDt47WTyABg/KV+Batik2o=
Mail-Copies-To: nobody
Mail-Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>
 by: Helmut Waitzmann - Sun, 23 Apr 2023 01:33 UTC

Martin Τrautmann <t-usenet@gmx.net>:
> On Sat, 22 Apr 2023 03:33:43 +0200, Helmut Waitzmann wrote:
>>> I want to sort
>>> * first by column 4, numerical,
>>> * second by column 2
>>> * third by column 3
>>
>> […with sorted result]
>>
>>
>> The sorted result of your example has apparently been sorted
>> according to the following description:
>>
>> First, group the lines sorted by column 3, that is, sort the
>> lines in a manner that results in alphabetically ascending
>> values in column 3.
>
> That's a matter of concern how the sort works.
>
>
> If I want to pre-sort by 3 first, then sub-sort by column 2,
> that's fine. But when I pipe one sort to the other, the second
> sort will destroy the sort before. That's why i had my sort
> order in reverted order, using a pipe example.

That won't help, either:  A sorting pipe using (a standard)
"sort" won't solve the problem, because one cannot tell (a
standard) "sort" to do a sort on the given key option only.  Each
sort in the pipe will be total (according to its sort criteria)
of its own.

With GNU‐"sort", though, a sorting pipe can solve the problem, if
one applies the "--stable" option to each (except the first) of
the "sort" invocations.  Then the command

sort --stable -t ';' -n -k 4,4 |
sort --stable -t ';' -k 2,2 |
sort --stable -t ';' -k 3,3

will do the job.  (Unfortunately the "--stable" option is not
part of the POSIX standard.)

[A quote from the "sort" description in the POSIX standard]

> This description is much better than my man and info sort
>

Yes, that's my experience, too.  I tend to read not only the
manual page or info documentation but also look into the
corresponding POSIX description (if the utility is part of the
POSIX standard), and then check, whether the manual page or info
documentation conflicts with the POSIX description.

> - but unfortunately I can't be sure that the POSIX info actually
> does work on my local sort implementation: sort 5.93 November
> 2005

Yes, that might happen.  In practice, GNU tries to follow the
POSIX standard.

The size of pipes (Was: sort by multiple columns)

<u23fpe$2opsm$1@news.xmission.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6122&group=comp.unix.shell#6122

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gaze...@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.unix.shell
Subject: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 14:36:30 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <u23fpe$2opsm$1@news.xmission.com>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de> <834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de> <slrnu4a5im.34b.t-usenet@ID-685.user.individual.de> <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>
Injection-Date: Sun, 23 Apr 2023 14:36:30 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="2910102"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Sun, 23 Apr 2023 14:36 UTC

In article <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>,
David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
....
>Keep in mind. When sorting a file, the last line in the input may end up
>becoming the first line in the output. The sort can not write anything to
>the pipe or output file until it's sorted the entire input. With a pipe,
>the temporary file is in ram rather then being a named file on disk.

This actually raises an interesting point. Pipes are not infinite in size,
and they could, theoretically block if enough is written on the write end
without anything being read from the read end. Though the limits are
likely very large nowadays on modern systems, I think the original
implementation was only 4096 bytes and the standards today (POSIX) may not
guarantee anything more than that (haven't checked).

For most programs, this is rarely a concern, since most pipelines write and
read more or less simultaneously in real time, but sort is an edge case for
the reason you explain above.

Something to keep in mind if you ever decide to sort very large files in a
pipeline. And it is probably a better idea not to do so; to sort it all at
once, using multiple key specifications on the command line.

--
Rich people pay Fox people to convince middle class people to blame poor people.

(John Fugelsang)

Re: The size of pipes (Was: sort by multiple columns)

<u23gql$3rkl5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6123&group=comp.unix.shell#6123

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 16:54:12 +0200
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <u23gql$3rkl5$1@dont-email.me>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 23 Apr 2023 14:54:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="179f52437eefbb5747dfb2f8e88f4c7c";
logging-data="4051621"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aSexXz2KX1hpbEgbqZxCR"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:HnV3HUiKg+1rwEVUP7ZXKSD/VhM=
In-Reply-To: <u23fpe$2opsm$1@news.xmission.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sun, 23 Apr 2023 14:54 UTC

On 23.04.2023 16:36, Kenny McCormack wrote:
> [...]
>
> For most programs, this is rarely a concern, since most pipelines write and
> read more or less simultaneously in real time, but sort is an edge case for
> the reason you explain above.

Note also that there are quite some sorting operations inherently
used (e.g. in 'ls', in shells '*' glob/pattern expansion, etc.).
For example, don't expect find | xargs ls to provide a sorted
output.

>
> Something to keep in mind if you ever decide to sort very large files in a
> pipeline. [...]

In whatever way some instance of sort is implemented (memory, or
temporary files, or whatever), my expectation is that
whatever | sort
will have to produce sorted output .- Isn't that guaranteed?

Janis

Re: The size of pipes (Was: sort by multiple columns)

<20230423084531.125@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6124&group=comp.unix.shell#6124

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 15:51:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <20230423084531.125@kylheku.com>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net>
<u23fpe$2opsm$1@news.xmission.com> <u23gql$3rkl5$1@dont-email.me>
<u23ito$2osbe$1@news.xmission.com>
Injection-Date: Sun, 23 Apr 2023 15:51:53 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fdee22ba1634897cb943c668e9e98311";
logging-data="4067509"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KtNrpMblx8PzAiOfv/vHgAWu7vM+0bPU="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:rXGLwcZaHgBCnMUK9sms8UWWq6Y=
 by: Kaz Kylheku - Sun, 23 Apr 2023 15:51 UTC

On 2023-04-23, Kenny McCormack <gazelle@shell.xmission.com> wrote:
> The bad case would be if a program produced a ton of output, but the reader
> didn't read any of it. I'll have to think some more as to whether or not
> that applies here.

Limited pipe sizes cause two potential problems:

- deadlock: programs that both read and write a pipe may work when
tested with small messages, but lock up on larger ones.

- atomicity of writes: a write of a number of bytes smaller
than the pipe size can be read all together on the other end,
so the reading end will work correctly without checking for
a short read. When the message size exceeds the pipe size,
that breaks.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: The size of pipes (Was: sort by multiple columns)

<0fw8oLb25z6qFt02a@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6125&group=comp.unix.shell#6125

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 15:51:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <0fw8oLb25z6qFt02a@bongo-ra.co>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de> <834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de> <slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 Apr 2023 15:51:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2494b4dfcde8cbd5f394c874115563db";
logging-data="4070589"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180DQVIkypGhcs2e7/CSXyW"
Cancel-Lock: sha1:bdRzLEoEC9eiKH14uTS4bw3gTc0=
In-Reply-To: <u23fpe$2opsm$1@news.xmission.com>
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Sun, 23 Apr 2023 15:51 UTC

On Sun, 23 Apr 2023 14:36:30 -0000 (UTC)
gazelle@shell.xmission.com (Kenny McCormack) wrote:
> In article <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>,
> David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
> ...
> >Keep in mind. When sorting a file, the last line in the input may end up
> >becoming the first line in the output. The sort can not write anything to
> >the pipe or output file until it's sorted the entire input. With a pipe,
> >the temporary file is in ram rather then being a named file on disk.
>
> This actually raises an interesting point. Pipes are not infinite in size,
> and they could, theoretically block if enough is written on the write end
> without anything being read from the read end. Though the limits are
> likely very large nowadays on modern systems, I think the original
> implementation was only 4096 bytes and the standards today (POSIX) may not
> guarantee anything more than that (haven't checked).

I tried to find an argument which you can give to getconf to get the
answer to that but I didn't see anything. I don't think POSIX gives a constant
(in some C header) to get the answer to that. There is PIPE_BUF but this is
for atomic writes rather than total pipe capacity.

> For most programs, this is rarely a concern, since most pipelines write and
> read more or less simultaneously in real time, but sort is an edge case for
> the reason you explain above.
>
> Something to keep in mind if you ever decide to sort very large files in a
> pipeline. And it is probably a better idea not to do so; to sort it all at
> once, using multiple key specifications on the command line.

I don't see the problem. If sort is on the left of a pipe then it will
sort its whole input and then all it will do is write to the pipe. If sort
is on the right of a pipe then in the beginning it will only do reading
until it has read everything and then do the sorting. Obviously if you
have process1 | process2 and one side does reading or writing (whatever
applies) much slower than the other side then the fast side will block but
there's nothing special with sort about that. On the contrary , by the
nature of what it does , sort will only do reading or writing during part
of its operation.

--
Fans of both doomsday scenario movies and movies that show close-ups of Willem
Dafoe's pubic region should walk away eerily pleased from this one.
https://www.imdb.com/review/rw2553866/

Re: The size of pipes

<oh2ghj-veh.ln1@mail.home.palmen-it.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6126&group=comp.unix.shell#6126

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: fel...@palmen-it.de (Felix Palmen)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes
Date: Sun, 23 Apr 2023 18:21:44 +0200
Organization: palmen-it.de
Lines: 29
Message-ID: <oh2ghj-veh.ln1@mail.home.palmen-it.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de> <834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de> <slrnu4a5im.34b.t-usenet@ID-685.user.individual.de> <op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
Injection-Date: Sun, 23 Apr 2023 18:21:44 +0200
Injection-Info: dont-email.me; posting-host="b356e10f704f0fe7f2bc4230b5cdf758";
logging-data="4080716"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iFjp5ZryKNgk9MBEkV+Ct"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (FreeBSD/13.2-RELEASE (amd64)) tinews.pl/1.1.61
Cancel-Lock: sha256:g9iDMAlauFDiK2bYX/uV+xnu2NcnrVEg/n68/ADxqYQ=
sha1:M1ex5JOrdO5sQ/UnLoKsDbjNmks=
X-PGP-Hash: SHA256
X-PGP-Sig: GnuPG-v2 From,Newsgroups,Subject,Date,Injection-Date,Message-ID
iNUEARYIAH0WIQRpNhPVW79IN7ISOsxUreAGmHnyMQUCZEVbGF8UgAAAAAAuAChp
c3N1ZXItZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0Njkz
NjEzRDU1QkJGNDgzN0IyMTIzQUNDNTRBREUwMDY5ODc5RjIzMQAKCRBUreAGmHny
MQ6uAQC+D2gUKb27tQE4fkItmID5CrxZUk2e9TR7gq6GFBbH1QEAyh1Kwl4aUTLN
Oui/23L3GEqiZd1YDFw95vgOMn7UxQk=
=BsyS
X-PGP-Key: 693613D55BBF4837B2123ACC54ADE0069879F231
 by: Felix Palmen - Sun, 23 Apr 2023 16:21 UTC

* Kenny McCormack <gazelle@shell.xmission.com>:
> David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
> ...
>>Keep in mind. When sorting a file, the last line in the input may end up
>>becoming the first line in the output. The sort can not write anything to
>>the pipe or output file until it's sorted the entire input. With a pipe,
>>the temporary file is in ram rather then being a named file on disk.
>
> This actually raises an interesting point. Pipes are not infinite in size,
> and they could, theoretically block if enough is written on the write end
> [...]
> Something to keep in mind if you ever decide to sort very large files in a
> pipeline. And it is probably a better idea not to do so; to sort it all at
> once, using multiple key specifications on the command line.

This won't be a concern here. You need the whole data to sort something,
so the sort utility must read until EOF anyways before doing its work.
So, the real concern is whether you'll have enough RAM.

The only alternative would be to sort on the file contents. I don't know
whether some sort utility can do that (it certainly would create other
issues when sorting by "text lines" of very different lengths), but
that's not possible with pipes anyways, they can't be seeked.

--
Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//..........
{web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de
{pgp public key} http://palmen-it.de/pub.txt // """""""""""
{pgp fingerprint} 6936 13D5 5BBF 4837 B212 3ACC 54AD E006 9879 F231

Re: The size of pipes (Was: sort by multiple columns)

<op.13u3yypja3w0dxdave@hodgins.homeip.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6127&group=comp.unix.shell#6127

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dwhodg...@nomail.afraid.org (David W. Hodgins)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 12:26:48 -0400
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <op.13u3yypja3w0dxdave@hodgins.homeip.net>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="b43ae20efee7944fb0f321ac6b22f8a7";
logging-data="4083121"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9tNWvxGAAyFK014F1g1ebBRpLYM4xz04="
User-Agent: Opera Mail/12.16 (Linux)
Cancel-Lock: sha1:Jg/mJcNdc//QVHXt0JwTxNePr3I=
 by: David W. Hodgins - Sun, 23 Apr 2023 16:26 UTC

On Sun, 23 Apr 2023 10:36:30 -0400, Kenny McCormack <gazelle@shell.xmission.com> wrote:
> This actually raises an interesting point. Pipes are not infinite in size,
> and they could, theoretically block if enough is written on the write end
> without anything being read from the read end. Though the limits are
> likely very large nowadays on modern systems, I think the original
> implementation was only 4096 bytes and the standards today (POSIX) may not
> guarantee anything more than that (haven't checked).

Just tested "sort bigfile|hexdump|less". htop shows it's using 917M of ram
and 2.5GB of virtual storage (reserved, not all used) to sort a 730M input
file.

After ending the less output ...
$ free -m
total used free shared buff/cache available
Mem: 15955 5715 2376 361 7863 9548
Swap: 32761 2 32758

There may be versions of sort that are still limit how much ram it can use but
the version from the coreutils packages is not one of them. It's only limit is
based on the amount of ram and swap space available, and what the oom killer
can make available if you do start to run out.

Also note it has options such as "--temporary-directory=DIR" to use disk files
for temporary storage instead of ram.

Regards, Dave Hodgins

Re: The size of pipes (Was: sort by multiple columns)

<u23nc6$3sq0j$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6128&group=comp.unix.shell#6128

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 18:45:57 +0200
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <u23nc6$3sq0j$1@dont-email.me>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
<0fw8oLb25z6qFt02a@bongo-ra.co> <u23lqe$3sg1f$1@dont-email.me>
<u23mqv$3shd2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 23 Apr 2023 16:45:58 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="179f52437eefbb5747dfb2f8e88f4c7c";
logging-data="4089875"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+HZotFvN5zLdIEhK7Yp4Su"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:BAuM6JjvIr8V8hTtXErl8DITi0A=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <u23mqv$3shd2$1@dont-email.me>
 by: Janis Papanagnou - Sun, 23 Apr 2023 16:45 UTC

On 23.04.2023 18:36, Richard Harnden wrote:
>
> My man page says:

Thanks for that, since my man page doesn't say anything about the
algorithms. Now we have some clue what 'sort' on Unix does; and it
seems that hybrid sorting algorithms aren't implemented; which is
really strange since Quicksort implementations usually use Linear
Sort for small partitions, and upthread I already spoke about the
Mergesort/Heapsort hybrid. (Room for improvement? Or are they just
presuming that everything is doable with an arbitrary large virtual
memory? Who knows.)

>
> --radixsort
> Try to use radix sort, if the sort specifications allow.
> The radix sort can only be used for trivial locales (C and
> POSIX), and it cannot be used for numeric or month sort.
> Radix sort is very fast and stable.
>
> --mergesort
> Use mergesort. This is a universal algorithm that can
> always be used, but it is not always the fastest.
>
> --qsort
> Try to use quick sort, if the sort specifications allow.
> This sort algorithm cannot be used with -u and -s.
>
> --heapsort
> Try to use heap sort, if the sort specifications allow.
> This sort algorithm cannot be used with -u and -s.
>

Janis

Re: The size of pipes

<llzcGltkDoq262Vmi@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6129&group=comp.unix.shell#6129

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes
Date: Sun, 23 Apr 2023 18:03:36 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <llzcGltkDoq262Vmi@bongo-ra.co>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de> <834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de> <slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com> <oh2ghj-veh.ln1@mail.home.palmen-it.de>
<u23mmi$3slm9$1@dont-email.me> <1n4ghj-rti.ln1@mail.home.palmen-it.de> <u23p59$3t549$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 Apr 2023 18:03:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2494b4dfcde8cbd5f394c874115563db";
logging-data="4115130"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CN87ynkGc3saUtWIJSAcu"
Cancel-Lock: sha1:PBCVad7oU2ObxYPIj5HDUTYr+As=
In-Reply-To: <u23p59$3t549$1@dont-email.me>
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Sun, 23 Apr 2023 18:03 UTC

On Sun, 23 Apr 2023 19:16:25 +0200
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> On 23.04.2023 18:58, Felix Palmen wrote:
> > * Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
> >> On 23.04.2023 18:21, Felix Palmen wrote:
> >>>
> >>> This won't be a concern here. You need the whole data to sort something,
> >>> so the sort utility must read until EOF anyways before doing its work.
>
> s/doing/finishing/
>
> >> See my recent reply on a different view.
> >
> > So, even if it starts working on "chunks", this won't change anything:
> > the data from the pipe must be read in order to work with it, so the
> > size of the pipe won't be a problem here.
> >
> > It seems the idea assuming this was that the whole data to be sorted
> > must fit into the pipe buffer. But this isn't the case.
>
> It boils down to this; sorting can _start_ sorting with fewer data
> (something like a pipe-full), it can also _continue_ sorting with
> more parts of data, and to _finish_ sorting it naturally must have
> had all data available.

I think Kenny was worried in <u23fpe$2opsm$1@news.xmission.com>
and <u23ito$2osbe$1@news.xmission.com> about a deadlock situation where
no progress gets made because of low pipes capacity. I can't think of
a scenario where this can happen even if sort interleaves sorting and
reading from a pipe.

Re: The size of pipes (Was: sort by multiple columns)

<kalg1gF3o61U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6130&group=comp.unix.shell#6130

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!lilly.ping.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: jpstew...@personalprojects.net (John-Paul Stewart)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 15:42:00 -0400
Lines: 22
Message-ID: <kalg1gF3o61U1@mid.individual.net>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net XSJbMX4Z6MntSnOCJntlsQnKthyWWzelpDAdJ9qnXV7zXYXufW
Cancel-Lock: sha1:34S9uUIZ1I5GFDRXM0yWD0Kz16U=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Content-Language: en-CA
In-Reply-To: <u23fpe$2opsm$1@news.xmission.com>
 by: John-Paul Stewart - Sun, 23 Apr 2023 19:42 UTC

On 4/23/23 10:36, Kenny McCormack wrote:
> This actually raises an interesting point. Pipes are not infinite in size,
> and they could, theoretically block if enough is written on the write end
> without anything being read from the read end. Though the limits are
> likely very large nowadays on modern systems, I think the original
> implementation was only 4096 bytes and the standards today (POSIX) may not
> guarantee anything more than that (haven't checked).

FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
section that says in part:

Before Linux 2.6.11, the capacity of a pipe was the same as the
system page size (e.g., 4096 bytes on i386). Since Linux
2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
system with a page size of 4096 bytes). Since Linux 2.6.35,
the default pipe capacity is 16 pages, but the capacity can be
queried and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
PIPE_SZ operations. See fcntl(2) for more information.

So pipes on Linux aren't very large at all. I don't know how other Unix
systems compare.

Re: sort by multiple columns

<83o7nel6kp.fsf@helmutwaitzmann.news.arcor.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6131&group=comp.unix.shell#6131

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nn.throt...@xoxy.net (Helmut Waitzmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sun, 23 Apr 2023 21:52:22 +0200
Organization: A noiseless patient Spider
Lines: 72
Sender: Helmut Waitzmann <12f7e638@mail.de>
Message-ID: <83o7nel6kp.fsf@helmutwaitzmann.news.arcor.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="d525b036517d0b2582df1f879b18ae91";
logging-data="4149685"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+j6ALpPr6miaRlGfViphQniFTutK0brzY="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:NMmpIUO2vIuEKYcXPojxQsMo7jI=
sha1:tZ7gsuWv30da8oz7qBG/RoLQads=
Mail-Copies-To: nobody
Mail-Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
 by: Helmut Waitzmann - Sun, 23 Apr 2023 19:52 UTC

Martin Τrautmann <t-usenet@gmx.net>:
> On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
>>> If I want to pre-sort by 3 first, then sub-sort by column 2,
>>> that's fine. But when I pipe one sort to the other, the second
>>> sort will destroy the sort before. That's why i had my sort
>>> order in reverted order, using a pipe example.
>>
>> That won't help, either:  A sorting pipe using (a standard)
>> "sort" won't solve the problem, because one cannot tell (a
>> standard) "sort" to do a sort on the given key option only. 
>> Each sort in the pipe will be total (according to its sort
>> criteria) of its own.
>
> That was my problem - I expected that a pipe through several
> sorts would keep the order. I don't know why it doesn't.

Look at these sample lines:

1;0
1;1
1;2
0;0
0;1
0;2
2;0
2;1
2;2

To have this sequence of lines sorted in such a way that the
first field is sorted in ascending numeric order while the second
is sorted in descending numeric order, one could specify the two
sort criteria at once:

sort -t ';' -k 1nb,1 -k 2nr,2

How would the command line be if one would use two "sort"
invocations with each of them getting only one "-k" option
(replacing the "???" by the appropriate sort key specifications)?

first=??? ; second=???
sort -t ';' -k "$first" |
sort -t ';' -k "$second"

Or (if it's easier to understand, but it's equivalent) use an
intermediate file rather than a pipe:

first=??? ; second=???
sort -t ';' -k "$first" > file &&
sort -t ';' -k "$second" -- file

Try to answer the following questions:

Would the variable assignments

first=2nr,2
second=1nb,1

yield the correct result?  Why or why not?  Would they work if
one adds the GNU‐"sort" "--stable" option to the second "sort"
invocations?  Why or why not?

When using the variant with the intermediate file, after having
run the first "sort" invocation, you might examine the
intermediate file and try to predict what would be the outcome of
the second "sort" invocation.

Re: The size of pipes (Was: sort by multiple columns)

<op.13vd5llna3w0dxdave@hodgins.homeip.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6132&group=comp.unix.shell#6132

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dwhodg...@nomail.afraid.org (David W. Hodgins)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 16:06:47 -0400
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <op.13vd5llna3w0dxdave@hodgins.homeip.net>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
<kalg1gF3o61U1@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="b43ae20efee7944fb0f321ac6b22f8a7";
logging-data="4152531"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+PBSxFcJS7JFkIEthFOf3Z2CEIdPYByWw="
User-Agent: Opera Mail/12.16 (Linux)
Cancel-Lock: sha1:n5NOKJhnEdFXw5rTjVpRef/bG3o=
 by: David W. Hodgins - Sun, 23 Apr 2023 20:06 UTC

On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart <jpstewart@personalprojects.net> wrote:
> So pipes on Linux aren't very large at all. I don't know how other Unix
> systems compare.

The pipe only has to store a minimum of one buffer of data. If the process
writing data to the pipe is faster than the one reading it, then the write
process will block while it waits for the reading process to catch up.
Likewise if the reading process is faster. It will just block while it waits
for the data to be ready.

Having more buffers will speed it up only the processes run at different
speeds with the slower one being inconsistent in it's speed.

A good example of that is sort somefile>less.

If the the user presses page down repeatedly. Each time the faster sort process
has written enough data to fill the buffers, it gets blocked from writing until
the page down key is pressed and the less command reads the data for the next
screen full, freeing up some of the buffer space.

Note that when I write that the sort command is faster, by time the first
screen full shows up in less, all of the data has been sorted, it just needs
to be written to the output. Until the data is sorted, the less command is
blocked, waiting for input.

Regards, Dave Hodgins

Re: The size of pipes

<op.13u7oirpa3w0dxdave@hodgins.homeip.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6133&group=comp.unix.shell#6133

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dwhodg...@nomail.afraid.org (David W. Hodgins)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes
Date: Sun, 23 Apr 2023 13:46:56 -0400
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <op.13u7oirpa3w0dxdave@hodgins.homeip.net>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net> <u23fpe$2opsm$1@news.xmission.com>
<oh2ghj-veh.ln1@mail.home.palmen-it.de> <u23mmi$3slm9$1@dont-email.me>
<1n4ghj-rti.ln1@mail.home.palmen-it.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="b43ae20efee7944fb0f321ac6b22f8a7";
logging-data="4110169"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19G/qW4+cOX5m1Q+LJFHAYb+pdZnWYRzvU="
User-Agent: Opera Mail/12.16 (Linux)
Cancel-Lock: sha1:y5aFkmDW3/aTPNCwBTHAFtuorxQ=
 by: David W. Hodgins - Sun, 23 Apr 2023 17:46 UTC

On Sun, 23 Apr 2023 12:58:41 -0400, Felix Palmen <felix@palmen-it.de> wrote:
> It seems the idea assuming this was that the whole data to be sorted
> must fit into the pipe buffer. But this isn't the case.

As the last line of the input file(s) may be the first line of the final output,
all of the data must be sorted before anything is written to the pipe.

Either all of the data has to fit in ram, or it has to be sorted in chunks
with those chunks stored on disk, and then the chunks are then merged to
produce the output.

The coreutils package's sort can use temporary (unamed) files as needed in the
directory specified by the TMPDIR environment variable. (/tmp on most systems).
They wont show up in ls as they are unnamed.

It's not clear from the man page if it will always use temporary files or only
if instructed to. So checking the source ...
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/sort.c;h=8ca7a88c48ec07eccd952b14739e427721466c5d;hb=HEAD

If I'm reading it right, it always uses temporary files doing a sort/merge.
Given that it started in 1988, it's not surprising that it's designed to work
in a low ram environment.

So if you're in a low ram environment either ensure the $TMPDIR directory is
not in ram, or include the --temporary-directory=DIR to specify another
directory that is on a disk file system with enough free space.

Regards, Dave Hodgins

Re: The size of pipes (Was: sort by multiple columns)

<u24561$3tggh$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6134&group=comp.unix.shell#6134

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.unix.shell
Subject: Re: The size of pipes (Was: sort by multiple columns)
Date: Sun, 23 Apr 2023 20:41:37 -0000 (UTC)
Organization: The Pitcher Digital Freehold
Lines: 48
Message-ID: <u24561$3tggh$1@dont-email.me>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<op.13uwd4i8a3w0dxdave@hodgins.homeip.net>
<u23fpe$2opsm$1@news.xmission.com> <kalg1gF3o61U1@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 Apr 2023 20:41:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a3e7dc56dabd77e308545b066ea7b706";
logging-data="4112913"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oWSzs16gJZgrl5a+1RLMR47oOA6kUB3c="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:OL1R55bQIjEuuzY7xgN06GaCO/4=
 by: Lew Pitcher - Sun, 23 Apr 2023 20:41 UTC

On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart wrote:

> On 4/23/23 10:36, Kenny McCormack wrote:
>> This actually raises an interesting point. Pipes are not infinite in size,
>> and they could, theoretically block if enough is written on the write end
>> without anything being read from the read end. Though the limits are
>> likely very large nowadays on modern systems, I think the original
>> implementation was only 4096 bytes and the standards today (POSIX) may not
>> guarantee anything more than that (haven't checked).
>
> FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
> section that says in part:
>
> Before Linux 2.6.11, the capacity of a pipe was the same as the
> system page size (e.g., 4096 bytes on i386). Since Linux
> 2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
> system with a page size of 4096 bytes). Since Linux 2.6.35,
> the default pipe capacity is 16 pages, but the capacity can be
> queried and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
> PIPE_SZ operations. See fcntl(2) for more information.

And fcntl(2) says
F_SETPIPE_SZ (int; since Linux 2.6.35)
Change the capacity of the pipe referred to by fd to be at least
arg bytes. An unprivileged process can adjust the pipe capacity
to any value between the system page size and the limit defined
in /proc/sys/fs/pipe-max-size (see proc(5)).

On my Linux (untuned 4.4.301 kernel), /proc/sys/fs/pipe-max-size
is set to
16:35 $ cat /proc/sys/fs/pipe-max-size
1048576
or 1Mb

> So pipes on Linux aren't very large at all.

.... unless you tune them upward.

> I don't know how other Unix systems compare.

I've seen some studies; Linux pipe buffer sizes seem comparable to
other systems, which range in the 20K to 64K default size range, and
top out at about 1Mb.

HTH
--
Lew Pitcher
"In Skills We Trust"

Re: sort by multiple columns

<83jzy2l4tb.fsf@helmutwaitzmann.news.arcor.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6135&group=comp.unix.shell#6135

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nn.throt...@xoxy.net (Helmut Waitzmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sun, 23 Apr 2023 22:30:24 +0200
Organization: A noiseless patient Spider
Lines: 38
Sender: Helmut Waitzmann <12f7e638@mail.de>
Message-ID: <83jzy2l4tb.fsf@helmutwaitzmann.news.arcor.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
<83o7nel6kp.fsf@helmutwaitzmann.news.arcor.de>
Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="d525b036517d0b2582df1f879b18ae91";
logging-data="4162944"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+hdXNq46tQu2GzuxKmOFOkjGBBkFXExeQ="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:gM923Ez37lx2MvfkEdO93c4egfA=
sha1:k12g9vdqols/nBSbOXXtjuasuNc=
Mail-Copies-To: nobody
Mail-Reply-To: Helmut Waitzmann Anti-Spam-Ticket.b.qc3c <oe.throttle@xoxy.net>, Helmut Waitzmann <12f7e638@mail.de>
 by: Helmut Waitzmann - Sun, 23 Apr 2023 20:30 UTC

Helmut Waitzmann <nn.throttle@xoxy.net>:
> Look at these sample lines:
>
>
> 1;0
> 1;1
> 1;2
> 0;0
> 0;1
> 0;2
> 2;0
> 2;1
> 2;2
>
>
> To have this sequence of lines sorted in such a way that the
> first field is sorted in ascending numeric order while the
> second is sorted in descending numeric order,

I'm sorry, that is a quite misleading description.  What I wanted
to say is that the sequence of lines should be sorted to look
like

0;2
0;1
0;0
1;2
1;1
1;0
2;2
2;1
2;0

and to achieve this…

> one could specify the two sort criteria at once:
>
> sort -t ';' -k 1nb,1 -k 2nr,2

Re: sort by multiple columns

<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6136&group=comp.unix.shell#6136

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: t-use...@gmx.net (Martin Τrautmann)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sun, 23 Apr 2023 13:28:22 +0200
Organization: slrn user
Lines: 14
Message-ID: <slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
Reply-To: traut@gmx.de
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="c9db1d7d6ae6cfefe1f1662fd6f4d0fe";
logging-data="3987318"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX196mTJt7j7pfEmizPPvBtzg"
User-Agent: slrn/1.0.3 (Darwin)
Cancel-Lock: sha1:KaIwpXxse9DesR+/NfKcL1RVkZQ=
X-No-Archive: Yes
 by: Martin Τrautmann - Sun, 23 Apr 2023 11:28 UTC

On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
>> If I want to pre-sort by 3 first, then sub-sort by column 2,
>> that's fine. But when I pipe one sort to the other, the second
>> sort will destroy the sort before. That's why i had my sort
>> order in reverted order, using a pipe example.
>
> That won't help, either:  A sorting pipe using (a standard)
> "sort" won't solve the problem, because one cannot tell (a
> standard) "sort" to do a sort on the given key option only.  Each
> sort in the pipe will be total (according to its sort criteria)
> of its own.

That was my problem - I expected that a pipe through several sorts would
keep the order. I don't know why it doesn't.

Re: sort by multiple columns

<op.13uwd4i8a3w0dxdave@hodgins.homeip.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=6137&group=comp.unix.shell#6137

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: dwhodg...@nomail.afraid.org (David W. Hodgins)
Newsgroups: comp.unix.shell
Subject: Re: sort by multiple columns
Date: Sun, 23 Apr 2023 09:43:06 -0400
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>
References: <slrnu3v5vd.m2.t-usenet@ID-685.user.individual.de>
<83fs8sn1jc.fsf@helmutwaitzmann.news.arcor.de>
<slrnu471bk.34b.t-usenet@ID-685.user.individual.de>
<834jp7mlfo.fsf@helmutwaitzmann.news.arcor.de>
<slrnu4a5im.34b.t-usenet@ID-685.user.individual.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="b43ae20efee7944fb0f321ac6b22f8a7";
logging-data="4027490"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4+K81wR56Hl07apd0klrKJZ5ZNqoG5GY="
User-Agent: Opera Mail/12.16 (Linux)
Cancel-Lock: sha1:hy87w98+SQGz0rmSNnk2zoD6AoM=
 by: David W. Hodgins - Sun, 23 Apr 2023 13:43 UTC

On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
> That was my problem - I expected that a pipe through several sorts would
> keep the order. I don't know why it doesn't.

It may be easier to understand if you use a temporary files instead of pipes.

Sorting the input file by column 4, numerical creating a first temporary file.
Sort the first temporary file by column 2 creating a second temporary file.
Sort the second temporary file by column 3 creating the output.

The last sort doesn't know that the prior two sorts have been done. It just
looks at the file it's giving and sorts it by column 3.

Using a pipe just takes the output of the first and second sort and uses it
directly as input for the next sort. All the pipe does is eliminate the
need for a temporary file.

Keep in mind. When sorting a file, the last line in the input may end up becoming
the first line in the output. The sort can not write anything to the pipe or
output file until it's sorted the entire input. With a pipe, the temporary
file is in ram rather then being a named file on disk.

Regards, Dave Hodgins

Pages:12345
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor