Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

This is the sort of English up with which I will not put. -- Winston Churchill


aus+uk / uk.comp.os.linux / Re: Pipe cleanup of text - help needed

SubjectAuthor
o Re: Pipe cleanup of text - help neededGrant Taylor

1
Re: Pipe cleanup of text - help needed

<2596511257@f1.n221.z2.fidonet.fi>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=297&group=uk.comp.os.linux#297

  copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!F7FIqN6dkowTZ1CLxZIWTQ.user.46.165.242.75.POSTED!not-for-mail
From: Grant.Ta...@f1.n221.z2.fidonet.fi (Grant Taylor)
Newsgroups: uk.comp.os.linux
Subject: Re: Pipe cleanup of text - help needed
Date: Sun, 01 Aug 2021 18:57:40 +0200
Organization: rbb soupgate
Message-ID: <2596511257@f1.n221.z2.fidonet.fi>
References: <772741015@f0.n0.z0.fidonet.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="19565"; posting-host="F7FIqN6dkowTZ1CLxZIWTQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-MailConverter: SoupGate-OS/2 v1.20
X-Notice: Filtered by postfilter v. 0.9.2
X-Comment-To: All
 by: Grant Taylor - Sun, 1 Aug 2021 16:57 UTC

On 8/1/21 11:02 AM, Java Jive wrote:
> I want to clean this up so that only the first and last of each
> section are output, separated by a single line containing just '...'.

It's not just /first/ and /last/ line of a group. There also seems to
be a component on a minimum number of lines. E.g. "Genealogy Of Job" is
only two lines, but you aren't inserting "..." between the first and
last member of the group.

> Can anyone suggest a way of doing this by piping the output through
> awk or sed on the fly, rather than having to write a program to
> post-process the index?

I don't see a way to do this in the 90 seconds that I've looked at it.
However I do see a thread that might be worth pulling at. Maybe someone
else, perhaps the OP, will see the next step.

I would be inclined drop the last item (term?) from the base file name,
with the intention of turning this:

Unknown Person's Notebook - End 0
Unknown Person's Notebook - End 1
Unknown Person's Notebook - End 2
Unknown Person's Notebook - End 3
Unknown Person's Notebook - End 4
Unknown Person's Notebook - End 5

Into this:

Unknown Person's Notebook - End
Unknown Person's Notebook - End
Unknown Person's Notebook - End
Unknown Person's Notebook - End
Unknown Person's Notebook - End
Unknown Person's Notebook - End

This seems like something you could run through uniq (-c) to have a
start at finding ""duplicate / incremental parts ~> bases of file names.

You could probably use that as information to drive a decision to
truncate the output or not.

I feel like this may need multiple passes through the input; one to
identify when things need to be abbreviated / truncated and another as
the source of the data to be abbreviated / truncated or not. This means
that it's not exactly conducive to a typical STDIN -> STDOUT like filter.

The next thing to think about is trying to leverage sed's hold space and
doing a comparison of the current line to the hold space. -- I don't
do this often enough to know how to do this. But, this probably does
have the advantage of being able to do this in a single pass.

Seeing as how this plays on coparing adjacent lines of text, it will
almost certainly be predicated on the list being sorted.

However, you can't blindly strip off the file extension (and last part
of the name). Lest you combine file-1.png, file-2.jpg, and file-3.gif.

You really seem to be talking about something that can dynamically allow
for one element in a series of file (base) names differ and
conditionally truncate them. But you don't want to truncate
file-1.{png,jpg,gif} where the base name is the same but the extension
is the only part that differs.

This seems like a non-trival problem for simply parsing text.

--
Grant. . . .
unix || die

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor