Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

What causes the mysterious death of everyone?


aus+uk / uk.comp.os.linux / Re: while loop taking input from file via iconv

SubjectAuthor
* while loop taking input from file via iconvJava Jive
+* Re: while loop taking input from file via iconvJ.O. Aho
|`* Re: while loop taking input from file via iconvMike Easter
| `* Re: while loop taking input from file via iconvJ.O. Aho
|  `* Re: while loop taking input from file via iconvMike Easter
|   +- Re: while loop taking input from file via iconvMike Easter
|   +- Re: while loop taking input from file via iconvAragorn
|   `* Re: while loop taking input from file via iconvMartin Gregorie
|    `* Re: while loop taking input from file via iconvMike Easter
|     `* Re: while loop taking input from file via iconvMartin Gregorie
|      +- Re: while loop taking input from file via iconvMike Easter
|      +* Re: while loop taking input from file via iconvWilliam Unruh
|      |+- Re: while loop taking input from file via iconvMike Easter
|      |+* Re: while loop taking input from file via iconvMartin Gregorie
|      ||+* Re: while loop taking input from file via iconvRichard Kettlewell
|      |||`* Re: while loop taking input from file via iconvMartin Gregorie
|      ||| +- Re: while loop taking input from file via iconvRichard Kettlewell
|      ||| +- Re: while loop taking input from file via iconvStéphane CARPENTIER
|      ||| `- Re: while loop taking input from file via iconvJasen Betts
|      ||`* Re: while loop taking input from file via iconvPaul
|      || `* Re: while loop taking input from file via iconvChris Elvidge
|      ||  `- Re: while loop taking input from file via iconvPaul
|      |`- Re: while loop taking input from file via iconvAragorn
|      `* Re: while loop taking input from file via iconvSpiros Bousbouras
|       `- Re: while loop taking input from file via iconvMartin Gregorie
+* Re: while loop taking input from file via iconvJasen Betts
|+- Re: while loop taking input from file via iconvPaul
|`* Re: while loop taking input from file via iconvSpiros Bousbouras
| `* Re: while loop taking input from file via iconvJasen Betts
|  `- Re: while loop taking input from file via iconvJava Jive
+* Re: while loop taking input from file via iconvSpiros Bousbouras
|+- Re: while loop taking input from file via iconvSpiros Bousbouras
|`* Re: while loop taking input from file via iconvJava Jive
| +* Re: while loop taking input from file via iconvMartin Gregorie
| |`- Re: while loop taking input from file via iconvJava Jive
| `- Re: while loop taking input from file via iconvStéphane CARPENTIER
`* Character Encoding (Was: while loop taking input from file via iconvJava Jive
 +* Re: Character Encoding (Was: while loop taking input from file via iconv )Spiros Bousbouras
 |`* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 | `- Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 +* Re: Character Encoding (Was: while loop taking input from file viaPaul
 |`* Re: Character Encoding (Was: while loop taking input from file viaPaul
 | `- Re: Character Encoding (Was: while loop taking input from file viaJ.O. Aho
 +* Re: Character Encoding (Was: while loop taking input from file viajak
 |`* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 | +* Re: Character Encoding (Was: while loop taking input from file via iconv )Spiros Bousbouras
 | |`* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 | | `* Re: Character Encoding (Was: while loop taking input from file viaMartin Gregorie
 | |  `* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 | |   `* Re: Character Encoding (Was: while loop taking input from file viaMartin Gregorie
 | |    `- Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 | `- Re: Character Encoding (Was: while loop taking input from file viajak
 +* Re: Character Encoding (Was: while loop taking input from file viaAndy Burns
 |`* Re: Character Encoding (Was: while loop taking input from file viaPaul
 | `* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
 |  +- Re: Character Encoding (Was: while loop taking input from file viajak
 |  +- Re: Character Encoding (Was: while loop taking input from file viaAndy Burns
 |  +- Re: Character Encoding (Was: while loop taking input from file viajak
 |  `- Re: Character Encoding (Was: while loop taking input from file viaJasen Betts
 `* Re: Character Encoding (Was: while loop taking input from file viaJava Jive
  `- Re: Character Encoding (Was: while loop taking input from file viajak

Pages:123
Re: while loop taking input from file via iconv

<lRQnLHehKOavRC3s7@bongo-ra.co>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=409&group=uk.comp.os.linux#409

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!OC6U9UkZn9R/lnxSpxG5YA.user.46.165.242.91.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 05:23:18 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <lRQnLHehKOavRC3s7@bongo-ra.co>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net> <ino8oeFueohU1@mid.individual.net>
<inpcg5F6m4oU1@mid.individual.net> <inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="9884"; posting-host="OC6U9UkZn9R/lnxSpxG5YA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Sun, 15 Aug 2021 05:23 UTC

On Sat, 14 Aug 2021 20:53:08 -0000 (UTC)
Martin Gregorie <martin@mydomain.invalid> wrote:
> On Sat, 14 Aug 2021 12:57:09 -0700, Mike Easter wrote:
> > The commands were designed to be very powerful; the greater the power
> > the greater the 'responsibility'. Sometimes when one sees a man result
> > he feels like he is drowning.
> >
> The other thing that needs to be taught is that each command is designed
> to do one thing and to do it well, which is why there are so *many*
> commands. hence the need to understand 'apropos', or better,
>
> apropos 'action name' | less
>
> as an aid to finding the command they want and than
>
> man commandname
>
> to see how to use it.
>
> > Of course, being thrown into the water and needing to swim back out is
> > one way to learn to swim :-)
> >
> .. and sadly there isn't a lot else. If the student is capable of using
> command lines on another system, then something like 'Linux in a
> Nutshell' may be helpful, but that's about the only decent book I know
> unless there's a 'Linux for Dummies' available and they're not put off by
> the title.

If I go on amazon and search for "Linux command line" I see many books and
they tend to have high rating average. Either you don't consider them decent
or you haven't performed any such search in a long time even just for
curiosity.

--
vlaho.ninja/prog

Re: while loop taking input from file via iconv

<87fsvbma46.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=410&group=uk.comp.os.linux#410

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!nntp.terraraq.uk!.POSTED.nntp.terraraq.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 08:37:45 +0100
Organization: terraraq NNTP server
Message-ID: <87fsvbma46.fsf@LkoBDZeT.terraraq.uk>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
<sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: mantic.terraraq.uk; posting-host="nntp.terraraq.uk:2a00:1098:0:86:1000:3f:0:2";
logging-data="4938"; mail-complaints-to="usenet@mantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:Uxc9GaKwZ5rYA2J9k0/k+pS/REE=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Sun, 15 Aug 2021 07:37 UTC

Martin Gregorie <martin@mydomain.invalid> writes:
> William Unruh wrote:
>
>> Says someone, apparently, who has never looked at the command "find", or
>> may other commands. billions of different option combinations, some of
>> which work, others of which do not.
>
> Quite right: I don't use it because 'locate' is *much* faster and easier
> to use, especially if updatedb is run overnight by a cronjob

Bad comparison, since it doesn’t do the same thing.

--
https://www.greenend.org.uk/rjk/

Re: while loop taking input from file via iconv

<sfao26$2mr$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=412&group=uk.comp.os.linux#412

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nos...@needed.invalid (Paul)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 05:51:04 -0400
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <sfao26$2mr$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net> <ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net> <inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me> <inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me> <sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 15 Aug 2021 09:51:03 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b49be95aa323b476bf5cc879910381ad";
logging-data="2779"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19cIMRG45zZ/QRdCEOVnHJp399KD9/1kF0="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:FHqloL/yxch/gTRqemC5QJ5k/zA=
In-Reply-To: <sf9l1q$q7c$2@dont-email.me>
 by: Paul - Sun, 15 Aug 2021 09:51 UTC

Martin Gregorie wrote:
> On Sat, 14 Aug 2021 22:52:08 +0000, William Unruh wrote:
>
>> Says someone, apparently, who has never looked at the command "find", or
>> may other commands. billions of different option combinations, some of
>> which work, others of which do not.
>>
> Quite right: I don't use it because 'locate' is *much* faster and easier
> to use, especially if updatedb is run overnight by a cronjob
>
> Similarly, 'apropos' is nearly as fast 'locate' since it only has to scan
> the contents of /usr/share/man/* - and in addition, because its scanning
> the first line of each manpage, it also matches words of phrases
> describing what a program does, so often searching on a word or phrase
> describing what a program does means you a suitable program without
> knowing its name:
>
> $ apropos 'free space'
> e2freefrag (8) - report free space fragmentation information
> xfs_spaceman (8) - show free space information about an XFS filesystem
>
> $ apropos 'space used'
> space used: nothing appropriate.
> $ apropos 'space usage'
> df (1) - report file system disk space usage
> du (1) - estimate file space usage
> du (1p) - estimate file space usage
>
> ... and its no use complaining about how well or badly a manpage is
> written: often the only way to fit would be to submit a manpage patch.
>
> Yes, I know some Linux manpages are pretty bad. However, others (those
> for bash, sort and awk to name but a few) are excellent and most are
> usable.

This is why you keep a "notes" file.

find /media/somedisk -type d -exec ls -al -1 -d {} + > directories.txt
find /media/somedisk -type f -exec ls -al -1 {} + > filelist.txt

Any time you put some effort into crafting one, you
record it for the future.

./ffmpeg -hwaccel nvdec -i "fedora.mkv" -y -acodec aac -vcodec h264_nvenc -crf 23 "output2.mp4"

16.3x speed, 488FPS

Part of the fun is making them cryptic, so you can't understand them later.

Paul

Character Encoding (Was: while loop taking input from file via iconv )

<sfavf7$r6h$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=417&group=uk.comp.os.linux#417

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Character Encoding (Was: while loop taking input from file via iconv
)
Date: Sun, 15 Aug 2021 12:57:24 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfavf7$r6h$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="27857"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: Java Jive - Sun, 15 Aug 2021 11:57 UTC

On 13/08/2021 20:28, Java Jive wrote:
> I have the following lines in a shell script ...
>
> while [ -n "${LINE}" ]
>     do
>         if [ -n "${LINE} ]
>             then
>                 # Do processing
>         fi
>     done < "${DATA}"
>
> .... and this works fine for all but two lines in the data file, which
> contain accented characters. A file erroneously named with an e acute
> needs to be renamed to have an e grave, and a filename containing an e
> umlaut needs to be moved to a new location and given a new name.

Uggghhh! The reason for this disgust will become clear shortly!

This is a follow up question about character encodings ...

Previously I have released to my family two versions of the same archive
of family documents going back to the reign of Queen Anne, some items
possibly a little earlier. These documents were scanned (1o for
original scan) and then put through four possible stages of post-processing:
2n Contrast 'normalised' using pnnorm
3t Textcleaned
4nt n followed by 3
5tn t followed by n

For each document, the best result was copied into the main archive,
while the above preprocessing stages were left in an '_all'
sub-directory structure, with five subdirectories named as above, each
of which having beneath it a directory tree mirroring the main archive.

The main version of the archive, which most family members seem to have
downloaded, only included the main archive and didn't include the _all
subdirectory with all the pre-processing results, the full version
included this directory. IIRC, the former was compressed by WinZip from
the archive as it existed on a Windows PC at the time, but WinZip threw
a wobbly over the size of the full archive, so for that I had to use 7zip.

Now the crunch, when I unzip these on a Linux machine, I see different
bastardisations of accented characters. So, for example where the full
7zip archive when extracted shows an e acute correctly in both a console
and a file manager listing ...
"Chat Botté, Le" [e is correctly acute]
.... (if you're wondering, a French children's picture book version of
apparently 'Puss In Boots'), while with the WinZip main archive a
console listing shows a very odd character sequence instead of the e
acute ...
"Chat Bott'$'\302\202'', Le"
.... and a file manager listing has a graphic character resembling a 2x2
matrix, concerning which note that while \302 octal = \xC2 hex, and
\202 octal = \x82 hex, only the second of these and not the first
appears in the symbol:
|00|
|82|

My problem is that I can't find a search term to trap this strange
character to correct it, for example the following, and a few similar
that I've tried, don't work because they don't find the directory:
mv "Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
mv Chat\ Bott\'$\'\\302\\202\'\',\ Le "Chat Botté, Le"

I could use a glob wildcard character such as '?', but currently all the
filenames are within quotes, where globbing doesn't seem to work, and it
would be a hell of a business removing the quotes, because many names in
the archive use many characters that would each need to be anticipated
and escaped for in an unquoted filename, such as spaces, ampersands,
brackets, etc.

Can anyone suggest a sequence that will find the file, when put inside
quotes as the filename in the controlling data file mentioned previously
in the thread, so that it can just be treated like all the other lines?
As someone here suggested the data file is now stored as UTF-8 rather
than ANSI as it was formerly, and some example lines are given below in
a form for easier readability in a ng - in reality the fields are tab
separated but here are separated by double spacing and have been further
abbreviated to keep them from wrapping; leading symbols such as '+' and
'=' have special meanings for the program doing the work; and, yes, the
commands are basically DOS commands which for Linux are translated to
their bash equivalents:

=ATTRIB -R "./F H/Close/Sts Mary & John Churchyard Monuments.pdf"
=RD "./F H /_all/1o/Blessig & Heyder"
REN "./Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
MOVE "./Photo - D & M Close.png" "./Photos/D & M Close.png"
[etc]

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: while loop taking input from file via iconv

<sfb1i6$r12$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=419&group=uk.comp.os.linux#419

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mar...@mydomain.invalid (Martin Gregorie)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 12:33:10 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <sfb1i6$r12$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
<sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me>
<87fsvbma46.fsf@LkoBDZeT.terraraq.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 15 Aug 2021 12:33:10 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bf18e8faaee48edb5abd87ea303ca2ea";
logging-data="27682"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/wiQOu1ffw8M0iMG6l7CE+t0SsCuhkCeo="
User-Agent: Pan/0.146 (Hic habitat felicitas; 8107378
git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:aErdjcZZfzGzwvQmV9ceBNnNGCE=
 by: Martin Gregorie - Sun, 15 Aug 2021 12:33 UTC

On Sun, 15 Aug 2021 08:37:45 +0100, Richard Kettlewell wrote:

> Martin Gregorie <martin@mydomain.invalid> writes:
>> William Unruh wrote:
>>
>>> Says someone, apparently, who has never looked at the command "find",
>>> or may other commands. billions of different option combinations, some
>>> of which work, others of which do not.
>>
>> Quite right: I don't use it because 'locate' is *much* faster and
>> easier to use, especially if updatedb is run overnight by a cronjob
>
> Bad comparison, since it doesn’t do the same thing.

Both look for filenames, but 'find' can be restricted to a directory
structure - thats about the difference I can see in a quick manpage scan.

--
--
Martin | martin at
Gregorie | gregorie dot org

Re: while loop taking input from file via iconv

<sfb2dc$r12$2@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=420&group=uk.comp.os.linux#420

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mar...@mydomain.invalid (Martin Gregorie)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 12:47:40 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <sfb2dc$r12$2@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
<lRQnLHehKOavRC3s7@bongo-ra.co>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 15 Aug 2021 12:47:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bf18e8faaee48edb5abd87ea303ca2ea";
logging-data="27682"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197pI8Tu9qfGSOkfkJBGqWLgaLs9vXp6ew="
User-Agent: Pan/0.146 (Hic habitat felicitas; 8107378
git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:ba0eCIlcjNbvt3w1G7/+SAs2JVg=
 by: Martin Gregorie - Sun, 15 Aug 2021 12:47 UTC

On Sun, 15 Aug 2021 05:23:18 +0000, Spiros Bousbouras wrote:

> On Sat, 14 Aug 2021 20:53:08 -0000 (UTC)
> Martin Gregorie <martin@mydomain.invalid> wrote:
>> On Sat, 14 Aug 2021 12:57:09 -0700, Mike Easter wrote:
>> > The commands were designed to be very powerful; the greater the power
>> > the greater the 'responsibility'. Sometimes when one sees a man
>> > result he feels like he is drowning.
>> >
>> The other thing that needs to be taught is that each command is
>> designed to do one thing and to do it well, which is why there are so
>> *many* commands. hence the need to understand 'apropos', or better,
>>
>> apropos 'action name' | less
>>
>> as an aid to finding the command they want and than
>>
>> man commandname
>>
>> to see how to use it.
>>
>> > Of course, being thrown into the water and needing to swim back out
>> > is one way to learn to swim :-)
>> >
>> .. and sadly there isn't a lot else. If the student is capable of using
>> command lines on another system, then something like 'Linux in a
>> Nutshell' may be helpful, but that's about the only decent book I know
>> unless there's a 'Linux for Dummies' available and they're not put off
>> by the title.
>
> If I go on amazon and search for "Linux command line" I see many books
> and they tend to have high rating average. Either you don't consider
> them decent or you haven't performed any such search in a long time even
> just for curiosity.

Back in 2003 when I was moving my home systems from OS9/68000 to Linux I
got a copy of 'UNIX in Nutshell', which contained most of what I needed'
except for sysadmin stuff - and I couldn't find anything useful for
system administration and hor the kernel is organised until I settled on
'Debian Reference':

https://www.debian.org/doc/manuals/debian-reference/

Since then I've added Fedora documentation:
https://fedoraproject.org/wiki/Category:Documentation?rd=Docs

and stuff on Systemd

https://www.freedesktop.org/software/systemd/man/systemd.html#

--
--
Martin | martin at
Gregorie | gregorie dot org

Re: while loop taking input from file via iconv

<87a6linab6.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=421&group=uk.comp.os.linux#421

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.nntp.terraraq.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 13:48:13 +0100
Organization: terraraq NNTP server
Message-ID: <87a6linab6.fsf@LkoBDZeT.terraraq.uk>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
<sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me>
<87fsvbma46.fsf@LkoBDZeT.terraraq.uk> <sfb1i6$r12$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: mantic.terraraq.uk; posting-host="nntp.terraraq.uk:2a00:1098:0:86:1000:3f:0:2";
logging-data="9377"; mail-complaints-to="usenet@mantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:CmMesnaFWNkHebfAnObNkcD9aVA=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Sun, 15 Aug 2021 12:48 UTC

Martin Gregorie <martin@mydomain.invalid> writes:
> Richard Kettlewell wrote:
>> Martin Gregorie <martin@mydomain.invalid> writes:
>>> William Unruh wrote:

>>>> Says someone, apparently, who has never looked at the command
>>>> "find", or may other commands. billions of different option
>>>> combinations, some of which work, others of which do not.
>>>
>>> Quite right: I don't use it because 'locate' is *much* faster and
>>> easier to use, especially if updatedb is run overnight by a cronjob
>>
>> Bad comparison, since it doesn’t do the same thing.
>
> Both look for filenames,

No. I mean, that’s one thing find can do, but it’s nowhere near all of
it.

> but 'find' can be restricted to a directory structure - thats about
> the difference I can see in a quick manpage scan.

find has numerous ways of selecting which files to act on and what to do
with them. It also differs more fundamentally in that it traverses the
real directory structure, rather than an incomplete snapshot made at
some point in the past.

--
https://www.greenend.org.uk/rjk/

Re: Character Encoding (Was: while loop taking input from file via iconv )

<0NDDCyPcV8G14Fiq6@bongo-ra.co>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=422&group=uk.comp.os.linux#422

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!OC6U9UkZn9R/lnxSpxG5YA.user.46.165.242.91.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via iconv )
Date: Sun, 15 Aug 2021 12:58:39 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <0NDDCyPcV8G14Fiq6@bongo-ra.co>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="11863"; posting-host="OC6U9UkZn9R/lnxSpxG5YA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
X-Notice: Filtered by postfilter v. 0.9.2
 by: Spiros Bousbouras - Sun, 15 Aug 2021 12:58 UTC

On Sun, 15 Aug 2021 12:57:24 +0100
Java Jive <java@evij.com.invalid> wrote:
> Now the crunch, when I unzip these on a Linux machine, I see different
> bastardisations of accented characters. So, for example where the full
> 7zip archive when extracted shows an e acute correctly in both a console
> and a file manager listing ...
> "Chat Botté, Le" [e is correctly acute]
> ... (if you're wondering, a French children's picture book version of
> apparently 'Puss In Boots'), while with the WinZip main archive a
> console listing shows a very odd character sequence instead of the e
> acute ...
> "Chat Bott'$'\302\202'', Le"
> ... and a file manager listing has a graphic character resembling a 2x2
> matrix, concerning which note that while \302 octal = \xC2 hex, and
> \202 octal = \x82 hex, only the second of these and not the first
> appears in the symbol:
> |00|
> |82|

You aren't going to get anywhere with using high level tools for this. You
need to go low level and see the values of the actual bytes in the filenames.
So for example something like

ls *Chat* | od -A n -t x1

which will show the bytes in hexadecimal.

> My problem is that I can't find a search term to trap this strange
> character to correct it, for example the following, and a few similar
> that I've tried, don't work because they don't find the directory:
> mv "Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
> mv Chat\ Bott\'$\'\\302\\202\'\',\ Le "Chat Botté, Le"

What directory ? Your post says that some files have strange names. Do also
some directories have strange names ? In any case , the commands above do not
show a directory separator.

--
Who is the poster boy for posters ?

Re: while loop taking input from file via iconv

<20210815151058.4d860435@nx-74205>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=424&group=uk.comp.os.linux#424

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: thoron...@telenet.be (Aragorn)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 15:10:58 +0200
Organization: A noiseless patient Strider
Lines: 38
Message-ID: <20210815151058.4d860435@nx-74205>
References: <sf6h49$15o3$1@gioia.aioe.org>
<ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net>
<inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net>
<sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net>
<sf9afk$q7c$1@dont-email.me>
<sf9heo$a6v$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="63a1de43b084357f66c97d51f5bb615a";
logging-data="17262"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ZJUn6/MP9/Wb/ZU/LmRR4"
Cancel-Lock: sha1:eEf65UuallTlk67w4DyhPeDl0GQ=
X-Newsreader: Claws Mail 4.0.0 (GTK+ 3.24.30; x86_64-pc-linux-gnu)
 by: Aragorn - Sun, 15 Aug 2021 13:10 UTC

On 14.08.2021 at 22:52, William Unruh scribbled:

> On 2021-08-14, Martin Gregorie <martin@mydomain.invalid> wrote:
>
> > apropos 'action name' | less
> >
> > as an aid to finding the command they want and than
> >
> > man commandname
>
> Unfortunately man is really good for reminding someone what various
> things mean, but is pretty bad at teaching anyone how to use the
> command. A much much larger section for each man page with explicity
> examples would go a long way to making "man" useful for newbies (and
> oldies as well).

True, but this is what the GNU "info" command was (supposedly) intended
for. Well, that, and a more markup-style browsing experience, with
embedded links to other locally installed pages.

But then again, not all distributions install "info", and conversely,
I have already come across a distribution -- I seem to remember that it
was an earlier release of either PCLinuxOS or Mageia, but I'm not sure
anymore -- that installed the "info" pages by default but not the "man"
pages. And that was odd, because every UNIX system should at the very
least have "man" installed.

On my Manjaro system here, I have both "man" and "info" installed. The
latter did not come installed by default, but I either way find myself
looking far more at "man" pages than at "info" pages, as the "info"
system is actually quite bulky and a bit overkill -- sort of like the
difference between the simplicity of how the original GRUB worked and
the complexity of how GRUB2 works.

--
With respect,
= Aragorn =

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfba62$1df1$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=425&group=uk.comp.os.linux#425

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 16:00:15 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfba62$1df1$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<0NDDCyPcV8G14Fiq6@bongo-ra.co>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="46561"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: Java Jive - Sun, 15 Aug 2021 15:00 UTC

On 15/08/2021 13:58, Spiros Bousbouras wrote:
>
> On Sun, 15 Aug 2021 12:57:24 +0100
> Java Jive <java@evij.com.invalid> wrote:
>>
>> Now the crunch, when I unzip these on a Linux machine, I see different
>> bastardisations of accented characters. So, for example where the full
>> 7zip archive when extracted shows an e acute correctly in both a console
>> and a file manager listing ...
>> "Chat Botté, Le" [e is correctly acute]
>> ... (if you're wondering, a French children's picture book version of
>> apparently 'Puss In Boots'), while with the WinZip main archive a
>> console listing shows a very odd character sequence instead of the e
>> acute ...
>> "Chat Bott'$'\302\202'', Le"
>> ... and a file manager listing has a graphic character resembling a 2x2
>> matrix, concerning which note that while \302 octal = \xC2 hex, and
>> \202 octal = \x82 hex, only the second of these and not the first
>> appears in the symbol:
>> |00|
>> |82|
>
> You aren't going to get anywhere with using high level tools for this. You
> need to go low level and see the values of the actual bytes in the filenames.
> So for example something like
>
> ls *Chat* | od -A n -t x1
>
> which will show the bytes in hexadecimal.

Thanks again, will look into that.

>> My problem is that I can't find a search term to trap this strange
>> character to correct it, for example the following, and a few similar
>> that I've tried, don't work because they don't find the directory:
>> mv "Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
>> mv Chat\ Bott\'$\'\\302\\202\'\',\ Le "Chat Botté, Le"
>
> What directory ? Your post says that some files have strange names. Do also
> some directories have strange names ? In any case , the commands above do not
> show a directory separator.

As part of my manual investigations of the problem, I changed to the
directory of which the problem directory is a direct sub-directory, to
allow experimentation without having to type tediously extended pathnames.

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: while loop taking input from file via iconv

<sfbaq8$vm0$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=426&group=uk.comp.os.linux#426

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chr...@mshome.net (Chris Elvidge)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 16:11:04 +0100
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <sfbaq8$vm0$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net>
<ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net>
<inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me>
<inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me>
<sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me>
<sfao26$2mr$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 15 Aug 2021 15:11:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fa2752ff8aae1d8b2339d113adcc5d53";
logging-data="32448"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ekSYsGVG79uIaWcOLoyuVw4+vlmdPeCU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
Thunderbird/52.2.1 Lightning/5.4
Cancel-Lock: sha1:u/5q2lIw1gMGbb5ZS/pO59b9rR0=
In-Reply-To: <sfao26$2mr$1@dont-email.me>
Content-Language: en-GB
 by: Chris Elvidge - Sun, 15 Aug 2021 15:11 UTC

On 15/08/2021 10:51 am, Paul wrote:
> Martin Gregorie wrote:
>> On Sat, 14 Aug 2021 22:52:08 +0000, William Unruh wrote:
>>
>>> Says someone, apparently, who has never looked at the command "find", or
>>> may other commands. billions of different option combinations, some of
>>> which work, others of which do not.
>>>
>> Quite right: I don't use it because 'locate' is *much* faster and
>> easier to use, especially if updatedb is run overnight by a cronjob
>>
>> Similarly, 'apropos' is nearly as fast 'locate' since it only has to
>> scan the contents of /usr/share/man/* - and in addition, because its
>> scanning the first line of each manpage, it also matches words of
>> phrases describing what a program does, so often searching on a word
>> or phrase describing what a program does means you a suitable program
>> without knowing its name:
>>
>> $ apropos 'free space'
>> e2freefrag (8) - report free space fragmentation information
>> xfs_spaceman (8) - show free space information about an XFS
>> filesystem
>>
>> $ apropos 'space used'
>> space used: nothing appropriate.
>> $ apropos 'space usage'
>> df (1) - report file system disk space usage
>> du (1) - estimate file space usage
>> du (1p) - estimate file space usage
>>
>> ... and its no use complaining about how well or badly a manpage is
>> written: often the only way to fit would be to submit a manpage patch.
>> Yes, I know some Linux manpages are pretty bad. However, others (those
>> for bash, sort and awk to name but a few) are excellent and most are
>> usable.
>
> This is why you keep a "notes" file.
>
> find /media/somedisk -type d -exec ls -al -1 -d {} + > directories.txt
> find /media/somedisk -type f -exec ls -al -1 {} + > filelist.txt
>
> Any time you put some effort into crafting one, you
> record it for the future.
>
> ./ffmpeg -hwaccel nvdec -i "fedora.mkv" -y -acodec aac -vcodec
> h264_nvenc -crf 23 "output2.mp4"
>
> 16.3x speed, 488FPS
>
> Part of the fun is making them cryptic, so you can't understand them later.
>
> Paul
>

Why not:
find /media/somedisk -type f -ls > filelist.txt

Saves a process (or two)

--
Chris Elvidge
England

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfbbbg$1ulg$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=427&group=uk.comp.os.linux#427

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 16:20:14 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfbbbg$1ulg$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<0NDDCyPcV8G14Fiq6@bongo-ra.co> <sfba62$1df1$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="64176"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: Java Jive - Sun, 15 Aug 2021 15:20 UTC

On 15/08/2021 16:00, Java Jive wrote:
>
> On 15/08/2021 13:58, Spiros Bousbouras wrote:
>>
>> You aren't going to get anywhere with using high level tools for this.
>> You
>> need to go low level and see the values of the actual bytes in the
>> filenames.
>> So for example something like
>>
>>      ls *Chat* | od -A n -t x1
>>
>> which will show the bytes in hexadecimal.
>
> Thanks again, will look into that.

As I suspected from the octal, it's C2 82 ...

For example the first line is (though really I need a fixed font to show
this):
43 68 61 74 20 42 6f 74 74 c2 82 2c 20 4c 65 20
C h a t B o t t , L e

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: while loop taking input from file via iconv

<slrnshie7r.2ge.sc@scarpet42p.localdomain>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=428&group=uk.comp.os.linux#428

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!cleanfeed3-b.proxad.net!nnrp1-2.free.fr!not-for-mail
Newsgroups: alt.os.linux,uk.comp.os.linux
From: sc...@fiat-linux.fr (Stéphane CARPENTIER)
Subject: Re: while loop taking input from file via iconv
References: <sf6h49$15o3$1@gioia.aioe.org>
<ino78bFu5pcU1@mid.individual.net> <ino8oeFueohU1@mid.individual.net>
<inpcg5F6m4oU1@mid.individual.net> <inq7evFc25gU1@mid.individual.net>
<sf92mi$8vh$1@dont-email.me> <inqlgmFer5pU1@mid.individual.net>
<sf9afk$q7c$1@dont-email.me> <sf9heo$a6v$1@dont-email.me>
<sf9l1q$q7c$2@dont-email.me> <87fsvbma46.fsf@LkoBDZeT.terraraq.uk>
<sfb1i6$r12$1@dont-email.me>
Organization: Mulots' Killer
User-Agent: slrn/1.0.3 (Linux)
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Message-ID: <slrnshie7r.2ge.sc@scarpet42p.localdomain>
Date: 15 Aug 2021 15:55:39 GMT
Lines: 29
NNTP-Posting-Date: 15 Aug 2021 17:55:39 CEST
NNTP-Posting-Host: 78.201.248.7
X-Trace: 1629042939 news-4.free.fr 12687 78.201.248.7:51160
X-Complaints-To: abuse@proxad.net
 by: Stéphane CARPENTIER - Sun, 15 Aug 2021 15:55 UTC

Le 15-08-2021, Martin Gregorie <martin@mydomain.invalid> a écrit :
> On Sun, 15 Aug 2021 08:37:45 +0100, Richard Kettlewell wrote:
>
>> Martin Gregorie <martin@mydomain.invalid> writes:
>>> William Unruh wrote:
>>>
>>>> Says someone, apparently, who has never looked at the command "find",
>>>> or may other commands. billions of different option combinations, some
>>>> of which work, others of which do not.
>>>
>>> Quite right: I don't use it because 'locate' is *much* faster and
>>> easier to use, especially if updatedb is run overnight by a cronjob
>>
>> Bad comparison, since it doesn’t do the same thing.
>
> Both look for filenames, but 'find' can be restricted to a directory
> structure - thats about the difference I can see in a quick manpage scan.

Your search looks very quick. I don't use locate but I found its
manpage on Internet. From it, I don't see how to use locate to know
which files have been created recently. For example, I see a lot of
things find can do and not locate.

You can use find to look for files in an usb stick too. I guess running
updatedb on an usb stick may become ugly.

--
Si vous avez du temps à perdre :
https://scarpet42.gitlab.io

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfbm4l$tsp$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=439&group=uk.comp.os.linux#439

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nos...@needed.invalid (Paul)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 14:24:22 -0400
Organization: A noiseless patient Spider
Lines: 163
Message-ID: <sfbm4l$tsp$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 15 Aug 2021 18:24:21 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b49be95aa323b476bf5cc879910381ad";
logging-data="30617"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AoUS86Z6r9bWhsfrN+7tk3qC8sIWaOfo="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:qFncyEBN3up+WjduMoTIHrIxyUs=
In-Reply-To: <sfavf7$r6h$1@gioia.aioe.org>
 by: Paul - Sun, 15 Aug 2021 18:24 UTC

Java Jive wrote:
> On 13/08/2021 20:28, Java Jive wrote:
>> I have the following lines in a shell script ...
>>
>> while [ -n "${LINE}" ]
>> do
>> if [ -n "${LINE} ]
>> then
>> # Do processing
>> fi
>> done < "${DATA}"
>>
>> .... and this works fine for all but two lines in the data file, which
>> contain accented characters. A file erroneously named with an e acute
>> needs to be renamed to have an e grave, and a filename containing an e
>> umlaut needs to be moved to a new location and given a new name.
>
> Uggghhh! The reason for this disgust will become clear shortly!
>
> This is a follow up question about character encodings ...
>
> Previously I have released to my family two versions of the same archive
> of family documents going back to the reign of Queen Anne, some items
> possibly a little earlier. These documents were scanned (1o for
> original scan) and then put through four possible stages of
> post-processing:
> 2n Contrast 'normalised' using pnnorm
> 3t Textcleaned
> 4nt n followed by 3
> 5tn t followed by n
>
> For each document, the best result was copied into the main archive,
> while the above preprocessing stages were left in an '_all'
> sub-directory structure, with five subdirectories named as above, each
> of which having beneath it a directory tree mirroring the main archive.
>
> The main version of the archive, which most family members seem to have
> downloaded, only included the main archive and didn't include the _all
> subdirectory with all the pre-processing results, the full version
> included this directory. IIRC, the former was compressed by WinZip from
> the archive as it existed on a Windows PC at the time, but WinZip threw
> a wobbly over the size of the full archive, so for that I had to use 7zip.
>
> Now the crunch, when I unzip these on a Linux machine, I see different
> bastardisations of accented characters. So, for example where the full
> 7zip archive when extracted shows an e acute correctly in both a console
> and a file manager listing ...
> "Chat Botté, Le" [e is correctly acute]
> ... (if you're wondering, a French children's picture book version of
> apparently 'Puss In Boots'), while with the WinZip main archive a
> console listing shows a very odd character sequence instead of the e
> acute ...
> "Chat Bott'$'\302\202'', Le"
> ... and a file manager listing has a graphic character resembling a 2x2
> matrix, concerning which note that while \302 octal = \xC2 hex, and
> \202 octal = \x82 hex, only the second of these and not the first
> appears in the symbol:
> |00|
> |82|
>
> My problem is that I can't find a search term to trap this strange
> character to correct it, for example the following, and a few similar
> that I've tried, don't work because they don't find the directory:
> mv "Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
> mv Chat\ Bott\'$\'\\302\\202\'\',\ Le "Chat Botté, Le"
>
> I could use a glob wildcard character such as '?', but currently all the
> filenames are within quotes, where globbing doesn't seem to work, and it
> would be a hell of a business removing the quotes, because many names in
> the archive use many characters that would each need to be anticipated
> and escaped for in an unquoted filename, such as spaces, ampersands,
> brackets, etc.
>
> Can anyone suggest a sequence that will find the file, when put inside
> quotes as the filename in the controlling data file mentioned previously
> in the thread, so that it can just be treated like all the other lines?
> As someone here suggested the data file is now stored as UTF-8 rather
> than ANSI as it was formerly, and some example lines are given below in
> a form for easier readability in a ng - in reality the fields are tab
> separated but here are separated by double spacing and have been further
> abbreviated to keep them from wrapping; leading symbols such as '+' and
> '=' have special meanings for the program doing the work; and, yes, the
> commands are basically DOS commands which for Linux are translated to
> their bash equivalents:
>
> =ATTRIB -R "./F H/Close/Sts Mary & John Churchyard Monuments.pdf"
> =RD "./F H /_all/1o/Blessig & Heyder"
> REN "./Chat Bott'$'\302\202'', Le" "Chat Botté, Le"
> MOVE "./Photo - D & M Close.png" "./Photos/D & M Close.png"
> [etc]
>

https://stackoverflow.com/questions/4177783/xc3-xa9-and-other-codes/4177813#4177813

It looks like perhaps this "text string" for the filename,
went through some web encoding at some point. With a hex
editor, I can change C3 A9 to E9 hex, and the character in
the hex editor (on the right hand side) looks visually correct.

https://i.postimg.cc/TP57bLD9/C3-A9-to-E9.gif

You could do such an operation, in Perl, right on the
file system.

*********************** rename2.ps *************************
printf("this is a test\n");

$start = "Chat Bott";
$finish = ", Le";
$naughty1 = <\x{C3}\x{A9}> ;
$naughty2 = <\x{E9}> ;

$x = $start.$finish ;
$y = $start.$naughty1.$finish ;
$z = $start.$naughty2.$finish ;

open(OUT, ">>$x") || die("Cannot create X");
close(OUT);

open(OUT, ">>$y") || die("Cannot create Y");
close(OUT);

open(OUT, ">>$z") || die("Cannot create Z");
close(OUT);

use Cwd;

$c = getcwd ;

printf("Making a mess in %s\n", $c );

#rename( $y , $z );

exit(0);
*********************** end of rename2.ps *************************

I ran this in Windows 11, by double-clicking the file. I
could not run it using one of their terminals. I just thought
it was mildly amusing as to what the filenames looked like.

The idea of the script above, is you run it multiple times,
commenting out a line here or there, while you do your tests.
For example, comment out the creation of file $z and
enable the rename(y,z) command near the bottom, to see
if the created $y can be renamed to the presumed operational $z value.

https://i.postimg.cc/gksLyGFL/rename2-output.gif [Picture]

So far, I only tested it as copy/pasted above. I haven't
tested the rename.

Then, you'd need to pick up a recursive tree ("find-next-file")
type pattern, and look for a filename with $naughty1 in it,
and rename it somehow. Maybe something like one of the
examples here. You would probably need to look for a
substring of $naughty1, in the filenames returned.

https://stackoverflow.com/questions/5089680/how-to-find-files-folders-recursively-in-perl-script

File renaming, is the only thing I've done with Perl :-)
I'll never be a Perl person I guess.

Paul

Re: while loop taking input from file via iconv

<sfbmca$tsp$2@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=440&group=uk.comp.os.linux#440

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nos...@needed.invalid (Paul)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Date: Sun, 15 Aug 2021 14:28:28 -0400
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <sfbmca$tsp$2@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <ino78bFu5pcU1@mid.individual.net> <ino8oeFueohU1@mid.individual.net> <inpcg5F6m4oU1@mid.individual.net> <inq7evFc25gU1@mid.individual.net> <sf92mi$8vh$1@dont-email.me> <inqlgmFer5pU1@mid.individual.net> <sf9afk$q7c$1@dont-email.me> <sf9heo$a6v$1@dont-email.me> <sf9l1q$q7c$2@dont-email.me> <sfao26$2mr$1@dont-email.me> <sfbaq8$vm0$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 15 Aug 2021 18:28:27 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b49be95aa323b476bf5cc879910381ad";
logging-data="30617"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AoPWu+okFlR3w0sF+AkGTP1iHO8+3p0I="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:lucvPhTSt94X+2iKpwsAfvLA1uU=
In-Reply-To: <sfbaq8$vm0$1@dont-email.me>
 by: Paul - Sun, 15 Aug 2021 18:28 UTC

Chris Elvidge wrote:
> On 15/08/2021 10:51 am, Paul wrote:
>> Martin Gregorie wrote:
>>> On Sat, 14 Aug 2021 22:52:08 +0000, William Unruh wrote:
>>>
>>>> Says someone, apparently, who has never looked at the command
>>>> "find", or
>>>> may other commands. billions of different option combinations, some of
>>>> which work, others of which do not.
>>>>
>>> Quite right: I don't use it because 'locate' is *much* faster and
>>> easier to use, especially if updatedb is run overnight by a cronjob
>>>
>>> Similarly, 'apropos' is nearly as fast 'locate' since it only has to
>>> scan the contents of /usr/share/man/* - and in addition, because its
>>> scanning the first line of each manpage, it also matches words of
>>> phrases describing what a program does, so often searching on a word
>>> or phrase describing what a program does means you a suitable program
>>> without knowing its name:
>>>
>>> $ apropos 'free space'
>>> e2freefrag (8) - report free space fragmentation information
>>> xfs_spaceman (8) - show free space information about an XFS
>>> filesystem
>>>
>>> $ apropos 'space used'
>>> space used: nothing appropriate.
>>> $ apropos 'space usage'
>>> df (1) - report file system disk space usage
>>> du (1) - estimate file space usage
>>> du (1p) - estimate file space usage
>>>
>>> ... and its no use complaining about how well or badly a manpage is
>>> written: often the only way to fit would be to submit a manpage patch.
>>> Yes, I know some Linux manpages are pretty bad. However, others
>>> (those for bash, sort and awk to name but a few) are excellent and
>>> most are usable.
>>
>> This is why you keep a "notes" file.
>>
>> find /media/somedisk -type d -exec ls -al -1 -d {} + >
>> directories.txt
>> find /media/somedisk -type f -exec ls -al -1 {} + > filelist.txt
>>
>> Any time you put some effort into crafting one, you
>> record it for the future.
>>
>> ./ffmpeg -hwaccel nvdec -i "fedora.mkv" -y -acodec aac -vcodec
>> h264_nvenc -crf 23 "output2.mp4"
>>
>> 16.3x speed, 488FPS
>>
>> Part of the fun is making them cryptic, so you can't understand them
>> later.
>>
>> Paul
>>
>
> Why not:
> find /media/somedisk -type f -ls > filelist.txt
>
> Saves a process (or two)

At the time, this was a speed optimization.

I have no idea today, whether it's a good idea or not,
but I still use that.

It's also possible it might blow up on a pathological
file tree of some sort ("line too long").

Paul

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfbml8$84k$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=441&group=uk.comp.os.linux#441

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nos...@needed.invalid (Paul)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 14:33:13 -0400
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <sfbml8$84k$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org> <sfbm4l$tsp$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 15 Aug 2021 18:33:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b49be95aa323b476bf5cc879910381ad";
logging-data="8340"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185MjHfC5IeWizqVm3hGdR7/EWLDx+kySw="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:f0WN8JcJMhXhh8Y88grLTSUWYbg=
In-Reply-To: <sfbm4l$tsp$1@dont-email.me>
 by: Paul - Sun, 15 Aug 2021 18:33 UTC

Paul wrote:

>
> I ran this in Windows 11

Now, before everyone gets on my case about where I ran it,
I needed to be able to see what the users see when they
unpack their 7zip in Windows, and whether the filename
looks as intended.

If I did the test purely in Linux, against an NTFS file
system, who knows whether the text string display would
look just like it does on Windows. I'm not a character
set expert and cannot predict what those look like on
the Linux side. It's unlikely at the moment, that
Linux will even mount that file system (MFTMIRR) :-/ Thanks
to Microsoft. Only Fedora could mount it without whining.

It's hardly easy to do anything in a heterogenous
environment now. Like pulling teeth with dull pliers.

Paul

Re: while loop taking input from file via iconv

<sfbrbd$129$1@gonzo.revmaps.no-ip.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=442&group=uk.comp.os.linux#442

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx41.iad.POSTED!not-for-mail
From: use...@revmaps.no-ip.org (Jasen Betts)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: while loop taking input from file via iconv
Organization: JJ's own news server
Message-ID: <sfbrbd$129$1@gonzo.revmaps.no-ip.org>
References: <sf6h49$15o3$1@gioia.aioe.org>
<ino78bFu5pcU1@mid.individual.net> <ino8oeFueohU1@mid.individual.net>
<inpcg5F6m4oU1@mid.individual.net> <inq7evFc25gU1@mid.individual.net>
<sf92mi$8vh$1@dont-email.me> <inqlgmFer5pU1@mid.individual.net>
<sf9afk$q7c$1@dont-email.me> <sf9heo$a6v$1@dont-email.me>
<sf9l1q$q7c$2@dont-email.me> <87fsvbma46.fsf@LkoBDZeT.terraraq.uk>
<sfb1i6$r12$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 15 Aug 2021 19:53:17 -0000 (UTC)
Injection-Info: gonzo.revmaps.no-ip.org; posting-host="localhost:127.0.0.1";
logging-data="1097"; mail-complaints-to="usenet@gonzo.revmaps.no-ip.org"
User-Agent: slrn/1.0.3 (Linux)
X-Face: ?)Aw4rXwN5u0~$nqKj`xPz>xHCwgi^q+^?Ri*+R(&uv2=E1Q0Zk(>h!~o2ID@6{uf8s;a
+M[5[U[QT7xFN%^gR"=tuJw%TXXR'Fp~W;(T"1(739R%m0Yyyv*gkGoPA.$b,D.w:z+<'"=-lVT?6
{T?=R^:W5g|E2#EhjKCa+nt":4b}dU7GYB*HBxn&Td$@f%.kl^:7X8rQWd[NTc"P"u6nkisze/Q;8
"9Z{peQF,w)7UjV$c|RO/mQW/NMgWfr5*$-Z%u46"/00mx-,\R'fLPe.)^
Lines: 26
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Sun, 15 Aug 2021 20:00:45 UTC
Date: Sun, 15 Aug 2021 19:53:17 -0000 (UTC)
X-Received-Bytes: 2540
 by: Jasen Betts - Sun, 15 Aug 2021 19:53 UTC

On 2021-08-15, Martin Gregorie <martin@mydomain.invalid> wrote:
> On Sun, 15 Aug 2021 08:37:45 +0100, Richard Kettlewell wrote:
>
>> Martin Gregorie <martin@mydomain.invalid> writes:
>>> William Unruh wrote:
>>>
>>>> Says someone, apparently, who has never looked at the command "find",
>>>> or may other commands. billions of different option combinations, some
>>>> of which work, others of which do not.
>>>
>>> Quite right: I don't use it because 'locate' is *much* faster and
>>> easier to use, especially if updatedb is run overnight by a cronjob
>>
>> Bad comparison, since it doesn’t do the same thing.
>
> Both look for filenames, but 'find' can be restricted to a directory
> structure - thats about the difference I can see in a quick manpage scan.

find can search accoding to user, group, permission, age, size, type ...
find can launch other commands, and delete files.

locate only looks at filenames.

--
Jasen.

Re: Character Encoding (Was: while loop taking input from file via iconv )

<intb9vF13nqU1@mid.individual.net>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=443&group=uk.comp.os.linux#443

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: use...@example.net (J.O. Aho)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 22:21:19 +0200
Lines: 34
Message-ID: <intb9vF13nqU1@mid.individual.net>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<sfbm4l$tsp$1@dont-email.me> <sfbml8$84k$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net OXipZDF46LbCgOYPrG3kkgkqxR5XkeVWbAj4NKR8l+QcPKqafR
Cancel-Lock: sha1:myiSX7LdF88pWHaxlwwLioILAVE=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
In-Reply-To: <sfbml8$84k$1@dont-email.me>
Content-Language: en-US-large
 by: J.O. Aho - Sun, 15 Aug 2021 20:21 UTC

On 15/08/2021 20.33, Paul wrote:

> If I did the test purely in Linux, against an NTFS file
> system, who knows whether the text string display would
> look just like it does on Windows. I'm not a character
> set expert and cannot predict what those look like on
> the Linux side.

As long the systems has the same charset, then there shouldn't be any
differences, this do not just apply to Linux but other operating systems
as microsoft windows.

> It's unlikely at the moment, that
> Linux will even mount that file system (MFTMIRR) :-/ Thanks
> to Microsoft. Only Fedora could mount it without whining.

Much depends on the ntfs module loaded, the current in kernel ntfs
support is crappy and still used by some distributions, but most do have
support for the ntfs-3g driver, just you may install it manually.
The good news is that this driver will be in the kernel in a near future.

Mounting BitLock encrypted file systems can also be done on the Linux,
just in case you need to access files from your work computers harddrive.

Had been nice to see an in kernel exFat support too, but I doubt
microsoft has need of that in their Linux distributions, so I doubt they
will provide a driver.

--

//Aho

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfc0r9$1709$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=444&group=uk.comp.os.linux#444

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!f3Ja+IUlF3LLNCdyvqay1w.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Sun, 15 Aug 2021 23:27:02 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sfc0r9$1709$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="39945"; posting-host="f3Ja+IUlF3LLNCdyvqay1w.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: it
 by: jak - Sun, 15 Aug 2021 21:27 UTC

Il 15/08/2021 13:57, Java Jive ha scritto:
> On 13/08/2021 20:28, Java Jive wrote:
>> I have the following lines in a shell script ...
>>
>> while [ -n "${LINE}" ]
>>      do
>>          if [ -n "${LINE} ]
>>              then
>>                  # Do processing
>>          fi
>>      done < "${DATA}"
>>
>> .... and this works fine for all but two lines in the data file, which
>> contain accented characters. A file erroneously named with an e acute
>> needs to be renamed to have an e grave, and a filename containing an e
>> umlaut needs to be moved to a new location and given a new name.
>
> Uggghhh!  The reason for this disgust will become clear shortly!
>
> This is a follow up question about character encodings ...
>
> Previously I have released to my family two versions of the same archive
> of family documents going back to the reign of Queen Anne, some items
> possibly a little earlier.  These documents were scanned (1o for
> original scan) and then put through four possible stages of
> post-processing:
>     2n    Contrast 'normalised' using pnnorm
>     3t    Textcleaned
>     4nt    n followed by 3
>     5tn    t followed by n
>
> For each document, the best result was copied into the main archive,
> while the above preprocessing stages were left in an '_all'
> sub-directory structure, with five subdirectories named as above, each
> of which having beneath it a directory tree mirroring the main archive.
>
> The main version of the archive, which most family members seem to have
> downloaded, only included the main archive and didn't include the _all
> subdirectory with all the pre-processing results, the full version
> included this directory.  IIRC, the former was compressed by WinZip from
> the archive as it existed on a Windows PC at the time, but WinZip threw
> a wobbly over the size of the full archive, so for that I had to use 7zip.
>
> Now the crunch, when I unzip these on a Linux machine, I see different
> bastardisations of accented characters.  So, for example where the full
> 7zip archive when extracted shows an e acute correctly in both a console
> and a file manager listing ...
>     "Chat Botté, Le"    [e is correctly acute]
> ... (if you're wondering, a French children's picture book version of
> apparently 'Puss In Boots'), while with the WinZip main archive a
> console listing shows a very odd character sequence instead of the e
> acute ...
>     "Chat Bott'$'\302\202'', Le"
> ... and a file manager listing has a graphic character resembling a 2x2
> matrix, concerning which note that while \302 octal = \xC2 hex,  and
> \202 octal = \x82 hex, only the second of these and not the first
> appears in the symbol:
>     |00|
>     |82|
>
> My problem is that I can't find a search term to trap this strange
> character to correct it, for example the following, and a few similar
> that I've tried, don't work because they don't find the directory:
>     mv "Chat Bott'$'\302\202'', Le"    "Chat Botté, Le"
>     mv Chat\ Bott\'$\'\\302\\202\'\',\ Le "Chat Botté, Le"
>
> I could use a glob wildcard character such as '?', but currently all the
> filenames are within quotes, where globbing doesn't seem to work, and it
> would be a hell of a business removing the quotes, because many names in
> the archive use many characters that would each need to be anticipated
> and escaped for in an unquoted filename, such as spaces, ampersands,
> brackets, etc.
>
> Can anyone suggest a sequence that will find the file, when put inside
> quotes as the filename in the controlling data file mentioned previously
> in the thread, so that it can just be treated like all the other lines?
> As someone here suggested the data file is now stored as UTF-8 rather
> than ANSI as it was formerly, and some example lines are given below in
> a form for easier readability in a ng  -  in reality the fields are tab
> separated but here are separated by double spacing and have been further
> abbreviated to keep them from wrapping; leading symbols such as '+' and
> '=' have special meanings for the program doing the work; and, yes, the
> commands are basically DOS commands which for Linux are translated to
> their bash equivalents:
>
> =ATTRIB -R  "./F H/Close/Sts Mary & John Churchyard Monuments.pdf"
> =RD  "./F H /_all/1o/Blessig & Heyder"
> REN  "./Chat Bott'$'\302\202'', Le"  "Chat Botté, Le"
> MOVE  "./Photo - D & M Close.png" "./Photos/D & M Close.png"
> [etc]
>

Hi,
you could use the find command looking for filenames as a regular
expression, then use the command you need on them.
In this example I search for files with the extension ".o", display the
name with the command 'echo' and display it again converted to
uppercase:

find . -iregex ".*\.o$" -exec bash -c "echo -n original: {} && echo \"
modified: {}\" | tr [a-z] [A-Z]}" \;

There should be everything you need.

cheers

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfdaqc$1hq3$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=451&group=uk.comp.os.linux#451

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Mon, 16 Aug 2021 10:23:23 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfdaqc$1hq3$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<sfc0r9$1709$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="51011"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: Java Jive - Mon, 16 Aug 2021 09:23 UTC

On 15/08/2021 22:27, jak wrote:
>
> Il 15/08/2021 13:57, Java Jive ha scritto:
>>
>> Can anyone suggest a sequence that will find the file, when put inside
>> quotes as the filename in the controlling data file mentioned
>> previously in the thread, so that it can just be treated like all the
>> other lines? As someone here suggested the data file is now stored as
>> UTF-8 rather than ANSI as it was formerly, and some example lines are
>> given below in a form for easier readability in a ng  -  in reality
>> the fields are tab separated but here are separated by double spacing
>> and have been further abbreviated to keep them from wrapping; leading
>> symbols such as '+' and '=' have special meanings for the program
>> doing the work; and, yes, the commands are basically DOS commands
>> which for Linux are translated to their bash equivalents:
>>
>> =ATTRIB -R  "./F H/Close/Sts Mary & John Churchyard Monuments.pdf"
>> =RD  "./F H /_all/1o/Blessig & Heyder"
>> REN  "./Chat Bott'$'\302\202'', Le"  "Chat Botté, Le"
>> MOVE  "./Photo - D & M Close.png" "./Photos/D & M Close.png"
>> [etc]
>>
>
> Hi,
> you could use the find command looking for filenames as a regular
> expression, then use the command you need on them.
> In this example I search for files with the extension ".o", display the
> name with the command 'echo' and display it again converted to
> uppercase:
>
>  find . -iregex ".*\.o$" -exec bash -c "echo -n original: {} && echo \"
>    modified: {}\" | tr [a-z] [A-Z]}" \;
>
> There should be everything you need.

Thanks but no, that doesn't work. I had considered, before the script
works through the data file, of running a pre-process to find and rename
all these characters, but neither find nor ls will actually find the
erroneous characters *DIRECTLY*. The best either can do is find the
characters either side, but that means I have to know in advance where
all the problems are, and I'm not sure yet that I do. Really, if I'm
going to go down that road, I need a way of searching the entire archive
structure directly for affected files and renaming them, as a separate
process from working through the data file.

So, for example, this works because I'm specifying and finding the
neighbouring characters of one known instance, not because ls is finding
the oddball characters directly ...
ls Chat\ Bott?,\ Le | sed 's~\xc2\x82~é~g'
.... whereas these don't, with neither single nor double backslashes nor
various other combinations that I've tried, because neither find nor ls
seem able to find the oddball characters directly:
find . -regex ".*\\xc2\\x82.*"
ls -R *\\xc2\\x82*
ls -R *'$'\\302\\202''*

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: Character Encoding (Was: while loop taking input from file via iconv )

<zlhr6A1+YDCIh3bGX@bongo-ra.co>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=453&group=uk.comp.os.linux#453

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!OC6U9UkZn9R/lnxSpxG5YA.user.46.165.242.91.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via iconv )
Date: Mon, 16 Aug 2021 12:47:01 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <zlhr6A1+YDCIh3bGX@bongo-ra.co>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org> <sfc0r9$1709$1@gioia.aioe.org>
<sfdaqc$1hq3$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="41736"; posting-host="OC6U9UkZn9R/lnxSpxG5YA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Mon, 16 Aug 2021 12:47 UTC

On Mon, 16 Aug 2021 10:23:23 +0100
Java Jive <java@evij.com.invalid> wrote:
> So, for example, this works because I'm specifying and finding the
> neighbouring characters of one known instance, not because ls is finding
> the oddball characters directly ...
> ls Chat\ Bott?,\ Le | sed 's~\xc2\x82~é~g'
> ... whereas these don't, with neither single nor double backslashes nor
> various other combinations that I've tried, because neither find nor ls
> seem able to find the oddball characters directly:
> find . -regex ".*\\xc2\\x82.*"
> ls -R *\\xc2\\x82*
> ls -R *'$'\\302\\202''*

Try ls -R *$'\302\202'*

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfdsve$9bg$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=454&group=uk.comp.os.linux#454

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Mon, 16 Aug 2021 15:33:15 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfdsve$9bg$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<sfc0r9$1709$1@gioia.aioe.org> <sfdaqc$1hq3$1@gioia.aioe.org>
<zlhr6A1+YDCIh3bGX@bongo-ra.co>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="9584"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: Java Jive - Mon, 16 Aug 2021 14:33 UTC

On 16/08/2021 13:47, Spiros Bousbouras wrote:
> On Mon, 16 Aug 2021 10:23:23 +0100
> Java Jive <java@evij.com.invalid> wrote:
>> So, for example, this works because I'm specifying and finding the
>> neighbouring characters of one known instance, not because ls is finding
>> the oddball characters directly ...
>> ls Chat\ Bott?,\ Le | sed 's~\xc2\x82~é~g'
>> ... whereas these don't, with neither single nor double backslashes nor
>> various other combinations that I've tried, because neither find nor ls
>> seem able to find the oddball characters directly:
>> find . -regex ".*\\xc2\\x82.*"
>> ls -R *\\xc2\\x82*
>> ls -R *'$'\\302\\202''*
>
> Try ls -R *$'\302\202'*

No luck with that either ...
ls: cannot access '*'$'\302\202''*': No such file or directory

I think the trouble with all these methods is that they are specifying a
succession of two characters, where as far as unicode is concerned the
oddballs are single characters, so I fear they will never match, no
matter what magic incantation is used.

So I've been looking at putting in the following as a hack around. It's
designed to search for wildcards in the file name coming from the data
file, and if one is found, escape all the other 'dodgy' characters in it
and use it without quotes, but, although everything *LOOKS* as though it
should work, it gives an error message at the final file testing if
statement:

Before, works except for filenames containing wildcard characters:

if [ -n "${Debug}" ]
then
echo "CE3 = ${CE3}"
fi
if [ "${CE3/./}" != "${CE3}" ]
then
# Is file spec
if [ ! -f "${CE3}" ] # Note quotes
then
if [ -n "${Debug}" ]
then
echo "WARNING - File '${CE3}' does not exist!"
fi
Result=1
CE2=""
fi
else
# Is path spec
if [ ! -d "${CE3}" ] # Note quotes
then
if [ -n "${Debug}" ]
then
echo "WARNING - Directory '${CE3}' does not exist!"
fi
Result=1
CE2=""
fi
fi

After, try to escape wildcard containing filenames:

# Need to remove any enclosing quotes at this stage
while [ "${CE3:0:1}" == "'" ] && [ "${CE3: -1:1}" == "'" ]
do
CE3="${CE3:1:${#CE3}-2}"
done
while [ "${CE3:0:1}" == "\"" ] && [ "${CE3: -1:1}" == "\"" ]
do
CE3="${CE3:1:${#CE3}-2}"
done
# Check for wildcard chars
if [ "${CE3/\?/}" != "${CE3}" ] || [ "${CE3/\*/}" != "${CE3}" ]
then
# Wildcards, cannot quote, so escape difficult characters
CE3=$(echo "${CE3}" | sed "s~\([ #&'(),;-]}\)~\\\\\1~g")
else
# No wildcards, enclose in quotes
CE3="\"${CE3}\""
fi
if [ -n "${Debug}" ]
then
echo "CE3 = ${CE3}"
# Example output here seems correct, for example ...
# CE3 = ... /Newscuttings\ \-\ Wedding\ Of\ Zo?\ <Surname>.png
fi
if [ "${CE3/./}" != "${CE3}" ]
then
# Is file spec
if [ ! -f ${CE3} ] # Note no quotes
# ... but errors here: <scriptname>: line <num>: [: too many arguments
then
if [ -n "${Debug}" ]
then
echo "WARNING - File '${CE3}' does not exist!"
fi
Result=1
CE2=""
fi
else
# Is path spec
if [ ! -d ${CE3} ] # Note no quotes
# ... and here: <scriptname>: line <num>: [: too many arguments
then
if [ -n "${Debug}" ]
then
echo "WARNING - Directory '${CE3}' does not exist!"
fi
Result=1
CE2=""
fi
fi

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfe1v0$u60$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=455&group=uk.comp.os.linux#455

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mar...@mydomain.invalid (Martin Gregorie)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Mon, 16 Aug 2021 15:58:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <sfe1v0$u60$1@dont-email.me>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<sfc0r9$1709$1@gioia.aioe.org> <sfdaqc$1hq3$1@gioia.aioe.org>
<zlhr6A1+YDCIh3bGX@bongo-ra.co> <sfdsve$9bg$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 16 Aug 2021 15:58:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ff94f311d8e737fbb646f0f4b3ea57fe";
logging-data="30912"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18YjJ31jAorKYosmnUODLOtfHdeD5PnRak="
User-Agent: Pan/0.146 (Hic habitat felicitas; 8107378
git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:N+tESgQ5INGOoINhDo2lgwqeygI=
 by: Martin Gregorie - Mon, 16 Aug 2021 15:58 UTC

On Mon, 16 Aug 2021 15:33:15 +0100, Java Jive wrote:

> No luck with that either ...
> ls: cannot access '*'$'\302\202''*': No such file or directory
>
Might be worth writing a noddy Java program to see if it can resolve your
problem character codes.

The Java 'char' primitive can hold multibyte character values. and the
Character() class provides methods to recognise character types, lengths,
and non-Unicode characters.

--
--
Martin | martin at
Gregorie | gregorie dot org

Re: Character Encoding (Was: while loop taking input from file via iconv )

<sfe3mo$1h6t$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=456&group=uk.comp.os.linux#456

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!8YXKAhSo8fMBpI0CH1QWtw.user.46.165.242.75.POSTED!not-for-mail
From: jav...@evij.com.invalid (Java Jive)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Mon, 16 Aug 2021 17:28:06 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sfe3mo$1h6t$1@gioia.aioe.org>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
<sfc0r9$1709$1@gioia.aioe.org> <sfdaqc$1hq3$1@gioia.aioe.org>
<zlhr6A1+YDCIh3bGX@bongo-ra.co> <sfdsve$9bg$1@gioia.aioe.org>
<sfe1v0$u60$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="50397"; posting-host="8YXKAhSo8fMBpI0CH1QWtw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101
Thunderbird/68.4.2
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: Java Jive - Mon, 16 Aug 2021 16:28 UTC

On 16/08/2021 16:58, Martin Gregorie wrote:
> On Mon, 16 Aug 2021 15:33:15 +0100, Java Jive wrote:
>
>> No luck with that either ...
>> ls: cannot access '*'$'\302\202''*': No such file or directory
>>
> Might be worth writing a noddy Java program to see if it can resolve your
> problem character codes.
>
> The Java 'char' primitive can hold multibyte character values. and the
> Character() class provides methods to recognise character types, lengths,
> and non-Unicode characters.

But I can't be sure that any of the target machines will have Java,
Perl, or Python installed. This has to be achieved with what will
normally be installed on a Linux or MacOS box.

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

Re: Character Encoding (Was: while loop taking input from file via iconv )

<inviqqFeqbiU1@mid.individual.net>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=457&group=uk.comp.os.linux#457

  copy link   Newsgroups: alt.os.linux uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!news2.arglkargh.de!news.karotte.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: use...@andyburns.uk (Andy Burns)
Newsgroups: alt.os.linux,uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )
Date: Mon, 16 Aug 2021 17:42:01 +0100
Lines: 13
Message-ID: <inviqqFeqbiU1@mid.individual.net>
References: <sf6h49$15o3$1@gioia.aioe.org> <sfavf7$r6h$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net B1tWp62+VftxxS32Nf+NJQX9XNrQRZYos0mptv4dV9r49xifu6
Cancel-Lock: sha1:4UB8fwr9Iv4lPs7Huwrm9movqGU=
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Content-Language: en-GB
In-Reply-To: <sfavf7$r6h$1@gioia.aioe.org>
 by: Andy Burns - Mon, 16 Aug 2021 16:42 UTC

Java Jive wrote:

> console listing shows a very odd character sequence instead of the e
> acute ...
>     "Chat Bott'$'\302\202'', Le"

Are you sure the filename is exactly as you say/think? What does

ls -b

show?

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor