Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Love at first sight is one of the greatest labor-saving devices the world has ever seen.


aus+uk / uk.comp.os.linux / Re: Character Encoding (Was: while loop taking input from file via ico

SubjectAuthor
o Re: Character Encoding (Was: while loop taking input from file viajak

1
Re: Character Encoding (Was: while loop taking input from file via ico

<1362714765@f1.n221.z2.fidonet.fi>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=484&group=uk.comp.os.linux#484

  copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!jHXdSDKPKExtdJMCjskdCQ.user.46.165.242.75.POSTED!not-for-mail
From: jak...@f1.n221.z2.fidonet.fi (jak)
Newsgroups: uk.comp.os.linux
Subject: Re: Character Encoding (Was: while loop taking input from file via
ico
Date: Tue, 17 Aug 2021 23:37:10 +0200
Organization: rbb soupgate
Message-ID: <1362714765@f1.n221.z2.fidonet.fi>
References: <2520393506@f0.n0.z0.fidonet.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="54380"; posting-host="jHXdSDKPKExtdJMCjskdCQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-MailConverter: SoupGate-OS/2 v1.20
X-Comment-To: All
X-Notice: Filtered by postfilter v. 0.9.2
 by: jak - Tue, 17 Aug 2021 21:37 UTC

Subject: Re: Character Encoding (Was: while loop taking input from file via
iconv )

Il 17/08/2021 14:52, Java Jive ha scritto:
> On 15/08/2021 12:57, Java Jive wrote:
>>
>> Can anyone suggest a sequence that will find the file, when put inside
>> quotes as the filename in the controlling data file mentioned
>> previously in the thread, so that it can just be treated like all the
>> other lines? As someone here suggested the data file is now stored as
>> UTF-8 rather than ANSI as it was formerly, and some example lines are
>> given below in a form for easier readability in a ng  -  in reality
>> the fields are tab separated but here are separated by double spacing
>> and have been further abbreviated to keep them from wrapping; leading
>> symbols such as '+' and '=' have special meanings for the program
>> doing the work; and, yes, the commands are basically DOS commands
>> which for Linux are translated to their bash equivalents:
>>
>> =ATTRIB -R  "./F H/Close/Sts Mary & John Churchyard Monuments.pdf"
>> =RD  "./F H /_all/1o/Blessig & Heyder"
>> REN  "./Chat Bott'$'\302\202'', Le"  "Chat Botté, Le"
>> MOVE  "./Photo - D & M Close.png" "./Photos/D & M Close.png"
>> [etc]
>
> I've completely fixed the problem with the following code inserted
> before processing the data file.  Thanks for all the help here that
> enabled me to do this.  It'll wrap of course, sorry can't help that,
> beyond reducing the tabs to two spaces:
>
> # Search for WinZip's botched accented characters
> # in the main download of v1: MacFarlane-Main.zip
> # 35 pathnames affected, botched characters are:
> #    Intended    Stored incorrectly as
> #    Char        Octal        Hex
> #    é (acute)    \302\202    \xC2\x82
> #    ë (diaeresis)    \302\211    \xC2\x89
> #    è (grave)    \302\212    \xC2\x8A
> #    Á (acute)    µ
>
> OLDIFS=${IFS} # Normally IFS=$' \t\n'
> IFS=$'\n'
> LASTREN=""
> for A in $(ls -1bR | grep -E '(:|µ|\\[0-7]{3}\\[0-7]{3})')
>   do
>     if [ -n "${Debug}" ]
>       then
>         echo "A = \"${A}\""
>     fi
>     if [ "${A: -1}" == ":" ]
>       then
>         THISDIR="${A/:/}"
>         if [ "${THISDIR}" == "${LASTREN/ -> .*/}" ]
>           then
>             THISDIR="${LASTREN/.* -> /}"
>         fi
>         if [ -n "${Debug}" ]
>           then
>             echo "THISDIR = \"${THISDIR}\""
>         fi
>       else
>         SC="${A}"
>         DS="${A}"
>         while [ -n "$(echo \"${SC}\" | grep -E
> '(µ|\\[0-7]{3}\\[0-7]{3})')" ]
>           do
>             case $(echo "${SC}" | sed -E
> 's~^.*(µ|\\[0-7]{3}\\[0-7]{3}).*$~\1~') in
>               "µ")         # A acute
>                     SC="${SC//µ/?}"
>                     DS="${DS//µ/Á}"
>                     ;;
>               "\302\202")  # e acute
>                     SC="${SC//\\302\\202/?}"
>                     DS="${DS//\\302\\202/é}"
>                     ;;
>               "\302\211")  # e diaeresis
>                     SC="${SC//\\302\\211/?}"
>                     DS="${DS//\\302\\211/ë}"
>                     ;;
>               "\302\212")  # e grave
>                     SC="${SC//\\302\\212/?}"
>                     DS="${DS//\\302\\212/è}"
>                     ;;
>             esac
>           done
>
>         DS="${DS//\\/}"
>         pushd "${THISDIR}"
>         echo "mv ${SC} \"${DS}\""
>         if [ -z "${Dummy}" ]
>           then
>             mv ${SC} "${DS}"
>         fi
>         popd
>
>         # Remember rename in case it's a directory containing others
>         LASTREN="${THISDIR}/${A//\\ / } -> ${THISDIR}/${DS}"
>         if [ -n "${Debug}" ]
>           then
>             echo "LASTREN = \"${LASTREN}\""
>         fi
>
>     fi
>   done
> IFS=${OLDIFS}
>

Just because I had also tried to write a version of the script shell:

These are the files I created for testing:

$ ls -1 jak/foo*
'jak/foo'$'\302\202'
'jak/foo'$'\302\202\302\202'
'jak/foo'$'\302\202\302\211'
'jak/foo'$'\302\212\302\202'
'jak/foo'$'\302\212\302\202''foo'

This is the result of the script:

$ ./renbadch
mv "./jak/foo\302\202" "./jak/fooé"
mv "./jak/foo\302\202\302\202" "./jak/fooéé"
mv "./jak/foo\302\202\302\211" "./jak/fooéë"
mv "./jak/foo\302\212\302\202" "./jak/fooèé"
mv "./jak/foo\302\212\302\202foo" "./jak/fooèéfoo"

This is the code:

#! /usr/bin/bash

regex='([^\\]*[^0-7]*)(\\[0-7]{3})(\\[0-7]{3})'

while read -r ll
do
orig=$ll
transl=""
while [[ $ll =~ $regex ]]
do
start=${BASH_REMATCH[1]}
goodch=$(printf %d ${BASH_REMATCH[3]:1})
newch=$(echo -e "\0${goodch}" | iconv -f 'CP863' -t
'UTF-8')
transl="${transl}${start}${newch}"
m=${BASH_REMATCH[0]}
ll=${ll##*"$m"}
done
echo "mv \"${orig}\" \"${transl}${ll}\""
done < <(find . -type f -exec ls -1b {} + | egrep '\\[0-7]{3}\\[0-7]{3}')

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor