Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Ahead warp factor one, Mr. Sulu.


computers / news.software.readers / Re: Dialog signature delimiter

SubjectAuthor
* Re: Dialog signature delimiterbill
`* Re: Dialog signature delimiterBernd Rose
 `* Re: Dialog signature delimiterbill
  `* Re: Dialog signature delimiterBernd Rose
   `- Re: Dialog signature delimiterbill

1
Re: Dialog signature delimiter

<s5nsda$1lat$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=359&group=news.software.readers#359

  copy link   Newsgroups: news.software.readers
Path: i2pn2.org!i2pn.org!aioe.org!tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org.POSTED!not-for-mail
From: bil...@spam.invalid (bill)
Newsgroups: news.software.readers
Subject: Re: Dialog signature delimiter
Date: Wed, 21 Apr 2021 02:42:28 +0200
Organization: Aioe.org NNTP Server
Lines: 39
Message-ID: <s5nsda$1lat$1@gioia.aioe.org>
References: <s5ilpb$kd9$1@gioia.aioe.org> <s5ir5o$7qt$1@tncsrv09.home.tnetconsulting.net> <s5irtv$ene$1@gioia.aioe.org> <s5iu6t$84h$1@tncsrv09.home.tnetconsulting.net> <w1ab8v93kvce$.dlg@b.rose.tmpbox.news.arcor.de> <s5jvec$5s7$1@gioia.aioe.org> <17rf3wjx02quj$.dlg@b.rose.tmpbox.news.arcor.de> <s5klfl$1ue5$1@gioia.aioe.org> <gr2vcjkxmucj.dlg@b.rose.tmpbox.news.arcor.de>
NNTP-Posting-Host: tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.41 (51e03d8d.9.298)
X-Notice: Filtered by postfilter v. 0.9.2
 by: bill - Wed, 21 Apr 2021 00:42 UTC

On Mon, 19 Apr 2021 22:01:33 +0200, Bernd Rose wrote:

> Notepad++ doesn't come into play, here.

Notepad++ solves the problem but it's an extra step I was trying to remove.
It replaces the HTML character "&#x201C;" with the HTML character "&quot;".

> You send a message from Dialog Editor.

Thanks. I was hoping maybe Dialog could be set to edit with Notepad++.
Or that Dialog could post process to substitute characters as defined.

> And depending on the characters found (in the current message),
> Dialog checks its encoding settings and formats the text to 7-Bit-ASCII,
> Windows-1250 or its siblings, CP850 or its siblings, UTF-7, UTF-8,...

It's a formatting problem that the punctuation isn't consistent when it
comes from multiple sources (some of which use these "pretty" characters
while others don't).

It is best solved in the text editor.
Or cleaned up just before sending the message.

> Every message /may/ be sent with a different encoding.

The encoding isn't the problem so much as the inconsistent punctuation.

Some pasted results from multiple web pages have fancy HTML punctuation
while others do not.

I couldn't get the syntax to work even on the simplest conversion (that of
the HTML smart quote to an HTML straight quote) so I'll just give up as
Dialog seems like a nice editor even without this feature.

Replace the HTML character "&#x201C;" with the HTML character "&quot;".

--
Why do pencils shave?
To look sharp.

Re: Dialog signature delimiter

<1bdu7zohmx63b$.dlg@b.rose.tmpbox.news.arcor.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=361&group=news.software.readers#361

  copy link   Newsgroups: news.software.readers
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: b.rose.t...@arcor.de (Bernd Rose)
Newsgroups: news.software.readers
Subject: Re: Dialog signature delimiter
Date: Wed, 21 Apr 2021 06:21:08 +0200
Message-ID: <1bdu7zohmx63b$.dlg@b.rose.tmpbox.news.arcor.de>
References: <s5ilpb$kd9$1@gioia.aioe.org> <s5ir5o$7qt$1@tncsrv09.home.tnetconsulting.net> <s5irtv$ene$1@gioia.aioe.org> <s5iu6t$84h$1@tncsrv09.home.tnetconsulting.net> <w1ab8v93kvce$.dlg@b.rose.tmpbox.news.arcor.de> <s5jvec$5s7$1@gioia.aioe.org> <17rf3wjx02quj$.dlg@b.rose.tmpbox.news.arcor.de> <s5klfl$1ue5$1@gioia.aioe.org> <gr2vcjkxmucj.dlg@b.rose.tmpbox.news.arcor.de> <s5nsda$1lat$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Info: solani.org;
logging-data="1074"; mail-complaints-to="abuse@news.solani.org"
User-Agent: 40tude_Dialog/2.0.15.41 (a162f0b9.252.344)
Cancel-Lock: sha1:wGZ+idF7ivIAz22V+BdC60hDuBY=
X-User-ID: eJwNycEBwCAIA8CVBAnIOGhg/xHa+x62i78whxsG03q4uRNSog2PgTBJreYlePwp7Eg+Nhbu0KScuZn9V6FD7WbWxKL0+QDobhqX
 by: Bernd Rose - Wed, 21 Apr 2021 04:21 UTC

On Wed, 21st Apr 2021 02:42:28 +0200, bill wrote:

>> Every message /may/ be sent with a different encoding.
>
> The encoding isn't the problem so much as the inconsistent punctuation.

Best-fit-selected charset and encoding /are/ your problems.

Internally, 40tude Dialog uses 16 bit Unicode for its edit buffer. So,
whatever you paste into the edit window will be translated to Unicode
characters, first. (Even the most simple ASCII-7 characters.) But the
OnBeforeSending script fires up /after/ the charset is selected, the
text is converted to the best-fit target charset, and the message is
already encoded.

Therefore, any replacement rule in OnBeforeSending needs to take into
account /every/ charset and encoding style not explicitly forbidden in
your 40tude Dialog settings. - Like I already said: The Umlaut-script
makes assumptions about these settings, which are not advisable in the
first place. - Except, when the ones using 40tude Dialog with this script
carefully only write and copy/paste with a narrow, closely defined set
of characters that map to only /one and always the same/ charset.

Bernd

Re: Dialog signature delimiter

<s5q5i8$mc2$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=365&group=news.software.readers#365

  copy link   Newsgroups: news.software.readers
Path: i2pn2.org!i2pn.org!aioe.org!tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org.POSTED!not-for-mail
From: bil...@spam.invalid (bill)
Newsgroups: news.software.readers
Subject: Re: Dialog signature delimiter
Date: Wed, 21 Apr 2021 23:30:57 +0200
Organization: Aioe.org NNTP Server
Lines: 55
Message-ID: <s5q5i8$mc2$1@gioia.aioe.org>
References: <s5ilpb$kd9$1@gioia.aioe.org> <s5ir5o$7qt$1@tncsrv09.home.tnetconsulting.net> <s5irtv$ene$1@gioia.aioe.org> <s5iu6t$84h$1@tncsrv09.home.tnetconsulting.net> <w1ab8v93kvce$.dlg@b.rose.tmpbox.news.arcor.de> <s5jvec$5s7$1@gioia.aioe.org> <17rf3wjx02quj$.dlg@b.rose.tmpbox.news.arcor.de> <s5klfl$1ue5$1@gioia.aioe.org> <gr2vcjkxmucj.dlg@b.rose.tmpbox.news.arcor.de> <s5nsda$1lat$1@gioia.aioe.org> <1bdu7zohmx63b$.dlg@b.rose.tmpbox.news.arcor.de>
NNTP-Posting-Host: tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.41 (51e03d8d.9.298)
X-Notice: Filtered by postfilter v. 0.9.2
 by: bill - Wed, 21 Apr 2021 21:30 UTC

On Wed, 21 Apr 2021 06:21:08 +0200, Bernd Rose wrote:

> Best-fit-selected charset and encoding /are/ your problems.

It's complicated so I don't know if I understand yet but I think I
understand your final point which is I'm not the one to make it work. ;-/
I can always stick with using Notepad++ then as my intermediary message
editor whenever I'm pasting together cites from multiple websites which have
inconsistent punctuation (eg smart quotes plus straight quotes) because
Notepad++ can convert curly quotes to straight quotes (and lots of others
that I added one by one to the shortcuts.xml file as I found them ).

In trying to understand your advice I had to look up what unicode was which
is essentially an encoding that handles hundreds of thousands of characters.

What I gather from what you imparted on me was that when I paste text copied
from various web pages into Dialog, the internal Dialog editor translates
everything I paste into 16 bit unicode characters.

For example a curly quote ends up as a 16-bit unicode curly quote.
And a straight quote ends up as a 16-bit unicode straight quote.

At this point the text is inconsistent (curly mixed with straight) even as
the encoding is consistent from what you said (ie it's all unicode now).

Normally just before sending I would cut and paste this into Notepad++ and
then I run the conversion of all odd characters to basic characters which
gives me the consistency that I want. Then I cut and paste that consistent
text back into Dialog and send the message.

Since you said the OnBeforeSending script fires up /after/ the message is
already encoded, I was just hoping the OnBeforeSending script could convert
the now unicode curly quotes to straight quotes in unicode.

I admit I don't understand why Dialog can't do that but I do understand your
overall advice which is that it's very difficult and therefore not a task
I'm equipped to do (since I don't even understand why it can't work yet).

I'll just give up but I do appreciate that you helped advise me (I think I'm
way out of my league as I don't understand why the encoding matters since
it's all unicode at the time that I need to convert curly quotes to straight
quotes).

Nonetheless, it's ok to have Notepad++ in the middle which does the
conversion for me of the unicode curly quotes to the unicode straight
quotes.

-1- I cut and paste into the Dialog editor from a variety of web pages
-2- Just before sending I cut and paste the Dialog message into Notepad++
-3- Notepad++ converts unicode curly quotes to unicode straight quotes
-4- I paste that now consistent punctuation into Dialog to send off
--
Why can't pencils move?
Because they are stationery

Re: Dialog signature delimiter

<1dcrggb5j969v$.dlg@b.rose.tmpbox.news.arcor.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=373&group=news.software.readers#373

  copy link   Newsgroups: news.software.readers
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: b.rose.t...@arcor.de (Bernd Rose)
Newsgroups: news.software.readers
Subject: Re: Dialog signature delimiter
Date: Thu, 22 Apr 2021 20:50:23 +0200
Message-ID: <1dcrggb5j969v$.dlg@b.rose.tmpbox.news.arcor.de>
References: <s5ilpb$kd9$1@gioia.aioe.org> <s5ir5o$7qt$1@tncsrv09.home.tnetconsulting.net> <s5irtv$ene$1@gioia.aioe.org> <s5iu6t$84h$1@tncsrv09.home.tnetconsulting.net> <w1ab8v93kvce$.dlg@b.rose.tmpbox.news.arcor.de> <s5jvec$5s7$1@gioia.aioe.org> <17rf3wjx02quj$.dlg@b.rose.tmpbox.news.arcor.de> <s5klfl$1ue5$1@gioia.aioe.org> <gr2vcjkxmucj.dlg@b.rose.tmpbox.news.arcor.de> <s5nsda$1lat$1@gioia.aioe.org> <1bdu7zohmx63b$.dlg@b.rose.tmpbox.news.arcor.de> <s5q5i8$mc2$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Injection-Info: solani.org;
logging-data="6832"; mail-complaints-to="abuse@news.solani.org"
User-Agent: 40tude_Dialog/2.0.15.41 (6ddb0c0c.108.456)
Cancel-Lock: sha1:KgbTC0BZR/SYj2enSoDXYGwalGg=
X-User-ID: eJwNysEBwCAIA8CVBAyScQRh/xHaex/Mxetsh28MJqx17VbfXJx+rDZDsYGgUfie9N/HihWcvJK0bPBePapJROcZvqpYFh/Drhn7
 by: Bernd Rose - Thu, 22 Apr 2021 18:50 UTC

On Wed, 21st Apr 2021 23:30:57 +0200, bill wrote:

[Multisource Copy&Paste: Snipped process description, correct assumptions]
> Since you said the OnBeforeSending script fires up /after/ the message is
> already encoded, I was just hoping the OnBeforeSending script could convert
> the now unicode curly quotes to straight quotes in unicode.

At the time when OnBeforeSending fires up, the text will be encoded in about
any imaginable charset, but /not/ in 16-bit Unicode.

As you type (or Copy&Paste), 40tude Dialog continuously analyzes the text
buffer. It checks, which characters are contained in the text buffer and
compares them against the list of permitted charsets:
Settings -> General_Settings -> Charsets -> Use_best_matching_charset_out_of

It internally reduces the list of charsets to those, that have the /least/
number of (different) characters contained in the text buffer, which can not
be encoded. As long, as you leave Unicode-encoding charsets like UTF-7 and
UTF-8 in this list, the number of un-encodable characters should be zero.
From the remaining list of charsets, it picks the first.

When you watch the "Status" field in the upper area of the Compose Editor
window, you see the charset adjusting to the text you type. Until now,
my current message shows "us-ascii", because it contains no special chars,
at all. When I type the German umlaut ä, though, the charset listed in the
Status field immediately changes to "iso-8859-1" (aka Latin-1), which is
the first completely fitting charset in /my/ charset list. In your case,
a different charset might have been selected. If I insert curly quotation
marks („…“), the charset listed in Status changes again. (Now to UTF-8 in
my case.)

OnBeforeSending would have encountered "us-ascii" formatted text, when I
would have sent only the first couple of lines, "iso-8859-1" later on, and
finally UTF-8.

If the script contained a replacement rule from ä to ae, that is consistent
to "iso-8859-1", it would say: Replace byte code 0xE4 with ae. Consistent
to UTF-8, OTOH, it needs to replace the 2-byte code 0xC3 0xA4.

To avoid creating garbage by the replacements, OnBeforeSending would need
to detect the current charset /and/ have replacement rules for every charset
not explicitly disabled. Moreover, the script code would need to detect and
spare all header lines and any encoded attachment of the message.

One solution for the charset detection problem would be, to disable all
but one charset (from the list of permitted charsets). If you ensure, that
you always type plain characters from the range of the lower 7-bit ASCII
characters, this would be fine. But extremely unlikely. Therefore, the
fixed charset would need to be ready for virtually any character. This
just leaves UTF-7 and UTF-8. (Both have their own problems when used in
40tude Dialog with very high Unicode character planes. - But that's a
different story and solvable with a set of specific scripts.)

But if you set UTF-7 or UTF-8 as your only permitted charset, you post
most of your messages with unnecessary overhead. This is - historically -
massively frowned upon and will get you loads of "friendly suggestions"
to configure your Newsreader in a more appropriate way.

Apart from such hints, you'd still need to set up all those replacement
rules. To make things worse, both UTF-7 and UTF-8 are variable wide
character encoding schemes. Your replacement rules would not only need
to replace sometimes one, sometimes more bytes for a single character.
The replacement function would need to evaluate the whole text stream
to not mix up multi-byte position start points.

I strongly suggest to leave this idea be...
Bernd

Re: Dialog signature delimiter

<s5shs8$1c2t$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=374&group=news.software.readers#374

  copy link   Newsgroups: news.software.readers
Path: i2pn2.org!i2pn.org!aioe.org!tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org.POSTED!not-for-mail
From: bil...@spam.invalid (bill)
Newsgroups: news.software.readers
Subject: Re: Dialog signature delimiter
Date: Thu, 22 Apr 2021 21:13:21 +0200
Organization: Aioe.org NNTP Server
Lines: 123
Message-ID: <s5shs8$1c2t$1@gioia.aioe.org>
References: <s5ilpb$kd9$1@gioia.aioe.org> <s5ir5o$7qt$1@tncsrv09.home.tnetconsulting.net> <s5irtv$ene$1@gioia.aioe.org> <s5iu6t$84h$1@tncsrv09.home.tnetconsulting.net> <w1ab8v93kvce$.dlg@b.rose.tmpbox.news.arcor.de> <s5jvec$5s7$1@gioia.aioe.org> <17rf3wjx02quj$.dlg@b.rose.tmpbox.news.arcor.de> <s5klfl$1ue5$1@gioia.aioe.org> <gr2vcjkxmucj.dlg@b.rose.tmpbox.news.arcor.de> <s5nsda$1lat$1@gioia.aioe.org> <1bdu7zohmx63b$.dlg@b.rose.tmpbox.news.arcor.de> <s5q5i8$mc2$1@gioia.aioe.org> <1dcrggb5j969v$.dlg@b.rose.tmpbox.news.arcor.de>
NNTP-Posting-Host: tKMeCZpOoHFkQLaFKeAqow.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.41 (51e03d8d.9.298)
X-Notice: Filtered by postfilter v. 0.9.2
 by: bill - Thu, 22 Apr 2021 19:13 UTC

On Thu, 22 Apr 2021 20:50:23 +0200, Bernd Rose wrote:

> At the time when OnBeforeSending fires up, the text will be encoded in about
> any imaginable charset, but /not/ in 16-bit Unicode.

Oh my. OK. That's my fault for not understanding that key complication!
I apologize that this is all very confusing to me.
It just goes to prove I'm way out of my league and I need to let it rest.

> As you type (or Copy&Paste), 40tude Dialog continuously analyzes the text
> buffer. It checks, which characters are contained in the text buffer and
> compares them against the list of permitted charsets:
> Settings -> General_Settings -> Charsets -> Use_best_matching_charset_out_of

Oh. I see. I think. OK. That's why it's complicated I guess.

The character set in Dialog's editor can keep changing as I paste from
various web sites (depending on what's used by those various web sites).

> It internally reduces the list of charsets to those, that have the /least/
> number of (different) characters contained in the text buffer, which can not
> be encoded. As long, as you leave Unicode-encoding charsets like UTF-7 and
> UTF-8 in this list, the number of un-encodable characters should be zero.
> From the remaining list of charsets, it picks the first.

Interesting that it picks the first out of the remainders.
Nice to know.

> When you watch the "Status" field in the upper area of the Compose Editor
> window, you see the charset adjusting to the text you type. Until now,
> my current message shows "us-ascii", because it contains no special chars,
> at all.

I had not even noticed that "Status" field just below the Subject,
Newsgroup, and Email to headers! Thanks for pointing that out.

Mine says "Not yet sent (Charset utf-8)" at this very instant.

> In your case, a different charset might have been selected.

Thanks for explaining that it depends on what is in my list.
And in what order.

I just removed all your curly quotes and the Status changed instantly to
"Not yet sent (Charset iso8859-1)" as you implied it would.

I removed just now the German characters you typed and the Status field
instantly changed to "Not yet sent (Charset us-ascii)" (as you said it
would).

> To avoid creating garbage by the replacements, OnBeforeSending would need
> to detect the current charset /and/ have replacement rules for every charset
> not explicitly disabled.

Yikes. I get it now.

I had not realized the Dialog editor character set kept changing depending
on the characters that were in the actual message (and further depending on
what was available in the encoding list and even further what order that
list was in).

You're right. It's way out of my league for sure!

> Moreover, the script code would need to detect and
> spare all header lines and any encoded attachment of the message.

Aurgh. Even worse. I see. It gets more complex by the minute. :-0

> One solution for the charset detection problem would be, to disable all
> but one charset (from the list of permitted charsets). If you ensure, that
> you always type plain characters from the range of the lower 7-bit ASCII
> characters, this would be fine. But extremely unlikely.

Hmmmmmmm... I do NOT need fancy characters.
I only need what I can type on a standard keyboard.
I wonder if that might work for me?

> Therefore, the
> fixed charset would need to be ready for virtually any character. This
> just leaves UTF-7 and UTF-8. (Both have their own problems when used in
> 40tude Dialog with very high Unicode character planes. But that's a
> different story and solvable with a set of specific scripts.)

I don't know what a "character plane" is but I could get away with just the
English alphabet plus basic punctuation.

> But if you set UTF-7 or UTF-8 as your only permitted charset, you post
> most of your messages with unnecessary overhead. This is - historically -
> massively frowned upon and will get you loads of "friendly suggestions"
> to configure your Newsreader in a more appropriate way.

Oops. Oh well. There goes that idea. ;-/

> Apart from such hints, you'd still need to set up all those replacement
> rules. To make things worse, both UTF-7 and UTF-8 are variable wide
> character encoding schemes. Your replacement rules would not only need
> to replace sometimes one, sometimes more bytes for a single character.
> The replacement function would need to evaluate the whole text stream
> to not mix up multi-byte position start points.

Ayayyai. I see. It's too complex for me.
Way too complex for me.

There's no way it would be worth the effort.
Even if I knew how to program.

> I strongly suggest to leave this idea be...

Wow. Thank you for that detailed explanation.
I strongly agree with you.

I learned a lot about how Dialog works (didn't even know about that Status)
where it's sending this (so far) as "Not yet sent (Charset us-ascii)" and
I'll leave it at that.

Let's drop this train of thought.
Thanks for all your patient advice (I accept everything you said!)

Regards,
bill
--
There were too many pencils that they made a whole state.
It was named Pencilvania.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor