Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Marriage is the only adventure open to the cowardly. -- Voltaire


devel / comp.lang.c++ / unicode string manipulation and file I/O

SubjectAuthor
* unicode string manipulation and file I/OMarioCCCP
+* Re: unicode string manipulation and file I/OBonita Montero
|`- Re: unicode string manipulation and file I/OMarioCCCP
`* Re: unicode string manipulation and file I/OJames Kuyper
 `- Re: unicode string manipulation and file I/OMarioCCCP

1
unicode string manipulation and file I/O

<ufhg0a$3l3bg$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=2039&group=comp.lang.c%2B%2B#2039

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: NoliMihi...@libero.it (MarioCCCP)
Newsgroups: comp.lang.c++
Subject: unicode string manipulation and file I/O
Date: Tue, 3 Oct 2023 18:37:28 +0200
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <ufhg0a$3l3bg$1@dont-email.me>
Reply-To: MarioCCCP@CCCP.MIR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 3 Oct 2023 16:37:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1739fa3b94c906a53509bec3edfde88";
logging-data="3837296"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qm+fvtdGRRU3Ud2O2JabD"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:f5REwf9AoJYJafzrYaPZhM+4+7U=
Content-Language: en-GB, it-IT
 by: MarioCCCP - Tue, 3 Oct 2023 16:37 UTC

I have been away from C++ for too long and I am very
confused (particularly with the differences between wchar_t
strings and unicode strings, like UTF-8).

I'd need to load / save text files of non-Ascii strings
(containing actual unicode codepoints not represented as
"entities" but as actual characters, of variable size).

does the standard library contain suitable functions for
UFT-8 unicode string manipulation and input/output on files ?

I am trying to use QString from QT ... but frustratingly
produces compile errors IN THEIR headers (not yet in my own
code), and I can't find what I am missing. So I was also
looking for some more standard and portable solution.

Tnx for any advice. Pls also mention relevant headers, if any

tnx again

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
MarioCPPP

Re: unicode string manipulation and file I/O

<ufhg5a$3l5i7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=2040&group=comp.lang.c%2B%2B#2040

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: unicode string manipulation and file I/O
Date: Tue, 3 Oct 2023 18:40:13 +0200
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <ufhg5a$3l5i7$1@dont-email.me>
References: <ufhg0a$3l3bg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 3 Oct 2023 16:40:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="485eca7af7972ec2e41e655d59aee6d8";
logging-data="3839559"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mmXPOlP0Vf24PMQ7ytW5hhcpOyfyDKRw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:cak0y5S3JdgOscar4H/vNVM0PMU=
Content-Language: de-DE
In-Reply-To: <ufhg0a$3l3bg$1@dont-email.me>
 by: Bonita Montero - Tue, 3 Oct 2023 16:40 UTC

Am 03.10.2023 um 18:37 schrieb MarioCCCP:
>
>
> I have been away from C++ for too long and I am very confused
> (particularly with the differences between wchar_t strings and unicode
> strings, like UTF-8).
>
> I'd need to load / save text files of non-Ascii strings (containing
> actual unicode codepoints not represented as "entities" but as actual
> characters, of variable size).
>
> does the standard library contain suitable functions for UFT-8 unicode
> string manipulation and input/output on files ?
>
> I am trying to use QString from QT ... but frustratingly produces
> compile errors IN THEIR headers (not yet in my own code), and I can't
> find what I am missing. So I was also looking for some more standard and
> portable solution.
>
> Tnx for any advice. Pls also mention relevant headers, if any

Maybe this helps:
https://stackoverflow.com/questions/4775437/read-unicode-utf-8-file-into-wstring

Re: unicode string manipulation and file I/O

<ufhlm6$3mapv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=2041&group=comp.lang.c%2B%2B#2041

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: NoliMihi...@libero.it (MarioCCCP)
Newsgroups: comp.lang.c++
Subject: Re: unicode string manipulation and file I/O
Date: Tue, 3 Oct 2023 20:14:30 +0200
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <ufhlm6$3mapv$1@dont-email.me>
References: <ufhg0a$3l3bg$1@dont-email.me> <ufhg5a$3l5i7$1@dont-email.me>
Reply-To: MarioCCCP@CCCP.MIR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 3 Oct 2023 18:14:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1739fa3b94c906a53509bec3edfde88";
logging-data="3877695"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19f8FxX7N/tYda+IGMlSYSR"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:P8qsYhQ99dJtKmzvJz2PWupS3EQ=
Content-Language: en-GB, it-IT
In-Reply-To: <ufhg5a$3l5i7$1@dont-email.me>
 by: MarioCCCP - Tue, 3 Oct 2023 18:14 UTC

On 03/10/23 18:40, Bonita Montero wrote:
> Am 03.10.2023 um 18:37 schrieb MarioCCCP:
>>
>>
>> I have been away from C++ for too long and I am very
>> confused (particularly with the differences between
>> wchar_t strings and unicode strings, like UTF-8).
>>
>> I'd need to load / save text files of non-Ascii strings
>> (containing actual unicode codepoints not represented as
>> "entities" but as actual characters, of variable size).
>>
>> does the standard library contain suitable functions for
>> UFT-8 unicode string manipulation and input/output on files ?
>>
>> I am trying to use QString from QT ... but frustratingly
>> produces compile errors IN THEIR headers (not yet in my
>> own code), and I can't find what I am missing. So I was
>> also looking for some more standard and portable solution.
>>
>> Tnx for any advice. Pls also mention relevant headers, if any
>
> Maybe this helps:
> https://stackoverflow.com/questions/4775437/read-unicode-utf-8-file-into-wstring
>
>

I'll have a look, tnx !
Ciao

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
MarioCPPP

Re: unicode string manipulation and file I/O

<ufim4l$s1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=2043&group=comp.lang.c%2B%2B#2043

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuy...@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c++
Subject: Re: unicode string manipulation and file I/O
Date: Tue, 3 Oct 2023 23:28:21 -0400
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <ufim4l$s1$1@dont-email.me>
References: <ufhg0a$3l3bg$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 4 Oct 2023 03:28:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4964839f88640b496d9837a0f375a145";
logging-data="897"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+thrks13Vs2h7qF/KkPx7XT15PmqvVlg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:EkcMJF6ngNHzZXT2LVwOFhErD/8=
In-Reply-To: <ufhg0a$3l3bg$1@dont-email.me>
Content-Language: en-US
 by: James Kuyper - Wed, 4 Oct 2023 03:28 UTC

On 10/3/23 12:37, MarioCCCP wrote:
>
>
> I have been away from C++ for too long and I am very confused
> (particularly with the differences between wchar_t strings and unicode
> strings, like UTF-8).

The execution character set has a multibyte encoding that is used by
most standard library routines that that take arguments as strings of
[[un]signed] char. It could use UTF-8 (or even UTF-16, if CHAR_BIT >=
16), but it is not required to do so.

wchar_t uses a fixed-length encoding capable of encoding every supported
character. It could have a Unicode encoding, either UTF-32, or UCS-2 if
the set of supported characters is sufficiently restricted, but it is
not required to.

The C++ standard incorporates by reference functions for converting
between those encodings:
The <cwchar> header declares routines for converting between wchar_t and
multi-byte strings: std::mbstowcs(), std::wcstombs(), std::wbtowc(), and
std::wctomb(). std::wcrtomb() std::mnbrtowc(), std::mbstowcs(),
std::wcsrtombs(). Their definitions are not in the C++ standard itself,
but incorporated by reference from the C standard.

char8_t, char16_t, and char32_t are newer typedefs used to store strings
in UTF-8, UTF-16, or UTF-32 format.

The <cuchar> header defines routines for converting between the
multi-byte and any of the UTF-N encodings, with names like mbrtoc8() and
c8rtomb() (and similarly for c16 and 32). Their definitions are also
incorporated by reference from the C standard.

Combining those routines, you can use multi-byte strings as an
intermediary to convert between any pair of the other encodings
mentioned above.

Re: unicode string manipulation and file I/O

<ufk3js$cj06$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=2047&group=comp.lang.c%2B%2B#2047

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: NoliMihi...@libero.it (MarioCCCP)
Newsgroups: comp.lang.c++
Subject: Re: unicode string manipulation and file I/O
Date: Wed, 4 Oct 2023 18:24:25 +0200
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <ufk3js$cj06$1@dont-email.me>
References: <ufhg0a$3l3bg$1@dont-email.me> <ufim4l$s1$1@dont-email.me>
Reply-To: MarioCCCP@CCCP.MIR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 4 Oct 2023 16:24:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4d2da97603eea7c53d46b967ea757b3d";
logging-data="412678"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19dTfcX2ZreR22c4OZhMYO2"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:OmDX26HDrUtMGfZ0ZNdIj0NCR0Q=
Content-Language: en-GB, it-IT
In-Reply-To: <ufim4l$s1$1@dont-email.me>
 by: MarioCCCP - Wed, 4 Oct 2023 16:24 UTC

On 04/10/23 05:28, James Kuyper wrote:
> On 10/3/23 12:37, MarioCCCP wrote:
>>
>> I have been away from C++ for too long and I am very confused
>> (particularly with the differences between wchar_t strings and unicode
>> strings, like UTF-8).
> The execution character set has a multibyte encoding that is used by
> most standard library routines that that take arguments as strings of
> [[un]signed] char. It could use UTF-8 (or even UTF-16, if CHAR_BIT >=
> 16), but it is not required to do so.
>
> wchar_t uses a fixed-length encoding capable of encoding every supported
> character. It could have a Unicode encoding, either UTF-32, or UCS-2 if
> the set of supported characters is sufficiently restricted, but it is
> not required to.
>
> The C++ standard incorporates by reference functions for converting
> between those encodings:
> The <cwchar> header declares routines for converting between wchar_t and
> multi-byte strings: std::mbstowcs(), std::wcstombs(), std::wbtowc(), and
> std::wctomb(). std::wcrtomb() std::mnbrtowc(), std::mbstowcs(),
> std::wcsrtombs(). Their definitions are not in the C++ standard itself,
> but incorporated by reference from the C standard.
>
> char8_t, char16_t, and char32_t are newer typedefs used to store strings
> in UTF-8, UTF-16, or UTF-32 format.
>
> The <cuchar> header defines routines for converting between the
> multi-byte and any of the UTF-N encodings, with names like mbrtoc8() and
> c8rtomb() (and similarly for c16 and 32). Their definitions are also
> incorporated by reference from the C standard.
>
> Combining those routines, you can use multi-byte strings as an
> intermediary to convert between any pair of the other encodings
> mentioned above.

tnx for the detailed compendium. I'll try to go through all
this !
Ciao

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
MarioCPPP


devel / comp.lang.c++ / unicode string manipulation and file I/O

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor