Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Linux - Where do you want to fly today? -- Unknown source


devel / comp.lang.forth / Re: Counting frequencies of unique words

SubjectAuthor
* Re: Counting frequencies of unique wordsDoug Hoffman
+* Re: Counting frequencies of unique wordsNN
|`- Re: Counting frequencies of unique wordsGerry Jackson
`- Re: Counting frequencies of unique wordsP Falth

1
Re: Counting frequencies of unique words

<610e7c57$0$693$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14134&group=comp.lang.forth#14134

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
From: dhoffman...@gmail.com (Doug Hoffman)
Subject: Re: Counting frequencies of unique words
Newsgroups: comp.lang.forth
References: <57548ac6-9f34-480d-a589-0e12913df32dn@googlegroups.com>
<0631fed3-7c3f-4caa-85ee-15dac3450f4an@googlegroups.com>
<2021Mar14.225533@mips.complang.tuwien.ac.at>
<7a444a11-fa7c-45e8-815c-90b3e867e2abn@googlegroups.com>
<09ac4d63-03db-4f97-95d5-5f102c630dddn@googlegroups.com>
<2021Mar19.172724@mips.complang.tuwien.ac.at>
<8f931e69-c277-4e7c-ae5a-01469275dccbn@googlegroups.com>
<2021Mar20.095626@mips.complang.tuwien.ac.at>
<c5a4fe8b-ff52-49be-87b2-926d3b003fcan@googlegroups.com>
<2021Mar20.130637@mips.complang.tuwien.ac.at>
<13c21120-95fc-459c-872e-09852169ad64n@googlegroups.com>
<07fb6122-ef79-4070-9248-afe12cb75b44n@googlegroups.com>
<558ea4ff-abc3-4101-aed7-524bab47caafn@googlegroups.com>
<f8d096f3-3fc1-4806-bb70-d8489312cb8bn@googlegroups.com>
<7140992d-31cc-4889-adb3-7e7566b6def0n@googlegroups.com>
<4149d058-673f-4b10-b5c8-ebae0f0497b5n@googlegroups.com>
<f0e2618d-c67a-4828-a4da-65c550d6d8b2n@googlegroups.com>
X-Mozilla-News-Host: news://news.sunsite.dk
Date: Sat, 7 Aug 2021 08:28:00 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <f0e2618d-c67a-4828-a4da-65c550d6d8b2n@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Lines: 20
Message-ID: <610e7c57$0$693$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: ec70a0d9.news.sunsite.dk
X-Trace: 1628339287 news.sunsite.dk 693 glidedog@gmail.com/68.55.82.126:65144
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Sat, 7 Aug 2021 12:28 UTC

On 3/27/21 7:41 AM, P Falth wrote:
> When a char
> is read from the input buffer it is immediately converted to lower case
> ( a simple $20 OR instead of a UTF8 conversion )

For ascii 91 thru 95 I get something that looks wrong using $20 OR.
What am I missing?

Btw, it made no difference in the unique words count in the
kjvbible.txt file. Maybe that's why you used it.

Thanks.
-Doug

: go ( c --) dup emit $20 or emit ;
91 go [{
92 go \|
93 go ]}
94 go ^~
95 go _

Re: Counting frequencies of unique words

<db4a810d-57a8-4b78-8a2c-f03a44e970a4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14135&group=comp.lang.forth#14135

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ad4:5bc7:: with SMTP id t7mr15656029qvt.10.1628341104593;
Sat, 07 Aug 2021 05:58:24 -0700 (PDT)
X-Received: by 2002:ac8:7118:: with SMTP id z24mr12740959qto.319.1628341104369;
Sat, 07 Aug 2021 05:58:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sat, 7 Aug 2021 05:58:24 -0700 (PDT)
In-Reply-To: <610e7c57$0$693$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23c5:6f04:d001:4146:6662:db50:a5be;
posting-account=9A5f7goAAAD_QfJPZnlK3Xq_UhzYjdP-
NNTP-Posting-Host: 2a00:23c5:6f04:d001:4146:6662:db50:a5be
References: <57548ac6-9f34-480d-a589-0e12913df32dn@googlegroups.com>
<0631fed3-7c3f-4caa-85ee-15dac3450f4an@googlegroups.com> <2021Mar14.225533@mips.complang.tuwien.ac.at>
<7a444a11-fa7c-45e8-815c-90b3e867e2abn@googlegroups.com> <09ac4d63-03db-4f97-95d5-5f102c630dddn@googlegroups.com>
<2021Mar19.172724@mips.complang.tuwien.ac.at> <8f931e69-c277-4e7c-ae5a-01469275dccbn@googlegroups.com>
<2021Mar20.095626@mips.complang.tuwien.ac.at> <c5a4fe8b-ff52-49be-87b2-926d3b003fcan@googlegroups.com>
<2021Mar20.130637@mips.complang.tuwien.ac.at> <13c21120-95fc-459c-872e-09852169ad64n@googlegroups.com>
<07fb6122-ef79-4070-9248-afe12cb75b44n@googlegroups.com> <558ea4ff-abc3-4101-aed7-524bab47caafn@googlegroups.com>
<f8d096f3-3fc1-4806-bb70-d8489312cb8bn@googlegroups.com> <7140992d-31cc-4889-adb3-7e7566b6def0n@googlegroups.com>
<4149d058-673f-4b10-b5c8-ebae0f0497b5n@googlegroups.com> <f0e2618d-c67a-4828-a4da-65c550d6d8b2n@googlegroups.com>
<610e7c57$0$693$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <db4a810d-57a8-4b78-8a2c-f03a44e970a4n@googlegroups.com>
Subject: Re: Counting frequencies of unique words
From: november...@gmail.com (NN)
Injection-Date: Sat, 07 Aug 2021 12:58:24 +0000
Content-Type: text/plain; charset="UTF-8"
 by: NN - Sat, 7 Aug 2021 12:58 UTC

On Saturday, 7 August 2021 at 13:28:09 UTC+1, Doug Hoffman wrote:
> On 3/27/21 7:41 AM, P Falth wrote:
> > When a char
> > is read from the input buffer it is immediately converted to lower case
> > ( a simple $20 OR instead of a UTF8 conversion )
>
> For ascii 91 thru 95 I get something that looks wrong using $20 OR.
> What am I missing?
>
> Btw, it made no difference in the unique words count in the
> kjvbible.txt file. Maybe that's why you used it.
>
> Thanks.
> -Doug
>
> : go ( c --) dup emit $20 or emit ;
> 91 go [{
> 92 go \|
> 93 go ]}
> 94 go ^~
> 95 go _

'$20 OR' always sets the 5th bit which is why he chose lower case.

I use :-

: ?upper ( c -- f ) 'A' 'Z' 1+ within ;
: ?lower ( c -- f ) 'a' 'z' 1+ within ;

: toupper ( c -- c ) dup ?lower if 32 xor then ;
: tolower ( c -- c ) dup ?upper if 32 xor then ;

: test 127 32 do cr i . space i emit space i toupper emit loop ; test
: test 127 32 do cr i . space i emit space i tolower emit loop ; test

Re: Counting frequencies of unique words

<f926f776-32e6-4fb7-9a0a-d9c3754d5ab3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14136&group=comp.lang.forth#14136

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:b16:: with SMTP id 22mr15310010qkl.38.1628341485604;
Sat, 07 Aug 2021 06:04:45 -0700 (PDT)
X-Received: by 2002:a05:622a:305:: with SMTP id q5mr12688838qtw.154.1628341485322;
Sat, 07 Aug 2021 06:04:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sat, 7 Aug 2021 06:04:45 -0700 (PDT)
In-Reply-To: <610e7c57$0$693$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=79.45.33.116; posting-account=ryzhhAoAAAAIqf1uqmG9E4uP1Bagd-k2
NNTP-Posting-Host: 79.45.33.116
References: <57548ac6-9f34-480d-a589-0e12913df32dn@googlegroups.com>
<0631fed3-7c3f-4caa-85ee-15dac3450f4an@googlegroups.com> <2021Mar14.225533@mips.complang.tuwien.ac.at>
<7a444a11-fa7c-45e8-815c-90b3e867e2abn@googlegroups.com> <09ac4d63-03db-4f97-95d5-5f102c630dddn@googlegroups.com>
<2021Mar19.172724@mips.complang.tuwien.ac.at> <8f931e69-c277-4e7c-ae5a-01469275dccbn@googlegroups.com>
<2021Mar20.095626@mips.complang.tuwien.ac.at> <c5a4fe8b-ff52-49be-87b2-926d3b003fcan@googlegroups.com>
<2021Mar20.130637@mips.complang.tuwien.ac.at> <13c21120-95fc-459c-872e-09852169ad64n@googlegroups.com>
<07fb6122-ef79-4070-9248-afe12cb75b44n@googlegroups.com> <558ea4ff-abc3-4101-aed7-524bab47caafn@googlegroups.com>
<f8d096f3-3fc1-4806-bb70-d8489312cb8bn@googlegroups.com> <7140992d-31cc-4889-adb3-7e7566b6def0n@googlegroups.com>
<4149d058-673f-4b10-b5c8-ebae0f0497b5n@googlegroups.com> <f0e2618d-c67a-4828-a4da-65c550d6d8b2n@googlegroups.com>
<610e7c57$0$693$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f926f776-32e6-4fb7-9a0a-d9c3754d5ab3n@googlegroups.com>
Subject: Re: Counting frequencies of unique words
From: peter.m....@gmail.com (P Falth)
Injection-Date: Sat, 07 Aug 2021 13:04:45 +0000
Content-Type: text/plain; charset="UTF-8"
 by: P Falth - Sat, 7 Aug 2021 13:04 UTC

On Saturday, 7 August 2021 at 14:28:09 UTC+2, Doug Hoffman wrote:
> On 3/27/21 7:41 AM, P Falth wrote:
> > When a char
> > is read from the input buffer it is immediately converted to lower case
> > ( a simple $20 OR instead of a UTF8 conversion )
>
> For ascii 91 thru 95 I get something that looks wrong using $20 OR.
> What am I missing?
>
> Btw, it made no difference in the unique words count in the
> kjvbible.txt file. Maybe that's why you used it.

Yes I checked that and found that it worked

Peter

>
> Thanks.
> -Doug
>
> : go ( c --) dup emit $20 or emit ;
> 91 go [{
> 92 go \|
> 93 go ]}
> 94 go ^~
> 95 go _

Re: Counting frequencies of unique words

<sep2c9$iqt$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14146&group=comp.lang.forth#14146

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: do-not-...@swldwa.uk (Gerry Jackson)
Newsgroups: comp.lang.forth
Subject: Re: Counting frequencies of unique words
Date: Sun, 8 Aug 2021 17:56:42 +0100
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <sep2c9$iqt$1@dont-email.me>
References: <57548ac6-9f34-480d-a589-0e12913df32dn@googlegroups.com>
<2021Mar14.225533@mips.complang.tuwien.ac.at>
<7a444a11-fa7c-45e8-815c-90b3e867e2abn@googlegroups.com>
<09ac4d63-03db-4f97-95d5-5f102c630dddn@googlegroups.com>
<2021Mar19.172724@mips.complang.tuwien.ac.at>
<8f931e69-c277-4e7c-ae5a-01469275dccbn@googlegroups.com>
<2021Mar20.095626@mips.complang.tuwien.ac.at>
<c5a4fe8b-ff52-49be-87b2-926d3b003fcan@googlegroups.com>
<2021Mar20.130637@mips.complang.tuwien.ac.at>
<13c21120-95fc-459c-872e-09852169ad64n@googlegroups.com>
<07fb6122-ef79-4070-9248-afe12cb75b44n@googlegroups.com>
<558ea4ff-abc3-4101-aed7-524bab47caafn@googlegroups.com>
<f8d096f3-3fc1-4806-bb70-d8489312cb8bn@googlegroups.com>
<7140992d-31cc-4889-adb3-7e7566b6def0n@googlegroups.com>
<4149d058-673f-4b10-b5c8-ebae0f0497b5n@googlegroups.com>
<f0e2618d-c67a-4828-a4da-65c550d6d8b2n@googlegroups.com>
<610e7c57$0$693$14726298@news.sunsite.dk>
<db4a810d-57a8-4b78-8a2c-f03a44e970a4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 8 Aug 2021 16:56:41 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f7d38dd30d84bc4854ece520ea9eb59e";
logging-data="19293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xKa/nCp0HqUcDeCb3I+bQMcNT7eFVL94="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:PSbbxjUvirQ4vPaE5zSm9VkAiGM=
In-Reply-To: <db4a810d-57a8-4b78-8a2c-f03a44e970a4n@googlegroups.com>
Content-Language: en-GB
 by: Gerry Jackson - Sun, 8 Aug 2021 16:56 UTC

On 07/08/2021 13:58, NN wrote:
> On Saturday, 7 August 2021 at 13:28:09 UTC+1, Doug Hoffman wrote:
>> On 3/27/21 7:41 AM, P Falth wrote:
>>> When a char
>>> is read from the input buffer it is immediately converted to lower case
>>> ( a simple $20 OR instead of a UTF8 conversion )
>>
>> For ascii 91 thru 95 I get something that looks wrong using $20 OR.
>> What am I missing?
>>
>> Btw, it made no difference in the unique words count in the
>> kjvbible.txt file. Maybe that's why you used it.'A
>>
>> Thanks.
>> -Doug
>>
>> : go ( c --) dup emit $20 or emit ;
>> 91 go [{
>> 92 go \|
>> 93 go ]}
>> 94 go ^~
>> 95 go _
>
> '$20 OR' always sets the 5th bit which is why he chose lower case.
>
> I use :-
>
> : ?upper ( c -- f ) 'A' 'Z' 1+ within ;
> : ?lower ( c -- f ) 'a' 'z' 1+ within ;
>
> : toupper ( c -- c ) dup ?lower if 32 xor then ;
> : tolower ( c -- c ) dup ?upper if 32 xor then ;
>
> : test 127 32 do cr i . space i emit space i toupper emit loop ; test
> : test 127 32 do cr i . space i emit space i tolower emit loop ; test
>

Simmilar but a bit more efficient is

: up/down ( ch1 ch1 ch3 -- ch1|ch2 ) over + #26 u< bl and xor ;
: ch>lower ( ch1 -- ch1|ch2 ) [ 'A' negate ] literal up/down ;
: ch>upper ( ch1 -- ch1|ch2 ) [ 'a' negate ] literal up/down ;

: bounds ( ca u -- ca+u ca ) over + swap ;
: set-case ( ca u xt -- ) \ xt for ch>upper or ch>lower
-rot bounds ?do i c@ over execute i c! loop drop
;

--
Gerry


devel / comp.lang.forth / Re: Counting frequencies of unique words

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor