Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

Time sharing: The use of many people by the computer.


computers / comp.compression / One Way To Output LZ77 codes

SubjectAuthor
* One Way To Output LZ77 codesGerald Tamayo
+* Re: One Way To Output LZ77 codesHarry Potter
|`* Re: One Way To Output LZ77 codesHarry Potter
| `* Re: One Way To Output LZ77 codesHarry Potter
|  +- Re: One Way To Output LZ77 codesGerald R. Tamayo
|  `* Re: One Way To Output LZ77 codesGerald R. Tamayo
|   `* Re: One Way To Output LZ77 codesHarry Potter
|    `* Re: One Way To Output LZ77 codesGerald R. Tamayo
|     `* Re: One Way To Output LZ77 codesHarry Potter
|      `- Re: One Way To Output LZ77 codesGerald R. Tamayo
+* One Way To Output LZ77 codesGerald Tamayo
|`* One Way To Output LZ77 codesGerald Tamayo
| `- One Way To Output LZ77 codesGerald Tamayo
`* Re: One Way To Output LZ77 codesHarry Potter
 `* Re: One Way To Output LZ77 codesHarry Potter
  +- Re: One Way To Output LZ77 codesMatthias Waldhauer
  +- Re: One Way To Output LZ77 codesGerald Tamayo
  +- Re: One Way To Output LZ77 codesGerald Tamayo
  `- Re: One Way To Output LZ77 codesGerald Tamayo

1
Subject: Re: One Way To Output LZ77 codes
From: Matthias Waldhauer
Newsgroups: comp.compression
Date: Wed, 26 Aug 2020 07:54 UTC
References: 1 2 3
X-Received: by 2002:ac8:4e37:: with SMTP id d23mr7316187qtw.210.1598428474164;
Wed, 26 Aug 2020 00:54:34 -0700 (PDT)
X-Received: by 2002:a25:e0a:: with SMTP id 10mr886676ybo.256.1598428473926;
Wed, 26 Aug 2020 00:54:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 26 Aug 2020 00:54:33 -0700 (PDT)
In-Reply-To: <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=194.114.104.125; posting-account=mZDhIAkAAAA5yZTwP-i7JDS8KSI6Oroh
NNTP-Posting-Host: 194.114.104.125
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com> <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e3fbd20b-f96e-4bd2-af6c-4793baf2ba7dn@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: matthias...@gmail.com (Matthias Waldhauer)
Injection-Date: Wed, 26 Aug 2020 07:54:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
Harry Potter schrieb am Freitag, 27. März 2020 um 01:04:32 UTC+1:
On Thursday, March 26, 2020 at 5:35:49 PM UTC-4, Harry Potter wrote:
I like your idea but admit that I don't fully understand it. :( Can you post some P-code to describe the technique? :)
Never mind: I get it. :) I'm applying your ideas now, but I'm running into some problems with your technique. :(

What's your progress so far?

On encode.su I wrote some thoughts about it and where it might loose efficiency. Some improvement ideas are also there. The main benefits in my humble view might come with text compression, as there are a lot of reused strings, where removing the length encoding for all of them could offset the need to have pointers into the decompressed block including for the first occurence of the string.

M.


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Fri, 11 Sep 2020 20:28 UTC
References: 1
X-Received: by 2002:ac8:4e19:: with SMTP id c25mr3856074qtw.283.1599856086185;
Fri, 11 Sep 2020 13:28:06 -0700 (PDT)
X-Received: by 2002:a25:6b52:: with SMTP id o18mr4835056ybm.367.1599856085987;
Fri, 11 Sep 2020 13:28:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Fri, 11 Sep 2020 13:28:05 -0700 (PDT)
In-Reply-To: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.88.142; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 100.2.88.142
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Fri, 11 Sep 2020 20:28:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
I like the sound of your technique, but I don't fully understand how it works.  How do I specify where to insert an offset block? 

On Wednesday, February 12, 2020 at 11:35:36 AM UTC-5, Gerald R. Tamayo wrote:
Reduced Length LZ (RLLZ)

Let me share here my one way to output LZ77 codes, which may improve LZ77/LZSS or LZW.

LZ77:

LZ77 coding transmits <offset, length> codes. Many times in the compression process, the same matching string is encoded with an <offset> and the same <length> code. We can avoid transmitting the <length> code for many strings as well as not outputting a bit to identify a literal (as in LZSS) by the following:

Instead, read the whole input or block and gather duplicated or similar strings and encode <the whole string> (e.g., <length of the string (n>=2)> plus the actual string of characters) only this one time, <the number of the same matching strings (i.e., number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps BLOCK_SIZE.) Do this for all duplicated strings. This is actually a "Reduced Length LZ (RLLZ)".

The literals are outputted last *without bit flags* since they fall into the block buffers not covered or "not activated" by encoded strings. So just one array of literals can be outputted maybe at the end of the file, or, since block-based coding is more practical for shorter offset codes, at the end of the appropriately-sized block of <offsets>.

During decoding, the strings are written first in the output or write buffer using the offsets, and the literals "fill in" the unwritten positions or "holes" in the write buffer. This makes very compact LZ.


***

No need to output bit flags for literals or the number of following consecutive literals in some algorithms.

Then the completely filled write buffer is written to file.

That is, e.g. after gathering all distinct repeated strings on a block,

1) no transmission of <length> code for the next occurrences of the same string (but, in the simplest way, you have to output the number of succeeding strings);

2) transmitting the literals last (in the output buffer) means no need to transmit *bit flags* for literals (and matches). This is the novel or surprising idea here: deferred literals output;

3) if you know (LZT) LZ-Tamayo (2008) algorithm where it was demonstrated complete exclusion of the <length> code, this might as well be "LZ-Tamayo2". Should improve LZ77/LZSS/LZW based compressors. Decoding is also straight-forward.

Sorry that only now after a decade of LZT (2008 ) i am releasing this. I stopped coding compression in 2010 or 2011. (Note: it seems there is already "LZT" before i named my algorithm. LZ-Tischer.)
***

I have similar ideas as "rep-codes" in 2000s. My compression ideas are from the late 90s when i became interested again in data compression.

The ones i call here "holes" they call "gaps", even in LZW. But the idea of deferring transmission of literals for an output buffer avoids bit flags for both literals and matches which is still used in some explanations of ROLZ. Other algorithms still need to output the number of following literals, or literal_length. Deferred, meaning this is not an "online" algorithm.

-- Gerald R. Tamayo

(reposted, edited from Sept. 2018)


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Fri, 15 Jan 2021 19:26 UTC
References: 1 2
X-Received: by 2002:ac8:6f69:: with SMTP id u9mr12934266qtv.16.1610738803584;
Fri, 15 Jan 2021 11:26:43 -0800 (PST)
X-Received: by 2002:a25:286:: with SMTP id 128mr12090920ybc.116.1610738803422;
Fri, 15 Jan 2021 11:26:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Fri, 15 Jan 2021 11:26:43 -0800 (PST)
In-Reply-To: <c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.88.142; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 100.2.88.142
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com> <c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Fri, 15 Jan 2021 19:26:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
Hi, again!  I just figured out how to use your technique: I missed the part where you stated how to write the compressed block pointers.  I get it now, and, well, your technique worked!  :)  I had to kill a few of my techniques, but it was well worth it: I'm doing much better than before.  Thank you!


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Wed, 17 Feb 2021 18:16 UTC
References: 1 2 3
X-Received: by 2002:a05:6214:76f:: with SMTP id f15mr432179qvz.56.1613585814959;
Wed, 17 Feb 2021 10:16:54 -0800 (PST)
X-Received: by 2002:a25:b906:: with SMTP id x6mr911707ybj.504.1613585814791;
Wed, 17 Feb 2021 10:16:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 17 Feb 2021 10:16:54 -0800 (PST)
In-Reply-To: <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.88.142; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 100.2.88.142
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Wed, 17 Feb 2021 18:16:54 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
I was wrong.  I found a bug in my software,and now, I'm doing worse than without your technique, and I couldn't get back the numbers.  :(


Subject: Re: One Way To Output LZ77 codes
From: Gerald R. Tamayo
Newsgroups: comp.compression
Date: Wed, 17 Mar 2021 09:17 UTC
References: 1 2 3 4
X-Received: by 2002:ad4:4421:: with SMTP id e1mr4102879qvt.48.1615972665242;
Wed, 17 Mar 2021 02:17:45 -0700 (PDT)
X-Received: by 2002:a25:ad90:: with SMTP id z16mr3196583ybi.116.1615972664996;
Wed, 17 Mar 2021 02:17:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 17 Mar 2021 02:17:44 -0700 (PDT)
In-Reply-To: <67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:4450:83de:c400:2c23:a490:a181:1374;
posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 2001:4450:83de:c400:2c23:a490:a181:1374
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3691db65-ef7b-49b8-9ce6-65f078c2d979n@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald R. Tamayo)
Injection-Date: Wed, 17 Mar 2021 09:17:45 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
Perhaps this simple illustration will help:

Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)

aacbbdeaab <- input source
0123456789 <- index

Compress:

Encode all strings first:

(size of string, string, number of occurrences of strings), [positions in file or block]

aa: (2, aa, 2), [0, 7]
bb: (2, bb, 1), [3]

Encode literals last: [cdeb]

Decompress:

Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)


Subject: Re: One Way To Output LZ77 codes
From: Gerald R. Tamayo
Newsgroups: comp.compression
Date: Wed, 17 Mar 2021 09:45 UTC
References: 1 2 3 4
X-Received: by 2002:a05:620a:22f5:: with SMTP id p21mr3712261qki.225.1615974327701;
Wed, 17 Mar 2021 02:45:27 -0700 (PDT)
X-Received: by 2002:a25:424f:: with SMTP id p76mr3418370yba.109.1615974327500;
Wed, 17 Mar 2021 02:45:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 17 Mar 2021 02:45:27 -0700 (PDT)
In-Reply-To: <67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:4450:83de:c400:2c23:a490:a181:1374;
posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 2001:4450:83de:c400:2c23:a490:a181:1374
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald R. Tamayo)
Injection-Date: Wed, 17 Mar 2021 09:45:27 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
Perhaps this simple illustration will help:


Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)

aacbbdeaabb   <- input source
012345678910 <- index

encode all strings first:

(size of string, string, number of occurrences of strings), [positions in file or block]

aa: (2, aa, 2), [0, 7]
bb: (2, bb, 2), [3, 9]

encode literals last: [cde]

Decompress:

Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Thu, 2 Sep 2021 18:17 UTC
References: 1 2 3 4 5
X-Received: by 2002:a37:b4d:: with SMTP id 74mr4421133qkl.92.1630606672044;
Thu, 02 Sep 2021 11:17:52 -0700 (PDT)
X-Received: by 2002:a25:6705:: with SMTP id b5mr6267841ybc.116.1630606669741;
Thu, 02 Sep 2021 11:17:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Thu, 2 Sep 2021 11:17:49 -0700 (PDT)
In-Reply-To: <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.132.12; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 100.2.132.12
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com> <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f96ad285-5493-4fb6-9414-55ca9486cf0an@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Thu, 02 Sep 2021 18:17:52 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24
View all headers
On Wednesday, March 17, 2021 at 5:45:28 AM UTC-4, Gerald R. Tamayo wrote:
Perhaps this simple illustration will help:


Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)

aacbbdeaabb <- input source
012345678910 <- index

encode all strings first:

(size of string, string, number of occurrences of strings), [positions in file or block]

aa: (2, aa, 2), [0, 7]
bb: (2, bb, 2), [3, 9]

encode literals last: [cde]

Decompress:

Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)
I did that then some performance optimizations and am doing *worse* than without your technique.  Maybe I'm doing something wrong.  I need to try it again.  :)


Subject: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 16:35 UTC
X-Received: by 2002:ac8:5502:: with SMTP id j2mr8056599qtq.127.1581525334464;
Wed, 12 Feb 2020 08:35:34 -0800 (PST)
X-Received: by 2002:a05:620a:24d:: with SMTP id q13mr10565260qkn.99.1581525334216;
Wed, 12 Feb 2020 08:35:34 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 08:35:33 -0800 (PST)
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.33.185; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.33.185
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
Subject: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Wed, 12 Feb 2020 16:35:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 72
View all headers
Reduced Length LZ (RLLZ)

Let me share here my one way to output LZ77 codes, which may improve LZ77/LZSS or LZW.

LZ77:

LZ77 coding transmits <offset, length> codes. Many times in the compression process, the same matching string is encoded with an <offset> and the same <length> code. We can avoid transmitting the <length> code for many strings as well as not outputting a bit to identify a literal (as in LZSS) by the following:

Instead, read the whole input or block  and gather duplicated or similar strings and encode <the whole string> (e.g., <length of the string (n>=2)> plus the actual string of characters) only this one time, <the number of the same matching strings (i.e., number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps BLOCK_SIZE.) Do this for all duplicated strings. This is actually a "Reduced Length LZ (RLLZ)".

The literals are outputted last *without bit flags* since they fall into the block buffers not covered or "not activated" by encoded strings. So just one array of literals can be outputted maybe at the end of the file, or, since block-based coding is more practical for shorter offset codes, at the end of the appropriately-sized block of <offsets>.

During decoding, the strings are written first in the output or write buffer using the offsets, and the literals "fill in" the unwritten positions or "holes" in the write buffer. This makes very compact LZ.


                ***

No need to output bit flags for literals or the number of following consecutive literals in some algorithms.

Then the completely filled write buffer is written to file.

That is, e.g. after gathering all distinct repeated strings on a block,

1) no transmission of <length> code for the next occurrences of the same string (but, in the simplest way, you have to output the number of succeeding strings);

2) transmitting the literals last (in the output buffer) means no need to transmit *bit flags* for literals (and matches). This is the novel or surprising idea here: deferred literals output;

3) if you know (LZT) LZ-Tamayo (2008) algorithm where it was demonstrated complete exclusion of the <length> code, this might as well be "LZ-Tamayo2". Should improve LZ77/LZSS/LZW based compressors. Decoding is also straight-forward.

Sorry that only now after a decade of LZT (2008 ) i am releasing this. I stopped coding compression in 2010 or 2011. (Note: it seems there is already "LZT" before i named my algorithm. LZ-Tischer.)
***

I have similar ideas as "rep-codes" in 2000s. My compression ideas are from the late 90s when i became interested again in data compression.

The ones i call here "holes" they call "gaps", even in LZW. But the idea of deferring transmission of literals for an output buffer avoids bit flags for both literals and matches which is still used in some explanations of ROLZ. Other algorithms still need to output the number of following literals, or literal_length. Deferred, meaning this is not an "online" algorithm.

-- Gerald R. Tamayo

(reposted, edited from Sept. 2018)



Subject: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 16:42 UTC
References: 1
X-Received: by 2002:aed:2f45:: with SMTP id l63mr8084781qtd.221.1581525742000;
Wed, 12 Feb 2020 08:42:22 -0800 (PST)
X-Received: by 2002:a05:6214:80c:: with SMTP id df12mr7728735qvb.113.1581525741862;
Wed, 12 Feb 2020 08:42:21 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 08:42:21 -0800 (PST)
In-Reply-To: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.33.185; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.33.185
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fc0dd3a8-b0b1-49cf-b887-ce6b99c833b1@googlegroups.com>
Subject: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Wed, 12 Feb 2020 16:42:21 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
There are no "size" of holes to transmit, there are just characters written to holes, whatever the sizes of those holes, 1 char, 2 chars, or n chars, the literals are just written one by one into the holes.

Decode Output buffer (one appropriately-sized block):

[STRING..STRING....STRING.STRING.....STRING.STRING]
(2 holes, 4 holes, 1 hole, 5 holes, 1 hole)



Subject: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 16:51 UTC
References: 1 2
X-Received: by 2002:ae9:e006:: with SMTP id m6mr11869397qkk.94.1581526317626;
Wed, 12 Feb 2020 08:51:57 -0800 (PST)
X-Received: by 2002:a05:620a:12cf:: with SMTP id e15mr8474464qkl.120.1581526317456;
Wed, 12 Feb 2020 08:51:57 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 08:51:57 -0800 (PST)
In-Reply-To: <fc0dd3a8-b0b1-49cf-b887-ce6b99c833b1@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.33.185; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.33.185
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com> <fc0dd3a8-b0b1-49cf-b887-ce6b99c833b1@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f178ee8-26a9-4fe0-943a-593350728c05@googlegroups.com>
Subject: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Wed, 12 Feb 2020 16:51:57 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
The literals "nicely fit" into the holes in the write buffer, and they are in their *correct positions* in the file!, like the strings. Simply put, we still preserve the order or sequence of literals and strings.



Subject: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Thu, 13 Feb 2020 05:29 UTC
References: 1 2 3
X-Received: by 2002:aed:38c2:: with SMTP id k60mr10368506qte.375.1581571796476;
Wed, 12 Feb 2020 21:29:56 -0800 (PST)
X-Received: by 2002:aed:2798:: with SMTP id a24mr22657134qtd.184.1581571796288;
Wed, 12 Feb 2020 21:29:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Wed, 12 Feb 2020 21:29:55 -0800 (PST)
In-Reply-To: <7f178ee8-26a9-4fe0-943a-593350728c05@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.36.205; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.36.205
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<fc0dd3a8-b0b1-49cf-b887-ce6b99c833b1@googlegroups.com> <7f178ee8-26a9-4fe0-943a-593350728c05@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <abd00f5c-4c82-41ee-866d-64d4f61a8063@googlegroups.com>
Subject: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Thu, 13 Feb 2020 05:29:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
Edit: <BLOCK_SIZE-1>

<the number of the same matching strings (i.e., number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps <BLOCK_SIZE-1>.)

-- Gerald


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Thu, 26 Mar 2020 21:35 UTC
References: 1
X-Received: by 2002:a37:4dc8:: with SMTP id a191mr11245867qkb.450.1585258547850;
Thu, 26 Mar 2020 14:35:47 -0700 (PDT)
X-Received: by 2002:ad4:556e:: with SMTP id w14mr11033255qvy.144.1585258547749;
Thu, 26 Mar 2020 14:35:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Thu, 26 Mar 2020 14:35:47 -0700 (PDT)
In-Reply-To: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=71.190.145.58; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 71.190.145.58
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Thu, 26 Mar 2020 21:35:47 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
I like your idea but admit that I don't fully understand it.  :(  Can you post some P-code to describe the technique?  :)


Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Fri, 27 Mar 2020 00:04 UTC
References: 1 2
X-Received: by 2002:a0c:8167:: with SMTP id 94mr10266442qvc.90.1585267470970;
Thu, 26 Mar 2020 17:04:30 -0700 (PDT)
X-Received: by 2002:ac8:2919:: with SMTP id y25mr11370030qty.198.1585267470823;
Thu, 26 Mar 2020 17:04:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Thu, 26 Mar 2020 17:04:30 -0700 (PDT)
In-Reply-To: <8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=71.190.145.58; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 71.190.145.58
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com> <8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Fri, 27 Mar 2020 00:04:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 3
View all headers
On Thursday, March 26, 2020 at 5:35:49 PM UTC-4, Harry Potter wrote:
I like your idea but admit that I don't fully understand it.  :(  Can you post some P-code to describe the technique?  :)

Never mind: I get it.  :)   I'm applying your ideas now, but I'm running into some problems with your technique.  :(


Subject: Re: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 11:45 UTC
References: 1 2 3
X-Received: by 2002:a05:620a:205e:: with SMTP id d30mr15030973qka.450.1593431107152;
Mon, 29 Jun 2020 04:45:07 -0700 (PDT)
X-Received: by 2002:a05:620a:1348:: with SMTP id c8mr11572516qkl.324.1593431106938;
Mon, 29 Jun 2020 04:45:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 04:45:06 -0700 (PDT)
In-Reply-To: <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=2001:4450:8343:7b00:d99d:9cf0:e262:3bae;
posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 2001:4450:8343:7b00:d99d:9cf0:e262:3bae
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com> <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3758d9d8-5a65-4291-b58c-932f5bac5e4ao@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Mon, 29 Jun 2020 11:45:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the literals is still preserved.


Subject: Re: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 11:48 UTC
References: 1 2 3
X-Received: by 2002:ac8:4419:: with SMTP id j25mr15794514qtn.0.1593431315749;
Mon, 29 Jun 2020 04:48:35 -0700 (PDT)
X-Received: by 2002:a0c:b2d1:: with SMTP id d17mr3873979qvf.100.1593431315598;
Mon, 29 Jun 2020 04:48:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!peer01.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 04:48:35 -0700 (PDT)
In-Reply-To: <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.38.155; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.38.155
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com> <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89e6c1cd-95f9-4c65-9854-ad83d0313684o@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Mon, 29 Jun 2020 11:48:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1642
X-Received-Body-CRC: 90810641
View all headers
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the literals and strings is still preserved.


Subject: Re: One Way To Output LZ77 codes
From: Gerald Tamayo
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 11:50 UTC
References: 1 2 3
X-Received: by 2002:a05:620a:629:: with SMTP id 9mr14959816qkv.353.1593431449774;
Mon, 29 Jun 2020 04:50:49 -0700 (PDT)
X-Received: by 2002:a37:a204:: with SMTP id l4mr13915067qke.200.1593431449613;
Mon, 29 Jun 2020 04:50:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder7.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Mon, 29 Jun 2020 04:50:49 -0700 (PDT)
In-Reply-To: <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=112.211.38.155; posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 112.211.38.155
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<8735ccb5-ec4f-4c81-bab1-7e70d672d08a@googlegroups.com> <bdd56c23-2139-4455-956c-182caffdfd9e@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5abf83ab-7ffe-4ef2-a51d-af9704194206o@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald Tamayo)
Injection-Date: Mon, 29 Jun 2020 11:50:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
View all headers
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the literals and strings is still preserved.


Subject: Re: One Way To Output LZ77 codes
From: Gerald R. Tamayo
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 13:19 UTC
References: 1 2 3 4 5 6
X-Received: by 2002:a37:43c6:: with SMTP id q189mr2448566qka.315.1633180757617;
Sat, 02 Oct 2021 06:19:17 -0700 (PDT)
X-Received: by 2002:a25:69c7:: with SMTP id e190mr2351113ybc.334.1633180757314;
Sat, 02 Oct 2021 06:19:17 -0700 (PDT)
Path: rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 06:19:17 -0700 (PDT)
In-Reply-To: <f96ad285-5493-4fb6-9414-55ca9486cf0an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:4450:833c:a500:19f1:e60e:c58e:e54e;
posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 2001:4450:833c:a500:19f1:e60e:c58e:e54e
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com> <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
<f96ad285-5493-4fb6-9414-55ca9486cf0an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <277eaa59-fd4e-4837-8ce5-8e278fd5e379n@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald R. Tamayo)
Injection-Date: Sat, 02 Oct 2021 13:19:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 35
View all headers

I did that then some performance optimizations and am doing *worse* than without your technique. Maybe I'm doing something wrong. I need to try it again. :)

Maybe the following improved example will help.

Here is simple illustration for RLLZ, to be complete:

Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)

aacbcdeaabc <- input source
012345678910 <- index

Compress:

Encode all duplicated strings first:
Very simple:
(<size of string>, <string>, <number of "next" occurrences of string>), [positions in file or block (no match lengths needed)]

aa: (2, aa, 1), [0, 7]
bc: (2, bc, 1), [3, 9]

Better:
aa: (2, aa, 1-1=0), [0, 7]
bc: (2, bc, 1-1=0), [3, 9]

Encode literals last: [cde]


Decompress:

Decode all strings first;
Read/Get literals. (The literals are then inserted into the holes. Whatever the size of those holes randomly positioned in the write buffer, they are filled by the literals.)

The strings and literals are nicely in their correct order or sequences. Write to actual file.



Subject: Re: One Way To Output LZ77 codes
From: Harry Potter
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 14:10 UTC
References: 1 2 3 4 5 6 7
X-Received: by 2002:a0c:c189:: with SMTP id n9mr15217561qvh.5.1633183814415;
Sat, 02 Oct 2021 07:10:14 -0700 (PDT)
X-Received: by 2002:a25:bb0b:: with SMTP id z11mr4000149ybg.108.1633183814063;
Sat, 02 Oct 2021 07:10:14 -0700 (PDT)
Path: rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 07:10:13 -0700 (PDT)
In-Reply-To: <277eaa59-fd4e-4837-8ce5-8e278fd5e379n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.132.12; posting-account=xRocggoAAACFej4w6sQauoZjUP9yroE5
NNTP-Posting-Host: 100.2.132.12
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com> <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
<f96ad285-5493-4fb6-9414-55ca9486cf0an@googlegroups.com> <277eaa59-fd4e-4837-8ce5-8e278fd5e379n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4ca96552-407b-48d1-aefc-020be27e6bban@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: rose.jos...@yahoo.com (Harry Potter)
Injection-Date: Sat, 02 Oct 2021 14:10:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 234
View all headers
Maybe there's a problem with my code.  (?)  I pasted the relevant code here.  It uses Assembler, but I believe the original C code is still in the file.
--------------------------------
void CompBin01 (void)
{
//signed char c2; //Holds value for len. current
// int c2;
// unsigned /*j1,*/ j3, j4, j5;
// unsigned i, j, j2, k, m, n, o=0;
//vz.InPos=0;
k=0; vz.InPos=0;
for (; vz.InPos<vz.InEnd; )
{
//printc ('.');
printu (vz.InPos); printcr ();
/* if (Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7))) {
//getkey ();
for (i=j=m=0; i<NumRLLZEnt; ++i) {
if (RLLZBuf[i].len>j && !memcmp(&InBuffer[vz.InPos], &InBuffer[RLLZBuf[i].loc], RLLZBuf[i].len)) {
j=RLLZBuf[i].len; m=i;
}
}
//if (j==0) {CompBin01U (); ++vz.InPos; continue;}
if (j==0) {WriteB (InBuffer[vz.InPos]); ++vz.InPos; continue;}
//CompBin01Ue ();
++backcol;
vz.InPos+=j;
} else {
//CompBin01U (); ++vz.InPos;
WriteB (InBuffer[vz.InPos]); ++vz.InPos;
}*/
//if (!(Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) WriteB (InBuffer[vz.InPos]);
//else ++backcol;
// if (!(Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) {
// WriteB (InBuffer[vz.InPos]);
// //++backcol;
// }
// ++vz.InPos;
if ((Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) {
//CompBin01Ue ();
writebuffer ();
for (; Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)); ++vz.InPos);
//++backcol;
} else {
CompBin01U ();
++vz.InPos;
}
}
writeoutlast ();
}
static unsigned p;

void GetRLLZBlocks (void)
{
unsigned i, j, k, l, m, n, o;
unsigned q;
struct RLLZBuf* r;
for (i=NumRLLZEnt=0; i<vz.InEnd; ) {
printu (i); printcr ();
if ((lenc=GetLZW (i))>=4) {
o=techLZW.pos-(unsigned)&InBuffer;
//if (GetLZW(i+1)>=lenc+1) {++i; continue;}
for (j=0; j<NumRLLZEnt; ++j) {
if (RLLZBuf[j].len==lenc &&
!memcmp (&InBuffer[o], &InBuffer[RLLZBuf[j].loc], RLLZBuf[j].len))
break;
}
if (j==NumRLLZEnt) {
if (NumRLLZEnt>=1000) {
//++backcol;
++i; continue;
}
RLLZBuf[j].loc=o;
RLLZBuf[j].len=lenc;
for (k=0; k<lenc; k++) {
//Buffer3[(o+k)>>3]|=1<<(o+k&7);
}
RLLZBuf[j].numoccur=1;
++NumRLLZEnt;
}
++RLLZBuf[j].numoccur;
for (k=0; k<lenc; k++) {
//++backcol;
//Buffer3[(i+k)>>3]|=1<<(i+k&7);
}
i+=techLZW.size;
} else {
a: ++i;
}
}
for (i=0; i<NumRLLZEnt; ++i) {
//++backcol;
if (RLLZBuf[i].numoccur<5-(RLLZBuf[i].len>=5?2:0)) {
for (j=i; j<NumRLLZEnt; ++j) {
memcpy (&RLLZBuf[j], &RLLZBuf[j+1], sizeof (RLLZBuf[0]));
} --NumRLLZEnt; --i;
}
}
prints ("rllz="); printu (NumRLLZEnt); printcr (); getkey ();
WriteDist (NumRLLZEnt, 1024);
for (o=0; o<NumRLLZEnt; ++o) {
i=2;
lenc=RLLZBuf[o].len;
if (lenc<6) {
WriteDist (lenc-2+1-i, 6-i);
} else if (lenc< 11) {
WriteDist (0, 6-i);
//writeoutf_10 ();
WriteDist (lenc-6,  11-6);
} else if (lenc<18) {
WriteDist (5-i, 6-i);
//writeoutf_10 ();
WriteDist ((lenc- 11), (22- 11));
} else {
WriteDist (5-i, 6-i);
WriteDist ((18- 11)+(lenc-18)%4, (22- 11));
//writeoutf_01 ();
//WriteDist (11-7+((lenc-11)&5), 17-7);
WriteDist ((lenc-18)/4, (38-18+3)/4);
//WriteDist ((lenc-18), (57-18));
}
for (j=0; j<lenc; ++j) {
WriteB (InBuffer[RLLZBuf[o].loc+j]);
} k=0; m=-1;

for (l=n=0; l<vz.InEnd; )
{
//++backcol;
lenc=0; p=-1;
//if (Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7))) {
//if (GetLZW(l)>=4) {
//if (!(Buffer3[l]>>(l>>3)&1<<(i&7))) {++l; continue;}
//if (InBuffer[l]!=InBuffer)
//r=&RLLZBuf[0];
cin=&InBuffer[l];
/*for (j=0; j<NumRLLZEnt; ++j) {

if ((q=r->len)>lenc &&
*cin==InBuffer[i=r->loc] &&
//(Buffer3[l>>3]&1<<(l&7)) &&
memcmp2(&InBuffer[i], q))
{lenc=q; p=j;}
++r;
}*/
__asm__ (
"\tlda\t#0\n"
"\tsta\t_i\n"
//"\tlda\tNumRLLZEnt+1\n"
"\tsta\t_i+1\n"
"\tlda\t#<_RLLZBuf\n"
"\tsta\tptr2\n"
"\tlda\t#>_RLLZBuf\n"
"\tsta\tptr2+1\n"

// "\tlda\t_cin\n"
// "\tsta\tptr4\n"
// "\tlda\t_cin+1\n"
// "\tsta\tptr4+1\n"

"@lp1:\n"
"\tldy\t#2\n"
"\tlda\t_lenc\n"
"\tcmp\t(ptr2),y\n"
"\tbeq\t@skip\n"
"\tbcs\t@skip\n"
"\tlda\t#<_InBuffer\n"
"\tldy\t#0\n"
"\tclc\n"
"\tadc\t(ptr2),y\n"
"\tsta\tptr3\n"
"\tiny\n"
"\tlda\t#>_InBuffer\n"
"\tadc\t(ptr2),y\n"
"\tsta\tptr3+1\n"

"\tldy\t#0\n"
"\tlda\t(_cin),y\n"
"\tcmp\t(ptr3),y\n"
"\tbne\t@skip\n"

"\tldy\t#2\n"
"\tlda\t(ptr2),y\n"
"\ttax\n"
"\ttay\n"
"\tdey\n"
"@lp2:\n"
"\tlda\t(_cin),y\n"
"\tcmp\t(ptr3),y\n"
"\tbne\t@skip\n"
"\tdey\n"
"\tbne\t@lp2\n"

"\tlda\t_i\n"
"\tsta\t_p\n"
"\tlda\t_i+1\n"
"\tsta\t_p+1\n"
"\tstx\t_lenc\n"

"@skip:\n"
"\tlda\tptr2\n"
"\tclc\n"
"\tadc\t#8\n"
"\tsta\tptr2\n"
"\tbcc\t@skip2\n"
"\tinc\tptr2+1\n"
"@skip2:\n"
"\tinc\t_i\n"
"\tbne\t@skip3\n"
"\tinc\t_i+1\n"
"@skip3:\n"
"\tlda\t_i+1\n"
"\tcmp\t_NumRLLZEnt+1\n"
"\tbcc\t@lp1\n"
"\tlda\t_i\n"
"\tcmp\t_NumRLLZEnt\n"
"\tbcc\t@lp1\n"
);
if (lenc && p==o) {
//++backcol;
for (i=0; i<lenc; i++) {
Buffer2[(l+i)>>3]|=1<<(l+i&7);
//++backcol;
}
WriteDist (l-k, vz.InEnd-k+1);
//k=l; l+=lenc;
k=l+lenc; l+=lenc;
} else {
++l;
}
} WriteDist (vz.InEnd-k, vz.InEnd+1-k);
printu (o); printcr ();
}
//printf (">>> %X\n", &Buffer2); getkey ();
}
--------------------------------
WriteDist() writes the information using as few bits as possible.


Subject: Re: One Way To Output LZ77 codes
From: Gerald R. Tamayo
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 15:54 UTC
References: 1 2 3 4 5 6 7 8
X-Received: by 2002:a05:620a:409:: with SMTP id 9mr3077823qkp.76.1633190057524;
Sat, 02 Oct 2021 08:54:17 -0700 (PDT)
X-Received: by 2002:a25:54c5:: with SMTP id i188mr4558538ybb.304.1633190057239;
Sat, 02 Oct 2021 08:54:17 -0700 (PDT)
Path: rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.compression
Date: Sat, 2 Oct 2021 08:54:17 -0700 (PDT)
In-Reply-To: <4ca96552-407b-48d1-aefc-020be27e6bban@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:4450:833c:a500:19f1:e60e:c58e:e54e;
posting-account=x4y9KQoAAAAgtc4BPWDOKB7Ls5RAV5pf
NNTP-Posting-Host: 2001:4450:833c:a500:19f1:e60e:c58e:e54e
References: <b5e3ca3e-579f-4d17-b598-d759e2f416ee@googlegroups.com>
<c1cbbfcb-3f5e-474a-9627-4369ca4dbf9dn@googlegroups.com> <601398a4-d579-4bf0-99e9-af3bf0ca4752n@googlegroups.com>
<67b9ba67-c9fc-4231-b231-2859110197aen@googlegroups.com> <3be433da-97c4-45d2-991d-53b32d6bce1an@googlegroups.com>
<f96ad285-5493-4fb6-9414-55ca9486cf0an@googlegroups.com> <277eaa59-fd4e-4837-8ce5-8e278fd5e379n@googlegroups.com>
<4ca96552-407b-48d1-aefc-020be27e6bban@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <36d76732-2c42-45c5-8d1f-1cb061673ee9n@googlegroups.com>
Subject: Re: One Way To Output LZ77 codes
From: com...@gmail.com (Gerald R. Tamayo)
Injection-Date: Sat, 02 Oct 2021 15:54:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 34
View all headers
Maybe the following improved example will help.

Here is simple illustration for RLLZ, to be complete:

Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)

aacbcdeaabc <- input source
012345678910 <- index

Compress:

Encode all duplicated strings first:
Very simple:
(<size of string>, <string>, <number of "next" occurrences of string>), [positions in file or block (no match lengths needed)],
   :

aa: (2, aa, 1), [0, 7]
bc: (2, bc, 1), [3, 9]

Better:
aa: (2, aa, 1-1=0), [0, 7]
bc: (2, bc, 1-1=0), [3, 9]

Encode end of string code stream: (0, , );

Encode literals last: [cde].


Decompress:

Decode all strings first;
Read/Get literals. (The literals are then inserted into the holes. Whatever the size of those holes randomly positioned in the write buffer, they are filled by the literals.)

The strings and literals are nicely in their correct order or sequences. Write to actual file.


1
rocksolid light 0.7.2
clearneti2ptor