Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Brain fried -- Core dumped


devel / comp.lang.tcl / untar file by file in a loop

SubjectAuthor
* untar file by file in a loopAlexandru
`* untar file by file in a loopRich
 `* untar file by file in a loopAlexandru
  +* untar file by file in a loopSchelte
  |`* untar file by file in a loopRich
  | `- untar file by file in a loopAlexandru
  +- untar file by file in a loopRobert Heller
  `- untar file by file in a loopRich

1
untar file by file in a loop

<0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9954&group=comp.lang.tcl#9954

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:620a:440e:b0:6f6:2a11:c497 with SMTP id v14-20020a05620a440e00b006f62a11c497mr11604802qkp.213.1667288345913;
Tue, 01 Nov 2022 00:39:05 -0700 (PDT)
X-Received: by 2002:a05:620a:f88:b0:6e0:15aa:72e0 with SMTP id
b8-20020a05620a0f8800b006e015aa72e0mr11916477qkn.40.1667288345757; Tue, 01
Nov 2022 00:39:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Tue, 1 Nov 2022 00:39:05 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=82.207.255.142; posting-account=glPZ8goAAADztwA3kVEZPMKXCGydx5DU
NNTP-Posting-Host: 82.207.255.142
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>
Subject: untar file by file in a loop
From: alexandr...@meshparts.de (Alexandru)
Injection-Date: Tue, 01 Nov 2022 07:39:05 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 6295
 by: Alexandru - Tue, 1 Nov 2022 07:39 UTC

I have a procedure that unpacks files given by a list of file paths from an archive like this:

proc ::meshparts::AssemblyArchiveUnpack {zipfile {paths {}} {targetpaths {}}} {
set f [open $zipfile rb]
fconfigure $f -encoding binary -translation lf -eofchar {}
zlib push gunzip $f
if {[llength $paths]==0} {
set result [tar::untar $f -chan]
} else {
foreach path $paths targetpath $targetpaths {
set dir [file dirname $targetpath]
set code [catch {file mkdir $dir} err]
if {$code} {
::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
continue
}
set result [tar::untar $f -file $path -dir $dir -chan]
seek $f 0
}
}
close $f
return 1
}

The main part is the foreach:

foreach path $paths targetpath $targetpaths {
set dir [file dirname $targetpath]
set code [catch {file mkdir $dir} err]
if {$code} {
::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
continue
}
set result [tar::untar $f -file $path -dir $dir -chan]
seek $f 0
}

It can be further reduces to:

foreach path $paths targetpath $targetpaths {
set dir [file dirname $targetpath]
set result [tar::untar $f -file $path -dir $dir -chan]
seek $f 0
}

The problem is that it only works for first file in list.
Second file is not unpacked and if a third file is given I get the error:

*** START OF ERROR MESSAGE ***
can't read "name": no such variable
can't read "name": no such variable
while executing
"set $x"
(procedure "readHeader" line 5)
invoked from within
"readHeader [read $fh 512]"
(procedure "tar::untar" line 24)
invoked from within
"tar::untar $f -file $path -dir $dir -chan"

For me, It looks like the untar procedure has a bug.
The "seek $f 0" command I added it while trying to make it work.
No success until now.
I think, while the read channel stays open, the untar procedure read until the end of the file, so the next untar command does not find the needed file.
But then, the "seek $f 0" should actually solve the problem.
But it doesn't.

Here is the untar procedure, maybe some trained eyes can see the issue better than me.

proc ::tar::untar {tar args} {
set nooverwrite 0
set data 0
set nomtime 0
set noperms 0
set chan 0
parseOpts {dir 1 file 1 glob 1 nooverwrite 0 nomtime 0 noperms 0 chan 0} $args
if {![info exists dir]} {set dir [pwd]}
set pattern *
if {[info exists file]} {
set pattern [string map {* \\* ? \\? \\ \\\\ \[ \\\[ \] \\\]} $file]
} elseif {[info exists glob]} {
set pattern $glob
}

set ret {}
if {$chan} {
set fh $tar
} else {
set fh [::open $tar]
fconfigure $fh -encoding binary -translation lf -eofchar {}
}
while {![eof $fh]} {
array set header [readHeader [read $fh 512]]
HandleLongLink $fh header
if {$header(name) == ""} break
if {$header(prefix) != ""} {append header(prefix) /}
set name [string trimleft $header(prefix)$header(name) /]
if {![string match $pattern $name] || ($nooverwrite && [file exists $name])} {
seekorskip $fh [expr {$header(size) + [pad $header(size)]}] current
continue
}

if {$dir!=""} {
if {[::tar::isabsolute $name]} {
set name [file join $dir [file tail $name]]
} else {
set name [file join $dir $name]
}
}
if {![file isdirectory [file dirname $name]]} {
file mkdir [file dirname $name]
lappend ret [file dirname $name] {}
}
if {[string match {[0346]} $header(type)]} {
if {[catch {::open $name w+} new]} {
# sometimes if we dont have write permission we can still delete
catch {file delete -force $name}
set new [::open $name w+]
}
fconfigure $new -encoding binary -translation lf -eofchar {}
fcopy $fh $new -size $header(size)
close $new
lappend ret $name $header(size)
} elseif {$header(type) == 5} {
file mkdir $name
lappend ret $name {}
} elseif {[string match {[12]} $header(type)] && $::tcl_platform(platform) == "unix"} {
catch {file delete $name}
if {![catch {file link [string map {1 -hard 2 -symbolic} $header(type)] $name $header(linkname)}]} {
lappend ret $name {}
}
}
seekorskip $fh [pad $header(size)] current
if {![file exists $name]} continue

if {$::tcl_platform(platform) == "unix"} {
if {!$noperms} {
catch {file attributes $name -permissions 0[string range $header(mode) 2 end]}
}
catch {file attributes $name -owner $header(uid) -group $header(gid)}
catch {file attributes $name -owner $header(uname) -group $header(gname)}
}
if {!$nomtime} {
file mtime $name $header(mtime)
}
}
if {!$chan} {
close $fh
}
return $ret
}

Re: untar file by file in a loop

<tjrc4i$q1nu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9964&group=comp.lang.tcl#9964

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: untar file by file in a loop
Date: Tue, 1 Nov 2022 14:57:22 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <tjrc4i$q1nu$1@dont-email.me>
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>
Injection-Date: Tue, 1 Nov 2022 14:57:22 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="976f85e8b9aa8267b18aa2f750b2de52";
logging-data="853758"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18U1O1T3W8VSovT9ipFJ0//"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:xQoQKzyUt7AGP+eEC93TIpbH7f0=
 by: Rich - Tue, 1 Nov 2022 14:57 UTC

Alexandru <alexandru.dadalau@meshparts.de> wrote:
> I have a procedure that unpacks files given by a list of file paths from an archive like this:
>
> proc ::meshparts::AssemblyArchiveUnpack {zipfile {paths {}} {targetpaths {}}} {

Confustion above for yourself in the future. A zip file is not a tar
file, and a tar file is not a zip file (zip and tar are two very
different formats). Having the variable of the name be 'zipfile'
implies a "zip" not a "tar" at first glance.

> set f [open $zipfile rb]
> fconfigure $f -encoding binary -translation lf -eofchar {}
> zlib push gunzip $f
> if {[llength $paths]==0} {
> set result [tar::untar $f -chan]
> } else {
> foreach path $paths targetpath $targetpaths {
> set dir [file dirname $targetpath]
> set code [catch {file mkdir $dir} err]
> if {$code} {
> ::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
> continue
> }
> set result [tar::untar $f -file $path -dir $dir -chan]
> seek $f 0
> }
> }
> close $f
> return 1
> }

If your tar file is indeed gzipped, implied by this:
> zlib push gunzip $f
then simply doing this:
> seek $f 0
will not work, because just seeking to the beginning does not reset the
gunzip state to the same as it was at initial file opening. Which is
most likely why things are failing for you.

Try closing and reopening the file inside the loop. If that works,
then this was the cause.

> For me, It looks like the untar procedure has a bug.

Looks to me like you are creating the problem by trying to seek around
inside gzipped data. You also have to be able to reset the gunzip
uncompress state to the identical state it was in for the file offset to
make that work.

If you can't formulate a glob pattern for the set of files you want to
extract, then you'll have to do one of four things:

1) unpack the entire tar file into a temporary location, then move out
the files of interest and delete the unwanted files

2) close and reopen the file inside the loop around tar::untar. But you
are still left with scanning all of the preceeding tar data up to the
file of interest, which means you are quite close to an O(N^2)
complexity here

3) Create your own 'untar' by making calls into the tar module
internals to read file headers, decide if the header is for a file of
interest, and extract the file if so. This, however, does mean you are
calling procs that are not documented as part of the visible api to the
tar module, so should the internals change, your code would break until
you adapted. This method, however, does give you the most efficient
extract, because only a single pass over the tar file is needed.

4) Extend the tar module's untar proc to take an additional parameter
that is a list of filenames to match tar entries against and extract
each when found, and consider contributing the changes back to Tcllib.
This has the identical benefits of #3, with the added benefit that if
accepted, your change becomes part of the documented API so less likely
to change "out from under you" in the future.

Re: untar file by file in a loop

<9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9967&group=comp.lang.tcl#9967

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:6214:19cb:b0:4bb:6a28:83bc with SMTP id j11-20020a05621419cb00b004bb6a2883bcmr16439622qvc.102.1667316957402;
Tue, 01 Nov 2022 08:35:57 -0700 (PDT)
X-Received: by 2002:a05:620a:2698:b0:6f5:7127:76a4 with SMTP id
c24-20020a05620a269800b006f5712776a4mr12979052qkp.216.1667316957169; Tue, 01
Nov 2022 08:35:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Tue, 1 Nov 2022 08:35:56 -0700 (PDT)
In-Reply-To: <tjrc4i$q1nu$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=82.207.255.142; posting-account=glPZ8goAAADztwA3kVEZPMKXCGydx5DU
NNTP-Posting-Host: 82.207.255.142
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com> <tjrc4i$q1nu$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
Subject: Re: untar file by file in a loop
From: alexandr...@meshparts.de (Alexandru)
Injection-Date: Tue, 01 Nov 2022 15:35:57 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 5026
 by: Alexandru - Tue, 1 Nov 2022 15:35 UTC

Rich schrieb am Dienstag, 1. November 2022 um 15:57:26 UTC+1:
> Alexandru <alexandr...@meshparts.de> wrote:
> > I have a procedure that unpacks files given by a list of file paths from an archive like this:
> >
> > proc ::meshparts::AssemblyArchiveUnpack {zipfile {paths {}} {targetpaths {}}} {
> Confustion above for yourself in the future. A zip file is not a tar
> file, and a tar file is not a zip file (zip and tar are two very
> different formats). Having the variable of the name be 'zipfile'
> implies a "zip" not a "tar" at first glance.
> > set f [open $zipfile rb]
> > fconfigure $f -encoding binary -translation lf -eofchar {}
> > zlib push gunzip $f
> > if {[llength $paths]==0} {
> > set result [tar::untar $f -chan]
> > } else {
> > foreach path $paths targetpath $targetpaths {
> > set dir [file dirname $targetpath]
> > set code [catch {file mkdir $dir} err]
> > if {$code} {
> > ::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
> > continue
> > }
> > set result [tar::untar $f -file $path -dir $dir -chan]
> > seek $f 0
> > }
> > }
> > close $f
> > return 1
> > }
> If your tar file is indeed gzipped, implied by this:
> > zlib push gunzip $f
> then simply doing this:
> > seek $f 0
> will not work, because just seeking to the beginning does not reset the
> gunzip state to the same as it was at initial file opening. Which is
> most likely why things are failing for you.
>
> Try closing and reopening the file inside the loop. If that works,
> then this was the cause.
> > For me, It looks like the untar procedure has a bug.
> Looks to me like you are creating the problem by trying to seek around
> inside gzipped data. You also have to be able to reset the gunzip
> uncompress state to the identical state it was in for the file offset to
> make that work.
>
> If you can't formulate a glob pattern for the set of files you want to
> extract, then you'll have to do one of four things:
>
> 1) unpack the entire tar file into a temporary location, then move out
> the files of interest and delete the unwanted files
>
> 2) close and reopen the file inside the loop around tar::untar. But you
> are still left with scanning all of the preceeding tar data up to the
> file of interest, which means you are quite close to an O(N^2)
> complexity here
>
> 3) Create your own 'untar' by making calls into the tar module
> internals to read file headers, decide if the header is for a file of
> interest, and extract the file if so. This, however, does mean you are
> calling procs that are not documented as part of the visible api to the
> tar module, so should the internals change, your code would break until
> you adapted. This method, however, does give you the most efficient
> extract, because only a single pass over the tar file is needed.
>
> 4) Extend the tar module's untar proc to take an additional parameter
> that is a list of filenames to match tar entries against and extract
> each when found, and consider contributing the changes back to Tcllib.
> This has the identical benefits of #3, with the added benefit that if
> accepted, your change becomes part of the documented API so less likely
> to change "out from under you" in the future.

Thanks Rich,

I must admit, I still don't understand, how "read" can work on the channel but "seek" not.
I'll just follow your advice and see if I can add a -files option to the "untar" procedure and propose a change on github (your option 4).

Option 2 is of course a "no go". I can already see now the time needed to open the archive and finding one file is huge. Doing this for multiple files would be a party braker.

BTW: I know tar and zip are different formats. I have this habbit of calling all types of archives a zip file.

Regards
Alexandru

Re: untar file by file in a loop

<nnd$18c47394$39fd2a48@050b5a371f696864>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9968&group=comp.lang.tcl#9968

  copy link   Newsgroups: comp.lang.tcl
Date: Tue, 1 Nov 2022 16:59:09 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.0.2
Subject: Re: untar file by file in a loop
Newsgroups: comp.lang.tcl
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>
<tjrc4i$q1nu$1@dont-email.me>
<9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
Content-Language: nl-NL, en-US
From: nos...@wanadoo.nl (Schelte)
In-Reply-To: <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <nnd$18c47394$39fd2a48@050b5a371f696864>
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!feed.abavia.com!abe006.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 13
Injection-Date: Tue, 01 Nov 2022 16:59:09 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: Schelte - Tue, 1 Nov 2022 15:59 UTC

On 01/11/2022 16:35, Alexandru wrote:
> Option 2 is of course a "no go".
Instead of closing/reopening, you can also pop the gunzip channel
transformation, seek to the beginning, and then push the transformation
again. But I doubt that will make a big difference in performance.
Parsing the file multiple times is what makes it slow. Closing/opening
the file is probably negligible in comparison.

Schelte.

Re: untar file by file in a loop

<dDidnR4pLMFI0fz-nZ2dnZfqn_adnZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9970&group=comp.lang.tcl#9970

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 01 Nov 2022 16:44:05 +0000
MIME-Version: 1.0
From: hel...@deepsoft.com (Robert Heller)
Organization: Deepwoods Software
X-Newsreader: TkNews 3.0 (1.2.15)
Subject: Re: untar file by file in a loop
In-Reply-To: <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>
<tjrc4i$q1nu$1@dont-email.me>
<9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
Newsgroups: comp.lang.tcl
Content-Type: text/plain;
charset="us-ascii"
Originator: heller@sharky4.deepsoft.com
Message-ID: <dDidnR4pLMFI0fz-nZ2dnZfqn_adnZ2d@giganews.com>
Date: Tue, 01 Nov 2022 16:44:05 +0000
Lines: 122
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-e4IM0QTRa2QHVf8bM2uXACewr5o/u/U1C8rF32gMyZ8OJT/e2Rk/ASnZ+Ft5QDxqMiEPM7plEqPE1Nz!GzsVeyd1wAjXGjvl/pWbOZ3Sd3HqYz42kg053848x+AZoZ2nA1RzMQPHrtrffqI/81WOUDOEpK0I!5sQ=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Robert Heller - Tue, 1 Nov 2022 16:44 UTC

At Tue, 1 Nov 2022 08:35:56 -0700 (PDT) Alexandru <alexandru.dadalau@meshparts.de> wrote:

>
> Rich schrieb am Dienstag, 1. November 2022 um 15:57:26 UTC+1:
> > Alexandru <alexandr...@meshparts.de> wrote:
> > > I have a procedure that unpacks files given by a list of file paths from an archive like this:
> > >
> > > proc ::meshparts::AssemblyArchiveUnpack {zipfile {paths {}} {targetpaths {}}} {
> > Confustion above for yourself in the future. A zip file is not a tar
> > file, and a tar file is not a zip file (zip and tar are two very
> > different formats). Having the variable of the name be 'zipfile'
> > implies a "zip" not a "tar" at first glance.
> > > set f [open $zipfile rb]
> > > fconfigure $f -encoding binary -translation lf -eofchar {}
> > > zlib push gunzip $f
> > > if {[llength $paths]==0} {
> > > set result [tar::untar $f -chan]
> > > } else {
> > > foreach path $paths targetpath $targetpaths {
> > > set dir [file dirname $targetpath]
> > > set code [catch {file mkdir $dir} err]
> > > if {$code} {
> > > ::meshparts::message "*** [mc {%1$s} $err]" -errorlog 0
> > > continue
> > > }
> > > set result [tar::untar $f -file $path -dir $dir -chan]
> > > seek $f 0
> > > }
> > > }
> > > close $f
> > > return 1
> > > }
> > If your tar file is indeed gzipped, implied by this:
> > > zlib push gunzip $f
> > then simply doing this:
> > > seek $f 0
> > will not work, because just seeking to the beginning does not reset the
> > gunzip state to the same as it was at initial file opening. Which is
> > most likely why things are failing for you.
> >
> > Try closing and reopening the file inside the loop. If that works,
> > then this was the cause.
> > > For me, It looks like the untar procedure has a bug.
> > Looks to me like you are creating the problem by trying to seek around
> > inside gzipped data. You also have to be able to reset the gunzip
> > uncompress state to the identical state it was in for the file offset to
> > make that work.
> >
> > If you can't formulate a glob pattern for the set of files you want to
> > extract, then you'll have to do one of four things:
> >
> > 1) unpack the entire tar file into a temporary location, then move out
> > the files of interest and delete the unwanted files
> >
> > 2) close and reopen the file inside the loop around tar::untar. But you
> > are still left with scanning all of the preceeding tar data up to the
> > file of interest, which means you are quite close to an O(N^2)
> > complexity here
> >
> > 3) Create your own 'untar' by making calls into the tar module
> > internals to read file headers, decide if the header is for a file of
> > interest, and extract the file if so. This, however, does mean you are
> > calling procs that are not documented as part of the visible api to the
> > tar module, so should the internals change, your code would break until
> > you adapted. This method, however, does give you the most efficient
> > extract, because only a single pass over the tar file is needed.
> >
> > 4) Extend the tar module's untar proc to take an additional parameter
> > that is a list of filenames to match tar entries against and extract
> > each when found, and consider contributing the changes back to Tcllib.
> > This has the identical benefits of #3, with the added benefit that if
> > accepted, your change becomes part of the documented API so less likely
> > to change "out from under you" in the future.
>
> Thanks Rich,
>
> I must admit, I still don't understand, how "read" can work on the channel but "seek" not.
> I'll just follow your advice and see if I can add a -files option to the "untar" procedure and propose a change on github (your option 4).
>

When you "read" a compressed tar file, you are not actually reading the tar
file itself, but the output of a pipeline from gunzip (or something like
gunzip). You can't seek on a pipeline -- I don't know if this is an actual pipe
device or a 'faked' pipe using VFS hackery and it does not matter which, the
effect is the same.

> Option 2 is of course a "no go". I can already see now the time needed to open the archive and finding one file is huge. Doing this for multiple files would be a party braker.
>
> BTW: I know tar and zip are different formats. I have this habbit of calling all types of archives a zip file.

This confusing tar and zip is probably what is getting you into lots of
trouble, esp. if you are confusing a gziped tar file.

Some important things to understand about tar and zip files:

Tar was originally designed for *tapes* (yes, those reels of plastic film
coated with Iron Oxide). Nobody uses tapes anymore. Tar files don't have
compressed elements, the whole tar file get compressed as a single blob. Tar
files are meant to be read and written sequentially and not randomly accessed.

*Zip* files contain an *uncompress* table of contents, and each member element
is separately compressed (or not). Zip files were specificly designed to be
randomly accessed -- one can seek to the end and read the TOC and then seek to
specific files in the Zip archive and extract (and uncompress) them, in any
order you like.

>
> Regards
> Alexandru
>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
heller@deepsoft.com -- Webhosting Services

Re: untar file by file in a loop

<tjrii6$qibk$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9971&group=comp.lang.tcl#9971

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: untar file by file in a loop
Date: Tue, 1 Nov 2022 16:47:02 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <tjrii6$qibk$1@dont-email.me>
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com> <tjrc4i$q1nu$1@dont-email.me> <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
Injection-Date: Tue, 1 Nov 2022 16:47:02 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="976f85e8b9aa8267b18aa2f750b2de52";
logging-data="870772"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/CVOT191eb0VaiVMyuHuUV"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:v4HlE+a+TFuV3WrEai9gVbLZ8Mo=
 by: Rich - Tue, 1 Nov 2022 16:47 UTC

Alexandru <alexandru.dadalau@meshparts.de> wrote:
> Thanks Rich,
>
> I must admit, I still don't understand, how "read" can work on the
> channel but "seek" not.

The seek works. You move the file pointer back and start reading from
a different offset.

But, your file is a gzip file. The gzip compressed format needs to be
read from the front, because to unpack byte X, you need the gzip
compression state that was created by unpacking bytes 0 through X-1.

If you are at offset Y, you have the gzip compression state created
from 0 through Y-1. If you now seek to X, you'll get the wrong result
from trying to decompress X using the gzip state of 0 through Y-1.

> I'll just follow your advice and see if I can add a -files option to
> the "untar" procedure and propose a change on github (your option 4).
>
> Option 2 is of course a "no go". I can already see now the time
> needed to open the archive and finding one file is huge. Doing this
> for multiple files would be a party braker.

Tar is not zip. The expanded acrynym gives a clue (T)ape (Ar)chive.
It was created (originally) to package files onto magnetic tape. As
tape does not have "random seek ability" tar contains no features to
allow random access within the tar file. You have to either read it
from the start in a linear manner, or pre-index once up front (by
reading it from from to back in a linear manner) and then use your
index to randomly grab files out.

Zip files include index data as part of the format, so one can directly
access a single file in a zip without having to read the whole file
from the front in order to do so.

> BTW: I know tar and zip are different formats. I have this habbit of
> calling all types of archives a zip file.

Which is fine, but it confuses others who call a tar file a tar file
and a zip file a zip file because they are two very different file
formats.

Re: untar file by file in a loop

<tjril0$qibk$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9972&group=comp.lang.tcl#9972

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: untar file by file in a loop
Date: Tue, 1 Nov 2022 16:48:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <tjril0$qibk$2@dont-email.me>
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com> <tjrc4i$q1nu$1@dont-email.me> <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com> <nnd$18c47394$39fd2a48@050b5a371f696864>
Injection-Date: Tue, 1 Nov 2022 16:48:32 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="976f85e8b9aa8267b18aa2f750b2de52";
logging-data="870772"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19M8xIMRTVenUKOEUSGGANt"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:S9q9zuPg+E92gTRaHszpPeWgMAU=
 by: Rich - Tue, 1 Nov 2022 16:48 UTC

Schelte <nospam@wanadoo.nl> wrote:
> On 01/11/2022 16:35, Alexandru wrote:
>> Option 2 is of course a "no go".
> Instead of closing/reopening, you can also pop the gunzip channel
> transformation, seek to the beginning, and then push the transformation
> again.

Ah, that would reset the gzip state as well. I forgot about that
option.

> But I doubt that will make a big difference in performance. Parsing
> the file multiple times is what makes it slow. Closing/opening the
> file is probably negligible in comparison.

Agreed.

Re: untar file by file in a loop

<ba4ddbe9-3be6-4a40-bb12-6e367183e328n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=9976&group=comp.lang.tcl#9976

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:620a:2002:b0:6fa:19cd:81f6 with SMTP id c2-20020a05620a200200b006fa19cd81f6mr12410515qka.691.1667328253953;
Tue, 01 Nov 2022 11:44:13 -0700 (PDT)
X-Received: by 2002:a05:622a:58b:b0:39c:fab9:3045 with SMTP id
c11-20020a05622a058b00b0039cfab93045mr16262142qtb.26.1667328253783; Tue, 01
Nov 2022 11:44:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Tue, 1 Nov 2022 11:44:13 -0700 (PDT)
In-Reply-To: <tjril0$qibk$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:9e8:8916:da00:a9d6:63e:791e:4774;
posting-account=glPZ8goAAADztwA3kVEZPMKXCGydx5DU
NNTP-Posting-Host: 2001:9e8:8916:da00:a9d6:63e:791e:4774
References: <0a08e82f-0332-4340-8101-0d227046670bn@googlegroups.com>
<tjrc4i$q1nu$1@dont-email.me> <9d2d2ff1-af7f-47d7-90f6-87df2e9d56c5n@googlegroups.com>
<nnd$18c47394$39fd2a48@050b5a371f696864> <tjril0$qibk$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba4ddbe9-3be6-4a40-bb12-6e367183e328n@googlegroups.com>
Subject: Re: untar file by file in a loop
From: alexandr...@meshparts.de (Alexandru)
Injection-Date: Tue, 01 Nov 2022 18:44:13 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2135
 by: Alexandru - Tue, 1 Nov 2022 18:44 UTC

Rich schrieb am Dienstag, 1. November 2022 um 17:48:36 UTC+1:
> Schelte <nos...@wanadoo.nl> wrote:
> > On 01/11/2022 16:35, Alexandru wrote:
> >> Option 2 is of course a "no go".
> > Instead of closing/reopening, you can also pop the gunzip channel
> > transformation, seek to the beginning, and then push the transformation
> > again.
> Ah, that would reset the gzip state as well. I forgot about that
> option.
> > But I doubt that will make a big difference in performance. Parsing
> > the file multiple times is what makes it slow. Closing/opening the
> > file is probably negligible in comparison.
> Agreed.

Thanks all for the help.
I added the -files and -dirs options to the untar procedure and commited the changes:
https://github.com/Meshparts/tcllib/blob/master/modules/tar/tar.tcl

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor