Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

There can be no twisted thought without a twisted molecule. -- R. W. Gerard


devel / comp.lang.tcl / Can Tcl scan faster than find?

SubjectAuthor
* Can Tcl scan faster than find?Luc
+* Can Tcl scan faster than find?Ralf Fassel
|`* Can Tcl scan faster than find?Rich
| `* Can Tcl scan faster than find?Luc
|  +- Can Tcl scan faster than find?Rich
|  `- Can Tcl scan faster than find?Robert Heller
+* Can Tcl scan faster than find?briang
|`* Can Tcl scan faster than find?Luc
| +- Can Tcl scan faster than find?briang
| `* Can Tcl scan faster than find?Rich
|  `- Can Tcl scan faster than find?Robert Heller
`- Can Tcl scan faster than find?blacksqr

1
Can Tcl scan faster than find?

<20221208190118.52a4f004@lud1.home>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10313&group=comp.lang.tcl#10313

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!HR0kHkNobvWU+jU8yJku/w.user.46.165.242.75.POSTED!not-for-mail
From: no...@no.no (Luc)
Newsgroups: comp.lang.tcl
Subject: Can Tcl scan faster than find?
Date: Thu, 8 Dec 2022 19:01:18 -0300
Organization: Aioe.org NNTP Server
Message-ID: <20221208190118.52a4f004@lud1.home>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="38330"; posting-host="HR0kHkNobvWU+jU8yJku/w.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Newsreader: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)
 by: Luc - Thu, 8 Dec 2022 22:01 UTC

I have this application that is divided between a shell script
and a Tcl script.

The shell script uses `find' to scan the entire hard disk and output
the full path of every single file into a catalog file. It has to be
run from time to time to update the catalog.

The Tcl script has a very quick'n'dirty GUI that accepts a string
as input, finds matches in the catalog and shows all the matches,
with the matched string highlighted.

It's a very old application of mine that I want to improve.

The first version of it did everything in one Tcl script, but
I remember when I replaced the Tcl proc with a shell script to scan
the hard disk because `find' was a lot faster than my Tcl code.

Of course, maybe my code was bad, but it was just a matter of going
into every directory found and globbing it. There wasn't a lot of
opportunity for screwing up.

Anyway, my question is, do you think it's possible to write Tcl code
that can rival `find' in speed?

--
Luc
>>

Re: Can Tcl scan faster than find?

<yga7cz0q236.fsf@akutech.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10314&group=comp.lang.tcl#10314

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralf...@gmx.de (Ralf Fassel)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Fri, 09 Dec 2022 11:54:37 +0100
Lines: 26
Message-ID: <yga7cz0q236.fsf@akutech.de>
References: <20221208190118.52a4f004@lud1.home>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net KujpZb8jYKZ4I8ocIAUw6gj9gvd1ek74XaJqPHBufAmIdhFSI=
Cancel-Lock: sha1:LG7pPGec6gCP+I+8fdEFZkWCUlE= sha1:vSNIxY7ddax5YS9p6zIA4nlTye0=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Fri, 9 Dec 2022 10:54 UTC

* Luc <no@no.no>
| Anyway, my question is, do you think it's possible to write Tcl code
| that can rival `find' in speed?

There is the fileutil package in tcllib:

https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tcllib/files/modules/fileutil/fileutil.md

which contains

::fileutil::find ?basedir ?filtercmd??

An implementation of the unix command find. Adapted from the Tcler's
Wiki. Takes at most two arguments, the path to the directory to start
searching from and a command to use to evaluate interest in each
file. [...]

Maybe give it a try? Note that the command returns only after all files
have been found, so for a 'live' application you would start it in a
separate thread and communicate the files via the filtercmd to the main
thread (or play around with 'update' in the filtercmd).

Somehow I doubt that a script based solution will be faster than one
in C (though the disk IO should be the limiting factor here).

R'

Re: Can Tcl scan faster than find?

<tmvmad$16vil$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10316&group=comp.lang.tcl#10316

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Fri, 9 Dec 2022 16:04:29 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <tmvmad$16vil$1@dont-email.me>
References: <20221208190118.52a4f004@lud1.home> <yga7cz0q236.fsf@akutech.de>
Injection-Date: Fri, 9 Dec 2022 16:04:29 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="2bbdaa9c66aa2e5ca1a6d61427e077e3";
logging-data="1277525"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18sefzDoIaal4Xg0b80D1wV"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:WzAh4/eTG8cLXsBJYgYr6oj8WOw=
 by: Rich - Fri, 9 Dec 2022 16:04 UTC

Ralf Fassel <ralfixx@gmx.de> wrote:
> * Luc <no@no.no>
> | Anyway, my question is, do you think it's possible to write Tcl code
> | that can rival `find' in speed?
>
> Somehow I doubt that a script based solution will be faster than one
> in C (though the disk IO should be the limiting factor here).

Agreed. I also doubt a TCL variant will be faster than the
/usr/bin/find utility for identical scans.

And disk IO, esp. if using mechanical disks where seek times dominate
for "scan a directory hierarchy" runs, is going to be the ultimate
limiting factor. This fact will likely be what would make it appear
that a TCL and a /usr/bin/find scan were close in time -- both spent a
majority (as in 98%+) of their runtime waiting for disk head seeks to
complete.

Running on an SSD would remove the seek time overhead, and likely
result in /usr/bin/find surpassing a TCL solution by a substantial
margin.

Re: Can Tcl scan faster than find?

<20221209173652.1dbb11c5@lud1.home>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10317&group=comp.lang.tcl#10317

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!HR0kHkNobvWU+jU8yJku/w.user.46.165.242.75.POSTED!not-for-mail
From: no...@no.no (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Fri, 9 Dec 2022 17:36:52 -0300
Organization: Aioe.org NNTP Server
Message-ID: <20221209173652.1dbb11c5@lud1.home>
References: <20221208190118.52a4f004@lud1.home>
<yga7cz0q236.fsf@akutech.de>
<tmvmad$16vil$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="4799"; posting-host="HR0kHkNobvWU+jU8yJku/w.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Newsreader: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)
 by: Luc - Fri, 9 Dec 2022 20:36 UTC

On Fri, 9 Dec 2022 16:04:29 -0000 (UTC), Rich wrote:

> Ralf Fassel <ralfixx@gmx.de> wrote:

> And disk IO, esp. if using mechanical disks where seek times dominate
> for "scan a directory hierarchy" runs, is going to be the ultimate
> limiting factor. This fact will likely be what would make it appear
> that a TCL and a /usr/bin/find scan were close in time -- both spent a
> majority (as in 98%+) of their runtime waiting for disk head seeks to
> complete.
>
> Running on an SSD would remove the seek time overhead, and likely
> result in /usr/bin/find surpassing a TCL solution by a substantial
> margin.

The disk I/O bottleneck is not very relevant because I am not as concerned
with how long it's going to take as I am with how much LONGER than `find'
it's going to take.

I intend to release the end product as an application so it's not just for
me, and people are expected to understand that scanning the entire HD is
going to take some time. The core of the issue here is whether it's still
worth trying to do everything in Tcl or I should just accept the facts of
life and do some [exec find] thing.

I'm also considering the option of collecting additional data on every
file such as size, date and permissions, up to the user. For that I would
feel a lot more comfortable using pure Tcl. The current code has none of
that but it occurs to me that some people may want it.

So yeah, I guess I have to run some tests on that ::fileutil:: command
and see how well it performs against my Tcl code and `find'.

Thank you all.

--
Luc
>>

Re: Can Tcl scan faster than find?

<tn07cp$1am3a$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10319&group=comp.lang.tcl#10319

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Fri, 9 Dec 2022 20:55:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <tn07cp$1am3a$1@dont-email.me>
References: <20221208190118.52a4f004@lud1.home> <yga7cz0q236.fsf@akutech.de> <tmvmad$16vil$1@dont-email.me> <20221209173652.1dbb11c5@lud1.home>
Injection-Date: Fri, 9 Dec 2022 20:55:53 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="2bbdaa9c66aa2e5ca1a6d61427e077e3";
logging-data="1398890"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19gu93MaDK+RYpXjkyLLR9K"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:yFIOdPNJ9CaBB9ON9FDzPNSCDH8=
 by: Rich - Fri, 9 Dec 2022 20:55 UTC

Luc <no@no.no> wrote:
> On Fri, 9 Dec 2022 16:04:29 -0000 (UTC), Rich wrote:
>
>> Ralf Fassel <ralfixx@gmx.de> wrote:
>
>> And disk IO, esp. if using mechanical disks where seek times dominate
>> for "scan a directory hierarchy" runs, is going to be the ultimate
>> limiting factor. This fact will likely be what would make it appear
>> that a TCL and a /usr/bin/find scan were close in time -- both spent a
>> majority (as in 98%+) of their runtime waiting for disk head seeks to
>> complete.
>>
>> Running on an SSD would remove the seek time overhead, and likely
>> result in /usr/bin/find surpassing a TCL solution by a substantial
>> margin.
>
>
> The disk I/O bottleneck is not very relevant because I am not as concerned
> with how long it's going to take as I am with how much LONGER than `find'
> it's going to take.

If you want to quantify "how much longer" then your only option may be
to run tests. About all any of us can say without actually testing is
"TCL is likely to be slower".

> I intend to release the end product as an application so it's not just for
> me, and people are expected to understand that scanning the entire HD is
> going to take some time. The core of the issue here is whether it's still
> worth trying to do everything in Tcl or I should just accept the facts of
> life and do some [exec find] thing.

Do you plan to make the end product be cross platform (i.e., run on
Linux, Windows, and Mac)? If yes, then you'd want to write it all in
Tcl, even if slower, because there is no equivalent to 'find' on win
(at least not in the default MS install) and while there is one on Mac,
the BSD vs. GNU differences might make for the need for two different
process loops.

> I'm also considering the option of collecting additional data on every
> file such as size, date and permissions, up to the user. For that I would
> feel a lot more comfortable using pure Tcl. The current code has none of
> that but it occurs to me that some people may want it.

GNU find has the ability to output much of this with it's "-print"
option, which might make find even faster than TCL -- but then you /do/
still have to parse the output in TCL, possibly negating the
difference. But that option to find may not exist on Mac, and there is
no 'find' on windows by default.

Re: Can Tcl scan faster than find?

<cOWdnW2thflNMA7-nZ2dnZfqnPednZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10320&group=comp.lang.tcl#10320

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Fri, 09 Dec 2022 21:51:12 +0000
MIME-Version: 1.0
From: hel...@deepsoft.com (Robert Heller)
Organization: Deepwoods Software
X-Newsreader: TkNews 3.0 (1.2.15)
Subject: Re: Can Tcl scan faster than find?
In-Reply-To: <20221209173652.1dbb11c5@lud1.home>
References: <20221208190118.52a4f004@lud1.home>??<yga7cz0q236.fsf@akutech.de>??<tmvmad$16vil$1@dont-email.me>
<20221209173652.1dbb11c5@lud1.home>
Newsgroups: comp.lang.tcl
Content-Type: text/plain;
charset="us-ascii"
Originator: heller@sharky4.deepsoft.com
Message-ID: <cOWdnW2thflNMA7-nZ2dnZfqnPednZ2d@giganews.com>
Date: Fri, 09 Dec 2022 21:51:12 +0000
Lines: 66
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-Iap1NYiSDA+ztfUV2iLjqPAJJlAGpcZHDAlQ3l35qh/xX2VKDqhopC25Pr21yQmkr0K2wM2EfyyAoPG!zc2RqCp/iphZwKPPaq2m+UDsGhGNWYY4GOkTzyl1zQ8B4E4KYPg2S5Aawo+oLcyNaiYZwhV++6KH!Yp0=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Robert Heller - Fri, 9 Dec 2022 21:51 UTC

At Fri, 9 Dec 2022 17:36:52 -0300 Luc <no@no.no> wrote:

>
> On Fri, 9 Dec 2022 16:04:29 -0000 (UTC), Rich wrote:
>
> > Ralf Fassel <ralfixx@gmx.de> wrote:
>
> > And disk IO, esp. if using mechanical disks where seek times dominate
> > for "scan a directory hierarchy" runs, is going to be the ultimate
> > limiting factor. This fact will likely be what would make it appear
> > that a TCL and a /usr/bin/find scan were close in time -- both spent a
> > majority (as in 98%+) of their runtime waiting for disk head seeks to
> > complete.
> >
> > Running on an SSD would remove the seek time overhead, and likely
> > result in /usr/bin/find surpassing a TCL solution by a substantial
> > margin.
>
>
> The disk I/O bottleneck is not very relevant because I am not as concerned
> with how long it's going to take as I am with how much LONGER than `find'
> it's going to take.
>
> I intend to release the end product as an application so it's not just for
> me, and people are expected to understand that scanning the entire HD is
> going to take some time. The core of the issue here is whether it's still
> worth trying to do everything in Tcl or I should just accept the facts of
> life and do some [exec find] thing.

More likely:

set fp [open "|find ..." r];# replace '...' with find's params and opts

fileevent $fp readable [list processfile $fp]

## called as
proc processfile {fp} {
if {[gets $fp pathname] >= 0} {
# process pathname (eg using "file <command> $pathname ..." as desired)
} else {
catch {close $fp}
exit; # or whatever
}
}

vwait forever;# don't forget this at the end (if Tk is not in play).

>
> I'm also considering the option of collecting additional data on every
> file such as size, date and permissions, up to the user. For that I would
> feel a lot more comfortable using pure Tcl. The current code has none of
> that but it occurs to me that some people may want it.
>
> So yeah, I guess I have to run some tests on that ::fileutil:: command
> and see how well it performs against my Tcl code and `find'.
>
> Thank you all.
>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
heller@deepsoft.com -- Webhosting Services

Re: Can Tcl scan faster than find?

<32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10325&group=comp.lang.tcl#10325

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a37:88c7:0:b0:6ec:537f:3d94 with SMTP id k190-20020a3788c7000000b006ec537f3d94mr66334290qkd.376.1670713555213;
Sat, 10 Dec 2022 15:05:55 -0800 (PST)
X-Received: by 2002:a37:54e:0:b0:6fc:c48b:8eab with SMTP id
75-20020a37054e000000b006fcc48b8eabmr17350210qkf.216.1670713555020; Sat, 10
Dec 2022 15:05:55 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Sat, 10 Dec 2022 15:05:54 -0800 (PST)
In-Reply-To: <20221208190118.52a4f004@lud1.home>
Injection-Info: google-groups.googlegroups.com; posting-host=192.183.219.24; posting-account=f4QznQoAAAAjupLEpV87s_G-96g1Io1w
NNTP-Posting-Host: 192.183.219.24
References: <20221208190118.52a4f004@lud1.home>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com>
Subject: Re: Can Tcl scan faster than find?
From: bgriffin...@gmail.com (briang)
Injection-Date: Sat, 10 Dec 2022 23:05:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3237
 by: briang - Sat, 10 Dec 2022 23:05 UTC

On Thursday, December 8, 2022 at 2:01:23 PM UTC-8, Luc wrote:
> I have this application that is divided between a shell script
> and a Tcl script.
>
> The shell script uses `find' to scan the entire hard disk and output
> the full path of every single file into a catalog file. It has to be
> run from time to time to update the catalog.
>
> The Tcl script has a very quick'n'dirty GUI that accepts a string
> as input, finds matches in the catalog and shows all the matches,
> with the matched string highlighted.
>
> It's a very old application of mine that I want to improve.
>
> The first version of it did everything in one Tcl script, but
> I remember when I replaced the Tcl proc with a shell script to scan
> the hard disk because `find' was a lot faster than my Tcl code.
>
> Of course, maybe my code was bad, but it was just a matter of going
> into every directory found and globbing it. There wasn't a lot of
> opportunity for screwing up.
>
> Anyway, my question is, do you think it's possible to write Tcl code
> that can rival `find' in speed?
>
> --
> Luc
> >>

I doubt you'll be able to best the speed of find. I have written a utility in Tcl that scans the entire hard drive. I used a threaded model to try and take advantage of I/O latency, since it also gathers file size info. My assumption is that the OS will optimize its operations and suspend the thread(s) until the data is ready. I have not timed it or compared to "find", but it is able to scan ~0.5TB fast enough for me. It's not quick, nor does it take "forever." It also runs on all platforms.

It scans the starting dir for files and subdirectories, and farms the subdirectories out to another thread from a pool. The thread jobs get queued as worker threads become available. This is done recursively. The results in the worker thread are queued back to the main thread via a non-blocking callback command, making the worker thread quickly available for the next job.

-Brian

Re: Can Tcl scan faster than find?

<20221210203002.169a931c@lud1.home>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10326&group=comp.lang.tcl#10326

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!aioe.org!HR0kHkNobvWU+jU8yJku/w.user.46.165.242.75.POSTED!not-for-mail
From: no...@no.no (Luc)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Sat, 10 Dec 2022 20:30:02 -0300
Organization: Aioe.org NNTP Server
Message-ID: <20221210203002.169a931c@lud1.home>
References: <20221208190118.52a4f004@lud1.home>
<32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="55889"; posting-host="HR0kHkNobvWU+jU8yJku/w.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Newsreader: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)
 by: Luc - Sat, 10 Dec 2022 23:30 UTC

On Sat, 10 Dec 2022 15:05:54 -0800 (PST), briang wrote:

> I doubt you'll be able to best the speed of find. I have written a
> utility in Tcl that scans the entire hard drive. I used a threaded model
> to try and take advantage of I/O latency, since it also gathers file size
> info. My assumption is that the OS will optimize its operations and
> suspend the thread(s) until the data is ready. I have not timed it or
> compared to "find", but it is able to scan ~0.5TB fast enough for me.
> It's not quick, nor does it take "forever." It also runs on all
> platforms.
>
> It scans the starting dir for files and subdirectories, and farms the
> subdirectories out to another thread from a pool. The thread jobs get
> queued as worker threads become available. This is done recursively. The
> results in the worker thread are queued back to the main thread via a
> non-blocking callback command, making the worker thread quickly available
> for the next job.
>
> -Brian

Interesting, but I wonder how effective that concept of threads really is.
The CPU may support multiple threads, but does the hard disk?

--
Luc
>>

Re: Can Tcl scan faster than find?

<95a83ab1-de26-4d19-b0fe-275a5a9312efn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10327&group=comp.lang.tcl#10327

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:6214:a0b:b0:4c6:f93f:1cfa with SMTP id dw11-20020a0562140a0b00b004c6f93f1cfamr46910201qvb.49.1670716832789;
Sat, 10 Dec 2022 16:00:32 -0800 (PST)
X-Received: by 2002:ac8:720e:0:b0:3a8:2f8:683 with SMTP id a14-20020ac8720e000000b003a802f80683mr310115qtp.99.1670716832633;
Sat, 10 Dec 2022 16:00:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Sat, 10 Dec 2022 16:00:32 -0800 (PST)
In-Reply-To: <20221210203002.169a931c@lud1.home>
Injection-Info: google-groups.googlegroups.com; posting-host=192.183.219.24; posting-account=f4QznQoAAAAjupLEpV87s_G-96g1Io1w
NNTP-Posting-Host: 192.183.219.24
References: <20221208190118.52a4f004@lud1.home> <32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com>
<20221210203002.169a931c@lud1.home>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <95a83ab1-de26-4d19-b0fe-275a5a9312efn@googlegroups.com>
Subject: Re: Can Tcl scan faster than find?
From: bgriffin...@gmail.com (briang)
Injection-Date: Sun, 11 Dec 2022 00:00:32 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2480
 by: briang - Sun, 11 Dec 2022 00:00 UTC

On Saturday, December 10, 2022 at 3:30:08 PM UTC-8, Luc wrote:
> On Sat, 10 Dec 2022 15:05:54 -0800 (PST), briang wrote:
>
> > I doubt you'll be able to best the speed of find. I have written a
> > utility in Tcl that scans the entire hard drive. I used a threaded model
> > to try and take advantage of I/O latency, since it also gathers file size
> > info. My assumption is that the OS will optimize its operations and
> > suspend the thread(s) until the data is ready. I have not timed it or
> > compared to "find", but it is able to scan ~0.5TB fast enough for me.
> > It's not quick, nor does it take "forever." It also runs on all
> > platforms.
> >
> > It scans the starting dir for files and subdirectories, and farms the
> > subdirectories out to another thread from a pool. The thread jobs get
> > queued as worker threads become available. This is done recursively. The
> > results in the worker thread are queued back to the main thread via a
> > non-blocking callback command, making the worker thread quickly available
> > for the next job.
> >
> > -Brian
> Interesting, but I wonder how effective that concept of threads really is.
> The CPU may support multiple threads, but does the hard disk?
>
> --
> Luc
> >>
Yes, they do.

-Brian

Re: Can Tcl scan faster than find?

<tn3gbo$1p6o1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10328&group=comp.lang.tcl#10328

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Can Tcl scan faster than find?
Date: Sun, 11 Dec 2022 02:47:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <tn3gbo$1p6o1$1@dont-email.me>
References: <20221208190118.52a4f004@lud1.home> <32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com> <20221210203002.169a931c@lud1.home>
Injection-Date: Sun, 11 Dec 2022 02:47:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="f90df1f022427ba91905349c5986926e";
logging-data="1874689"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+N3lZ2C2WtyP5uZB6zS+d6"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:7+9wfpf0sGvhwO0lrAzDftCxQyU=
 by: Rich - Sun, 11 Dec 2022 02:47 UTC

Luc <no@no.no> wrote:
> On Sat, 10 Dec 2022 15:05:54 -0800 (PST), briang wrote:
>
>> I doubt you'll be able to best the speed of find. I have written a
>> utility in Tcl that scans the entire hard drive. I used a threaded model
>> to try and take advantage of I/O latency, since it also gathers file size
>> info. My assumption is that the OS will optimize its operations and
>> suspend the thread(s) until the data is ready. I have not timed it or
>> compared to "find", but it is able to scan ~0.5TB fast enough for me.
>> It's not quick, nor does it take "forever." It also runs on all
>> platforms.
>>
>> It scans the starting dir for files and subdirectories, and farms the
>> subdirectories out to another thread from a pool. The thread jobs get
>> queued as worker threads become available. This is done recursively. The
>> results in the worker thread are queued back to the main thread via a
>> non-blocking callback command, making the worker thread quickly available
>> for the next job.
>>
>> -Brian
>
> Interesting, but I wonder how effective that concept of threads really is.
> The CPU may support multiple threads, but does the hard disk?

Yes. Look up Native Command Queuing:
https://en.wikipedia.org/wiki/NCQ

For a mechanical drive, there is only one head arm, so ultimately the
"threads" serialize on that fact, but the drive can readjust ordering
to minimize head arm seeks.

For a SSD drive, since there is no head arm, there is no head arm seek
time, and depending upon the internal flash memory design, the
'threads' could possibly perform parallel reads from different areas of
the flash.

Re: Can Tcl scan faster than find?

<eSqdnVvm78en-wj-nZ2dnZfqnPudnZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10330&group=comp.lang.tcl#10330

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 11 Dec 2022 05:10:18 +0000
MIME-Version: 1.0
From: hel...@deepsoft.com (Robert Heller)
Organization: Deepwoods Software
X-Newsreader: TkNews 3.0 (1.2.15)
Subject: Re: Can Tcl scan faster than find?
In-Reply-To: <tn3gbo$1p6o1$1@dont-email.me>
References: <20221208190118.52a4f004@lud1.home>
<32f8fb62-2570-4905-a043-8be444e9f90fn@googlegroups.com>
<20221210203002.169a931c@lud1.home> <tn3gbo$1p6o1$1@dont-email.me>
Newsgroups: comp.lang.tcl
Content-Type: text/plain;
charset="us-ascii"
Originator: heller@sharky4.deepsoft.com
Message-ID: <eSqdnVvm78en-wj-nZ2dnZfqnPudnZ2d@giganews.com>
Date: Sun, 11 Dec 2022 05:10:18 +0000
Lines: 55
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-l28JGyNXJJ8TuFGQjwts5WOgEQJAQSBv1OPuQ25qmLV1gMtoQSI0eLULHuYFGQFdWuFHYCqUvToKtFp!CFU2w9OiO7QjEntBJzk8Ttv93TbCK+3a+vq8KpS5IC41CD8Ipr1p6utLZgBny3rWzbnXgiAqACDk!icA=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Received-Bytes: 4006
 by: Robert Heller - Sun, 11 Dec 2022 05:10 UTC

At Sun, 11 Dec 2022 02:47:20 -0000 (UTC) Rich <rich@example.invalid> wrote:

>
> Luc <no@no.no> wrote:
> > On Sat, 10 Dec 2022 15:05:54 -0800 (PST), briang wrote:
> >
> >> I doubt you'll be able to best the speed of find. I have written a
> >> utility in Tcl that scans the entire hard drive. I used a threaded model
> >> to try and take advantage of I/O latency, since it also gathers file size
> >> info. My assumption is that the OS will optimize its operations and
> >> suspend the thread(s) until the data is ready. I have not timed it or
> >> compared to "find", but it is able to scan ~0.5TB fast enough for me.
> >> It's not quick, nor does it take "forever." It also runs on all
> >> platforms.
> >>
> >> It scans the starting dir for files and subdirectories, and farms the
> >> subdirectories out to another thread from a pool. The thread jobs get
> >> queued as worker threads become available. This is done recursively. The
> >> results in the worker thread are queued back to the main thread via a
> >> non-blocking callback command, making the worker thread quickly available
> >> for the next job.
> >>
> >> -Brian
> >
> > Interesting, but I wonder how effective that concept of threads really is.
> > The CPU may support multiple threads, but does the hard disk?
>
> Yes. Look up Native Command Queuing:
> https://en.wikipedia.org/wiki/NCQ
>
> For a mechanical drive, there is only one head arm, so ultimately the
> "threads" serialize on that fact, but the drive can readjust ordering
> to minimize head arm seeks.
>
> For a SSD drive, since there is no head arm, there is no head arm seek
> time, and depending upon the internal flash memory design, the
> 'threads' could possibly perform parallel reads from different areas of
> the flash.

I would expect that at the application level, disk I/O might not be tied
*directly* to physical "disk" I/O, but rather be accessing the RAM-based disk
cache buffers. That is the *kernel* might be reading large parts of the disk
(whole tracks) into RAM buffers. Depending on how the data is on the "disk",
it *might* be possible to effective access multiple parts of the disk
"concurrently" with different threads.

>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
heller@deepsoft.com -- Webhosting Services

Re: Can Tcl scan faster than find?

<35a56af1-d3c2-4e14-a52a-88619a795d4fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10388&group=comp.lang.tcl#10388

  copy link   Newsgroups: comp.lang.tcl
X-Received: by 2002:a05:622a:5a88:b0:3a5:46b0:ffec with SMTP id fz8-20020a05622a5a8800b003a546b0ffecmr90410003qtb.306.1671469482459;
Mon, 19 Dec 2022 09:04:42 -0800 (PST)
X-Received: by 2002:ac8:660f:0:b0:3a9:78a5:37e6 with SMTP id
c15-20020ac8660f000000b003a978a537e6mr269760qtp.356.1671469481806; Mon, 19
Dec 2022 09:04:41 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.tcl
Date: Mon, 19 Dec 2022 09:04:41 -0800 (PST)
In-Reply-To: <20221208190118.52a4f004@lud1.home>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:241:8002:20f0:44e2:d3a7:aa2c:a852;
posting-account=HOjbdAkAAADQWpbtDe731VR_-chRuCoh
NNTP-Posting-Host: 2601:241:8002:20f0:44e2:d3a7:aa2c:a852
References: <20221208190118.52a4f004@lud1.home>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <35a56af1-d3c2-4e14-a52a-88619a795d4fn@googlegroups.com>
Subject: Re: Can Tcl scan faster than find?
From: stephen....@alum.mit.edu (blacksqr)
Injection-Date: Mon, 19 Dec 2022 17:04:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 13
 by: blacksqr - Mon, 19 Dec 2022 17:04 UTC

On Thursday, December 8, 2022 at 4:01:23 PM UTC-6, Luc wrote:
> Anyway, my question is, do you think it's possible to write Tcl code
> that can rival `find' in speed?
>
> --
> Luc
> >>

I wrote a Tcl program called globfind a while back (https://wiki.tcl-lang.org/page/globfind) which I tried to optimize for speed in searches of large filesystem spaces. I got a performance improvement of about three times over Tcllib's fileutil::find, but it's still slower than GNU find. A large pattern-match search using globfind requires about 150% of the time GNU find takes.


devel / comp.lang.tcl / Can Tcl scan faster than find?

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor