Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

It is easier to change the specification to fit the program than vice versa.


computers / comp.os.linux.misc / Program to Split Single-Page HTML Documentation

SubjectAuthor
* Program to Split Single-Page HTML DocumentationComputer Nerd Kev
+* Re: Program to Split Single-Page HTML DocumentationBud Frede
|`* Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev
| `- Re: Program to Split Single-Page HTML DocumentationBud Frede
+* Re: Program to Split Single-Page HTML DocumentationCarlos E.R.
|+* Re: Program to Split Single-Page HTML DocumentationHelmut Richter
||`* Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev
|| `* Re: Program to Split Single-Page HTML DocumentationHelmut Richter
||  `* Re: Program to Split Single-Page HTML DocumentationHelmut Richter
||   `* Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev
||    `* Re: Program to Split Single-Page HTML Documentation23k.304
||     `* Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev
||      +* Re: Program to Split Single-Page HTML DocumentationHelmut Richter
||      |`- Re: Program to Split Single-Page HTML Documentation23k.304
||      `- Re: Program to Split Single-Page HTML Documentation23k.304
|`- Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev
`* Re: Program to Split Single-Page HTML DocumentationTheo
 `- Re: Program to Split Single-Page HTML DocumentationComputer Nerd Kev

1
Program to Split Single-Page HTML Documentation

<64c73d4a@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11348&group=comp.os.linux.misc#11348

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64c73d4a@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i686))
NNTP-Posting-Host: news.ausics.net
Date: 31 Jul 2023 14:49:14 +1000
Organization: Ausics - https://www.ausics.net
Lines: 31
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Mon, 31 Jul 2023 04:49 UTC

I don't like browsing huge single HTML pages of documentation. Does
anyone know of a program or script (preferably for Linux) that can
scan a big software manual's single HTML page and automatically
break it up according to the contents section and the corresponding
anchor links?

Basically I want something to turn this:
http://www.gnu.org/software/coreutils/manual/coreutils.html

into this:
http://www.gnu.org/software/coreutils/manual/html_node/index.html

But without the Texinfo source like GNU software (usually) uses.
Just from the HTML itself. I also want it to output static HTML, so
no solutions using Javascript or browser add-ons.

One option might be to use csplit to break it up at common section
separator patterns, then a simple script renames the new files
according to their heading text. But I'd like to have HTML
navigation links, ideally including converting existing anchor
links inside the document.

A prime target would be the Raspberry Pi configuration
documentation, which has convinced me of the merit of multi-page
docs by how confusing it has become for me since they switched to a
single-page layout:
https://www.raspberrypi.com/documentation/computers/configuration.html

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<FHNxM.38394$fRmf.32626@fx02.iad>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11356&group=comp.os.linux.misc#11356

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
From: fre...@mouse-potato.com (Bud Frede)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Organization: Wossamotta U.
References: <64c73d4a@news.ausics.net>
X-No-Archive: Yes
X-Clacks-Overhead: GNU Terry Pratchett
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
Lines: 36
Message-ID: <FHNxM.38394$fRmf.32626@fx02.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 31 Jul 2023 12:21:57 UTC
Date: Mon, 31 Jul 2023 08:21:35 -0400
X-Received-Bytes: 2204
 by: Bud Frede - Mon, 31 Jul 2023 12:21 UTC

Computer Nerd Kev <not@telling.you.invalid> writes:

> I don't like browsing huge single HTML pages of documentation. Does
> anyone know of a program or script (preferably for Linux) that can
> scan a big software manual's single HTML page and automatically
> break it up according to the contents section and the corresponding
> anchor links?
>
> Basically I want something to turn this:
> http://www.gnu.org/software/coreutils/manual/coreutils.html
>
> into this:
> http://www.gnu.org/software/coreutils/manual/html_node/index.html
>
> But without the Texinfo source like GNU software (usually) uses.
> Just from the HTML itself. I also want it to output static HTML, so
> no solutions using Javascript or browser add-ons.
>
> One option might be to use csplit to break it up at common section
> separator patterns, then a simple script renames the new files
> according to their heading text. But I'd like to have HTML
> navigation links, ideally including converting existing anchor
> links inside the document.
>
> A prime target would be the Raspberry Pi configuration
> documentation, which has convinced me of the merit of multi-page
> docs by how confusing it has become for me since they switched to a
> single-page layout:
> https://www.raspberrypi.com/documentation/computers/configuration.html

I've never used it myself, but I seem to remember talking to someone in
the past who used htmldoc to do this kind of thing?

https://www.msweet.org/htmldoc/

Re: Program to Split Single-Page HTML Documentation

<64c89ecd@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11364&group=comp.os.linux.misc#11364

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64c89ecd@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <FHNxM.38394$fRmf.32626@fx02.iad>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i686))
NNTP-Posting-Host: news.ausics.net
Date: 1 Aug 2023 15:57:33 +1000
Organization: Ausics - https://www.ausics.net
Lines: 26
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Tue, 1 Aug 2023 05:57 UTC

In comp.os.linux.misc Bud Frede <frede@mouse-potato.com> wrote:
>> A prime target would be the Raspberry Pi configuration
>> documentation, which has convinced me of the merit of multi-page
>> docs by how confusing it has become for me since they switched to a
>> single-page layout:
>> https://www.raspberrypi.com/documentation/computers/configuration.html
>
> I've never used it myself, but I seem to remember talking to someone in
> the past who used htmldoc to do this kind of thing?
>
> https://www.msweet.org/htmldoc/

Thanks for that. I've been to that website before but it's not
until you look in the README that it mentions the option of HTML as
an output format as well as for input. In fact the "htmlsep"
format option does exactly what I wanted:

mkdir rpi_config
htmldoc -t htmlsep -d ./rpi_config 'https://www.raspberrypi.com/documentation/computers/configuration.html'

It did pass through a few broken relative links that pointed to
other pages at the website, but I think I can live with that.

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<tN5yM.22979$Wk53.22486@fx01.iad>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11369&group=comp.os.linux.misc#11369

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx01.iad.POSTED!not-for-mail
From: fre...@mouse-potato.com (Bud Frede)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Organization: Wossamotta U.
References: <64c73d4a@news.ausics.net> <FHNxM.38394$fRmf.32626@fx02.iad>
<64c89ecd@news.ausics.net>
X-No-Archive: Yes
X-Clacks-Overhead: GNU Terry Pratchett
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
Lines: 27
Message-ID: <tN5yM.22979$Wk53.22486@fx01.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 01 Aug 2023 11:13:29 UTC
Date: Tue, 01 Aug 2023 07:13:24 -0400
X-Received-Bytes: 1834
 by: Bud Frede - Tue, 1 Aug 2023 11:13 UTC

Computer Nerd Kev <not@telling.you.invalid> writes:

> In comp.os.linux.misc Bud Frede <frede@mouse-potato.com> wrote:
>>> A prime target would be the Raspberry Pi configuration
>>> documentation, which has convinced me of the merit of multi-page
>>> docs by how confusing it has become for me since they switched to a
>>> single-page layout:
>>> https://www.raspberrypi.com/documentation/computers/configuration.html
>>
>> I've never used it myself, but I seem to remember talking to someone in
>> the past who used htmldoc to do this kind of thing?
>>
>> https://www.msweet.org/htmldoc/
>
> Thanks for that. I've been to that website before but it's not
> until you look in the README that it mentions the option of HTML as
> an output format as well as for input. In fact the "htmlsep"
> format option does exactly what I wanted:
>
> mkdir rpi_config
> htmldoc -t htmlsep -d ./rpi_config 'https://www.raspberrypi.com/documentation/computers/configuration.html'
>
> It did pass through a few broken relative links that pointed to
> other pages at the website, but I think I can live with that.

I'm glad I could help out! :-)

Re: Program to Split Single-Page HTML Documentation

<ee6npjxig4.ln2@Telcontar.valinor>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11370&group=comp.os.linux.misc#11370

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: robin_li...@es.invalid (Carlos E.R.)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: Tue, 1 Aug 2023 13:21:18 +0200
Lines: 40
Message-ID: <ee6npjxig4.ln2@Telcontar.valinor>
References: <64c73d4a@news.ausics.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net b2NIb5KjKiKqTjzZ9ELveQhXpVppp4WElMR4yXPhmxgrsfi9E9
X-Orig-Path: Telcontar.valinor!not-for-mail
Cancel-Lock: sha1:7N6Yhu36QeRJlDwIdoVYwWESaLY= sha256:82PXpebOWx4gdsX+KjOtYqEVXzeBpUD+abjiy0Z83P4=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.1
Content-Language: es-ES, en-CA
In-Reply-To: <64c73d4a@news.ausics.net>
 by: Carlos E.R. - Tue, 1 Aug 2023 11:21 UTC

On 2023-07-31 06:49, Computer Nerd Kev wrote:
> I don't like browsing huge single HTML pages of documentation. Does
> anyone know of a program or script (preferably for Linux) that can
> scan a big software manual's single HTML page and automatically
> break it up according to the contents section and the corresponding
> anchor links?
>
> Basically I want something to turn this:
> http://www.gnu.org/software/coreutils/manual/coreutils.html
>
> into this:
> http://www.gnu.org/software/coreutils/manual/html_node/index.html
>
> But without the Texinfo source like GNU software (usually) uses.
> Just from the HTML itself. I also want it to output static HTML, so
> no solutions using Javascript or browser add-ons.
>
> One option might be to use csplit to break it up at common section
> separator patterns, then a simple script renames the new files
> according to their heading text. But I'd like to have HTML
> navigation links, ideally including converting existing anchor
> links inside the document.
>
> A prime target would be the Raspberry Pi configuration
> documentation, which has convinced me of the merit of multi-page
> docs by how confusing it has become for me since they switched to a
> single-page layout:
> https://www.raspberrypi.com/documentation/computers/configuration.html
>

Interesting. I find multi-page docs confusing, I prefer single page. I
can do searches on the whole thing.

No, sorry, I do not know a tool to do the splitting automatically. I
would simply use an html editor and save chunks manually. LO could
possibly do it. Otherwise, I would try Composer in Seamonkey.

--
Cheers, Carlos.

Re: Program to Split Single-Page HTML Documentation

<423fa31-6dd-a7a8-7758-4eca1c952b4@email.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11371&group=comp.os.linux.misc#11371

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: hr.use...@email.de (Helmut Richter)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: Tue, 1 Aug 2023 14:31:24 +0200
Lines: 45
Message-ID: <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de>
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="8323329-1135635514-1690892490=:2060"
X-Trace: individual.net v+eXyssBeKnvq8tmyhOZzwf4Io/DrZwlxoIJ1HGer1OiqHisCM
X-Orig-Path: kiboko2!hr.usenet
Cancel-Lock: sha1:XIwekMJFSEkiBJZjGrBXiv8XPKk= sha256:iFrEFTOx56jFbyruh20BzMkw+FUKhPQBRv8Qqw3VSiY=
In-Reply-To: <ee6npjxig4.ln2@Telcontar.valinor>
Content-ID: <d5484126-8a80-c98b-86a6-322e1d8a7de7@web.de>
 by: Helmut Richter - Tue, 1 Aug 2023 12:31 UTC

On Tue, 1 Aug 2023, Carlos E.R. wrote:

> No, sorry, I do not know a tool to do the splitting automatically. I would
> simply use an html editor and save chunks manually. LO could possibly do it.
> Otherwise, I would try Composer in Seamonkey.

Web pages are made by lots of tools which yield readable or unreadable
HTML code. Things that make code unreadable are

– CSS for distributing text over the screen (this has decreased a
little since Web designers reckon with smartphones that have no
screen in the classic sense, i.e. at least 25 cm wide). CSS that works
locally (fonts, sizes, ...) is less of a problem, you only have to make
sure to repeat the declarations in each portion,

– a converter from something else, e.g. Word or PDF, to HTML, especially if
it is expected to convert also vice versa,

– many tools working simultaneously (a CMS, a WYSIWYG editor, a
converter of text formats, macro techniques like SSI, and – most
prominently – server side scripting like PHP and others) not
knowing which other tools are underway as well.

If you look into the HTML code without understanding anything you
cannot expect any tool to understand more. In simple cases, a tool
like HTML Tidy (https://www.w3.org/People/Raggett/tidy/) might improve
the situation but a tool working on everything is hardly conceivable.

Consequently, the proposed tool HTMLdoc excludes virtually everything:

While it currently does not support many things in “the modern web”
such as Cascading Style Sheets (CSS), forms, full Unicode, and Emoji
characters, ...

--
Helmut Richter

Re: Program to Split Single-Page HTML Documentation

<64c9941b@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11380&group=comp.os.linux.misc#11380

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64c9941b@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 2 Aug 2023 09:24:11 +1000
Organization: Ausics - https://www.ausics.net
Lines: 19
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Tue, 1 Aug 2023 23:24 UTC

In comp.infosystems.www.misc Helmut Richter <hr.usenet@email.de> wrote:
> Consequently, the proposed tool HTMLdoc excludes virtually everything:
>
> While it currently does not support many things in "the modern web"
> such as Cascading Style Sheets (CSS), forms, full Unicode, and Emoji
> characters, ...

Well no, it doesn't exclude most software documentation because for
whatever reason the HTML in much of that has remained relatively
sane. Some CSS is creeping in, but a tool that ignores it still
produces clear text with some formatting. So far I've tested it
with two documentation pages published in 2023 and it's understood
the HTML fine (except for that out-of-page link problem in the RPi
doc). Plus often I'm looking at docs published 10-20 years ago
anyway.

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<64c997ae@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11381&group=comp.os.linux.misc#11381

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64c997ae@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 2 Aug 2023 09:39:26 +1000
Organization: Ausics - https://www.ausics.net
Lines: 21
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Tue, 1 Aug 2023 23:39 UTC

In comp.infosystems.www.misc Carlos E.R. <robin_listas@es.invalid> wrote:
> Interesting. I find multi-page docs confusing, I prefer single page. I
> can do searches on the whole thing.

Having both is really the ideal for me. But I read the multi-page
version and resort to the single-page one if I'm reduced to text
searching the whole document. On the other hand often it's useful
to do text searches limited to paticular sections/pages too.

If I set out to read a manual from start to finish then the
single-page version is good. But for reference use it's too easy
to lose my place in them. I usually open different parts in
multiple tabs. Five minutes after I've learnt something important
from a manual I've usually forgotten it again, but I remember the
tab or what the layout of the page looked like, so I can get back
to it quickly. That doesn't work so well on a massive single
webpage.

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11396&group=comp.os.linux.misc#11396

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: hr.use...@email.de (Helmut Richter)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: Wed, 2 Aug 2023 22:29:32 +0200
Lines: 57
Message-ID: <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de>
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Trace: individual.net 6sqN0WfBpek7KLKlOEgKPgL1Qa1NGr1TF3dz7o3qJhj8bumva8
X-Orig-Path: kiboko2!hr.usenet
Cancel-Lock: sha1:rYrV8O8nr0ypo6YHoafVjDdeHdE= sha256:cEShpvqOq37DOORoloiKowddCx35wfDwcxUP3HKOkNk=
In-Reply-To: <64c9941b@news.ausics.net>
 by: Helmut Richter - Wed, 2 Aug 2023 20:29 UTC

On Wed, 2 Aug 2023, Computer Nerd Kev wrote:

> In comp.infosystems.www.misc Helmut Richter <hr.usenet@email.de> wrote:
> > Consequently, the proposed tool HTMLdoc excludes virtually everything:
> >
> > While it currently does not support many things in "the modern web"
> > such as Cascading Style Sheets (CSS), forms, full Unicode, and Emoji
> > characters, ...
>
> Well no, it doesn't exclude most software documentation because for
> whatever reason the HTML in much of that has remained relatively
> sane. Some CSS is creeping in, but a tool that ignores it still
> produces clear text with some formatting. So far I've tested it
> with two documentation pages published in 2023 and it's understood
> the HTML fine (except for that out-of-page link problem in the RPi
> doc). Plus often I'm looking at docs published 10-20 years ago
> anyway.

Out-of-page links are trivial: replace each link "#xyz" by "subpage#xyz".
It is known to which subpage each link belongs, at least if you go over
the text in two passes. This is a procedure which I apply to all my web
pages, which are written as one document, and split into pieces later.

Another automatic change is to close and reopen each open tag:

<tag1>
a...
<tag2>
b...
---------------------------- cut here --------------
c...
</tag2>
d...
</tag1>

becomes

<tag1>
a...
<tag2>
b...
</tag2>
</tag1>
--------------------------was cut here ---------------
<tag1>
<tag2>
c...
</tag2>
d...
</tag1>

It is not obvious which attributes should be repeated in the second copy.
Mostly it is safer to repeat them, at least "class" and "style" attributes
and "id" attributes if they are referred to in style sheets.

--
Helmut Richter

Re: Program to Split Single-Page HTML Documentation

<c9c81087-9545-e093-51a9-84a38531e9c7@email.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11407&group=comp.os.linux.misc#11407

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: hr.use...@email.de (Helmut Richter)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: Thu, 3 Aug 2023 11:44:04 +0200
Lines: 18
Message-ID: <c9c81087-9545-e093-51a9-84a38531e9c7@email.de>
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Trace: individual.net aNi/nzFs57E9EC4F/+MDxgz8k4kZdk0LUJfspj7T0uWx6TXFrN
X-Orig-Path: kiboko2!hr.usenet
Cancel-Lock: sha1:Zw7gv5GaPJLoFD29MazWb3BAR7U= sha256:EPOu8wKM1+0zBR2Pm1MwtfUBhFFNBoP42X2xdbPQVWE=
In-Reply-To: <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de>
 by: Helmut Richter - Thu, 3 Aug 2023 09:44 UTC

On Wed, 2 Aug 2023, Helmut Richter wrote:

> Out-of-page links are trivial: replace each link "#xyz" by "subpage#xyz".
> It is known to which subpage each link belongs, at least if you go over
> the text in two passes. This is a procedure which I apply to all my web
> pages, which are written as one document, and split into pieces later.

It might be interesting to see an example of the TOC (table of contents)
of such a split article (https://hhr-m.de/sw-fibel/contents.html). It
contains all anchors in the whole article, which are possible but not
necessarily used (except from the TOC, of course) link targets. The link
structure might be still better visible if you look into the source code
of that web page which is fairly readable.

Sorry, the article itself is in German, a tutorial for learning Swahili.

--
Helmut Richter

Re: Program to Split Single-Page HTML Documentation

<64cce9fd@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11434&group=comp.os.linux.misc#11434

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64cce9fd@news.ausics.net>
Cancel-Lock: sha1:JYwVKyMD2nA4PbndWzDNh2zs2gc=
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de> <c9c81087-9545-e093-51a9-84a38531e9c7@email.de>
User-Agent: tin/2.1.1-20120623 ("Mulindry") (UNIX) (Linux/3.9.6 (i686))
NNTP-Posting-Host: news.ausics.net
Date: 4 Aug 2023 22:07:25 +1000
Organization: Ausics - https://www.ausics.net
Lines: 46
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Fri, 4 Aug 2023 12:07 UTC

In comp.os.linux.misc Helmut Richter <hr.usenet@email.de> wrote:
> On Wed, 2 Aug 2023, Helmut Richter wrote:
>> Out-of-page links are trivial: replace each link "#xyz" by "subpage#xyz".
>> It is known to which subpage each link belongs, at least if you go over
>> the text in two passes. This is a procedure which I apply to all my web
>> pages, which are written as one document, and split into pieces later.
>
> It might be interesting to see an example of the TOC (table of contents)
> of such a split article (https://hhr-m.de/sw-fibel/contents.html). It
> contains all anchors in the whole article, which are possible but not
> necessarily used (except from the TOC, of course) link targets. The link
> structure might be still better visible if you look into the source code
> of that web page which is fairly readable.

I think you misunderstood the problem. Perhaps I should have
explained that I would prefer it to rewrite relative links to other
webpages as absolute links.

As it is, a link like this:
<a href="/documentation/computers/processors.html#bcm2835">BCM2835</a>

From here:
https://www.raspberrypi.com/documentation/computers/configuration.html

Doesn't work when conveted unless the processors.html page is also
saved locally. Seeing as the program saw the source URL, I would
have liked it to be smart enough to turn such relative links into
absolute links when the link distination is another webpage.

This has fixed many of those relative links which had a directory
path:
for page in *.html; do sed -i \
's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done

Pre-processing the page to rewrite relative links to other pages in
the same directory when the path isn't in the href, before running
HTMLDOC, would fix the rest.

Such as this:
<a href="config_txt.html#video-options">

It's not a major complaint.

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11443&group=comp.os.linux.misc#11443

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.22.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.earthlink.com!news.earthlink.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 05 Aug 2023 04:45:56 +0000
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de> <c9c81087-9545-e093-51a9-84a38531e9c7@email.de> <64cce9fd@news.ausics.net>
From: 23k...@bfxw9.net (23k.304)
Organization: feather germanium
Date: Sat, 5 Aug 2023 00:45:55 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <64cce9fd@news.ausics.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Message-ID: <S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com>
Lines: 58
X-Usenet-Provider: http://www.giganews.com
NNTP-Posting-Host: 68.212.127.105
X-Trace: sv3-EqwcdSN0EtgRUNuziX/0mcNpHcOi6opxZ5FSIB6veI/A+URYDDZzI0cAkyO6jPFOvGaE1rl+A+ZghnG!taANjJZfckjj4SUYmHiAdY3VG+ZOfakvB5big8K+MRORR9kB6jK/p2SFKRmGy3elDZnyp0JKh3te!T3Dxav0+lLv6LBbXXPR1/A==
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: 23k.304 - Sat, 5 Aug 2023 04:45 UTC

On 8/4/23 8:07 AM, Computer Nerd Kev wrote:
> In comp.os.linux.misc Helmut Richter <hr.usenet@email.de> wrote:
>> On Wed, 2 Aug 2023, Helmut Richter wrote:
>>> Out-of-page links are trivial: replace each link "#xyz" by "subpage#xyz".
>>> It is known to which subpage each link belongs, at least if you go over
>>> the text in two passes. This is a procedure which I apply to all my web
>>> pages, which are written as one document, and split into pieces later.
>>
>> It might be interesting to see an example of the TOC (table of contents)
>> of such a split article (https://hhr-m.de/sw-fibel/contents.html). It
>> contains all anchors in the whole article, which are possible but not
>> necessarily used (except from the TOC, of course) link targets. The link
>> structure might be still better visible if you look into the source code
>> of that web page which is fairly readable.
>
> I think you misunderstood the problem. Perhaps I should have
> explained that I would prefer it to rewrite relative links to other
> webpages as absolute links.
>
> As it is, a link like this:
> <a href="/documentation/computers/processors.html#bcm2835">BCM2835</a>
>
> From here:
> https://www.raspberrypi.com/documentation/computers/configuration.html
>
> Doesn't work when conveted unless the processors.html page is also
> saved locally. Seeing as the program saw the source URL, I would
> have liked it to be smart enough to turn such relative links into
> absolute links when the link distination is another webpage.
>
> This has fixed many of those relative links which had a directory
> path:
> for page in *.html; do sed -i \
> 's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done
>
> Pre-processing the page to rewrite relative links to other pages in
> the same directory when the path isn't in the href, before running
> HTMLDOC, would fix the rest.
>
> Such as this:
> <a href="config_txt.html#video-options">
>
> It's not a major complaint.

Ummm ... are you trying to do this STATICALLY, on
pre-existing HTML, or DYNAMICALLY, as users actually
access the pages ???

For the first case, a little Python will do wonders.
Identify tags, where you need to insert the absolute
parts of the paths, do it. Python is great with text
strings.

The second case ... not as easy. There you'd be best
off using PHP. Yea, JavaScript will do it too, but
it's always something<dot>something<dot>something<dot>
something<dot>something<dot> ... functional but it
gets UGLY/OPAQUE real quick :-)

Re: Program to Split Single-Page HTML Documentation

<64ce0679@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11447&group=comp.os.linux.misc#11447

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64ce0679@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de> <c9c81087-9545-e093-51a9-84a38531e9c7@email.de> <64cce9fd@news.ausics.net> <S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i686))
NNTP-Posting-Host: news.ausics.net
Date: 5 Aug 2023 18:21:13 +1000
Organization: Ausics - https://www.ausics.net
Lines: 39
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Sat, 5 Aug 2023 08:21 UTC

In comp.os.linux.misc 23k.304 <23k304@bfxw9.net> wrote:
> On 8/4/23 8:07 AM, Computer Nerd Kev wrote:
>> This has fixed many of those relative links which had a directory
>> path:
>> for page in *.html; do sed -i \
>> 's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done
>>
>> Pre-processing the page to rewrite relative links to other pages in
>> the same directory when the path isn't in the href, before running
>> HTMLDOC, would fix the rest.
>>
>> Such as this:
>> <a href="config_txt.html#video-options">
>>
>> It's not a major complaint.
>
> Ummm ... are you trying to do this STATICALLY, on
> pre-existing HTML, or DYNAMICALLY, as users actually
> access the pages ???

Statically, as stated in the first post.

Also no users besides me, browsing locally with file:// URLs.

> For the first case, a little Python will do wonders.
> Identify tags, where you need to insert the absolute
> parts of the paths, do it. Python is great with text
> strings.

Well Sed will do that too with a smarter regex (using selection
brackets), and I don't like Python for reasons we've already
argued about. As it happens there are so few links without an
absolute path in that doc (I thought there were none until browsing
around after that first try) that I figured it wasn't worth
bothering with. Other docs may sufficiently motivate me eventually.

--
__ __
#_ < |\| |< _#

Re: Program to Split Single-Page HTML Documentation

<969b4853-bf-13e6-2b79-3e62642e7f1b@email.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11450&group=comp.os.linux.misc#11450

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: hr.use...@email.de (Helmut Richter)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: Sat, 5 Aug 2023 10:56:57 +0200
Lines: 38
Message-ID: <969b4853-bf-13e6-2b79-3e62642e7f1b@email.de>
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de> <c9c81087-9545-e093-51a9-84a38531e9c7@email.de> <64cce9fd@news.ausics.net>
<S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com> <64ce0679@news.ausics.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Trace: individual.net ihsCQ0V6uCDP3GR4y9+23QM+3WDZIe/mOxsdXXNpaLa3P/FdWY
X-Orig-Path: kiboko2!hr.usenet
Cancel-Lock: sha1:43NfjS1GVYv6okdDX5bcV9o19FA= sha256:JrJSQhAN1su3ooF6GS8KwooU3wxYO3rj5LL8skHSQuY=
In-Reply-To: <64ce0679@news.ausics.net>
 by: Helmut Richter - Sat, 5 Aug 2023 08:56 UTC

On Sat, 5 Aug 2023, Computer Nerd Kev wrote:

> In comp.os.linux.misc 23k.304 <23k304@bfxw9.net> wrote:
> > On 8/4/23 8:07 AM, Computer Nerd Kev wrote:
> >> This has fixed many of those relative links which had a directory
> >> path:
> >> for page in *.html; do sed -i \
> >> 's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done
> >>
> >> Pre-processing the page to rewrite relative links to other pages in
> >> the same directory when the path isn't in the href, before running
> >> HTMLDOC, would fix the rest.
> >>
> >> Such as this:
> >> <a href="config_txt.html#video-options">
> >>
> >> It's not a major complaint.
> >
> > Ummm ... are you trying to do this STATICALLY, on
> > pre-existing HTML, or DYNAMICALLY, as users actually
> > access the pages ???
>
> Statically, as stated in the first post.

Yes, there is nothing dynamic in it.

I had still another idea: leave the links as is, and map them to the
correct URLs by a rewrite in the server. The table of necessary rewrite
statements is static but can be modified if necessary. I discarded that
because it is too complex for too little benefit, if any.

> Also no users besides me, browsing locally with file:// URLs.

Of course, the rewrite is not an option in this case as there is no
server.

--
Helmut Richter

Re: Program to Split Single-Page HTML Documentation

<oTudnXSkQvkWiFL5nZ2dnZfqn_idnZ2d@earthlink.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11469&group=comp.os.linux.misc#11469

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.27.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.earthlink.com!news.earthlink.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 06 Aug 2023 03:37:47 +0000
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor> <423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net> <d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de> <c9c81087-9545-e093-51a9-84a38531e9c7@email.de> <64cce9fd@news.ausics.net> <S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com> <64ce0679@news.ausics.net> <969b4853-bf-13e6-2b79-3e62642e7f1b@email.de>
From: 23k...@bfxw9.net (23k.304)
Organization: feather germanium
Date: Sat, 5 Aug 2023 23:37:47 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <969b4853-bf-13e6-2b79-3e62642e7f1b@email.de>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Message-ID: <oTudnXSkQvkWiFL5nZ2dnZfqn_idnZ2d@earthlink.com>
Lines: 42
X-Usenet-Provider: http://www.giganews.com
NNTP-Posting-Host: 68.212.127.105
X-Trace: sv3-Ne7gTE2Ps0+9n3PUGdKFsMxwkzJfJGNeUXCTerpL7HcEeNSIUsT04FMhpH9sdBAqHWpq+GHh8qMXS0K!JR44wEu9n/ULmUxFYkgh4lkloPCK1ONizif4C3m7F2rDzF8pXIG75NEGgshZWdIWGA6oQj5+LTnK!xtt3TvRPHZgs4S5iU542Gw==
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: 23k.304 - Sun, 6 Aug 2023 03:37 UTC

On 8/5/23 4:56 AM, Helmut Richter wrote:
> On Sat, 5 Aug 2023, Computer Nerd Kev wrote:
>
>> In comp.os.linux.misc 23k.304 <23k304@bfxw9.net> wrote:
>>> On 8/4/23 8:07 AM, Computer Nerd Kev wrote:
>>>> This has fixed many of those relative links which had a directory
>>>> path:
>>>> for page in *.html; do sed -i \
>>>> 's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done
>>>>
>>>> Pre-processing the page to rewrite relative links to other pages in
>>>> the same directory when the path isn't in the href, before running
>>>> HTMLDOC, would fix the rest.
>>>>
>>>> Such as this:
>>>> <a href="config_txt.html#video-options">
>>>>
>>>> It's not a major complaint.
>>>
>>> Ummm ... are you trying to do this STATICALLY, on
>>> pre-existing HTML, or DYNAMICALLY, as users actually
>>> access the pages ???
>>
>> Statically, as stated in the first post.
>
> Yes, there is nothing dynamic in it.
>
> I had still another idea: leave the links as is, and map them to the
> correct URLs by a rewrite in the server. The table of necessary rewrite
> statements is static but can be modified if necessary. I discarded that
> because it is too complex for too little benefit, if any.
>
>> Also no users besides me, browsing locally with file:// URLs.
>
> Of course, the rewrite is not an option in this case as there is no
> server.

It's also more "hidden" ... which can getcha when you,
or somebody, has to debug/rewrite in a couple of years.

Re: Program to Split Single-Page HTML Documentation

<oTudnXekQvmQi1L5nZ2dnZfqn_gAAAAA@earthlink.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11470&group=comp.os.linux.misc#11470

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!nntp.earthlink.com!news.earthlink.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 06 Aug 2023 03:39:57 +0000
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <ee6npjxig4.ln2@Telcontar.valinor>
<423fa31-6dd-a7a8-7758-4eca1c952b4@email.de> <64c9941b@news.ausics.net>
<d9ddb265-e637-7a8-b6e6-b1d7a6281a3c@email.de>
<c9c81087-9545-e093-51a9-84a38531e9c7@email.de> <64cce9fd@news.ausics.net>
<S4GdneJULuqZSVD5nZ2dnZfqn_SdnZ2d@earthlink.com> <64ce0679@news.ausics.net>
From: 23k...@bfxw9.net (23k.304)
Organization: feather germanium
Date: Sat, 5 Aug 2023 23:39:57 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <64ce0679@news.ausics.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Message-ID: <oTudnXekQvmQi1L5nZ2dnZfqn_gAAAAA@earthlink.com>
Lines: 42
X-Usenet-Provider: http://www.giganews.com
NNTP-Posting-Host: 68.212.127.105
X-Trace: sv3-BFQWeZ/vbMc4b644tf+Zhr/sCVOiBEZuMT01p5rk+0ueatEjMckcymkghvxRBoVGWvH+jTNmLEkwzRC!zitujsFM+O8A91dVR24fQe9itBDUK6/LtPy5U2GYDkena6uMaz7fRL5xTSLSC0SHOc9hl668xrPH!SGuzeZ7XaTpZSzgezp1l6A==
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: 23k.304 - Sun, 6 Aug 2023 03:39 UTC

On 8/5/23 4:21 AM, Computer Nerd Kev wrote:
> In comp.os.linux.misc 23k.304 <23k304@bfxw9.net> wrote:
>> On 8/4/23 8:07 AM, Computer Nerd Kev wrote:
>>> This has fixed many of those relative links which had a directory
>>> path:
>>> for page in *.html; do sed -i \
>>> 's/<a href="\//<a href="https:\/\/www.raspberrypi.com\//g' $page; done
>>>
>>> Pre-processing the page to rewrite relative links to other pages in
>>> the same directory when the path isn't in the href, before running
>>> HTMLDOC, would fix the rest.
>>>
>>> Such as this:
>>> <a href="config_txt.html#video-options">
>>>
>>> It's not a major complaint.
>>
>> Ummm ... are you trying to do this STATICALLY, on
>> pre-existing HTML, or DYNAMICALLY, as users actually
>> access the pages ???
>
> Statically, as stated in the first post.
>
> Also no users besides me, browsing locally with file:// URLs.
>
>> For the first case, a little Python will do wonders.
>> Identify tags, where you need to insert the absolute
>> parts of the paths, do it. Python is great with text
>> strings.
>
> Well Sed will do that too with a smarter regex (using selection
> brackets), and I don't like Python for reasons we've already
> argued about. As it happens there are so few links without an
> absolute path in that doc (I thought there were none until browsing
> around after that first try) that I figured it wasn't worth
> bothering with. Other docs may sufficiently motivate me eventually.

I absolutely hate regex. To me a few clear lines of Python
are much more agreeable and obvious to all.

But, for static, existing, pages either WILL get you there
without much drama.

Re: Program to Split Single-Page HTML Documentation

<+iv*eVenz@news.chiark.greenend.org.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11533&group=comp.os.linux.misc#11533

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsfeed.xs3.de!callisto.xs3.de!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo)
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
Subject: Re: Program to Split Single-Page HTML Documentation
Date: 07 Aug 2023 18:51:14 +0100 (BST)
Organization: University of Cambridge, England
Message-ID: <+iv*eVenz@news.chiark.greenend.org.uk>
References: <64c73d4a@news.ausics.net>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="11817"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Mon, 7 Aug 2023 17:51 UTC

In comp.infosystems.www.misc Computer Nerd Kev <not@telling.you.invalid> wrote:
> A prime target would be the Raspberry Pi configuration
> documentation, which has convinced me of the merit of multi-page
> docs by how confusing it has become for me since they switched to a
> single-page layout:
> https://www.raspberrypi.com/documentation/computers/configuration.html

It doesn't answer the general question, but just to note that documentation
is generated from Asciidoc, and the source files can be found in their repo:
https://github.com/raspberrypi/documentation

Presumably there's a way to build Asciidoc to generate multi-page HTML, but
if not you could just read the .adoc files - github makes a fair stab at
rendering them.

Theo

Re: Program to Split Single-Page HTML Documentation

<64d178df@news.ausics.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=11539&group=comp.os.linux.misc#11539

  copy link   Newsgroups: comp.infosystems.www.misc comp.os.linux.misc
Message-ID: <64d178df@news.ausics.net>
From: not...@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Program to Split Single-Page HTML Documentation
Newsgroups: comp.infosystems.www.misc,comp.os.linux.misc
References: <64c73d4a@news.ausics.net> <+iv*eVenz@news.chiark.greenend.org.uk>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 8 Aug 2023 09:06:08 +1000
Organization: Ausics - https://www.ausics.net
Lines: 30
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Mon, 7 Aug 2023 23:06 UTC

In comp.os.linux.misc Theo <theom+news@chiark.greenend.org.uk> wrote:
> In comp.infosystems.www.misc Computer Nerd Kev <not@telling.you.invalid> wrote:
>> A prime target would be the Raspberry Pi configuration
>> documentation, which has convinced me of the merit of multi-page
>> docs by how confusing it has become for me since they switched to a
>> single-page layout:
>> https://www.raspberrypi.com/documentation/computers/configuration.html
>
> It doesn't answer the general question, but just to note that documentation
> is generated from Asciidoc, and the source files can be found in their repo:
> https://github.com/raspberrypi/documentation

That's a good point. But I'm doubtful that Asciidoc converters will
easily give me the "previous contents next" links at the top and
bottom of pages, like published page-split HTML docs usually have.
The suggested HTMLDOC program generates those navigation links, so
I'll stick with it even for the RPi docs.

> Presumably there's a way to build Asciidoc to generate multi-page HTML, but
> if not you could just read the .adoc files - github makes a fair stab at
> rendering them.

GitHub unfortunately seems to have become a Javascript-only
solution now. Files and directories in repos can no longer be
viewed without Javascript (except for the top directory and the
readme, on the first page). Very disappointing.

--
__ __
#_ < |\| |< _#

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor