Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Man will never fly. Space travel is merely a dream. All aspirin is alike.


computers / comp.text.tex / Thu 13 Apr: TeX Hour: unlatex; Fermi estimate for reprocessing the arXiv: 6:30 to 7:30 BST

SubjectAuthor
o Thu 13 Apr: TeX Hour: unlatex; Fermi estimate for reprocessing theJonathan Fine

1
Thu 13 Apr: TeX Hour: unlatex; Fermi estimate for reprocessing the arXiv: 6:30 to 7:30 BST

<76108cbf-c075-45b9-a31b-32dc88233351n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=6431&group=comp.text.tex#6431

  copy link   Newsgroups: comp.text.tex
X-Received: by 2002:a05:620a:4629:b0:74a:acb9:58f1 with SMTP id br41-20020a05620a462900b0074aacb958f1mr2321633qkb.11.1681327930195;
Wed, 12 Apr 2023 12:32:10 -0700 (PDT)
X-Received: by 2002:a25:cfcf:0:b0:b8b:f1ac:9c6c with SMTP id
f198-20020a25cfcf000000b00b8bf1ac9c6cmr12783423ybg.3.1681327929869; Wed, 12
Apr 2023 12:32:09 -0700 (PDT)
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.text.tex
Date: Wed, 12 Apr 2023 12:32:09 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=80.189.209.186; posting-account=1n5iOQoAAAAdoKmXR0eD8Li08uSD4aUd
NNTP-Posting-Host: 80.189.209.186
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <76108cbf-c075-45b9-a31b-32dc88233351n@googlegroups.com>
Subject: Thu 13 Apr: TeX Hour: unlatex; Fermi estimate for reprocessing the
arXiv: 6:30 to 7:30 BST
From: jfine2...@gmail.com (Jonathan Fine)
Injection-Date: Wed, 12 Apr 2023 19:32:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Jonathan Fine - Wed, 12 Apr 2023 19:32 UTC

Hi

Well, next week, on Monday 17 April 1:00-5:00pm Eastern Time, we have the very first arXiv Access Forum. I'm looking forward to that. I'm most grateful to the organisers, and I hope they're not overwhelmed. They have a massive responsibility. As do the esteemed presenters and panelists, and the many participants.

Tomorrow's TeX Hour (Thursday 6:30 to 7:30pm UK Summer Time) is about my emerging unlatex tool for reprocessing TeX documents, to provide more accessible outputs. I'm close to creating in Python an equivalent to TeX's internal boxes. This involves an interesting parser + builder combination, linked by a stream of control symbols, constructors and leaf nodes.

Here are some arXiv stats (in round numbers):

Total number of submissions: 2.25 million.
Downloads per month: 25 million.
Seconds in a month: 2.6 million.
Registered for arXiv Access Forum: 2,000 people.

Why seconds in a month? Well, it's approximately equal to the total number of submissions. So we can make a Fermi estimate as to how long it will take to reprocess the entire arXiv to get accessible outputs (assuming suitable software).

Suppose we have a desktop PC with 12 cores, so 24 threads, so about 20 cores doing useful work. On such a machine, if not bottlenecked, we could do the whole lot in a month provided each item takes only 20 seconds. The download might take a while, and the electricity would be about £150 (or $150).

Harder is to make a Fermi estimate for creating suitable software, and yet harder is writing and testing the software. Also very important is field testing of its outputs for accessibility..

Here's the URL for Monday's arXiv forum: https://accessibility2023.arxiv.org/
The TeX Hour zoom URL: https://us02web.zoom.us/j/78551255396?pwd=cHdJN0pTTXRlRCtSd1lCTHpuWmNIUT09
The home page tomorrow's TeX Hour: https://texhour.github.io/2023/04/13/unlatex-results-prospects/

wishing you happy arXiving

Jonathan

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor