Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

E = MC ** 2 +- 3db


devel / comp.compilers / Re: Does the theory and algorithms of compiler design also apply to data formats?

SubjectAuthor
* Does the theory and algorithms of compiler design also apply to data formats?Roger L Costello
+* Re: Does the theory and algorithms of compiler design also apply to data formatsgah4
|`- Re: Does the theory and algorithms of compiler design also apply to data formatsThomas Koenig
`- Re: Does the theory and algorithms of compiler design also apply to data formatsmatt.ti...@gmail.com

1
Does the theory and algorithms of compiler design also apply to data formats?

<22-01-100@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=245&group=comp.compilers#245

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: Does the theory and algorithms of compiler design also apply to data formats?
Date: Sat, 22 Jan 2022 23:54:30 +0000
Organization: Compilers Central
Lines: 18
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-100@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="28076"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, question, comment
Posted-Date: 22 Jan 2022 20:54:50 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=mitre.org;
 by: Roger L Costello - Sat, 22 Jan 2022 23:54 UTC

Hello Compiler Experts!

The books that I've read always talk about applying compiler theory and
algorithms to programming languages. But there are other kinds of languages
such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats such
as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory
and vast algorithms of compilers apply to these non-programming languages? Has
anyone created a Bison parser for JPEG? For JSON? For CSV?

/Roger
[You could, but for the most part their syntax is so simple that a
formal parser would be overkill. For example, JSON has a handful of
atoms and only two data structures, a sequential list and a key:value
object. Everything else is the semantics. The Microsoft formats like
docx, xlsx, and pptx are in fact zip files containing XML files. Unzip
one and take a look.
Also look at XDR, a widely used network data format and rpcgen which compiles
an XDR description into code to read and write it. -John]

Re: Does the theory and algorithms of compiler design also apply to data formats?

<22-01-102@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=246&group=comp.compilers#246

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Does the theory and algorithms of compiler design also apply to data formats?
Date: Sat, 22 Jan 2022 20:33:49 -0800 (PST)
Organization: Compilers Central
Lines: 45
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-102@comp.compilers>
References: <22-01-100@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="43302"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, practice
Posted-Date: 23 Jan 2022 15:06:54 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-01-100@comp.compilers>
 by: gah4 - Sun, 23 Jan 2022 04:33 UTC

On Saturday, January 22, 2022 at 5:54:52 PM UTC-8, Roger L Costello wrote:

> The books that I've read always talk about applying compiler theory and
> algorithms to programming languages. But there are other kinds of languages
> such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats such
> as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory
> and vast algorithms of compilers apply to these non-programming languages? Has
> anyone created a Bison parser for JPEG? For JSON? For CSV?

In the cases where a data format has enough structure to be parsable with
compiler tools, it is usually named a programming language. (Unless you
define programming language as only something that can be converted
into executable object code for actual hardware.)

JPEG files are actually EXIF files containing JPEG image data.
The EXIF part contains other information such as data, time, shutter
speed, and pretty much anything related to the camera and settings
that one could think of.

Many data formats are the simplest format for the internal data
structures for some program.

PostScript is a programming language designed for controlling
printers, but it does have many of the characteristics of a more
general purpose language. It is mostly meant to be written by
programs, but can be written by people. Some PostScript
programs contain macros to parse data inside the file and
format it for output, such as plots.

TeX is a document description language that also has
many general language features. It is pretty much not
parsable with compiler tools, as just about everything
can be changed inside the program, such as which
characters are letters. Since changes take effect
right away, the parser can't do too much look ahead.

metafont is a language, meant to be used with TeX,
meant for designing fonts. It looks and works more
like a programming language, though with some features
that usual programming languages don't have. Among
others, instead of the usual assignment statement, but
defines the relationship between variables, more generally.

In all these cases, and I am sure more, the difference
between data and program blurs just enough.

Re: Does the theory and algorithms of compiler design also apply to data formats?

<22-01-104@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=248&group=comp.compilers#248

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: matt.tim...@gmail.com (matt.ti...@gmail.com)
Newsgroups: comp.compilers
Subject: Re: Does the theory and algorithms of compiler design also apply to data formats?
Date: Sun, 23 Jan 2022 06:58:02 -0800 (PST)
Organization: Compilers Central
Lines: 34
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-104@comp.compilers>
References: <22-01-100@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="44336"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 23 Jan 2022 15:08:08 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-01-100@comp.compilers>
 by: matt.ti...@gmail.com - Sun, 23 Jan 2022 14:58 UTC

On Saturday, 22 January 2022 at 20:54:52 UTC-5, Roger L Costello wrote:
> Hello Compiler Experts!
>
> The books that I've read always talk about applying compiler theory and
> algorithms to programming languages. But there are other kinds of languages
> such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats
such
> as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory
> and vast algorithms of compilers apply to these non-programming languages?
Has
> anyone created a Bison parser for JPEG? For JSON? For CSV?

As the moderator indicates, these kinds of data formats are designed to be
simple, and so its not usually useful to use grammar-based parser generators
for the data format itself.

SGML is a notable exception to this. The standard that defines it is large
and its grammar is complicated. It wouldn't be crazy to use a parser
generator for XML either.

For a lot of these data formats, though, you can apply schemas of some sort to
the data (SGML DTDs, XML schema, JSON schema, etc.), and when the data is
anticipated to represent a *document*, as in SGML or XML, these schemas are
basically a graph of nested regular expressions much like a grammar, and a lot
of parsing theory applies.

Furthermore, document *processing*, as in generating a printed manual from the
structure document that defines its parts, involves applying rules to
structures that are recognized in the content. This is syntax directed
translation (https://en.wikipedia.org/wiki/Syntax-directed_translation), and
all the related compiler theory applies. In some ways it is easier, because
the content you're translating is a tree instead of flat text, but in some
ways it is more difficult, because the job is to implement a manual human
process instead of a language that was designed to be parsed.

Re: Does the theory and algorithms of compiler design also apply to data formats?

<22-01-108@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=249&group=comp.compilers#249

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.compilers
Subject: Re: Does the theory and algorithms of compiler design also apply to data formats?
Date: Sun, 23 Jan 2022 21:05:40 -0000 (UTC)
Organization: news.netcologne.de
Lines: 15
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-108@comp.compilers>
References: <22-01-100@comp.compilers> <22-01-102@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="5095"; mail-complaints-to="abuse@iecc.com"
Keywords: syntax
Posted-Date: 23 Jan 2022 17:16:00 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Thomas Koenig - Sun, 23 Jan 2022 21:05 UTC

gah4 <gah4@u.washington.edu> schrieb:

> In the cases where a data format has enough structure to be parsable with
> compiler tools, it is usually named a programming language.

I think STEP (the CAD graphics format) is an exception.

A language called EXPRESS (specified in something like BNF) is used
to specify a "schema", and this specification can then be used to
write parsers for the actual file. All of this is specified in
standards which are quite expensive.

When I had occasion to write out CAD data from programs I wrote
myself, I looked at this workflow for an hour and decided to use
IGES instead.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor