Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Keep the number of passes in a compiler to a minimum. -- D. Gries


devel / comp.lang.python / Re: what to do with multiple BOMs

SubjectAuthor
* Re: Compute working daysBruno Lirio
+- Re: Compute working daysMark Lawrence
+- Re: Compute working daysMRAB
+- what to do with multiple BOMsRobin Becker
`- Re: what to do with multiple BOMsMRAB

1
Re: Compute working days

<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14663&group=comp.lang.python#14663

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:ac8:7155:: with SMTP id h21mr9259544qtp.231.1629316627532;
Wed, 18 Aug 2021 12:57:07 -0700 (PDT)
X-Received: by 2002:a05:620a:b4f:: with SMTP id x15mr11358107qkg.436.1629316627214;
Wed, 18 Aug 2021 12:57:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Wed, 18 Aug 2021 12:57:06 -0700 (PDT)
In-Reply-To: <61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=187.18.29.60; posting-account=jmQ6LgoAAAAszYMrnGsifi4aDYqVhmKu
NNTP-Posting-Host: 187.18.29.60
References: <mailman.1839.1237036441.11746.python-list@python.org> <61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
Subject: Re: Compute working days
From: brunobrl...@gmail.com (Bruno Lirio)
Injection-Date: Wed, 18 Aug 2021 19:57:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Bruno Lirio - Wed, 18 Aug 2021 19:57 UTC

Em sábado, 14 de março de 2009 às 13:59:41 UTC-3, Casey escreveu:
> How about:
> from datetime import date, timedelta
> # Define the weekday mnemonics to match the date.weekday function
> (MON, TUE, WED, THU, FRI, SAT, SUN) = range(7)
> def workdays(start_date, end_date, whichdays=(MON,TUE,WED,THU,FRI)):
> '''
> Calculate the number of working days between two dates inclusive
> (start_date <= end_date).
> The actual working days can be set with the optional whichdays
> parameter
> (default is MON-FRI)
> '''
> delta_days = (end_date - start_date).days + 1
> full_weeks, extra_days = divmod(delta_days, 7)
> # num_workdays = how many days/week you work * total # of weeks
> num_workdays = (full_weeks + 1) * len(whichdays)
> # subtract out any working days that fall in the 'shortened week'
> for d in range(1, 8 - extra_days):
> if (end_date + timedelta(d)).weekday() in whichdays:
> num_workdays -= 1
> return num_workdays
Could it include the holidays in Brazil?

Re: Compute working days

<579a4be2-2a2e-48b2-bd8b-81076380254cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14664&group=comp.lang.python#14664

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a05:622a:10c:: with SMTP id u12mr9531385qtw.303.1629319276712;
Wed, 18 Aug 2021 13:41:16 -0700 (PDT)
X-Received: by 2002:a37:a613:: with SMTP id p19mr191848qke.28.1629319276583;
Wed, 18 Aug 2021 13:41:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Wed, 18 Aug 2021 13:41:16 -0700 (PDT)
In-Reply-To: <89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=146.200.61.243; posting-account=aKzvzQoAAAAnYB7N4xfKf_Ihdfau7aG5
NNTP-Posting-Host: 146.200.61.243
References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com> <89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <579a4be2-2a2e-48b2-bd8b-81076380254cn@googlegroups.com>
Subject: Re: Compute working days
From: breamore...@gmail.com (Mark Lawrence)
Injection-Date: Wed, 18 Aug 2021 20:41:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Mark Lawrence - Wed, 18 Aug 2021 20:41 UTC

On Wednesday, August 18, 2021 at 8:57:20 PM UTC+1, Bruno Lirio wrote:
> Em sábado, 14 de março de 2009 às 13:59:41 UTC-3, Casey escreveu:
> > How about:
> > from datetime import date, timedelta
> > # Define the weekday mnemonics to match the date.weekday function
> > (MON, TUE, WED, THU, FRI, SAT, SUN) = range(7)
> > def workdays(start_date, end_date, whichdays=(MON,TUE,WED,THU,FRI)):
> > '''
> > Calculate the number of working days between two dates inclusive
> > (start_date <= end_date).
> > The actual working days can be set with the optional whichdays
> > parameter
> > (default is MON-FRI)
> > '''
> > delta_days = (end_date - start_date).days + 1
> > full_weeks, extra_days = divmod(delta_days, 7)
> > # num_workdays = how many days/week you work * total # of weeks
> > num_workdays = (full_weeks + 1) * len(whichdays)
> > # subtract out any working days that fall in the 'shortened week'
> > for d in range(1, 8 - extra_days):
> > if (end_date + timedelta(d)).weekday() in whichdays:
> > num_workdays -= 1
> > return num_workdays
> Could it include the holidays in Brazil?

If you know what the holidays in Brazil are you can obviously extend this as needed, but it's a bit weird responding to a thread that's 12 yeasr old :)

Re: Compute working days

<mailman.431.1629325996.4164.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14665&group=comp.lang.python#14665

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: pyt...@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Compute working days
Date: Wed, 18 Aug 2021 23:33:02 +0100
Lines: 28
Message-ID: <mailman.431.1629325996.4164.python-list@python.org>
References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
<dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de sCCv2cVwHqG6aBipgtLQqQcmIp8KRfmMytY7pk7lVsvw==
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=k/lF8sP4;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'def': 0.04; 'parameter':
0.05; "'''": 0.09; 'datetime': 0.09; 'from:addr:python': 0.09;
'received:192.168.1.64': 0.09; 'subject:working': 0.09; 'yes.':
0.09; 'import': 0.14; 'algorithms': 0.16; 'assuming': 0.16;
'date,': 0.16; 'from:addr:mrabarnett.plus.com': 0.16;
'from:name:mrab': 0.16; 'holidays': 0.16; 'holidays.': 0.16;
'message-id:@mrabarnett.plus.com': 0.16; 'received:plus.net':
0.16; 'wrote:': 0.16; 'tue,': 0.18; 'thu,': 0.20; 'weeks': 0.22;
'fri,': 0.23; 'sat,': 0.23; 'to:addr:python-list': 0.23; 'actual':
0.24; 'function': 0.28; 'header:User-Agent:1': 0.31;
'received:192.168.1': 0.31; 'header:In-Reply-To:1': 0.33; 'two':
0.37; 'received:192.168': 0.37; 'include': 0.40; 'total': 0.40;
'could': 0.40; 'days': 0.61; "you'd": 0.63; 'skip:w 20': 0.65;
'skip:t 20': 0.67; 'received:212': 0.68; '2009': 0.69; 'casey':
0.69; 'days,': 0.69; 'holidays,': 0.84; 'subject:days': 0.84;
'fall': 0.95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1629325984; bh=DKgkqh02Hr7TSF47NdlBqWtTmjD//2a9L7i1uDAxt3Y=;
h=Subject:To:References:From:Date:In-Reply-To;
b=k/lF8sP4qbVXpl06vyQOYRA9NpocTnmCqt2dcd9jeSAq+8goqVBbwu+O5P1UdaoHp
XmGy6YFdc/1YDfzrHvrRU+Rt30naP5lWCtmppJ2dVxlqqfzc3ISspjHWlGsaCCghPN
m4KMbhEgJEvoR8PWTbdbfs1M2CFiOfjr3NSCz2uqOLTQrIs25U/CINSLsKnpQbyCfQ
9uHP+LdzUBcyTglC3Ejp8RRE6nfSr7GjL13uIEf4ARJ7aysLJIZ/lqj/xh3WgtIK1Y
umKpiy2RiTdg+oe0W8AdXUYDPX7tDg6fzfpZrwhKJqtjIRqwf1aKY9vvGFMdMwr6Nk
l/RrdVGz0dg7g==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.3 cv=IvmFjI3g c=1 sm=1 tr=0
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=LVKMQQmunCxMUDB8aAwA:9 a=YnFMlPnCd2enjw2Y:21
a=2vARzJJDCT151tWz:21 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
In-Reply-To: <89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
Content-Language: en-GB
X-CMAE-Envelope: MS4wfP13S6oWOXuRjIHy5elS5hBn7YZTDJsP2LlcjjgmWQz5pzRAUeJeGsm0kZDLHfqdTroQjfgLVtV3TktHevsuLKUn9e8K3Fv+n7GTnDtHhqqtGpetVNIi
/N7GeV7+OTSGuwjJtDbvdimIKWhPtGutFf/hjGhaMfek2/3a+gV2RqY75QNBTOAhHbDQtxhtfl0RWA==
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
X-Mailman-Original-References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
 by: MRAB - Wed, 18 Aug 2021 22:33 UTC

On 2021-08-18 20:57, Bruno Lirio wrote:
> Em sábado, 14 de março de 2009 às 13:59:41 UTC-3, Casey escreveu:
>> How about:
>> from datetime import date, timedelta
>> # Define the weekday mnemonics to match the date.weekday function
>> (MON, TUE, WED, THU, FRI, SAT, SUN) = range(7)
>> def workdays(start_date, end_date, whichdays=(MON,TUE,WED,THU,FRI)):
>> '''
>> Calculate the number of working days between two dates inclusive
>> (start_date <= end_date).
>> The actual working days can be set with the optional whichdays
>> parameter
>> (default is MON-FRI)
>> '''
>> delta_days = (end_date - start_date).days + 1
>> full_weeks, extra_days = divmod(delta_days, 7)
>> # num_workdays = how many days/week you work * total # of weeks
>> num_workdays = (full_weeks + 1) * len(whichdays)
>> # subtract out any working days that fall in the 'shortened week'
>> for d in range(1, 8 - extra_days):
>> if (end_date + timedelta(d)).weekday() in whichdays:
>> num_workdays -= 1
>> return num_workdays
> Could it include the holidays in Brazil?
>
Yes. The algorithms calculates the number of working days, assuming no
holidays, so you'd then subtract any working days between the start and
end dates that are actually holidays.

what to do with multiple BOMs

<mailman.432.1629395139.4164.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14671&group=comp.lang.python#14671

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: rob...@reportlab.com (Robin Becker)
Newsgroups: comp.lang.python
Subject: what to do with multiple BOMs
Date: Thu, 19 Aug 2021 14:07:43 +0100
Lines: 33
Message-ID: <mailman.432.1629395139.4164.python-list@python.org>
References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
<dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
<ce4a146a-09c0-dca7-3caa-501867a0fa10@everest.reportlab.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de X76brusX06IedpjBa7eEKg0XjAsPwK/+V5m9iCZy6m+w==
Return-Path: <python-python-list@m.gmane-mx.org>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.020
X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'received:ciao.gmane.io':
0.09; 'received:gmane.io': 0.09; 'received:list': 0.09; 'text.':
0.09; 'becker': 0.16; 'received:116.202': 0.16;
'received:116.202.254': 0.16; 'received:116.202.254.214': 0.16;
'removed.': 0.16; 'robin': 0.16; 'subject:what': 0.16; 'unicode':
0.16; 'says': 0.16; 'to:addr:python-list': 0.23; 'command': 0.24;
'seems': 0.26; 'wrong': 0.27; 'done': 0.28; 'output': 0.28;
'text': 0.29; 'header:User-Agent:1': 0.31; 'think': 0.31; 'but':
0.31; 'context': 0.32; 'header:In-Reply-To:1': 0.33; 'processed':
0.35; 'file': 0.38; 'though': 0.38; 'test': 0.40; 'entity': 0.40;
'ago': 0.40; 'initial': 0.61; 'kept': 0.61; 'years': 0.67;
'received:116': 0.71; 'experts': 0.77; 'extra': 0.84; 'implies':
0.84
X-Injected-Via-Gmane: http://gmane.org/
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
In-Reply-To: <dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
Content-Language: en-US-large
X-Mailman-Approved-At: Thu, 19 Aug 2021 13:45:38 -0400
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ce4a146a-09c0-dca7-3caa-501867a0fa10@everest.reportlab.co.uk>
X-Mailman-Original-References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
<dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
 by: Robin Becker - Thu, 19 Aug 2021 13:07 UTC

Channeling unicode text experts and xml people:

I have xml entity with initial bytes ff fe ff fe which the file command says is
UTF-16, little-endian text.

I agree, but what should be done about the additional BOM.

A test output made many years ago seems to keep the extra BOM. The xml context is

xml file 014.xml
<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
<!ENTITY e SYSTEM "014.ent">
]>
<doc>&e;</doc

the entitity file 014.ent is bombomdata

b'\xff\xfe\xff\xfed\x00a\x00t\x00a\x00'

The old saved test output of processing is

b'<doc>\xef\xbb\xbfdata</doc>'

which implies seems as though the extra BOM in the entity has been kept and processed into a different BOM meaning utf8.

I think the test file is wrong and that multiple BOM chars in the entiry should have been removed.

Am I right?
--
Robin Becker

Re: what to do with multiple BOMs

<mailman.433.1629399343.4164.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14673&group=comp.lang.python#14673

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: pyt...@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: what to do with multiple BOMs
Date: Thu, 19 Aug 2021 19:52:35 +0100
Lines: 39
Message-ID: <mailman.433.1629399343.4164.python-list@python.org>
References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
<dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
<ce4a146a-09c0-dca7-3caa-501867a0fa10@everest.reportlab.co.uk>
<cd847405-ee6a-dd60-cdd5-4b0a19487d7f@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de sHXUsvOAbQGXDw7COAm0pQdV5DJToBJDVbjvr8H14ZPQ==
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=Twf84/r7;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.002
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'e.g.': 0.07; 'thing.':
0.07; 'utf-8': 0.07; 'wrong.': 0.07; 'from:addr:python': 0.09;
'received:192.168.1.64': 0.09; 'text.': 0.09; 'looks': 0.11;
'becker': 0.16; 'from:addr:mrabarnett.plus.com': 0.16;
'from:name:mrab': 0.16; 'message-id:@mrabarnett.plus.com': 0.16;
'middle,': 0.16; 'originated': 0.16; 'received:plus.net': 0.16;
'removed.': 0.16; 'robin': 0.16; 'subject:what': 0.16; 'unicode':
0.16; 'wrote:': 0.16; 'says': 0.16; 'to:addr:python-list': 0.23;
'command': 0.24; 'seems': 0.26; 'wrong': 0.27; 'old': 0.28;
'done': 0.28; 'output': 0.28; 'putting': 0.28; 'text': 0.29;
'header:User-Agent:1': 0.31; 'received:192.168.1': 0.31; 'think':
0.31; 'but': 0.31; 'saved': 0.31; 'context': 0.32; 'header:In-
Reply-To:1': 0.33; 'processed': 0.35; 'windows': 0.36; 'system,':
0.37; 'received:192.168': 0.37; 'file': 0.38; 'read': 0.38;
'though': 0.38; "it's": 0.38; 'use': 0.38; 'test': 0.40; 'entity':
0.40; 'ago': 0.40; 'initial': 0.61; 'kept': 0.61; 'skip:b 10':
0.62; 'years': 0.67; 'received:212': 0.68; 'experts': 0.77;
'extra': 0.84; 'implies': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1629399156; bh=TR2x642clU1xf3yfWCIX/WzHit/dPspdpm3L0zd6cew=;
h=Subject:To:References:From:Date:In-Reply-To;
b=Twf84/r7gcRlNDYDqZS6ft2bz8m/sJsnOhf+v1du2tOR+ZXfyW11yuayCvjJgTt8P
9qlll0OJ4TY7wzQxu7GEghYwf5DgPTQeEkouOaDODd3QUwLD0xCH10W/vS1UfSXe+O
Sk1A+uMpQYZS2z7/75hqBjJluGoC38m92hv6qimYnPz+7UW5CToLB3hlCL9Bc6f5Mo
09NSx+mI97sFKqtFjNgANTy3pMXcIVPTLsQlwdT/aWtSfeKvy7QRNOkhepu+rSLYXY
Xt+CaK2D3WWSfnc9zNTSSn656AO68IpfMeiRdVX1i9tQDrPU9/51Xna7St1OBrupVd
v1Qe7KILVvBUA==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.3 cv=fI+iIaSe c=1 sm=1 tr=0
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=jn1RiYZd1jUJZ4bxpO8A:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
In-Reply-To: <ce4a146a-09c0-dca7-3caa-501867a0fa10@everest.reportlab.co.uk>
Content-Language: en-GB
X-CMAE-Envelope: MS4wfA7kl6/Nl1Hwg9QwyXolh4c8UVSaeAwq92ZcMwBOB7vcIac1Jnr87QF0CQAHqrbLBqoNikZ1BD7IUwQ1I1tHOqLuGcNNHK+tYG0jkWvYubQDevJ6t2vV
p0bD5ytDwmpIppq3hBHFCpfY69Djl/8XhLrKEG1Wea+JuX5a+8DPus/pK58yRI6hbSFEA+fe3iTVyg==
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <cd847405-ee6a-dd60-cdd5-4b0a19487d7f@mrabarnett.plus.com>
X-Mailman-Original-References: <mailman.1839.1237036441.11746.python-list@python.org>
<61bd7316-a3a5-40a7-a27b-4c74cda1a819@c11g2000yqj.googlegroups.com>
<89bfa317-950a-46b7-b5b9-ec855329dbc4n@googlegroups.com>
<dc7e79cd-7d54-cb57-51aa-356abe357b57@mrabarnett.plus.com>
<ce4a146a-09c0-dca7-3caa-501867a0fa10@everest.reportlab.co.uk>
 by: MRAB - Thu, 19 Aug 2021 18:52 UTC

On 2021-08-19 14:07, Robin Becker wrote:
> Channeling unicode text experts and xml people:
>
> I have xml entity with initial bytes ff fe ff fe which the file command says is
> UTF-16, little-endian text.
>
> I agree, but what should be done about the additional BOM.
>
> A test output made many years ago seems to keep the extra BOM. The xml context is
>
>
> xml file 014.xml
> <!DOCTYPE doc [
> <!ELEMENT doc (#PCDATA)>
> <!ENTITY e SYSTEM "014.ent">
> ]>
> <doc>&e;</doc
>
> the entitity file 014.ent is bombomdata
>
> b'\xff\xfe\xff\xfed\x00a\x00t\x00a\x00'
>
> The old saved test output of processing is
>
> b'<doc>\xef\xbb\xbfdata</doc>'
>
> which implies seems as though the extra BOM in the entity has been kept and processed into a different BOM meaning utf8.
>
> I think the test file is wrong and that multiple BOM chars in the entiry should have been removed.
>
> Am I right?
>
The use of a BOM b'\xef\xbb\xbf' at the start of a UTF-8 file is a
Windows thing. It's not used on non-Windows systems. Putting it in the
middle, e.g. b'<doc>\xef\xbb\xbfdata</doc>', just looks wrong.

It looks like the contents of a UTF-8 file, with a BOM because it
originated on a Windows system, were read in without stripping the BOM
first.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor