Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"They ought to make butt-flavored cat food." --Gallagher


devel / comp.protocols.dicom / Re: Encoding of Traditional Chinese using ISO IR-100

SubjectAuthor
* Encoding of Traditional Chinese using ISO IR-100David Gobbi
`* Encoding of Traditional Chinese using ISO IR-100Sebastian Meyer
 `* Encoding of Traditional Chinese using ISO IR-100David Gobbi
  `* Encoding of Traditional Chinese using ISO IR-100David Gobbi
   `- Encoding of Traditional Chinese using ISO IR-100Sebastian Meyer

1
Re: Encoding of Traditional Chinese using ISO IR-100

<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10910&group=comp.protocols.dicom#10910

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:620a:1106:b0:742:9e15:3e0 with SMTP id o6-20020a05620a110600b007429e1503e0mr1026321qkk.5.1682700651404;
Fri, 28 Apr 2023 09:50:51 -0700 (PDT)
X-Received: by 2002:ad4:5a46:0:b0:5ef:6cae:c975 with SMTP id
ej6-20020ad45a46000000b005ef6caec975mr828051qvb.4.1682700651091; Fri, 28 Apr
2023 09:50:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 09:50:50 -0700 (PDT)
In-Reply-To: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.go...@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 16:50:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 4
 by: David Gobbi - Fri, 28 Apr 2023 16:50 UTC

Are you sure that these Chinese characters aren't just encoded in utf-8 in the DICOM file?

If they are, then notepad++ will probably be able to autodetect that they are utf-8 and display them properly, while applications that strictly enforce "ISO_IR 100" will display garbage.

If you can post a hex dump of one of the data elements that contains the characters, I could inspect it to see what encoding is used.

Re: Encoding of Traditional Chinese using ISO IR-100

<e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10911&group=comp.protocols.dicom#10911

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:6214:18f1:b0:5ef:4729:9896 with SMTP id ep17-20020a05621418f100b005ef47299896mr841319qvb.1.1682702335825;
Fri, 28 Apr 2023 10:18:55 -0700 (PDT)
X-Received: by 2002:a05:622a:1a20:b0:3f0:abe7:24a2 with SMTP id
f32-20020a05622a1a2000b003f0abe724a2mr2142088qtb.10.1682702335606; Fri, 28
Apr 2023 10:18:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:18:55 -0700 (PDT)
In-Reply-To: <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=82.198.201.83; posting-account=klpRsgoAAABbMeo4tWqaNw_dLxFda3Mq
NNTP-Posting-Host: 82.198.201.83
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com> <2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: mey...@mevis.de (Sebastian Meyer)
Injection-Date: Fri, 28 Apr 2023 17:18:55 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Sebastian Meyer - Fri, 28 Apr 2023 17:18 UTC

Hi David,
here is a dump of two of the affected tags, created with dcmdump:
>dcmdump +Qn +L +P InstitutionName +P StudyDescription "chinese_org.dcm"
(0008,0080) LO [&#169;&#201;&#164;&#175;&#186;&#238;&#166;X&#194;&#229;&#176;|] # 14, 1 InstitutionName
(0008,1030) LO [Low Dose Lung CT &#167;C&#190;&#175;&#182;q&#170;&#205;&#179;&#161;&#192;&#203;&#172;d(&#174;&#231;&#182;&#233;&#165;&#171;&#184;&#201;&#167;U)] # 44, 1 StudyDescription

Thanks for your help!

David Gobbi schrieb am Freitag, 28. April 2023 um 18:50:53 UTC+2:
> Are you sure that these Chinese characters aren't just encoded in utf-8 in the DICOM file?
>
> If they are, then notepad++ will probably be able to autodetect that they are utf-8 and display them properly, while applications that strictly enforce "ISO_IR 100" will display garbage.
>
> If you can post a hex dump of one of the data elements that contains the characters, I could inspect it to see what encoding is used.

Re: Encoding of Traditional Chinese using ISO IR-100

<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10912&group=comp.protocols.dicom#10912

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:ad4:55cc:0:b0:5ef:52a8:bb8d with SMTP id bt12-20020ad455cc000000b005ef52a8bb8dmr873691qvb.0.1682704007325;
Fri, 28 Apr 2023 10:46:47 -0700 (PDT)
X-Received: by 2002:a05:622a:1ba4:b0:3e1:5755:7bbf with SMTP id
bp36-20020a05622a1ba400b003e157557bbfmr2123924qtb.5.1682704007107; Fri, 28
Apr 2023 10:46:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:46:46 -0700 (PDT)
In-Reply-To: <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.go...@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 17:46:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1992
 by: David Gobbi - Fri, 28 Apr 2023 17:46 UTC

The encoding is Big5. I'm suprised that Notepad++ could autodetect it. Here is how I checked the encoding in Python:

b = bytearray([169,201,164,175,186,238,166,194,229,176])
print(b.decode('big5'))
怡仁綜汕撠

I tried UTF-8 first, then Big5, and voila, that's what it was.

Although Big5 isn't part of the DICOM standard, I have a tool that can dump these files:
https://github.com/dgobbi/vtk-dicom/wiki/dicomdump
https://github.com/dgobbi/vtk-dicom/releases/tag/v0.8.14

dicomdump --charset big5 chinese_org.dcm

Re: Encoding of Traditional Chinese using ISO IR-100

<9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10913&group=comp.protocols.dicom#10913

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:ae9:f312:0:b0:74d:562e:440d with SMTP id p18-20020ae9f312000000b0074d562e440dmr811489qkg.6.1682704437188;
Fri, 28 Apr 2023 10:53:57 -0700 (PDT)
X-Received: by 2002:a05:622a:1a92:b0:3e3:7dd2:47fc with SMTP id
s18-20020a05622a1a9200b003e37dd247fcmr2116485qtc.10.1682704437003; Fri, 28
Apr 2023 10:53:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 10:53:56 -0700 (PDT)
In-Reply-To: <393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.159.213.138; posting-account=oJk4vAoAAAAuHqwGdLwYUlL776upyWJ3
NNTP-Posting-Host: 136.159.213.138
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: david.go...@gmail.com (David Gobbi)
Injection-Date: Fri, 28 Apr 2023 17:53:57 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 6
 by: David Gobbi - Fri, 28 Apr 2023 17:53 UTC

On Friday, 28 April 2023 at 11:46:49 UTC-6, David Gobbi wrote:

> dicomdump --charset big5 chinese_org.dcm

Ah, I forgot that for this to work, the original CharacterSet has to be removed first, e.g.

dcmodify -e 0008,0005 chinese_org.dcm

Re: Encoding of Traditional Chinese using ISO IR-100

<30dd1659-02f5-4e72-a0a6-7b0cfcb84d1fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10914&group=comp.protocols.dicom#10914

  copy link   Newsgroups: comp.protocols.dicom
X-Received: by 2002:a05:620a:1001:b0:74e:4595:f39 with SMTP id z1-20020a05620a100100b0074e45950f39mr1087607qkj.11.1682705243955;
Fri, 28 Apr 2023 11:07:23 -0700 (PDT)
X-Received: by 2002:ac8:5703:0:b0:3ef:3af7:1c40 with SMTP id
3-20020ac85703000000b003ef3af71c40mr2170233qtw.3.1682705243737; Fri, 28 Apr
2023 11:07:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.protocols.dicom
Date: Fri, 28 Apr 2023 11:07:23 -0700 (PDT)
In-Reply-To: <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=82.198.201.83; posting-account=klpRsgoAAABbMeo4tWqaNw_dLxFda3Mq
NNTP-Posting-Host: 82.198.201.83
References: <d900f104-2aca-4d9b-8106-25eb8a108bbcn@googlegroups.com>
<2d896a5f-c483-4438-9151-66718b1b5c0dn@googlegroups.com> <e75cccc4-4855-41d9-84a8-ff542d3ae520n@googlegroups.com>
<393f6adf-483b-4c17-9d59-e7161f119b73n@googlegroups.com> <9842ef35-25d5-4e77-8a7e-6fafda312e2fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <30dd1659-02f5-4e72-a0a6-7b0cfcb84d1fn@googlegroups.com>
Subject: Re: Encoding of Traditional Chinese using ISO IR-100
From: mey...@mevis.de (Sebastian Meyer)
Injection-Date: Fri, 28 Apr 2023 18:07:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 11
 by: Sebastian Meyer - Fri, 28 Apr 2023 18:07 UTC

Thank you so much!

I was surprised by this capability of notepad++ as well. Then I learned that the primary author has Taiwanese roots: https://donho.github.io/

David Gobbi schrieb am Freitag, 28. April 2023 um 19:53:58 UTC+2:
> On Friday, 28 April 2023 at 11:46:49 UTC-6, David Gobbi wrote:
>
> > dicomdump --charset big5 chinese_org.dcm
>
> Ah, I forgot that for this to work, the original CharacterSet has to be removed first, e.g.
>
> dcmodify -e 0008,0005 chinese_org.dcm

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor