Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

The reason computer chips are so small is computers don't eat much.

Re: Some advice required [OT]

Subject	Author
Some advice required [OT]	Laurent
Re: Some advice required [OT]	Niklas Holsti
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Ben Bacarisse
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Dennis Lee Bieber
Re: Some advice required [OT]	Ben Bacarisse
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Ben Bacarisse
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Ben Bacarisse
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Ben Bacarisse
Re: Some advice required [OT]	Dennis Lee Bieber
Re: Some advice required [OT]	Randy Brukardt
Re: Some advice required [OT]	Dennis Lee Bieber
Re: Some advice required [OT]	Niklas Holsti
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Randy Brukardt
Re: Some advice required [OT]	Laurent
Re: Some advice required [OT]	Randy Brukardt
Re: Some advice required [OT]	Simon Wright
Re: Some advice required [OT]	Laurent

Some advice required [OT]

<7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6438&group=comp.lang.ada#6438

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a37:bd05:: with SMTP id n5mr11302562qkf.293.1640596910990;
Mon, 27 Dec 2021 01:21:50 -0800 (PST)
X-Received: by 2002:a05:6902:703:: with SMTP id k3mr19954981ybt.31.1640596910735;
Mon, 27 Dec 2021 01:21:50 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 01:21:50 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
Subject: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 09:21:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 83

by: Laurent - Mon, 27 Dec 2021 09:21 UTC

Hi all

My problem is not directly related to Ada but on how to solve it in general.

Also writing via the web interface of google groups :(

I have to do statistics on the results of antimicrobial susceptibility testings.
I have to keep only one strain/patient and the most resistant one.
Until now I have been doing it manually by staring for hours at Excel sheets.
I am trying to get it automated but I don't know how to solve my problem.

I have tried calculating a checksum from the results but I have cases
which are unclear/collide.

The result for a strain is one row, the results are in columns.

I treat the results in blocks of 3.
S has a value of 1, I =2 and R=3, empty cells = 0

without weight SRS (1+3+1) and RSS (3+1+1) or SSR (1+1+3) give both 5
I would have to keep the 3 because they are different.

I thought that weighting the position would solve the collisions
but nope.
The first cell has a value of 1, 2nd of 2 and 3rd of 3.
S has a value of 1, I =2 and R=3, empty cells = 0
with weight RRS (1*3+2*3+3*1) and SSR (1*1 + 2*1+3*3) give both 12

Is there a better way doing this?

What I have so far as VBA code:

Public Function Test_Checksum(rng_Range As Range) As String

Dim rng_Cell As Range
Dim int_Counter As Integer
Dim str_Result As String
Dim i As Long

int_Counter = 1

For Each rng_Cell In rng_Range

If rng_Cell.Value = "S" Then
i = i + 1 * int_Counter
ElseIf rng_Cell.Value = "I" Then
i = i + 2 * int_Counter
ElseIf rng_Cell.Value = "R" Then
i = i + 3 * int_Counter
Else
--empty cell
i = i + 0
End If

If int_Counter = 3 Then
int_Counter = 0

If i < 9 Then
str_Result = str_Result & "0" & CStr(i)
Else
str_Result = str_Result & CStr(i)
End If
i = 0
End If

int_Counter = int_Counter + 1

Next rng_Cell

--if the loop terminates but i <> 3
If i < 9 Then
str_Result = str_Result & "0" & CStr(i)
Else
str_Result = str_Result & CStr(i)
End If

Test_Checksum = str_Result
End Function

Thank you very much

Kind regards

Laurent

Re: Some advice required [OT]

<j2tlk8FneraU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6439&group=comp.lang.ada#6439

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 13:16:24 +0200
Organization: Tidorum Ltd
Lines: 23
Message-ID: <j2tlk8FneraU1@mid.individual.net>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net +V6k4na1cw+MOoT5tJgSkg/RNRzXzuo2aKMrgM2zEhT1gRpT7M
Cancel-Lock: sha1:nZE+hDCqsglejxnw5AKb2uLxc2c=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
Content-Language: en-US

by: Niklas Holsti - Mon, 27 Dec 2021 11:16 UTC

On 2021-12-27 11:21, Laurent wrote:
> Hi all
>
> My problem is not directly related to Ada but on how to solve it in general.
>
> Also writing via the web interface of google groups :(
>
> I have to do statistics on the results of antimicrobial susceptibility testings.
> I have to keep only one strain/patient and the most resistant one.
> Until now I have been doing it manually by staring for hours at Excel sheets.
> I am trying to get it automated but I don't know how to solve my problem.

[ problem description snipped ]

Sorry, but I found your problem description impossible to understand.
Try to describe more clearly the experiment that is done, the structure
of the data the experiment provides (the meaning of the Excel rows and
columns), and the statistic you want to compute.

Also, if you do not intend to implement the solution in Ada, this is not
the right group to discuss it.

Re: Some advice required [OT]

<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6440&group=comp.lang.ada#6440

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:620a:2596:: with SMTP id x22mr11606666qko.408.1640608146984;
Mon, 27 Dec 2021 04:29:06 -0800 (PST)
X-Received: by 2002:a05:6902:1246:: with SMTP id t6mr21398578ybu.305.1640608146764;
Mon, 27 Dec 2021 04:29:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 04:29:06 -0800 (PST)
In-Reply-To: <j2tlk8FneraU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 12:29:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 66

by: Laurent - Mon, 27 Dec 2021 12:29 UTC

On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:

> Sorry, but I found your problem description impossible to understand.
> Try to describe more clearly the experiment that is done, the structure
> of the data the experiment provides (the meaning of the Excel rows and
> columns), and the statistic you want to compute.

Sorry tried to keep it short, was too short.

Columns are the antimicrobial drugs
Rows are the microorganism.

So every cell contains a result of S, I, R or simply an empty cell

S = Sensible
I = Intermediate
R = Resistant

empty cell <S<I<R

If a patient has 3 strains of the same microorganism but with different resistance profiles
I have to find the most resistant one. Or if they are different I keep them all.

I have no idea how to explain what I am doing to the compiler.
Why I would choose result from strain B over the result from strain A.

strain A: SSSRSS
strain B: SSRRRS

Simply counting the number of S, I and R doesn't work. Checksum with/without weight for the column number doesn't
work either.

Even if I get a correct result I have still the same problem as before why result B over result A.

Thought about building a tree for every family of drugs but the problem will again be the same.
How to decide which result to choose.

Would be easier to attach the Excel file directly.

> Also, if you do not intend to implement the solution in Ada, this is not
> the right group to discuss it.

I would very much prefer to solve it in Ada but at work I am stuck with Excel and VBA which is better than
doing it manually. After a few hours starring at a screen with thousand of rows of results...
If I get an Ada solution I can adapt it. Just limited to no access/pointers in VBA which shouldn't be required?

I know this is the wrong group to discuss, unfortunately I don't know of any place where I would get usable advice.
Here at least I know that there are no trolls or whatever. Perhaps some miscommunication because I thought my
explanation was clear but it wasn't because of missing context.

I am lurking in this group for some time. Just gave up on using Ada

Thanks

Laurent

Re: Some advice required [OT]

<87sfue8a0v.fsf@bsb.me.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6441&group=comp.lang.ada#6441

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 13:14:40 +0000
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <87sfue8a0v.fsf@bsb.me.uk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="43c94de768bae35691324aa39c32d313";
logging-data="16133"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/hYf9QYqJkv/0uqDM2unT7/uwVMLfbEqk="
Cancel-Lock: sha1:kyTAH9i83mQbQOzcKu+EvsvL5BM=
sha1:6+WaJPa9FY1r+Qqka4w6lLKiZWg=
X-BSB-Auth: 1.28d44797039971f8b356.20211227131440GMT.87sfue8a0v.fsf@bsb.me.uk

by: Ben Bacarisse - Mon, 27 Dec 2021 13:14 UTC

Laurent <lutgenl@icloud.com> writes:

> On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
>
>> Sorry, but I found your problem description impossible to understand.
>> Try to describe more clearly the experiment that is done, the structure
>> of the data the experiment provides (the meaning of the Excel rows and
>> columns), and the statistic you want to compute.
>
> Sorry tried to keep it short, was too short.
>
> Columns are the antimicrobial drugs
> Rows are the microorganism.
>
> So every cell contains a result of S, I, R or simply an empty cell
>
> S = Sensible
> I = Intermediate
> R = Resistant
>
> empty cell <S<I<R
>
> If a patient has 3 strains of the same microorganism but with
> different resistance profiles I have to find the most resistant
> one. Or if they are different I keep them all.
>
> I have no idea how to explain what I am doing to the compiler.

I think when you can explain it to people, you'll be able to code it. I
am still struggling to understand what you need.

> Why I would choose result from strain B over the result from strain A.
>
> strain A: SSSRSS
> strain B: SSRRRS

Let's space it out

drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
strain A S S S R S S
strain B S S R R R S

You want to choose B because it has is resistant to more drugs, yes?

I think, from the ordering you give, you need a measure that treats an R
as "more important" that any "I" which is "more important" than an "S".
(We will come to empty cells later.)

I think you need to treat the number of Rs, Is and Ss like digits in a
number. In base 10, the strains score

R S I
strain A 1 5 0 = 150
strain B 3 3 0 = 330

Now, in fact, you don't need to use base 10. The smallest base you can
use is one more than the maximum number of test results. If there can
be up to 16 tests (say) the score is

n(R)*17*17 + n(S)*17 + n(I).

If this suits your needs, we can consider empty cells later on. It's
not at all clear to me how to compare

strain C R____
strain D RRSSSS

Strain C is "less resistant" but only because there is not enough
information. In fact it seems more serious as it is resistant to all
tested drugs.

And then what about

strain D SR
strain E RS

Do you need to weight the drugs to break ties? I.e. is drug x more
important than drug y if x < y?

--
Ben.

Re: Some advice required [OT]

<lytueuufsz.fsf@pushface.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6445&group=comp.lang.ada#6445

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!aioe.org!8nKyDL3nVTTIdBB8axZhRA.user.46.165.242.75.POSTED!not-for-mail
From: sim...@pushface.org (Simon Wright)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 17:18:52 +0000
Organization: Aioe.org NNTP Server
Message-ID: <lytueuufsz.fsf@pushface.org>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: gioia.aioe.org; logging-data="49388"; posting-host="8nKyDL3nVTTIdBB8axZhRA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (darwin)
X-Notice: Filtered by postfilter v. 0.9.2
Cancel-Lock: sha1:4ZMawIgHkecrElfOGECxlpSMuq8=

by: Simon Wright - Mon, 27 Dec 2021 17:18 UTC

Laurent <lutgenl@icloud.com> writes:

> Until now I have been doing it manually by staring for hours at Excel
> sheets. I am trying to get it automated but I don't know how to solve
> my problem.

You must go through some mental process while staring at the
spreadsheets; what's that process? It can't involve checksums!

In a post below, you said you had to choose the most resistant, or if
different all of them, which doesn't make sense. Are you perhaps
thinking of ties? in which case you must have some notion of scoring
profiles so you can determine which profiles come equal-first.

Does RSSSSS score higher or lower than SIIIII?

Re: Some advice required [OT]

<cjsjsg1r74m5euhqmsd64lre5rc43dpf2n@4ax.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6446&group=comp.lang.ada#6446

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 27 Dec 2021 11:41:48 -0600
From: wlfr...@ix.netcom.com (Dennis Lee Bieber)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 12:41:43 -0500
Organization: IISS Elusive Unicorn
Message-ID: <cjsjsg1r74m5euhqmsd64lre5rc43dpf2n@4ax.com>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
X-No-Archive: yes
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 50
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-53bovxQPVSBV2FM33I59r1EQnefoLQCzNsYX+/bwYR/fSnxOQRPxwREipWuuq5r+u9LtMgPfnwc5PXa!nwfA8qu+j0sMaltj6IoDpH/3DG42JGguSqXo1nC0Ug3mP/eJpjbnVB7AArszOvE2B3Qlkxw2
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3346

by: Dennis Lee Bieber - Mon, 27 Dec 2021 17:41 UTC

On Mon, 27 Dec 2021 04:29:06 -0800 (PST), Laurent <lutgenl@icloud.com>
declaimed the following:

>Why I would choose result from strain B over the result from strain A.
>
>strain A: SSSRSS
>strain B: SSRRRS
>
>Simply counting the number of S, I and R doesn't work. ?Checksum with/without weight for the column number doesn't
>work either.

I wouldn't expect a checksum to be of any use, since the idea of most
checksums (and CRCs) is to be able to verify that a data sequence has not
been corrupted. Checksums don't "rank" data.

>
>Even if I get a correct result I have still the same problem as before why result B over result A.
>

Unfortunately, until you CAN describe why one result is preferred over
another, no one will be able to suggest algorithm(s) that may work (of
course, once you can explain it, you may not need assistance translating it
to code). For all we know, the cost of the various compounds might be a
factor affecting which of two similar result rows might be desired.

Actually, I'm still perplexed at the idea that the solution is picking
microbe strains that are most resistant to drugs -- unless one is trying to
reduce test candidates for yet undeveloped drugs ("if our new concoction
kills this strain, /then/ we will test it against the rest of the
strains").

I'm tempted to suggest R (or other statistical software) and
experimenting with various presentation/partitioning operations to see if
something reasonable pops out. Your data is NOT numerical (so don't bother
assigning numbers to your <null>SIR -- after all, you could just as easily
assign the ordinal position in the ASCII alphabet to them), so statistical
operations that work on non-numeric "factors" makes as much, if not more,
sense. (I've only toyed with R, so I don't know if it has partitioning
ability for three factors -- be a bit tedious to have to specify, say,

compound-X factor = R
(true) (false)
compound-X factor = S
(true) (false)
)

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Re: Some advice required [OT]

<31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6447&group=comp.lang.ada#6447

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:ad4:5cef:: with SMTP id iv15mr16017806qvb.82.1640629462328;
Mon, 27 Dec 2021 10:24:22 -0800 (PST)
X-Received: by 2002:a25:d086:: with SMTP id h128mr10876209ybg.646.1640629462087;
Mon, 27 Dec 2021 10:24:22 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 10:24:21 -0800 (PST)
In-Reply-To: <87sfue8a0v.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=78.141.135.179; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 78.141.135.179
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 18:24:22 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 115

by: Laurent - Mon, 27 Dec 2021 18:24 UTC

On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> Laurent <lut...@icloud.com> writes:
>
> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> >
> >> Sorry, but I found your problem description impossible to understand.
> >> Try to describe more clearly the experiment that is done, the structure
> >> of the data the experiment provides (the meaning of the Excel rows and
> >> columns), and the statistic you want to compute.
> >
> > Sorry tried to keep it short, was too short.
> >
> > Columns are the antimicrobial drugs
> > Rows are the microorganism.
> >
> > So every cell contains a result of S, I, R or simply an empty cell
> >
> > S = Sensible
> > I = Intermediate
> > R = Resistant
> >
> > empty cell <S<I<R
> >
> > If a patient has 3 strains of the same microorganism but with
> > different resistance profiles I have to find the most resistant
> > one. Or if they are different I keep them all.
> >
> > I have no idea how to explain what I am doing to the compiler.
> I think when you can explain it to people, you'll be able to code it. I
> am still struggling to understand what you need.
> > Why I would choose result from strain B over the result from strain A.
> >
> > strain A: SSSRSS
> > strain B: SSRRRS
> Let's space it out
>
> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> strain A S S S R S S
> strain B S S R R R S
>
> You want to choose B because it has is resistant to more drugs, yes?
>

Yes indeed

> I think, from the ordering you give, you need a measure that treats an R
> as "more important" that any "I" which is "more important" than an "S".
> (We will come to empty cells later.)
>
> I think you need to treat the number of Rs, Is and Ss like digits in a
> number. In base 10, the strains score
>
> R S I
> strain A 1 5 0 = 150
> strain B 3 3 0 = 330
>
> Now, in fact, you don't need to use base 10. The smallest base you can
> use is one more than the maximum number of test results. If there can
> be up to 16 tests (say) the score is
>
> n(R)*17*17 + n(S)*17 + n(I).
>
> If this suits your needs, we can consider empty cells later on. It's
> not at all clear to me how to compare
>
> strain C R____
> strain D RRSSSS
>
> Strain C is "less resistant" but only because there is not enough
> information. In fact it seems more serious as it is resistant to all
> tested drugs.
>

Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.

> And then what about
>
> strain D SR
> strain E RS
>

Yes those are the cases which are annoying me.

That's why I came up withe idea of multiplying the value of the result (S=1, I=2 and R=3) with the position of the value.
Tried it with triplets but there will still be cases where different results will give the same numeric value.
Ignoring empty cells for the moment.

Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12 will be the same numerical value but they are different resistance profiles
I would in this case keep both.

How to prevent that from happening.

The results are way longer than only 3 values so the possibilities for collisions are higher.

R R R R R S R R R S S S R S S => numeric:1812180608
R R R R R S R R R R S S S S S => numeric:1812180806

I have to keep both and that was an easy one. Only 2 to compare not 5.

A lot of R in common, only 2 differences.
Ok now I have the same result in an other representation.
Still no idea how to explain to the compiler why they are different.

> Do you need to weight the drugs to break ties? I.e. is drug x more
> important than drug y if x < y?
>
> --
> Ben.

Yes there is a hierarchy in the drugs but that information is not available in the exported results I work with.
I know that because it is part of my formation as medical technical assistant in a lab.
I was hoping to not have to recreate some form of expert system.

Thanks

Laurent

Re: Some advice required [OT]

<1b900321-0ef5-4a93-9816-db7ea55d49bfn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6448&group=comp.lang.ada#6448

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:622a:115:: with SMTP id u21mr15579870qtw.472.1640629845184;
Mon, 27 Dec 2021 10:30:45 -0800 (PST)
X-Received: by 2002:a05:6902:727:: with SMTP id l7mr22781787ybt.115.1640629845000;
Mon, 27 Dec 2021 10:30:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 10:30:44 -0800 (PST)
In-Reply-To: <lytueuufsz.fsf@pushface.org>
Injection-Info: google-groups.googlegroups.com; posting-host=78.141.135.179; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 78.141.135.179
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <lytueuufsz.fsf@pushface.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1b900321-0ef5-4a93-9816-db7ea55d49bfn@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 18:30:45 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 30

by: Laurent - Mon, 27 Dec 2021 18:30 UTC

On Monday, 27 December 2021 at 18:18:54 UTC+1, Simon Wright wrote:
> Laurent <lut...@icloud.com> writes:
>
> > Until now I have been doing it manually by staring for hours at Excel
> > sheets. I am trying to get it automated but I don't know how to solve
> > my problem.
> You must go through some mental process while staring at the
> spreadsheets; what's that process? It can't involve checksums!
>

Perhaps not calculating but somehow estimating which has the most R's and considering
the positions they are at. Considering the family of drug it belongs to.

With certain microorganisms it works quite good because they are not very variable.
Others are a real pain. Mostly finding the least resistant with the most in common of
the more resistant and then deleting one after another. Then comparing the leftover ones.

Takes time, high risk of messing up things because I begin to see things which are not there...

> In a post below, you said you had to choose the most resistant, or if
> different all of them, which doesn't make sense. Are you perhaps
> thinking of ties? in which case you must have some notion of scoring
> profiles so you can determine which profiles come equal-first.
>
> Does RSSSSS score higher or lower than SIIIII?

That would be 2 different strains which I would keep both.

Thanks

Laurent

Re: Some advice required [OT]

<j2ugi8FshmlU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6449&group=comp.lang.ada#6449

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 20:56:07 +0200
Organization: Tidorum Ltd
Lines: 20
Message-ID: <j2ugi8FshmlU1@mid.individual.net>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<cjsjsg1r74m5euhqmsd64lre5rc43dpf2n@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 9H3nyK7DY1aTRScViJCGyQf7Pzp2ZmNTGmTPliCSpccagWKkwk
Cancel-Lock: sha1:X3ch0cQ45PxRBdAH00ax4zpO8fc=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <cjsjsg1r74m5euhqmsd64lre5rc43dpf2n@4ax.com>
Content-Language: en-US

by: Niklas Holsti - Mon, 27 Dec 2021 18:56 UTC

On 2021-12-27 19:41, Dennis Lee Bieber wrote:
> On Mon, 27 Dec 2021 04:29:06 -0800 (PST), Laurent
> <lutgenl@icloud.com> declaimed the following:
>
>> Why I would choose result from strain B over the result from strain
>> A.
>>
>> strain A: SSSRSS strain B: SSRRRS
>>
>> Simply counting the number of S, I and R doesn't work. ?Checksum
>> with/without weight for the column number doesn't work either.
>
> I wouldn't expect a checksum to be of any use, since the idea of
> most checksums (and CRCs) is to be able to verify that a data
> sequence has not been corrupted. Checksums don't "rank" data.

I believe that Laurent does not mean "checksum" in its usual meaning,
but a numerical "score" computed as a sum of terms multiplied by
weights. Whether such a score can solve Laurent's problem is not clear.

Re: Some advice required [OT]

<d5782f58-e541-4224-8b9f-a173cb9c9054n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6450&group=comp.lang.ada#6450

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a37:6113:: with SMTP id v19mr13152070qkb.333.1640634281350;
Mon, 27 Dec 2021 11:44:41 -0800 (PST)
X-Received: by 2002:a25:ba05:: with SMTP id t5mr23920147ybg.675.1640634281115;
Mon, 27 Dec 2021 11:44:41 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 11:44:40 -0800 (PST)
In-Reply-To: <j2ugi8FshmlU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=78.141.135.179; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 78.141.135.179
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<cjsjsg1r74m5euhqmsd64lre5rc43dpf2n@4ax.com> <j2ugi8FshmlU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d5782f58-e541-4224-8b9f-a173cb9c9054n@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 19:44:41 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25

by: Laurent - Mon, 27 Dec 2021 19:44 UTC

On Monday, 27 December 2021 at 19:56:10 UTC+1, Niklas Holsti wrote:
> On 2021-12-27 19:41, Dennis Lee Bieber wrote:
> > On Mon, 27 Dec 2021 04:29:06 -0800 (PST), Laurent
> > <lut...@icloud.com> declaimed the following:
> >
> >> Why I would choose result from strain B over the result from strain
> >> A.
> >>
> >> strain A: SSSRSS strain B: SSRRRS
> >>
> >> Simply counting the number of S, I and R doesn't work. ?Checksum
> >> with/without weight for the column number doesn't work either.
> >
> > I wouldn't expect a checksum to be of any use, since the idea of
> > most checksums (and CRCs) is to be able to verify that a data
> > sequence has not been corrupted. Checksums don't "rank" data.
>
>
> I believe that Laurent does not mean "checksum" in its usual meaning,
> but a numerical "score" computed as a sum of terms multiplied by
> weights. Whether such a score can solve Laurent's problem is not clear.

I used the algorithm for calculating the checksum for the GTIN in barcodes as
starting point so I got stuck on the term checksum.

Half of the problem is finding the correct word with which to feed google to find an answer.

Re: Some advice required [OT]

<k05ksg9487nd13shdjgogl2j9eg6tq5c85@4ax.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6451&group=comp.lang.ada#6451

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 27 Dec 2021 13:51:57 -0600
From: wlfr...@ix.netcom.com (Dennis Lee Bieber)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 14:51:55 -0500
Organization: IISS Elusive Unicorn
Message-ID: <k05ksg9487nd13shdjgogl2j9eg6tq5c85@4ax.com>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com> <87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
X-No-Archive: yes
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 50
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-GNQYqzHI/q7HuUac0UYanENolApOpb1lhnsw2iQrYdpByAscgufctqdbKBcu3cJITHIw/Ei4QVWO7r9!F0XIkjKlNSn5l0/9qtz3bxorh16PtGqpBbh4XZq+f+H+KsGkwHLCiS9Cw3Uuzc2DK/DZeL16
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3588

by: Dennis Lee Bieber - Mon, 27 Dec 2021 19:51 UTC

On Mon, 27 Dec 2021 10:24:21 -0800 (PST), Laurent <lutgenl@icloud.com>
declaimed the following:

>
>Yes those are the cases which are annoying me.
>
>That's why I came up withe idea of multiplying the value of the result (S=1, I=2 and R=3) with the position of the value.
>Tried it with triplets but there will still be cases where different results will give the same numeric value.
>Ignoring empty cells for the moment.
>
Multiplying by column position inherently gives priority to the column
with the highest position. If the columns are, of themselves, not
significant, your algorithm needs to ignore column (reordering the columns
should not make a change in final selection). You could just about as
easily do a multi-column sort giving priority to the most significant
column.

>Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12 will be the same numerical value but they are different resistance profiles
>I would in this case keep both.

So far as I can make out -- ANY collision qualifies as "different
resistance profiles". In that example, the count of Ss vs the count of Rs
differ, but...

>The results are way longer than only 3 values so the possibilities for collisions are higher.
>
>R R R R R S R R R S S S R S S => numeric:1812180608
>R R R R R S R R R R S S S S S => numeric:1812180806
>
>I have to keep both and that was an easy one. Only 2 to compare not 5.
>

In this example the count of Ss and count of Rs is the same between the
two. And, again, you've applied an arbitrary ranking of the columns
(changing the order of the columns will tend to produce wildly different
sums).

>Yes there is a hierarchy in the drugs but that information is not available in the exported results I work with.

In that situation I would complain to the provider that the exported
data is incompletely defined. At the very least, the columns should be in
ascending (or descending) order of significance, justifying use of column
position as a weight (even better would be to have a row of the data
containing the weight to be used for a given column, which makes the column
order irrelevant).

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Re: Some advice required [OT]

<87fsqd7oz8.fsf@bsb.me.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6452&group=comp.lang.ada#6452

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 20:49:15 +0000
Organization: A noiseless patient Spider
Lines: 100
Message-ID: <87fsqd7oz8.fsf@bsb.me.uk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk>
<31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="43c94de768bae35691324aa39c32d313";
logging-data="12127"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197DJVR+KIc5toxLfjgqmtYIMOHjTAonlU="
Cancel-Lock: sha1:AGcdZrBYi/GFCmoePcJxgycJR/A=
sha1:d6m/7tZuJ0OaB5wdaLVIzmey2h0=
X-BSB-Auth: 1.69f5f7f54a69cca68d5c.20211227204916GMT.87fsqd7oz8.fsf@bsb.me.uk

by: Ben Bacarisse - Mon, 27 Dec 2021 20:49 UTC

Laurent <lutgenl@icloud.com> writes:

> On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
>> Laurent <lut...@icloud.com> writes:
>>
>> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
>> >
>> >> Sorry, but I found your problem description impossible to understand.
>> >> Try to describe more clearly the experiment that is done, the structure
>> >> of the data the experiment provides (the meaning of the Excel rows and
>> >> columns), and the statistic you want to compute.
>> >
>> > Sorry tried to keep it short, was too short.
>> >
>> > Columns are the antimicrobial drugs
>> > Rows are the microorganism.
>> >
>> > So every cell contains a result of S, I, R or simply an empty cell
>> >
>> > S = Sensible
>> > I = Intermediate
>> > R = Resistant
>> >
>> > empty cell <S<I<R
>> >
>> > If a patient has 3 strains of the same microorganism but with
>> > different resistance profiles I have to find the most resistant
>> > one. Or if they are different I keep them all.
>> >
>> > I have no idea how to explain what I am doing to the compiler.
>> I think when you can explain it to people, you'll be able to code it. I
>> am still struggling to understand what you need.
>> > Why I would choose result from strain B over the result from strain A.
>> >
>> > strain A: SSSRSS
>> > strain B: SSRRRS
>> Let's space it out
>>
>> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
>> strain A S S S R S S
>> strain B S S R R R S
>>
>> You want to choose B because it has is resistant to more drugs, yes?
>>
>
> Yes indeed
>
>> I think, from the ordering you give, you need a measure that treats an R
>> as "more important" that any "I" which is "more important" than an "S".
>> (We will come to empty cells later.)
>>
>> I think you need to treat the number of Rs, Is and Ss like digits in a
>> number. In base 10, the strains score
>>
>> R S I
>> strain A 1 5 0 = 150
>> strain B 3 3 0 = 330
>>
>> Now, in fact, you don't need to use base 10. The smallest base you can
>> use is one more than the maximum number of test results. If there can
>> be up to 16 tests (say) the score is
>>
>> n(R)*17*17 + n(S)*17 + n(I).
>>
>> If this suits your needs, we can consider empty cells later on. It's
>> not at all clear to me how to compare
>>
>> strain C R____
>> strain D RRSSSS
>>
>> Strain C is "less resistant" but only because there is not enough
>> information. In fact it seems more serious as it is resistant to all
>> tested drugs.
>>
>
> Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
>
>> And then what about
>>
>> strain D SR
>> strain E RS
>>
>
> Yes those are the cases which are annoying me.
>
> That's why I came up withe idea of multiplying the value of the result
> (S=1, I=2 and R=3) with the position of the value. Tried it with
> triplets but there will still be cases where different results will
> give the same numeric value. Ignoring empty cell able tps for the moment.
>
> Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> will be the same numerical value but they are different resistance
> profiles I would in this case keep both.
>
> How to prevent that from happening.

Can you first say why the suggestion I made is not helpful?

--
Ben.

Re: Some advice required [OT]

<e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6453&group=comp.lang.ada#6453

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:6214:d05:: with SMTP id 5mr16675197qvh.46.1640642995174;
Mon, 27 Dec 2021 14:09:55 -0800 (PST)
X-Received: by 2002:a25:7791:: with SMTP id s139mr1654011ybc.26.1640642994980;
Mon, 27 Dec 2021 14:09:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 14:09:54 -0800 (PST)
In-Reply-To: <87fsqd7oz8.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=78.141.135.179; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 78.141.135.179
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Mon, 27 Dec 2021 22:09:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 153

by: Laurent - Mon, 27 Dec 2021 22:09 UTC

On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> Laurent <lut...@icloud.com> writes:
>
> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> >> Laurent <lut...@icloud.com> writes:
> >>
> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> >> >
> >> >> Sorry, but I found your problem description impossible to understand.
> >> >> Try to describe more clearly the experiment that is done, the structure
> >> >> of the data the experiment provides (the meaning of the Excel rows and
> >> >> columns), and the statistic you want to compute.
> >> >
> >> > Sorry tried to keep it short, was too short.
> >> >
> >> > Columns are the antimicrobial drugs
> >> > Rows are the microorganism.
> >> >
> >> > So every cell contains a result of S, I, R or simply an empty cell
> >> >
> >> > S = Sensible
> >> > I = Intermediate
> >> > R = Resistant
> >> >
> >> > empty cell <S<I<R
> >> >
> >> > If a patient has 3 strains of the same microorganism but with
> >> > different resistance profiles I have to find the most resistant
> >> > one. Or if they are different I keep them all.
> >> >
> >> > I have no idea how to explain what I am doing to the compiler.
> >> I think when you can explain it to people, you'll be able to code it. I
> >> am still struggling to understand what you need.
> >> > Why I would choose result from strain B over the result from strain A.
> >> >
> >> > strain A: SSSRSS
> >> > strain B: SSRRRS
> >> Let's space it out
> >>
> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> >> strain A S S S R S S
> >> strain B S S R R R S
> >>
> >> You want to choose B because it has is resistant to more drugs, yes?
> >>
> >
> > Yes indeed
> >
> >> I think, from the ordering you give, you need a measure that treats an R
> >> as "more important" that any "I" which is "more important" than an "S".
> >> (We will come to empty cells later.)
> >>
> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> number. In base 10, the strains score
> >>
> >> R S I
> >> strain A 1 5 0 = 150
> >> strain B 3 3 0 = 330
> >>
> >> Now, in fact, you don't need to use base 10. The smallest base you can
> >> use is one more than the maximum number of test results. If there can
> >> be up to 16 tests (say) the score is
> >>
> >> n(R)*17*17 + n(S)*17 + n(I).
> >>
> >> If this suits your needs, we can consider empty cells later on. It's
> >> not at all clear to me how to compare
> >>
> >> strain C R____
> >> strain D RRSSSS
> >>
> >> Strain C is "less resistant" but only because there is not enough
> >> information. In fact it seems more serious as it is resistant to all
> >> tested drugs.
> >>
> >
> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> >
> >> And then what about
> >>
> >> strain D SR
> >> strain E RS
> >>
> >
> > Yes those are the cases which are annoying me.
> >
> > That's why I came up withe idea of multiplying the value of the result
> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> > triplets but there will still be cases where different results will
> > give the same numeric value. Ignoring empty cell able tps for the moment.
> >
> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> > will be the same numerical value but they are different resistance
> > profiles I would in this case keep both.
> >
> > How to prevent that from happening.
> Can you first say why the suggestion I made is not helpful?
>
> --
> Ben.

You mean that one:

> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> number. In base 10, the strains score
> >>
> >> R S I
> >> strain A 1 5 0 = 150
> >> strain B 3 3 0 = 330
> >>

Different resistance profiles same result:

S S S S S S R S S S S S S S S
score=1 14 0

S S S S S S S S S S S S R S S
score=1 14 0

R R R R R S R R R S S S S S S
score = 8 7 0

R R R R S S R R R S S S R S S
score = 8 7 0

I found 6 of those cases in 69 possible duplicates.

> >> Now, in fact, you don't need to use base 10. The smallest base you can
> >> use is one more than the maximum number of test results. If there can
> >> be up to 16 tests (say) the score is
> >>
> >> n(R)*17*17 + n(S)*17 + n(I).
> >>

The maximum is probably 20. More drugs don't fit onto one antimicrobial susceptibility test (AST) card.

Just scoring the numbers together doesn't work always because of those cases as you said yourself:
> >> And then what about
> >> strain D SR
> >> strain E RS

So I jumped to the conclusion that I need to add a weight for the position.

That's the solution I have figured out myself so far. But it suffers from the same problem perhaps
less often.

In the data I am testing I have 264 rows with results but only 69 are possible duplicates.
None of those produced a collision. So I have no idea how common that problem actually is.

Have to check that when I am back at work tomorrow.

Thanks

Laurent

Re: Some advice required [OT]

<87y2456071.fsf@bsb.me.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6454&group=comp.lang.ada#6454

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 00:29:54 +0000
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <87y2456071.fsf@bsb.me.uk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk>
<31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk>
<e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="006a31cbf3e05dab45838058281df2a3";
logging-data="26737"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VEHGYNF7RRA8X0p/wfa6lGCTQnDSM5aA="
Cancel-Lock: sha1:GMX0Gf5HDh5CTgTex65RZgGdgZ8=
sha1:VWlpvvPFxynu48+RCyk3a9GFzK0=
X-BSB-Auth: 1.58a2ef3709a584525dd7.20211228002954GMT.87y2456071.fsf@bsb.me.uk

by: Ben Bacarisse - Tue, 28 Dec 2021 00:29 UTC

Laurent <lutgenl@icloud.com> writes:

> On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
>> Laurent <lut...@icloud.com> writes:
>>
>> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
>> >> Laurent <lut...@icloud.com> writes:
>> >>
>> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
>> >> >
>> >> >> Sorry, but I found your problem description impossible to understand.
>> >> >> Try to describe more clearly the experiment that is done, the structure
>> >> >> of the data the experiment provides (the meaning of the Excel rows and
>> >> >> columns), and the statistic you want to compute.
>> >> >
>> >> > Sorry tried to keep it short, was too short.
>> >> >
>> >> > Columns are the antimicrobial drugs
>> >> > Rows are the microorganism.
>> >> >
>> >> > So every cell contains a result of S, I, R or simply an empty cell
>> >> >
>> >> > S = Sensible
>> >> > I = Intermediate
>> >> > R = Resistant
>> >> >
>> >> > empty cell <S<I<R
>> >> >
>> >> > If a patient has 3 strains of the same microorganism but with
>> >> > different resistance profiles I have to find the most resistant
>> >> > one. Or if they are different I keep them all.
>> >> >
>> >> > I have no idea how to explain what I am doing to the compiler.
>> >> I think when you can explain it to people, you'll be able to code it. I
>> >> am still struggling to understand what you need.
>> >> > Why I would choose result from strain B over the result from strain A.
>> >> >
>> >> > strain A: SSSRSS
>> >> > strain B: SSRRRS
>> >> Let's space it out
>> >>
>> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
>> >> strain A S S S R S S
>> >> strain B S S R R R S
>> >>
>> >> You want to choose B because it has is resistant to more drugs, yes?
>> >>
>> >
>> > Yes indeed
>> >
>> >> I think, from the ordering you give, you need a measure that treats an R
>> >> as "more important" that any "I" which is "more important" than an "S".
>> >> (We will come to empty cells later.)
>> >>
>> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> >> number. In base 10, the strains score
>> >>
>> >> R S I
>> >> strain A 1 5 0 = 150
>> >> strain B 3 3 0 = 330
>> >>
>> >> Now, in fact, you don't need to use base 10. The smallest base you can
>> >> use is one more than the maximum number of test results. If there can
>> >> be up to 16 tests (say) the score is
>> >>
>> >> n(R)*17*17 + n(S)*17 + n(I).
>> >>
>> >> If this suits your needs, we can consider empty cells later on. It's
>> >> not at all clear to me how to compare
>> >>
>> >> strain C R____
>> >> strain D RRSSSS
>> >>
>> >> Strain C is "less resistant" but only because there is not enough
>> >> information. In fact it seems more serious as it is resistant to all
>> >> tested drugs.
>> >>
>> >
>> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
>> >
>> >> And then what about
>> >>
>> >> strain D SR
>> >> strain E RS
>> >>
>> >
>> > Yes those are the cases which are annoying me.
>> >
>> > That's why I came up withe idea of multiplying the value of the result
>> > (S=1, I=2 and R=3) with the position of the value. Tried it with
>> > triplets but there will still be cases where different results will
>> > give the same numeric value. Ignoring empty cell able tps for the moment.
>> >
>> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
>> > will be the same numerical value but they are different resistance
>> > profiles I would in this case keep both.
>> >
>> > How to prevent that from happening.
>> Can you first say why the suggestion I made is not helpful?
>>
>> --
>> Ben.
>
> You mean that one:
>
>> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> >> number. In base 10, the strains score
>> >>
>> >> R S I
>> >> strain A 1 5 0 = 150
>> >> strain B 3 3 0 = 330
>> >>
>
> Different resistance profiles same result:

I don't yet understand the requirements so I am taking it in stages.
The first requirement seemed to be "more or less resistant". To do that
you can use digits in a large enough base but this will make the number
of Rs, Ss and Is paramount. Is that acceptable as a first step?

In order to help people to be able to make further suggestions, maybe
you could give the relative ordering you would like to see between the
following sets of profiles. For example, between SSR, SRS and RSS, I
think the order you want is RSS > SRS > SSR.

1: SSR, SRS, RSS

2: RSI, RIS, SRI, SIR, IRS, ISR

3: SSSR, SSRS, SRSS, RSSS

4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR

It's possible you could make do with an extra field (or digits) that
gives some measure of the relative ordering between otherwise similar
sequences. For example, using base 10 (for convenience of arithmetic)
both RRSSI and RSRSI would score 212xx but the last xx would reflect the
positioning of the results in the sequence. There are lots of way to do
this. One way would be use, as you were thinking, some sort of weighted
count. Using S=0, I=1 and R=2 with weights

54321
RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217

If you absolutely must never get duplicate numbers, but you still want
to preserve a strict specified ordering, I think you will have much more
work to do.

Getting a unique number for each case it trivial (but the ordering will
be wrong) and getting an ordering that rates every R > every S > every I
is also trivial, but there will be lots of duplicates. It's finding the
balance that's going to be hard.

--
Ben.

Re: Some advice required [OT]

<sqdrnc$chv$1@franka.jacob-sparre.dk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6455&group=comp.lang.ada#6455

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!paganini.bofh.team!newsfeed.xs3.de!callisto.xs3.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail
From: ran...@rrsoftware.com (Randy Brukardt)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Mon, 27 Dec 2021 20:10:50 -0600
Organization: JSA Research & Innovation
Lines: 45
Message-ID: <sqdrnc$chv$1@franka.jacob-sparre.dk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
Injection-Date: Tue, 28 Dec 2021 02:10:52 -0000 (UTC)
Injection-Info: franka.jacob-sparre.dk; posting-host="rrsoftware.com:24.196.82.226";
logging-data="12863"; mail-complaints-to="news@jacob-sparre.dk"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246

by: Randy Brukardt - Tue, 28 Dec 2021 02:10 UTC

"Laurent" <lutgenl@icloud.com> wrote in message
news:49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com...
On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:

....
>> Also, if you do not intend to implement the solution in Ada, this is not
>> the right group to discuss it.

>I would very much prefer to solve it in Ada but at work I am stuck with
>Excel
>and VBA which is better than doing it manually. After a few hours starring
>at
>a screen with thousand of rows of results... If I get an Ada solution I can
>adapt it. Just limited to no access/pointers in VBA which shouldn't be
>required?

Hybrid Ada-spreadsheet solutions are possible. It's quite easy to read/write
..csv files in Ada, and those can be easily imported/exported from any
spreadsheet program (I've been using Libreoffice Calc, but Excel is
similar).

For an example, the ACATS grading tools essentially work by expecting the
vendor (or a third party) to provide a tool that converts compilation
results into a .csv file. The .csv file(s) are then read by the grading tool
and compared to required results to provide a grade. But it also can be read
into a spreadsheet for sanity checking as well as additional analysis.

Similarly (and probably more useful to you), I've used spreadsheet data for
various traffic in AdaIC (retrieved from Google) as input to Ada programs
that analyze the data to provide information that Google is unable to (in
particular, usage of the various Ada standards, which are split up into
usage of several hundred separate files). I then take the results of the Ada
program (which is also a .csv file), open that, and paste the results into a
previously created spreadsheet that generates charts for showing to
management. (Even highly skilled programmers don't like looking through
columns of numbers for trends. :-)

But you do have to be able to describe the results that you are looking for.
Having read the entire thread, I'm more confused than I started. :-) I
suspect when you can describe your problem algorithmically, the solution
will be obvious. Good luck finding a solution.

Randy.

Re: Some advice required [OT]

<b50bf401-87e7-4352-b517-7fe6b6ded42dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6456&group=comp.lang.ada#6456

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:ac8:4a0e:: with SMTP id x14mr17649753qtq.345.1640671369779;
Mon, 27 Dec 2021 22:02:49 -0800 (PST)
X-Received: by 2002:a05:6902:703:: with SMTP id k3mr24744633ybt.31.1640671369506;
Mon, 27 Dec 2021 22:02:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 22:02:49 -0800 (PST)
In-Reply-To: <sqdrnc$chv$1@franka.jacob-sparre.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<sqdrnc$chv$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b50bf401-87e7-4352-b517-7fe6b6ded42dn@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Tue, 28 Dec 2021 06:02:49 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 50

by: Laurent - Tue, 28 Dec 2021 06:02 UTC

On Tuesday, 28 December 2021 at 03:10:54 UTC+1, Randy Brukardt wrote:
> "Laurent" <lut...@icloud.com> wrote in message
> news:49538254-21ed-4fd0...@googlegroups.com...
> On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> ...
> >> Also, if you do not intend to implement the solution in Ada, this is not
> >> the right group to discuss it.
>
> >I would very much prefer to solve it in Ada but at work I am stuck with
> >Excel
> >and VBA which is better than doing it manually. After a few hours starring
> >at
> >a screen with thousand of rows of results... If I get an Ada solution I can
> >adapt it. Just limited to no access/pointers in VBA which shouldn't be
> >required?
> Hybrid Ada-spreadsheet solutions are possible. It's quite easy to read/write
> .csv files in Ada, and those can be easily imported/exported from any
> spreadsheet program (I've been using Libreoffice Calc, but Excel is
> similar).
>
> For an example, the ACATS grading tools essentially work by expecting the
> vendor (or a third party) to provide a tool that converts compilation
> results into a .csv file. The .csv file(s) are then read by the grading tool
> and compared to required results to provide a grade. But it also can be read
> into a spreadsheet for sanity checking as well as additional analysis.
>
> Similarly (and probably more useful to you), I've used spreadsheet data for
> various traffic in AdaIC (retrieved from Google) as input to Ada programs
> that analyze the data to provide information that Google is unable to (in
> particular, usage of the various Ada standards, which are split up into
> usage of several hundred separate files). I then take the results of the Ada
> program (which is also a .csv file), open that, and paste the results into a
> previously created spreadsheet that generates charts for showing to
> management. (Even highly skilled programmers don't like looking through
> columns of numbers for trends. :-)
>
> But you do have to be able to describe the results that you are looking for.
> Having read the entire thread, I'm more confused than I started. :-) I
> suspect when you can describe your problem algorithmically, the solution
> will be obvious. Good luck finding a solution.
>
> Randy.

The problem is not that I don't want to use Ada. We are using Citrix so I am stuck with the programs
the IT departments allows me to use. Was already a chore to get MS Access made available.

I could send all the data home but then I have to be very careful to not have some patient
information somewhere floating around. Otherwise finding a solution to automate
this will be the least of my problems.

Thanks for the motivation

Re: Some advice required [OT]

<7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6457&group=comp.lang.ada#6457

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:620a:4495:: with SMTP id x21mr14074376qkp.633.1640677711877;
Mon, 27 Dec 2021 23:48:31 -0800 (PST)
X-Received: by 2002:a25:bcc3:: with SMTP id l3mr725717ybm.148.1640677711691;
Mon, 27 Dec 2021 23:48:31 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 27 Dec 2021 23:48:31 -0800 (PST)
In-Reply-To: <87y2456071.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Tue, 28 Dec 2021 07:48:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 200

by: Laurent - Tue, 28 Dec 2021 07:48 UTC

On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
> Laurent <lut...@icloud.com> writes:
>
> > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> >> Laurent <lut...@icloud.com> writes:
> >>
> >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> >> >> Laurent <lut...@icloud.com> writes:
> >> >>
> >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> >> >> >
> >> >> >> Sorry, but I found your problem description impossible to understand.
> >> >> >> Try to describe more clearly the experiment that is done, the structure
> >> >> >> of the data the experiment provides (the meaning of the Excel rows and
> >> >> >> columns), and the statistic you want to compute.
> >> >> >
> >> >> > Sorry tried to keep it short, was too short.
> >> >> >
> >> >> > Columns are the antimicrobial drugs
> >> >> > Rows are the microorganism.
> >> >> >
> >> >> > So every cell contains a result of S, I, R or simply an empty cell
> >> >> >
> >> >> > S = Sensible
> >> >> > I = Intermediate
> >> >> > R = Resistant
> >> >> >
> >> >> > empty cell <S<I<R
> >> >> >
> >> >> > If a patient has 3 strains of the same microorganism but with
> >> >> > different resistance profiles I have to find the most resistant
> >> >> > one. Or if they are different I keep them all.
> >> >> >
> >> >> > I have no idea how to explain what I am doing to the compiler.
> >> >> I think when you can explain it to people, you'll be able to code it. I
> >> >> am still struggling to understand what you need.
> >> >> > Why I would choose result from strain B over the result from strain A.
> >> >> >
> >> >> > strain A: SSSRSS
> >> >> > strain B: SSRRRS
> >> >> Let's space it out
> >> >>
> >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> >> >> strain A S S S R S S
> >> >> strain B S S R R R S
> >> >>
> >> >> You want to choose B because it has is resistant to more drugs, yes?
> >> >>
> >> >
> >> > Yes indeed
> >> >
> >> >> I think, from the ordering you give, you need a measure that treats an R
> >> >> as "more important" that any "I" which is "more important" than an "S".
> >> >> (We will come to empty cells later.)
> >> >>
> >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> >> number. In base 10, the strains score
> >> >>
> >> >> R S I
> >> >> strain A 1 5 0 = 150
> >> >> strain B 3 3 0 = 330
> >> >>
> >> >> Now, in fact, you don't need to use base 10. The smallest base you can
> >> >> use is one more than the maximum number of test results. If there can
> >> >> be up to 16 tests (say) the score is
> >> >>
> >> >> n(R)*17*17 + n(S)*17 + n(I).
> >> >>
> >> >> If this suits your needs, we can consider empty cells later on. It's
> >> >> not at all clear to me how to compare
> >> >>
> >> >> strain C R____
> >> >> strain D RRSSSS
> >> >>
> >> >> Strain C is "less resistant" but only because there is not enough
> >> >> information. In fact it seems more serious as it is resistant to all
> >> >> tested drugs.
> >> >>
> >> >
> >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> >> >
> >> >> And then what about
> >> >>
> >> >> strain D SR
> >> >> strain E RS
> >> >>
> >> >
> >> > Yes those are the cases which are annoying me.
> >> >
> >> > That's why I came up withe idea of multiplying the value of the result
> >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> >> > triplets but there will still be cases where different results will
> >> > give the same numeric value. Ignoring empty cell able tps for the moment.
> >> >
> >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> >> > will be the same numerical value but they are different resistance
> >> > profiles I would in this case keep both.
> >> >
> >> > How to prevent that from happening.
> >> Can you first say why the suggestion I made is not helpful?
> >>
> >> --
> >> Ben.
> >
> > You mean that one:
> >
> >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> >> number. In base 10, the strains score
> >> >>
> >> >> R S I
> >> >> strain A 1 5 0 = 150
> >> >> strain B 3 3 0 = 330
> >> >>
> >
> > Different resistance profiles same result:
> I don't yet understand the requirements so I am taking it in stages.
> The first requirement seemed to be "more or less resistant". To do that
> you can use digits in a large enough base but this will make the number
> of Rs, Ss and Is paramount. Is that acceptable as a first step?
>

The requirements are one strain of a certain microorganism/patient
The most resistant one or if they have different profiles

SRS vs RRS => last one, more Rs

SRS vs RSR = both, different profiles

> In order to help people to be able to make further suggestions, maybe
> you could give the relative ordering you would like to see between the
> following sets of profiles. For example, between SSR, SRS and RSS, I
> think the order you want is RSS > SRS > SSR.
>
> 1: SSR, SRS, RSS
>
> 2: RSI, RIS, SRI, SIR, IRS, ISR
>
> 3: SSSR, SSRS, SRSS, RSSS
>
> 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
>

The order of the results is given by the ID of the drug in the extraction tool.
I could probably order them by family and hierarchy of potence but
would that make a difference?

> It's possible you could make do with an extra field (or digits) that
> gives some measure of the relative ordering between otherwise similar
> sequences. For example, using base 10 (for convenience of arithmetic)
> both RRSSI and RSRSI would score 212xx but the last xx would reflect the
> positioning of the results in the sequence. There are lots of way to do
> this. One way would be use, as you were thinking, some sort of weighted
> count. Using S=0, I=1 and R=2 with weights
>
> 54321
> RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
> RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
>

So to be sure that I am following:

2*(5+4) = value of R (=2) * position of R(@5 and @4)
2*(5+3) = value of R (=2) * position of R(@5 and @3)

0*(3+2) = value of S (=0) * position of S(@3 and @2)
0*(4+2) = value of S (=0) * position of S(@4 and @2)

1*1 = value of I (=1) * position of I (@1)

2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
number?

But in this example I would have to keep both as drug 5,2 and 1 are common
to both results but 4 and 3 are unique.

The score would be completely misleading.

So if my table has a width of 20 columns the first column would be
10^20, the next 10^19,.... +/- a few 0s off?

I would have to implement it and see what I get as result.

> If you absolutely must never get duplicate numbers, but you still want
> to preserve a strict specified ordering, I think you will have much more
> work to do.
>
> Getting a unique number for each case it trivial (but the ordering will
> be wrong) and getting an ordering that rates every R > every S > every I
> is also trivial, but there will be lots of duplicates. It's finding the
> balance that's going to be hard.
>
> --
> Ben.

Click here to read the complete article

Re: Some advice required [OT]

<875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6458&group=comp.lang.ada#6458

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:622a:1654:: with SMTP id y20mr17332033qtj.374.1640682350011;
Tue, 28 Dec 2021 01:05:50 -0800 (PST)
X-Received: by 2002:a25:403:: with SMTP id 3mr6571035ybe.696.1640682349830;
Tue, 28 Dec 2021 01:05:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Tue, 28 Dec 2021 01:05:49 -0800 (PST)
In-Reply-To: <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk> <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Tue, 28 Dec 2021 09:05:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 198

by: Laurent - Tue, 28 Dec 2021 09:05 UTC

On Tuesday, 28 December 2021 at 08:48:32 UTC+1, Laurent wrote:
> On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
> > Laurent <lut...@icloud.com> writes:
> >
> > > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> > >> Laurent <lut...@icloud.com> writes:
> > >>
> > >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> > >> >> Laurent <lut...@icloud.com> writes:
> > >> >>
> > >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> > >> >> >
> > >> >> >> Sorry, but I found your problem description impossible to understand.
> > >> >> >> Try to describe more clearly the experiment that is done, the structure
> > >> >> >> of the data the experiment provides (the meaning of the Excel rows and
> > >> >> >> columns), and the statistic you want to compute.
> > >> >> >
> > >> >> > Sorry tried to keep it short, was too short.
> > >> >> >
> > >> >> > Columns are the antimicrobial drugs
> > >> >> > Rows are the microorganism.
> > >> >> >
> > >> >> > So every cell contains a result of S, I, R or simply an empty cell
> > >> >> >
> > >> >> > S = Sensible
> > >> >> > I = Intermediate
> > >> >> > R = Resistant
> > >> >> >
> > >> >> > empty cell <S<I<R
> > >> >> >
> > >> >> > If a patient has 3 strains of the same microorganism but with
> > >> >> > different resistance profiles I have to find the most resistant
> > >> >> > one. Or if they are different I keep them all.
> > >> >> >
> > >> >> > I have no idea how to explain what I am doing to the compiler.
> > >> >> I think when you can explain it to people, you'll be able to code it. I
> > >> >> am still struggling to understand what you need.
> > >> >> > Why I would choose result from strain B over the result from strain A.
> > >> >> >
> > >> >> > strain A: SSSRSS
> > >> >> > strain B: SSRRRS
> > >> >> Let's space it out
> > >> >>
> > >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> > >> >> strain A S S S R S S
> > >> >> strain B S S R R R S
> > >> >>
> > >> >> You want to choose B because it has is resistant to more drugs, yes?
> > >> >>
> > >> >
> > >> > Yes indeed
> > >> >
> > >> >> I think, from the ordering you give, you need a measure that treats an R
> > >> >> as "more important" that any "I" which is "more important" than an "S".
> > >> >> (We will come to empty cells later.)
> > >> >>
> > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > >> >> number. In base 10, the strains score
> > >> >>
> > >> >> R S I
> > >> >> strain A 1 5 0 = 150
> > >> >> strain B 3 3 0 = 330
> > >> >>
> > >> >> Now, in fact, you don't need to use base 10. The smallest base you can
> > >> >> use is one more than the maximum number of test results. If there can
> > >> >> be up to 16 tests (say) the score is
> > >> >>
> > >> >> n(R)*17*17 + n(S)*17 + n(I).
> > >> >>
> > >> >> If this suits your needs, we can consider empty cells later on. It's
> > >> >> not at all clear to me how to compare
> > >> >>
> > >> >> strain C R____
> > >> >> strain D RRSSSS
> > >> >>
> > >> >> Strain C is "less resistant" but only because there is not enough
> > >> >> information. In fact it seems more serious as it is resistant to all
> > >> >> tested drugs.
> > >> >>
> > >> >
> > >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> > >> >
> > >> >> And then what about
> > >> >>
> > >> >> strain D SR
> > >> >> strain E RS
> > >> >>
> > >> >
> > >> > Yes those are the cases which are annoying me.
> > >> >
> > >> > That's why I came up withe idea of multiplying the value of the result
> > >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> > >> > triplets but there will still be cases where different results will
> > >> > give the same numeric value. Ignoring empty cell able tps for the moment.
> > >> >
> > >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> > >> > will be the same numerical value but they are different resistance
> > >> > profiles I would in this case keep both.
> > >> >
> > >> > How to prevent that from happening.
> > >> Can you first say why the suggestion I made is not helpful?
> > >>
> > >> --
> > >> Ben.
> > >
> > > You mean that one:
> > >
> > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > >> >> number. In base 10, the strains score
> > >> >>
> > >> >> R S I
> > >> >> strain A 1 5 0 = 150
> > >> >> strain B 3 3 0 = 330
> > >> >>
> > >
> > > Different resistance profiles same result:
> > I don't yet understand the requirements so I am taking it in stages.
> > The first requirement seemed to be "more or less resistant". To do that
> > you can use digits in a large enough base but this will make the number
> > of Rs, Ss and Is paramount. Is that acceptable as a first step?
> >
> The requirements are one strain of a certain microorganism/patient
> The most resistant one or if they have different profiles
>
> SRS vs RRS => last one, more Rs
>
> SRS vs RSR = both, different profiles
> > In order to help people to be able to make further suggestions, maybe
> > you could give the relative ordering you would like to see between the
> > following sets of profiles. For example, between SSR, SRS and RSS, I
> > think the order you want is RSS > SRS > SSR.
> >
> > 1: SSR, SRS, RSS
> >
> > 2: RSI, RIS, SRI, SIR, IRS, ISR
> >
> > 3: SSSR, SSRS, SRSS, RSSS
> >
> > 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
> >
> The order of the results is given by the ID of the drug in the extraction tool.
> I could probably order them by family and hierarchy of potence but
> would that make a difference?
> > It's possible you could make do with an extra field (or digits) that
> > gives some measure of the relative ordering between otherwise similar
> > sequences. For example, using base 10 (for convenience of arithmetic)
> > both RRSSI and RSRSI would score 212xx but the last xx would reflect the
> > positioning of the results in the sequence. There are lots of way to do
> > this. One way would be use, as you were thinking, some sort of weighted
> > count. Using S=0, I=1 and R=2 with weights
> >
> > 54321
> > RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
> > RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
> >
> So to be sure that I am following:
>
> 2*(5+4) = value of R (=2) * position of R(@5 and @4)
> 2*(5+3) = value of R (=2) * position of R(@5 and @3)
>
> 0*(3+2) = value of S (=0) * position of S(@3 and @2)
> 0*(4+2) = value of S (=0) * position of S(@4 and @2)
>
> 1*1 = value of I (=1) * position of I (@1)
>
> 2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
> number?
>

Click here to read the complete article

Re: Some advice required [OT]

<f35c6d2f-c495-40ae-9f01-dfe0870ea063n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6461&group=comp.lang.ada#6461

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:a05:620a:430e:: with SMTP id u14mr15111425qko.286.1640696093624;
Tue, 28 Dec 2021 04:54:53 -0800 (PST)
X-Received: by 2002:a25:6884:: with SMTP id d126mr73431ybc.355.1640696091917;
Tue, 28 Dec 2021 04:54:51 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Tue, 28 Dec 2021 04:54:51 -0800 (PST)
In-Reply-To: <875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=213.166.55.173; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 213.166.55.173
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk> <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
<875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f35c6d2f-c495-40ae-9f01-dfe0870ea063n@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Tue, 28 Dec 2021 12:54:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 240

by: Laurent - Tue, 28 Dec 2021 12:54 UTC

On Tuesday, 28 December 2021 at 10:05:50 UTC+1, Laurent wrote:
> On Tuesday, 28 December 2021 at 08:48:32 UTC+1, Laurent wrote:
> > On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
> > > Laurent <lut...@icloud.com> writes:
> > >
> > > > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> > > >> Laurent <lut...@icloud.com> writes:
> > > >>
> > > >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> > > >> >> Laurent <lut...@icloud.com> writes:
> > > >> >>
> > > >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> > > >> >> >
> > > >> >> >> Sorry, but I found your problem description impossible to understand.
> > > >> >> >> Try to describe more clearly the experiment that is done, the structure
> > > >> >> >> of the data the experiment provides (the meaning of the Excel rows and
> > > >> >> >> columns), and the statistic you want to compute.
> > > >> >> >
> > > >> >> > Sorry tried to keep it short, was too short.
> > > >> >> >
> > > >> >> > Columns are the antimicrobial drugs
> > > >> >> > Rows are the microorganism.
> > > >> >> >
> > > >> >> > So every cell contains a result of S, I, R or simply an empty cell
> > > >> >> >
> > > >> >> > S = Sensible
> > > >> >> > I = Intermediate
> > > >> >> > R = Resistant
> > > >> >> >
> > > >> >> > empty cell <S<I<R
> > > >> >> >
> > > >> >> > If a patient has 3 strains of the same microorganism but with
> > > >> >> > different resistance profiles I have to find the most resistant
> > > >> >> > one. Or if they are different I keep them all.
> > > >> >> >
> > > >> >> > I have no idea how to explain what I am doing to the compiler.
> > > >> >> I think when you can explain it to people, you'll be able to code it. I
> > > >> >> am still struggling to understand what you need.
> > > >> >> > Why I would choose result from strain B over the result from strain A.
> > > >> >> >
> > > >> >> > strain A: SSSRSS
> > > >> >> > strain B: SSRRRS
> > > >> >> Let's space it out
> > > >> >>
> > > >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> > > >> >> strain A S S S R S S
> > > >> >> strain B S S R R R S
> > > >> >>
> > > >> >> You want to choose B because it has is resistant to more drugs, yes?
> > > >> >>
> > > >> >
> > > >> > Yes indeed
> > > >> >
> > > >> >> I think, from the ordering you give, you need a measure that treats an R
> > > >> >> as "more important" that any "I" which is "more important" than an "S".
> > > >> >> (We will come to empty cells later.)
> > > >> >>
> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > > >> >> number. In base 10, the strains score
> > > >> >>
> > > >> >> R S I
> > > >> >> strain A 1 5 0 = 150
> > > >> >> strain B 3 3 0 = 330
> > > >> >>
> > > >> >> Now, in fact, you don't need to use base 10. The smallest base you can
> > > >> >> use is one more than the maximum number of test results. If there can
> > > >> >> be up to 16 tests (say) the score is
> > > >> >>
> > > >> >> n(R)*17*17 + n(S)*17 + n(I).
> > > >> >>
> > > >> >> If this suits your needs, we can consider empty cells later on. It's
> > > >> >> not at all clear to me how to compare
> > > >> >>
> > > >> >> strain C R____
> > > >> >> strain D RRSSSS
> > > >> >>
> > > >> >> Strain C is "less resistant" but only because there is not enough
> > > >> >> information. In fact it seems more serious as it is resistant to all
> > > >> >> tested drugs.
> > > >> >>
> > > >> >
> > > >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> > > >> >
> > > >> >> And then what about
> > > >> >>
> > > >> >> strain D SR
> > > >> >> strain E RS
> > > >> >>
> > > >> >
> > > >> > Yes those are the cases which are annoying me.
> > > >> >
> > > >> > That's why I came up withe idea of multiplying the value of the result
> > > >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> > > >> > triplets but there will still be cases where different results will
> > > >> > give the same numeric value. Ignoring empty cell able tps for the moment.
> > > >> >
> > > >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> > > >> > will be the same numerical value but they are different resistance
> > > >> > profiles I would in this case keep both.
> > > >> >
> > > >> > How to prevent that from happening.
> > > >> Can you first say why the suggestion I made is not helpful?
> > > >>
> > > >> --
> > > >> Ben.
> > > >
> > > > You mean that one:
> > > >
> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > > >> >> number. In base 10, the strains score
> > > >> >>
> > > >> >> R S I
> > > >> >> strain A 1 5 0 = 150
> > > >> >> strain B 3 3 0 = 330
> > > >> >>
> > > >
> > > > Different resistance profiles same result:
> > > I don't yet understand the requirements so I am taking it in stages.
> > > The first requirement seemed to be "more or less resistant". To do that
> > > you can use digits in a large enough base but this will make the number
> > > of Rs, Ss and Is paramount. Is that acceptable as a first step?
> > >
> > The requirements are one strain of a certain microorganism/patient
> > The most resistant one or if they have different profiles
> >
> > SRS vs RRS => last one, more Rs
> >
> > SRS vs RSR = both, different profiles
> > > In order to help people to be able to make further suggestions, maybe
> > > you could give the relative ordering you would like to see between the
> > > following sets of profiles. For example, between SSR, SRS and RSS, I
> > > think the order you want is RSS > SRS > SSR.
> > >
> > > 1: SSR, SRS, RSS
> > >
> > > 2: RSI, RIS, SRI, SIR, IRS, ISR
> > >
> > > 3: SSSR, SSRS, SRSS, RSSS
> > >
> > > 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
> > >
> > The order of the results is given by the ID of the drug in the extraction tool.
> > I could probably order them by family and hierarchy of potence but
> > would that make a difference?
> > > It's possible you could make do with an extra field (or digits) that
> > > gives some measure of the relative ordering between otherwise similar
> > > sequences. For example, using base 10 (for convenience of arithmetic)
> > > both RRSSI and RSRSI would score 212xx but the last xx would reflect the
> > > positioning of the results in the sequence. There are lots of way to do
> > > this. One way would be use, as you were thinking, some sort of weighted
> > > count. Using S=0, I=1 and R=2 with weights
> > >
> > > 54321
> > > RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
> > > RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
> > >
> > So to be sure that I am following:
> >
> > 2*(5+4) = value of R (=2) * position of R(@5 and @4)
> > 2*(5+3) = value of R (=2) * position of R(@5 and @3)
> >
> > 0*(3+2) = value of S (=0) * position of S(@3 and @2)
> > 0*(4+2) = value of S (=0) * position of S(@4 and @2)
> >
> > 1*1 = value of I (=1) * position of I (@1)
> >
> > 2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
> > number?
> >
> Eh forget the last sentence, brain fart: I have 2 R's so 2*10000, 1 I so 1*1000 and 2 S's so 2*100
> > But in this example I would have to keep both as drug 5,2 and 1 are common
> > to both results but 4 and 3 are unique.
> >
> > The score would be completely misleading.
> >
> > So if my table has a width of 20 columns the first column would be
> > 10^20, the next 10^19,.... +/- a few 0s off?
> >
> > I would have to implement it and see what I get as result.
> > > If you absolutely must never get duplicate numbers, but you still want
> > > to preserve a strict specified ordering, I think you will have much more
> > > work to do.
> > >
> > > Getting a unique number for each case it trivial (but the ordering will
> > > be wrong) and getting an ordering that rates every R > every S > every I
> > > is also trivial, but there will be lots of duplicates. It's finding the
> > > balance that's going to be hard.
> > >
> > > --
> > > Ben.
> > I have prepared a cleaned up Excel workbook with only the duplicates which
> > pose problems. The ones I would keep have an orange ID.
> > I could upload it to Github. If that helps understanding the different cases.
> >
> > Thanks for your patience
> >
> > Laurent

Click here to read the complete article

Re: Some advice required [OT]

<87sfuc975y.fsf@bsb.me.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6462&group=comp.lang.ada#6462

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 13:43:21 +0000
Organization: A noiseless patient Spider
Lines: 267
Message-ID: <87sfuc975y.fsf@bsb.me.uk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk>
<31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk>
<e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk>
<7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="006a31cbf3e05dab45838058281df2a3";
logging-data="31505"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Wxt2oPzyytv1MC8y7vPEWV4g15SoO94k="
Cancel-Lock: sha1:Wj/hNhfT4D9tL4caiHf5my4pEng=
sha1:UI0nfC1UOQqTjJlsSOUa0Q8jFpc=
X-BSB-Auth: 1.cb11ca0c0eb35645b59c.20211228134321GMT.87sfuc975y.fsf@bsb.me.uk

by: Ben Bacarisse - Tue, 28 Dec 2021 13:43 UTC

Laurent <lutgenl@icloud.com> writes:

> On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
>> Laurent <lut...@icloud.com> writes:
>>
>> > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
>> >> Laurent <lut...@icloud.com> writes:
>> >>
>> >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
>> >> >> Laurent <lut...@icloud.com> writes:
>> >> >>
>> >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
>> >> >> >
>> >> >> >> Sorry, but I found your problem description impossible to understand.
>> >> >> >> Try to describe more clearly the experiment that is done, the structure
>> >> >> >> of the data the experiment provides (the meaning of the Excel rows and
>> >> >> >> columns), and the statistic you want to compute.
>> >> >> >
>> >> >> > Sorry tried to keep it short, was too short.
>> >> >> >
>> >> >> > Columns are the antimicrobial drugs
>> >> >> > Rows are the microorganism.
>> >> >> >
>> >> >> > So every cell contains a result of S, I, R or simply an empty cell
>> >> >> >
>> >> >> > S = Sensible
>> >> >> > I = Intermediate
>> >> >> > R = Resistant
>> >> >> >
>> >> >> > empty cell <S<I<R
>> >> >> >
>> >> >> > If a patient has 3 strains of the same microorganism but with
>> >> >> > different resistance profiles I have to find the most resistant
>> >> >> > one. Or if they are different I keep them all.
>> >> >> >
>> >> >> > I have no idea how to explain what I am doing to the compiler.
>> >> >> I think when you can explain it to people, you'll be able to code it. I
>> >> >> am still struggling to understand what you need.
>> >> >> > Why I would choose result from strain B over the result from strain A.
>> >> >> >
>> >> >> > strain A: SSSRSS
>> >> >> > strain B: SSRRRS
>> >> >> Let's space it out
>> >> >>
>> >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
>> >> >> strain A S S S R S S
>> >> >> strain B S S R R R S
>> >> >>
>> >> >> You want to choose B because it has is resistant to more drugs, yes?
>> >> >>
>> >> >
>> >> > Yes indeed
>> >> >
>> >> >> I think, from the ordering you give, you need a measure that treats an R
>> >> >> as "more important" that any "I" which is "more important" than an "S".
>> >> >> (We will come to empty cells later.)
>> >> >>
>> >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> >> >> number. In base 10, the strains score
>> >> >>
>> >> >> R S I
>> >> >> strain A 1 5 0 = 150
>> >> >> strain B 3 3 0 = 330
>> >> >>
>> >> >> Now, in fact, you don't need to use base 10. The smallest base you can
>> >> >> use is one more than the maximum number of test results. If there can
>> >> >> be up to 16 tests (say) the score is
>> >> >>
>> >> >> n(R)*17*17 + n(S)*17 + n(I).
>> >> >>
>> >> >> If this suits your needs, we can consider empty cells later on. It's
>> >> >> not at all clear to me how to compare
>> >> >>
>> >> >> strain C R____
>> >> >> strain D RRSSSS
>> >> >>
>> >> >> Strain C is "less resistant" but only because there is not enough
>> >> >> information. In fact it seems more serious as it is resistant to all
>> >> >> tested drugs.
>> >> >>
>> >> >
>> >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
>> >> >
>> >> >> And then what about
>> >> >>
>> >> >> strain D SR
>> >> >> strain E RS
>> >> >>
>> >> >
>> >> > Yes those are the cases which are annoying me.
>> >> >
>> >> > That's why I came up withe idea of multiplying the value of the result
>> >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
>> >> > triplets but there will still be cases where different results will
>> >> > give the same numeric value. Ignoring empty cell able tps for the moment.
>> >> >
>> >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
>> >> > will be the same numerical value but they are different resistance
>> >> > profiles I would in this case keep both.
>> >> >
>> >> > How to prevent that from happening.
>> >> Can you first say why the suggestion I made is not helpful?
>> >>
>> >> --
>> >> Ben.
>> >
>> > You mean that one:
>> >
>> >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> >> >> number. In base 10, the strains score
>> >> >>
>> >> >> R S I
>> >> >> strain A 1 5 0 = 150
>> >> >> strain B 3 3 0 = 330
>> >> >>
>> >
>> > Different resistance profiles same result:
>>
>> I don't yet understand the requirements so I am taking it in stages.
>> The first requirement seemed to be "more or less resistant". To do that
>> you can use digits in a large enough base but this will make the number
>> of Rs, Ss and Is paramount. Is that acceptable as a first step?
>
> The requirements are one strain of a certain microorganism/patient
> The most resistant one or if they have different profiles
>
> SRS vs RRS => last one, more Rs
>
> SRS vs RSR = both, different profiles

I think this is a "yes" to my question. The trouble is you speak in the
subject domain (as one would expect) but I have to speak in the computer
science domain, because that's all I know.

You speak of giving profiles a score. To me, that mean giving a
profile some numeric value (actually it need not be numeric, but let's
stick with numbers for the moment). The score orders the profiles --
some score high (= very resistant) and some score lower.

>> In order to help people to be able to make further suggestions, maybe
>> you could give the relative ordering you would like to see between the
>> following sets of profiles. For example, between SSR, SRS and RSS, I
>> think the order you want is RSS > SRS > SSR.
>>
>> 1: SSR, SRS, RSS
>>
>> 2: RSI, RIS, SRI, SIR, IRS, ISR
>>
>> 3: SSSR, SSRS, SRSS, RSSS
>>
>> 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
>
> The order of the results is given by the ID of the drug in the extraction tool.
> I could probably order them by family and hierarchy of potence but
> would that make a difference?

I am referring to the order you want the score to produce. You want, I
think, the score for a profile with more Rs to be higher than any score
for a profile with fewer. Using x for an S or an I or a missing result,
you want all of

RRRxxx xRRxRx xxRxRR

and so on to score higher than any of

RRxxxx RxxRxx xxRxRx

Three Rs beats two R no matter where they are. Similarly, when the
number of Rs is the same, you want a profile with more Is to "beat"
(score higher) than any profile with fewer.

There is a standard way to do this which can result in a pure number,
but you can also think of it as a short sequence of numbers (three in
this case) where the first is more important than the second, which is
more important than the third.

So IIISR is given the sequence (1,1,3) (1 R, 1 S and 3 Is). As a base
10 number, that's 113. In base 100 it's 10103. Bigger bases allow one
to separate larger counts.

Click here to read the complete article

Re: Some advice required [OT]

<87mtkk96in.fsf@bsb.me.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6463&group=comp.lang.ada#6463

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 13:57:20 +0000
Organization: A noiseless patient Spider
Lines: 265
Message-ID: <87mtkk96in.fsf@bsb.me.uk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net>
<49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk>
<31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk>
<e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk>
<7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
<875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com>
<f35c6d2f-c495-40ae-9f01-dfe0870ea063n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="006a31cbf3e05dab45838058281df2a3";
logging-data="31505"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jxoqaaiEP30hJv0EP/eWAKg1XF6fiYZ4="
Cancel-Lock: sha1:hSt3HT7HMyC7VcBZzdwDxeYHWMA=
sha1:SEvsZ73463mnrxxUtIeb7wvWuYw=
X-BSB-Auth: 1.19cb940709dfe093451e.20211228135720GMT.87mtkk96in.fsf@bsb.me.uk

by: Ben Bacarisse - Tue, 28 Dec 2021 13:57 UTC

Laurent <lutgenl@icloud.com> writes:

> On Tuesday, 28 December 2021 at 10:05:50 UTC+1, Laurent wrote:
>> On Tuesday, 28 December 2021 at 08:48:32 UTC+1, Laurent wrote:
>> > On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
>> > > Laurent <lut...@icloud.com> writes:
>> > >
>> > > > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
>> > > >> Laurent <lut...@icloud.com> writes:
>> > > >>
>> > > >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
>> > > >> >> Laurent <lut...@icloud.com> writes:
>> > > >> >>
>> > > >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
>> > > >> >> >
>> > > >> >> >> Sorry, but I found your problem description impossible to understand.
>> > > >> >> >> Try to describe more clearly the experiment that is done, the structure
>> > > >> >> >> of the data the experiment provides (the meaning of the Excel rows and
>> > > >> >> >> columns), and the statistic you want to compute.
>> > > >> >> >
>> > > >> >> > Sorry tried to keep it short, was too short.
>> > > >> >> >
>> > > >> >> > Columns are the antimicrobial drugs
>> > > >> >> > Rows are the microorganism.
>> > > >> >> >
>> > > >> >> > So every cell contains a result of S, I, R or simply an empty cell
>> > > >> >> >
>> > > >> >> > S = Sensible
>> > > >> >> > I = Intermediate
>> > > >> >> > R = Resistant
>> > > >> >> >
>> > > >> >> > empty cell <S<I<R
>> > > >> >> >
>> > > >> >> > If a patient has 3 strains of the same microorganism but with
>> > > >> >> > different resistance profiles I have to find the most resistant
>> > > >> >> > one. Or if they are different I keep them all.
>> > > >> >> >
>> > > >> >> > I have no idea how to explain what I am doing to the compiler.
>> > > >> >> I think when you can explain it to people, you'll be able to code it. I
>> > > >> >> am still struggling to understand what you need.
>> > > >> >> > Why I would choose result from strain B over the result from strain A.
>> > > >> >> >
>> > > >> >> > strain A: SSSRSS
>> > > >> >> > strain B: SSRRRS
>> > > >> >> Let's space it out
>> > > >> >>
>> > > >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
>> > > >> >> strain A S S S R S S
>> > > >> >> strain B S S R R R S
>> > > >> >>
>> > > >> >> You want to choose B because it has is resistant to more drugs, yes?
>> > > >> >>
>> > > >> >
>> > > >> > Yes indeed
>> > > >> >
>> > > >> >> I think, from the ordering you give, you need a measure that treats an R
>> > > >> >> as "more important" that any "I" which is "more important" than an "S".
>> > > >> >> (We will come to empty cells later.)
>> > > >> >>
>> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> > > >> >> number. In base 10, the strains score
>> > > >> >>
>> > > >> >> R S I
>> > > >> >> strain A 1 5 0 = 150
>> > > >> >> strain B 3 3 0 = 330
>> > > >> >>
>> > > >> >> Now, in fact, you don't need to use base 10. The smallest base you can
>> > > >> >> use is one more than the maximum number of test results. If there can
>> > > >> >> be up to 16 tests (say) the score is
>> > > >> >>
>> > > >> >> n(R)*17*17 + n(S)*17 + n(I).
>> > > >> >>
>> > > >> >> If this suits your needs, we can consider empty cells later on. It's
>> > > >> >> not at all clear to me how to compare
>> > > >> >>
>> > > >> >> strain C R____
>> > > >> >> strain D RRSSSS
>> > > >> >>
>> > > >> >> Strain C is "less resistant" but only because there is not enough
>> > > >> >> information. In fact it seems more serious as it is resistant to all
>> > > >> >> tested drugs.
>> > > >> >>
>> > > >> >
>> > > >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
>> > > >> >
>> > > >> >> And then what about
>> > > >> >>
>> > > >> >> strain D SR
>> > > >> >> strain E RS
>> > > >> >>
>> > > >> >
>> > > >> > Yes those are the cases which are annoying me.
>> > > >> >
>> > > >> > That's why I came up withe idea of multiplying the value of the result
>> > > >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
>> > > >> > triplets but there will still be cases where different results will
>> > > >> > give the same numeric value. Ignoring empty cell able tps for the moment.
>> > > >> >
>> > > >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
>> > > >> > will be the same numerical value but they are different resistance
>> > > >> > profiles I would in this case keep both.
>> > > >> >
>> > > >> > How to prevent that from happening.
>> > > >> Can you first say why the suggestion I made is not helpful?
>> > > >>
>> > > >> --
>> > > >> Ben.
>> > > >
>> > > > You mean that one:
>> > > >
>> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
>> > > >> >> number. In base 10, the strains score
>> > > >> >>
>> > > >> >> R S I
>> > > >> >> strain A 1 5 0 = 150
>> > > >> >> strain B 3 3 0 = 330
>> > > >> >>
>> > > >
>> > > > Different resistance profiles same result:
>> > > I don't yet understand the requirements so I am taking it in stages.
>> > > The first requirement seemed to be "more or less resistant". To do that
>> > > you can use digits in a large enough base but this will make the number
>> > > of Rs, Ss and Is paramount. Is that acceptable as a first step?
>> > >
>> > The requirements are one strain of a certain microorganism/patient
>> > The most resistant one or if they have different profiles
>> >
>> > SRS vs RRS => last one, more Rs
>> >
>> > SRS vs RSR = both, different profiles
>> > > In order to help people to be able to make further suggestions, maybe
>> > > you could give the relative ordering you would like to see between the
>> > > following sets of profiles. For example, between SSR, SRS and RSS, I
>> > > think the order you want is RSS > SRS > SSR.
>> > >
>> > > 1: SSR, SRS, RSS
>> > >
>> > > 2: RSI, RIS, SRI, SIR, IRS, ISR
>> > >
>> > > 3: SSSR, SSRS, SRSS, RSSS
>> > >
>> > > 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
>> > >
>> > The order of the results is given by the ID of the drug in the extraction tool.
>> > I could probably order them by family and hierarchy of potence but
>> > would that make a difference?
>> > > It's possible you could make do with an extra field (or digits) that
>> > > gives some measure of the relative ordering between otherwise similar
>> > > sequences. For example, using base 10 (for convenience of arithmetic)
>> > > both RRSSI and RSRSI would score 212xx but the last xx would reflect the
>> > > positioning of the results in the sequence. There are lots of way to do
>> > > this. One way would be use, as you were thinking, some sort of weighted
>> > > count. Using S=0, I=1 and R=2 with weights
>> > >
>> > > 54321
>> > > RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
>> > > RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
>> > >
>> > So to be sure that I am following:
>> >
>> > 2*(5+4) = value of R (=2) * position of R(@5 and @4)
>> > 2*(5+3) = value of R (=2) * position of R(@5 and @3)
>> >
>> > 0*(3+2) = value of S (=0) * position of S(@3 and @2)
>> > 0*(4+2) = value of S (=0) * position of S(@4 and @2)
>> >
>> > 1*1 = value of I (=1) * position of I (@1)
>> >
>> > 2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
>> > number?
>> >
>> Eh forget the last sentence, brain fart: I have 2 R's so 2*10000, 1 I so 1*1000 and 2 S's so 2*100
>> > But in this example I would have to keep both as drug 5,2 and 1 are common
>> > to both results but 4 and 3 are unique.
>> >
>> > The score would be completely misleading.
>> >
>> > So if my table has a width of 20 columns the first column would be
>> > 10^20, the next 10^19,.... +/- a few 0s off?
>> >
>> > I would have to implement it and see what I get as result.
>> > > If you absolutely must never get duplicate numbers, but you still want
>> > > to preserve a strict specified ordering, I think you will have much more
>> > > work to do.
>> > >
>> > > Getting a unique number for each case it trivial (but the ordering will
>> > > be wrong) and getting an ordering that rates every R > every S > every I
>> > > is also trivial, but there will be lots of duplicates. It's finding the
>> > > balance that's going to be hard.
>> > >
>> > > --
>> > > Ben.
>> > I have prepared a cleaned up Excel workbook with only the duplicates which
>> > pose problems. The ones I would keep have an orange ID.
>> > I could upload it to Github. If that helps understanding the different cases.
>> >
>> > Thanks for your patience
>> >
>> > Laurent
>
> Ben,

Click here to read the complete article

Re: Some advice required [OT]

<fefmsgl4cpearv10nkdm07kirrn5gete3e@4ax.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6464&group=comp.lang.ada#6464

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 28 Dec 2021 10:49:29 -0600
From: wlfr...@ix.netcom.com (Dennis Lee Bieber)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 11:49:30 -0500
Organization: IISS Elusive Unicorn
Message-ID: <fefmsgl4cpearv10nkdm07kirrn5gete3e@4ax.com>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com> <87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com> <87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com> <87y2456071.fsf@bsb.me.uk> <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
User-Agent: ForteAgent/8.00.32.1272
X-No-Archive: yes
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 22
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-Ea20bVePMpgU5kOFMuOvn2AGiF7nvyMN5q9e+cQrhloJm+WG9CK3Aim8uEhnJDKs0XktdmV5WDgyaAT!ZfA4yMncD7pmmE0HL6znB3MkAjgxA2TXFHM5l9ZZPgauywl6OCHq29cAzU/OqBw+k5c3FqQ0
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 2146

by: Dennis Lee Bieber - Tue, 28 Dec 2021 16:49 UTC

On Mon, 27 Dec 2021 23:48:31 -0800 (PST), Laurent <lutgenl@icloud.com>
declaimed the following:

>
>The requirements are one strain of a certain microorganism/patient
>The most resistant one or if they have different profiles
>
>SRS vs RRS => last one, more Rs
>
>SRS vs RSR = both, different profiles
>

Which is still inconclusive (at least as I view it) -- your second
example ALSO fits the "last one, more Rs" constraint. You haven't to define
how the first doesn't qualify as "different profiles". Both examples are
"1R, 2S" vs "2R, 1S".

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

Re: Some advice required [OT]

<e906a70c-6550-45b9-9eca-eae4133eba7fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6465&group=comp.lang.ada#6465

copy link Newsgroups: comp.lang.ada

X-Received: by 2002:ac8:4e96:: with SMTP id 22mr20073238qtp.76.1640715578450;
Tue, 28 Dec 2021 10:19:38 -0800 (PST)
X-Received: by 2002:a25:9003:: with SMTP id s3mr9937083ybl.323.1640715578181;
Tue, 28 Dec 2021 10:19:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Tue, 28 Dec 2021 10:19:37 -0800 (PST)
In-Reply-To: <87mtkk96in.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=78.141.135.179; posting-account=sDyr7QoAAAA7hiaifqt-gaKY2K7OZ8RQ
NNTP-Posting-Host: 78.141.135.179
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com>
<j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com>
<87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com>
<87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com>
<87y2456071.fsf@bsb.me.uk> <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
<875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com> <f35c6d2f-c495-40ae-9f01-dfe0870ea063n@googlegroups.com>
<87mtkk96in.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e906a70c-6550-45b9-9eca-eae4133eba7fn@googlegroups.com>
Subject: Re: Some advice required [OT]
From: lutg...@icloud.com (Laurent)
Injection-Date: Tue, 28 Dec 2021 18:19:38 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 328

by: Laurent - Tue, 28 Dec 2021 18:19 UTC

On Tuesday, 28 December 2021 at 14:57:22 UTC+1, Ben Bacarisse wrote:
> Laurent <lut...@icloud.com> writes:
>
> > On Tuesday, 28 December 2021 at 10:05:50 UTC+1, Laurent wrote:
> >> On Tuesday, 28 December 2021 at 08:48:32 UTC+1, Laurent wrote:
> >> > On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
> >> > > Laurent <lut...@icloud.com> writes:
> >> > >
> >> > > > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> >> > > >> Laurent <lut...@icloud.com> writes:
> >> > > >>
> >> > > >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> >> > > >> >> Laurent <lut...@icloud.com> writes:
> >> > > >> >>
> >> > > >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> >> > > >> >> >
> >> > > >> >> >> Sorry, but I found your problem description impossible to understand.
> >> > > >> >> >> Try to describe more clearly the experiment that is done, the structure
> >> > > >> >> >> of the data the experiment provides (the meaning of the Excel rows and
> >> > > >> >> >> columns), and the statistic you want to compute.
> >> > > >> >> >
> >> > > >> >> > Sorry tried to keep it short, was too short.
> >> > > >> >> >
> >> > > >> >> > Columns are the antimicrobial drugs
> >> > > >> >> > Rows are the microorganism.
> >> > > >> >> >
> >> > > >> >> > So every cell contains a result of S, I, R or simply an empty cell
> >> > > >> >> >
> >> > > >> >> > S = Sensible
> >> > > >> >> > I = Intermediate
> >> > > >> >> > R = Resistant
> >> > > >> >> >
> >> > > >> >> > empty cell <S<I<R
> >> > > >> >> >
> >> > > >> >> > If a patient has 3 strains of the same microorganism but with
> >> > > >> >> > different resistance profiles I have to find the most resistant
> >> > > >> >> > one. Or if they are different I keep them all.
> >> > > >> >> >
> >> > > >> >> > I have no idea how to explain what I am doing to the compiler.
> >> > > >> >> I think when you can explain it to people, you'll be able to code it. I
> >> > > >> >> am still struggling to understand what you need.
> >> > > >> >> > Why I would choose result from strain B over the result from strain A.
> >> > > >> >> >
> >> > > >> >> > strain A: SSSRSS
> >> > > >> >> > strain B: SSRRRS
> >> > > >> >> Let's space it out
> >> > > >> >>
> >> > > >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> >> > > >> >> strain A S S S R S S
> >> > > >> >> strain B S S R R R S
> >> > > >> >>
> >> > > >> >> You want to choose B because it has is resistant to more drugs, yes?
> >> > > >> >>
> >> > > >> >
> >> > > >> > Yes indeed
> >> > > >> >
> >> > > >> >> I think, from the ordering you give, you need a measure that treats an R
> >> > > >> >> as "more important" that any "I" which is "more important" than an "S".
> >> > > >> >> (We will come to empty cells later.)
> >> > > >> >>
> >> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> > > >> >> number. In base 10, the strains score
> >> > > >> >>
> >> > > >> >> R S I
> >> > > >> >> strain A 1 5 0 = 150
> >> > > >> >> strain B 3 3 0 = 330
> >> > > >> >>
> >> > > >> >> Now, in fact, you don't need to use base 10. The smallest base you can
> >> > > >> >> use is one more than the maximum number of test results. If there can
> >> > > >> >> be up to 16 tests (say) the score is
> >> > > >> >>
> >> > > >> >> n(R)*17*17 + n(S)*17 + n(I).
> >> > > >> >>
> >> > > >> >> If this suits your needs, we can consider empty cells later on. It's
> >> > > >> >> not at all clear to me how to compare
> >> > > >> >>
> >> > > >> >> strain C R____
> >> > > >> >> strain D RRSSSS
> >> > > >> >>
> >> > > >> >> Strain C is "less resistant" but only because there is not enough
> >> > > >> >> information. In fact it seems more serious as it is resistant to all
> >> > > >> >> tested drugs.
> >> > > >> >>
> >> > > >> >
> >> > > >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> >> > > >> >
> >> > > >> >> And then what about
> >> > > >> >>
> >> > > >> >> strain D SR
> >> > > >> >> strain E RS
> >> > > >> >>
> >> > > >> >
> >> > > >> > Yes those are the cases which are annoying me.
> >> > > >> >
> >> > > >> > That's why I came up withe idea of multiplying the value of the result
> >> > > >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> >> > > >> > triplets but there will still be cases where different results will
> >> > > >> > give the same numeric value. Ignoring empty cell able tps for the moment.
> >> > > >> >
> >> > > >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> >> > > >> > will be the same numerical value but they are different resistance
> >> > > >> > profiles I would in this case keep both.
> >> > > >> >
> >> > > >> > How to prevent that from happening.
> >> > > >> Can you first say why the suggestion I made is not helpful?
> >> > > >>
> >> > > >> --
> >> > > >> Ben.
> >> > > >
> >> > > > You mean that one:
> >> > > >
> >> > > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> >> > > >> >> number. In base 10, the strains score
> >> > > >> >>
> >> > > >> >> R S I
> >> > > >> >> strain A 1 5 0 = 150
> >> > > >> >> strain B 3 3 0 = 330
> >> > > >> >>
> >> > > >
> >> > > > Different resistance profiles same result:
> >> > > I don't yet understand the requirements so I am taking it in stages.
> >> > > The first requirement seemed to be "more or less resistant". To do that
> >> > > you can use digits in a large enough base but this will make the number
> >> > > of Rs, Ss and Is paramount. Is that acceptable as a first step?
> >> > >
> >> > The requirements are one strain of a certain microorganism/patient
> >> > The most resistant one or if they have different profiles
> >> >
> >> > SRS vs RRS => last one, more Rs
> >> >
> >> > SRS vs RSR = both, different profiles
> >> > > In order to help people to be able to make further suggestions, maybe
> >> > > you could give the relative ordering you would like to see between the
> >> > > following sets of profiles. For example, between SSR, SRS and RSS, I
> >> > > think the order you want is RSS > SRS > SSR.
> >> > >
> >> > > 1: SSR, SRS, RSS
> >> > >
> >> > > 2: RSI, RIS, SRI, SIR, IRS, ISR
> >> > >
> >> > > 3: SSSR, SSRS, SRSS, RSSS
> >> > >
> >> > > 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
> >> > >
> >> > The order of the results is given by the ID of the drug in the extraction tool.
> >> > I could probably order them by family and hierarchy of potence but
> >> > would that make a difference?
> >> > > It's possible you could make do with an extra field (or digits) that
> >> > > gives some measure of the relative ordering between otherwise similar
> >> > > sequences. For example, using base 10 (for convenience of arithmetic)
> >> > > both RRSSI and RSRSI would score 212xx but the last xx would reflect the
> >> > > positioning of the results in the sequence. There are lots of way to do
> >> > > this. One way would be use, as you were thinking, some sort of weighted
> >> > > count. Using S=0, I=1 and R=2 with weights
> >> > >
> >> > > 54321
> >> > > RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
> >> > > RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
> >> > >
> >> > So to be sure that I am following:
> >> >
> >> > 2*(5+4) = value of R (=2) * position of R(@5 and @4)
> >> > 2*(5+3) = value of R (=2) * position of R(@5 and @3)
> >> >
> >> > 0*(3+2) = value of S (=0) * position of S(@3 and @2)
> >> > 0*(4+2) = value of S (=0) * position of S(@4 and @2)
> >> >
> >> > 1*1 = value of I (=1) * position of I (@1)
> >> >
> >> > 2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
> >> > number?
> >> >
> >> Eh forget the last sentence, brain fart: I have 2 R's so 2*10000, 1 I so 1*1000 and 2 S's so 2*100
> >> > But in this example I would have to keep both as drug 5,2 and 1 are common
> >> > to both results but 4 and 3 are unique.
> >> >
> >> > The score would be completely misleading.
> >> >
> >> > So if my table has a width of 20 columns the first column would be
> >> > 10^20, the next 10^19,.... +/- a few 0s off?
> >> >
> >> > I would have to implement it and see what I get as result.
> >> > > If you absolutely must never get duplicate numbers, but you still want
> >> > > to preserve a strict specified ordering, I think you will have much more
> >> > > work to do.
> >> > >
> >> > > Getting a unique number for each case it trivial (but the ordering will
> >> > > be wrong) and getting an ordering that rates every R > every S > every I
> >> > > is also trivial, but there will be lots of duplicates. It's finding the
> >> > > balance that's going to be hard.
> >> > >
> >> > > --
> >> > > Ben.
> >> > I have prepared a cleaned up Excel workbook with only the duplicates which
> >> > pose problems. The ones I would keep have an orange ID.
> >> > I could upload it to Github. If that helps understanding the different cases.
> >> >
> >> > Thanks for your patience
> >> >
> >> > Laurent
> >
> > Ben,
> Posts crossed. You should probably ignore my last as it was written
> before I saw this one.
> > I have implemented your solution but I don't understand the reason why S would have a value of 0?
> > I then don't need to take care of the S'es because the result will always be 0. Not that it changes a lot
> >
> > Because I still couldn't choose the profile of interest only based on the numbers.
> >
> > R R S S I Ben's Solution: 212 11 Mine: 212 1205
> > R S R S I 212 13 212 1405
> > R R R S I 311 17 311 1805
> > R S R R I 311 21 311 1407
> > S R R R I 311 23 311 1607
> >
> > 311 17 and 311 23 being the most likely but unclear where the
> > difference might be.
> This is what is so frustrating for me. What do you mean, most likely?
> What do you mean be what the difference might be? Can you describe to
> me, as a human being, which you would choose and tell me how you
> decided. If you can't do that then all you are doing is trying random
> schemes until something pops up the look right for some specific set of
> data!

Click here to read the complete article

Re: Some advice required [OT]

<sqgnlh$sul$1@franka.jacob-sparre.dk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6466&group=comp.lang.ada#6466

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsfeed.xs3.de!callisto.xs3.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail
From: ran...@rrsoftware.com (Randy Brukardt)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 21:58:23 -0600
Organization: JSA Research & Innovation
Lines: 16
Message-ID: <sqgnlh$sul$1@franka.jacob-sparre.dk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com> <sqdrnc$chv$1@franka.jacob-sparre.dk> <b50bf401-87e7-4352-b517-7fe6b6ded42dn@googlegroups.com>
Injection-Date: Wed, 29 Dec 2021 04:20:01 -0000 (UTC)
Injection-Info: franka.jacob-sparre.dk; posting-host="rrsoftware.com:24.196.82.226";
logging-data="29653"; mail-complaints-to="news@jacob-sparre.dk"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246

by: Randy Brukardt - Wed, 29 Dec 2021 03:58 UTC

"Laurent" <lutgenl@icloud.com> wrote in message
news:b50bf401-87e7-4352-b517-7fe6b6ded42dn@googlegroups.com...
....
> The problem is not that I don't want to use Ada. We are using Citrix so I
> am stuck with the programs
> the IT departments allows me to use. Was already a chore to get MS Access
> made available.

Understood. But this is an Ada group, and we're not very motivated to talk
about stuff that can't be written in Ada. Besides, if you had an Ada
solution, you probably could figure out how to transcribe it into some other
language.

Randy.

Re: Some advice required [OT]

<sqgnli$sul$2@franka.jacob-sparre.dk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=6467&group=comp.lang.ada#6467

copy link Newsgroups: comp.lang.ada

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!gandalf.srv.welterde.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail
From: ran...@rrsoftware.com (Randy Brukardt)
Newsgroups: comp.lang.ada
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 22:20:00 -0600
Organization: JSA Research & Innovation
Lines: 56
Message-ID: <sqgnli$sul$2@franka.jacob-sparre.dk>
References: <7bede061-4b0f-4029-beb1-1056637e57d6n@googlegroups.com> <j2tlk8FneraU1@mid.individual.net> <49538254-21ed-4fd0-8316-1bccc7d3c635n@googlegroups.com> <87sfue8a0v.fsf@bsb.me.uk> <31332c61-a370-43a5-bbe0-efe338ee6d8fn@googlegroups.com> <87fsqd7oz8.fsf@bsb.me.uk> <e5ab38f4-b456-4e99-b702-be4f88e8b5c1n@googlegroups.com> <87y2456071.fsf@bsb.me.uk> <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com> <fefmsgl4cpearv10nkdm07kirrn5gete3e@4ax.com>
Injection-Date: Wed, 29 Dec 2021 04:20:02 -0000 (UTC)
Injection-Info: franka.jacob-sparre.dk; posting-host="rrsoftware.com:24.196.82.226";
logging-data="29653"; mail-complaints-to="news@jacob-sparre.dk"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246

by: Randy Brukardt - Wed, 29 Dec 2021 04:20 UTC

"Dennis Lee Bieber" <wlfraed@ix.netcom.com> wrote in message
news:fefmsgl4cpearv10nkdm07kirrn5gete3e@4ax.com...
> On Mon, 27 Dec 2021 23:48:31 -0800 (PST), Laurent <lutgenl@icloud.com>
> declaimed the following:
>
>>
>>The requirements are one strain of a certain microorganism/patient
>>The most resistant one or if they have different profiles
>>
>>SRS vs RRS => last one, more Rs
>>
>>SRS vs RSR = both, different profiles
>>
>
> Which is still inconclusive (at least as I view it) -- your second
> example ALSO fits the "last one, more Rs" constraint. You haven't to
> define
> how the first doesn't qualify as "different profiles". Both examples are
> "1R, 2S" vs "2R, 1S".

Let me try. I think he is saying that when one compares two profiles, one
compares each position with the relation I < S < R. Then, if you get the
same order for every position (either >= or <=), then (and only then), the
profile with more R's is the one you keep (or more S's if there are the same
number of R's). If you don't get the same order for each, then you keep
both.

So, for any pair of profiles, you can get a result of "<", ">", or
incomparable. It should be easy enough to write a function to determine this
result.

The problem I see is that I don't think there is any way to do this across
all of the data short of comparing all of the pairs The compare function
needs to be a "strict weak ordering" in order that sorting and the like be
meaningful between data sets. The issue here is that "incomparable" gets in
the way of having A > C being true for any B such that A > B and B > C.

OTOH, I wouldn't worry about that unless the data set is large. Computers
are fast these days, and brute force approaches are much easier to figure
out.

So I suggest the OP write a function to compare two sets of data; if the
result is that the sets are comparable, eliminate the set that is less
interesting. Then apply that function to every pair in the data until there
are no further eliminations. (Probably one would "eliminate" a set by
marking that it is less interesting that some other set, as opposed to
deleting it outright. I'd probably just use a spreadsheet cell pointing to
the more interesting set.)

If it was me, I'd do that in Ada first, to get the algorithm right. Then
translate it into whatever other languages (maybe even spreadsheet
formulas).

Randy.

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor