Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

A computer without COBOL and Fortran is like a piece of chocolate cake without ketchup and mustard.


devel / comp.databases.theory / Re: OLTP Transaction

SubjectAuthor
* No CASCADE in Commercial SQL & Benchmark ConsiderationsDerek Ignatius Asirvadem
+* Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsNicola
|`* Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsDerek Ignatius Asirvadem
| +- Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsDerek Ignatius Asirvadem
| `* Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsNicola
|  `* Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsDerek Ignatius Asirvadem
|   `* Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsNicola
|    `* Batch TransactionDerek Ignatius Asirvadem
|     +- Re: Batch TransactionDerek Ignatius Asirvadem
|     +- Re: Batch TransactionDerek Ignatius Asirvadem
|     `* Re: Batch TransactionNicola
|      `* Re: Batch TransactionDerek Ignatius Asirvadem
|       `* Re: Batch TransactionNicola
|        `* Re: Batch TransactionDerek Ignatius Asirvadem
|         +* Re: Batch TransactionDerek Ignatius Asirvadem
|         |`- Re: Batch TransactionDerek Ignatius Asirvadem
|         `* Re: Batch TransactionNicola
|          `* OLTP TransactionDerek Ignatius Asirvadem
|           +* Re: OLTP TransactionDerek Ignatius Asirvadem
|           |`- Re: OLTP TransactionDerek Ignatius Asirvadem
|           `* Re: OLTP TransactionNicola
|            `* Re: OLTP TransactionDerek Ignatius Asirvadem
|             `* Re: OLTP TransactionNicola
|              +* Re: OLTP TransactionDaniel Loth
|              |`* Re: OLTP TransactionNicola
|              | `* Re: OLTP TransactionDerek Ignatius Asirvadem
|              |  `* Re: OLTP TransactionDaniel Loth
|              |   +- Re: OLTP TransactionDaniel Loth
|              |   `* Re: OLTP TransactionNicola
|              |    +- Re: OLTP TransactionDaniel Loth
|              |    `* OLTP Transaction, ACID DefinitionDerek Ignatius Asirvadem
|              |     `- Redefined terms for AcademicsDerek Ignatius Asirvadem
|              `* Re: OLTP TransactionDerek Ignatius Asirvadem
|               `* Re: OLTP TransactionNicola
|                +- Re: OLTP TransactionNicola
|                `* Re: OLTP TransactionDerek Ignatius Asirvadem
|                 `- Re: OLTP TransactionNicola
`- Re: No CASCADE in Commercial SQL & Benchmark ConsiderationsDerek Ignatius Asirvadem

Pages:12
Re: OLTP Transaction

<f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=145&group=comp.databases.theory#145

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:ad4:4d92:: with SMTP id cv18mr5021031qvb.60.1624535782274; Thu, 24 Jun 2021 04:56:22 -0700 (PDT)
X-Received: by 2002:a05:6808:98e:: with SMTP id a14mr3713283oic.139.1624535781998; Thu, 24 Jun 2021 04:56:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Thu, 24 Jun 2021 04:56:21 -0700 (PDT)
In-Reply-To: <sb1jg9$19nq$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=199.229.250.82; posting-account=bFMNewoAAAAHC6b_JPlV7XvI31zIuG5T
NNTP-Posting-Host: 199.229.250.82
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com> <sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com> <sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com> <sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com> <saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com> <saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com> <sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com> <sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com> <sb1jg9$19nq$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com>
Subject: Re: OLTP Transaction
From: derek.as...@gmail.com (Derek Ignatius Asirvadem)
Injection-Date: Thu, 24 Jun 2021 11:56:22 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 203
 by: Derek Ignatius Asirv - Thu, 24 Jun 2021 11:56 UTC

Nicola

> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
> >
> > The report connection would normally do SELECT [at] READ_COMMITTED
> > which holds ReadLocks for the duration of the Statement,
>
> Ok. So, I am back to a question I have already posed to you:
>
> - OLAP queries run at READ COMMITTED;
> - Each OLAP query is a single SELECT statement;

1. Well, I guarantee that in my databases, because it is 100% Predicates.
Even high-end massive report tools do that, noting the intermediate “universe” that maps an RFS into something readable. And then it executes a single SELECT.
But boffins writing SQL on a non-SQL suite of programs, do not have a hope in hell. So I can see that what is vanilla to us, is news to you.

2. Not only OLAP, any SELECT runs at READ COMMITTED.
OLAP query do not *have* to be a single SELECT statement

> - Each SELECT holds read locks for the duration of the statement.

Yes.
SQL/ACID requirement, that MV-non-CC suites cannot support.

> But the duration of the statement may be significant

Why ???

I don’t have a single report in my entire SQL life that exceeds 12 seconds, and some of them are “complex” (20+ tables; subqueries; derived tables) and massive (tables have billions of rows). I kinda pride myself on sub-second response times, even for reports.

In the course of my operations, sure, I have replaced reports that run in 30 mins with a Rdb+single-SELECT that runs in under 10 seconds. Many, many times.

>>>>
On one occasion I got thrown out of the building by the big boss because his report that ran in 15 mins on Friday, after I re-built the DataStructures on the server, ran in under 1 second on Monday. He was sure that my project failed, that the report was wrong. I begged him to compare the report content, but he would not. I had to buy his secretary a box of Lindt chocolate, get her to print out both Friday and Monday reports, and get her to show them to him personally. The ungrateful sob did not even apologise, he just had security re-instate my security pass.
<<<<

I accept that in the MS world, where they are used to longer times, sure, their reports would go to 90 mins. I have replaced 2 of those and brought it down to under 10 secs.

> (minutes, hours, ...)

Not in the commercial SQL world.

Sure, in idiotic suites of programs, written by 10,000 undergrads spread across the planet, all singing the Stonebraker mantra to maintain their collective fantasy, hours and days are “normal”. That is the price of NOT having an architecture, of writing filth and pretending that it is a server. 1,000 or 2,000 TIMES slower than a commercial SQL Platform is “normal”.

So no, the issue simply does not apply to us. But I accept that is “normal” for the academics and freeware users. Just wait until you get to the benchmark that you were planning.

Take a look at this Benchmark I did against Oracle (MV-non-CC). Where Sybase returned in 2.16 secs, Oracle had to abandon the benchmark after 120 MINUTES.
__ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf

> Don't such queries create contention (other transactions cannot
> update the data read by the OLAP query until the query is over)?

Correct.
So what (see above).
No one cares about waiting a few seconds.

1. Did you not read the /Sybase Lock Manager/ link, page 2 table at bottom left ?

2. I have stated that if you want to get into this, there is a Server Monitoring report that just happens to expose this problem, and the considerations, and with more granularity than “contention” ... but you have not taken it up.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf

The Read Locks are precisely this, SELECT queries (not OLAP, but that makes no difference) contending with OLTP transactions. Do you not care, that after fixing a SAN Config problem, the Read Locks dropped by 91%, or what that means at the coal face. Or stated in converse, why Read Locks were 1,100% higher when the SAN was mis-configured.

In any case, take an extremely quick look at the report, in [§ Selection] at the top, just three lines:
__ Write Locks = up 10%
__ Read Locks = down 91%
__ Commits = no change (confirming the load is the same)

At minimum, I am confirming that (a) OLAP & SELECT queries *DO* hold Read Locks for the duration of the Statement, (b) that that is so fleeting; so irrelevant, in a normal commercial SQL server (Server Architecture).

> > Separately. What “rows that were not there [when the transaction
> > started]” ??? How can a transaction change a row that is not there
> > ???
>
> Of course it can't.
>
> > Or even have knowledge about a row that is not there ???
>
> Of course it can't. But it can see something that was not there before,
> and it appears at a later time. I explain what I mean with an example:
>
> Time Transaction Operation
> 0 T1,T2 start transaction;
> 1 T1 select sum(X) from R;
> 2 T2 insert into R(X) values (100);
> 3 T2 commit;
> 4 T1 select sum(X) from R;
>
> At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
> time 2, even if the record did non exist when T1 began. The SELECT at
> time 1 and time 4 would then return different results.

Sure. But that is the classic demonstration for requiring ISOLATION LEVEL REPEATABLE_READ, which requires a Transaction to be open.

1.
It is still a matter of understanding on your part.
(The PusGres link you gave is pure mind-numbing filth. You are still thinking about a SELECT that holds “transactional” locks. It does not (except in that oozing filth). )
Change all occs of [T1] to [S1], signifying that it is a SELECT, not a Transaction.
Remove T1 at Time[0] because we do not start a Transaction for a SELECT
If READ_COMMITTED is used (default on DB2; Sybase; MS), a SELECT [no Transaction]:
__ at Time[1] does not obtain the inserted row
__ at Time[4] obtains the inserted row

A SELECT does not have to be repeated, period, full stop, end of story.

2.
But if a SELECT does get repeated, then the context must be some Logical Unit of Work; a Transaction.
Since it is now a Transaction, ISOLATION LEVEL SERIALIZABLE applies (you do not get to choose).
(Using your original example, not the changes ala [1])
Add Time[5] T1 commit
The result is, T2 does not start until Time[5].
The inserted row does not get inserted until the T1 Transaction completes.

It would be an exceedingly stupid thing to do, repeating a SELECT in a Transaction, instead of saving the value at the first SELECT. But academically, as a theoretical possibility, sure. And SQL/ACID handles it, no problem at all.

> Of course, this is not an issue if the only queries you run at READ
> COMMITTED are single statements—provided that the system guarantees
> statement-level consistency

Yes.
(Although the example does not suffice, I do know what you mean.)

> (it seems that some don't:
> https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).

Come on.

1. Can you not tell the difference between a platform that provides Statement level consistency, and has done so for 3 decades (at the time of the question), and a bug, that in some cases it fails ? A bug does not prove that Statement-level consistency is not provided by the product, it proves only the existence of a bug. You are grasping at straws.

2. Both the seeker and the two responders are idiots. They are genuflecting to the (then) new MV (definitely no CC in that, but the pre-existing CC in the pre-existing “1PL”), the awe of the mystical and magical. With no understanding whatsoever.

3. The answer is actually quite wrong for Sybase & DB2, and MS/SQL (I seriously doubt that MS/SQL has changed for the answer to be correct).
> At read committed level shared locks are released as soon as the data is read (not at the end of the transaction or even the end of the statement)
No, at read committed level shared locks are released at the end of the statement.
The reference to “end of transaction” is stupid, good for confusing the confused further.

Cheers
Derek

Re: OLTP Transaction

<sb5dkr$102h$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=148&group=comp.databases.theory#148

  copy link   Newsgroups: comp.databases.theory
Path: i2pn2.org!i2pn.org!aioe.org!Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org.POSTED!not-for-mail
From: nic...@nohost.org (Nicola)
Newsgroups: comp.databases.theory
Subject: Re: OLTP Transaction
Date: Fri, 25 Jun 2021 20:17:31 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 316
Message-ID: <sb5dkr$102h$1@gioia.aioe.org>
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org>
<792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org>
<6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org>
<88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org>
<17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org>
<247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org>
<bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org>
<2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org>
<f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com>
NNTP-Posting-Host: Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: slrn/1.0.3 (Darwin)
X-Notice: Filtered by postfilter v. 0.9.2
 by: Nicola - Fri, 25 Jun 2021 20:17 UTC

On 2021-06-24, Derek Ignatius Asirvadem <derek.asirvadem@gmail.com> wrote:
>> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
>> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
>> >
>> > The report connection would normally do SELECT [at] READ_COMMITTED
>> > which holds ReadLocks for the duration of the Statement,
>>
>> Ok. So, I am back to a question I have already posed to you:
>>
>> - OLAP queries run at READ COMMITTED;
>> - Each OLAP query is a single SELECT statement;
>
> 1. Well, I guarantee that in my databases, because it is 100% Predicates.
> Even high-end massive report tools do that, noting the intermediate
> “universe” that maps an RFS into something readable. And then it
> executes a single SELECT.

I did not mean to imply otherwise. I am perfectly aware of how
expressive SQL can be on well-designed databases.

> 2. Not only OLAP, any SELECT runs at READ COMMITTED.

Yes. I focused on OLAP to emphasize that I have "complex" queries in
mind.

>> - Each SELECT holds read locks for the duration of the statement.
>
> Yes.
> SQL/ACID requirement,

Fine.

>that MV-non-CC suites cannot support.

Trivially, as MVCC does not use locks.

>> But the duration of the statement may be significant
>
> Why ???
>
> I don’t have a single report in my entire SQL life that exceeds 12
> seconds, and some of them are “complex” (20+ tables; subqueries;
> derived tables) and massive (tables have billions of rows). I kinda
> pride myself on sub-second response times, even for reports.

Fair (and funny anecdote you have there).

>> (minutes, hours, ...)
>
> Not in the commercial SQL world.
>
> Sure, in idiotic suites of programs, written by 10,000 undergrads
> spread across the planet, all singing the Stonebraker mantra to
> maintain their collective fantasy, hours and days are “normal”. That
> is the price of NOT having an architecture, of writing filth and
> pretending that it is a server. 1,000 or 2,000 TIMES slower than
> a commercial SQL Platform is “normal”.
>
> So no, the issue simply does not apply to us. But I accept that is
> “normal” for the academics and freeware users. Just wait until you
> get to the benchmark that you were planning.
>
> Take a look at this Benchmark I did against Oracle (MV-non-CC). Where
> Sybase returned in 2.16 secs, Oracle had to abandon the benchmark
> after 120 MINUTES.
> __ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf

I remember finding that benchmark some time ago and trying to
reproduce it in PostgreSQL, based on the DDL I found here:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/DDL%20101201.txt

and the query found here:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt

My only noteworthy remarks at the time were:

1. As bad as Oracle might be, it can't be so slow. It sounds like the
benchmark was performed by Sybase people and/or no effort was made to
optimize on the Oracle side.

Anyway, I don't care about Oracle.

2. 488MB for the 2M records of CustomerTransaction seems quite a lot.
I've got ~147MB, including indexes (well, one index), in PostgreSQL.

I have just repeated my experiment. The above query takes ~2.4s with
cold data and ~800ms with hot data on my laptop. Ok, that's flash
storage and a laptop one decade newer than the one in your document. But
it's stock PostgreSQL with default parameters (which are extremely
conservative), with no physical tuning applied. So, all in all it's not
so bad for the price.

For the sake of completeness, and possibly for comparison and further
discussion, at the end of this message I report my benchmark.

>> Don't such queries create contention (other transactions cannot
>> update the data read by the OLAP query until the query is over)?
>
> Correct.
> So what (see above).
> No one cares about waiting a few seconds.

There are many applications where waiting more than a few hundreds
milliseconds might be problematic. For example, waiting ten seconds
makes a web app unacceptably slow.

> 1. Did you not read the /Sybase Lock Manager/ link, page 2 table at bottom left ?

Yes.

> 2. I have stated that if you want to get into this, there is a Server
> Monitoring report that just happens to expose this problem, and the
> considerations, and with more granularity than “contention” ... but
> you have not taken it up.
> __ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
>
> The Read Locks are precisely this, SELECT queries (not OLAP, but that
> makes no difference) contending with OLTP transactions. Do you not
> care, that after fixing a SAN Config problem, the Read Locks dropped
> by 91%, or what that means at the coal face. Or stated in converse,
> why Read Locks were 1,100% higher when the SAN was mis-configured.

I have no doubt that, given your experience, you are able to achieve
great improvements on every system you touch. My questions are for
better understanding on my part, not to tell you that you should run
things differently.

Nicola

--------------------------------------------------------------------------------
--
-- Benchmark for Tony Andrews: DDL for Tables Used
-- PerformanceDBA 28 Nov 2010
-- Adapted by NV for PostgreSQL on 2021 Jun 25
--
-- Original code from:
--
-- https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/

-- NOTE: data types are inferred (not given in the original code).
create domain CustomerId as integer;
create domain _DescriptionSmall as text;
create domain _Description as text;
create domain "_Date" as date;
create domain _UserStamp as timestamp;
create domain _TimeStampSmall as timestamp;
create domain _IntSmall as smallint;
create domain _MoneySmall as numeric(7,2);
create domain TransactionCode as char
constraint valid_transaction_code
check (value in ('C','D'));

-- Population: 1000 Random CustomerIds
create table REF_Customer (
CustomerId CustomerId not null , -- SMALLINT
ShortName _DescriptionSmall not null ,
FullName _Description not null ,
CreatedDate "_Date" not null ,
UserStamp _UserStamp not null ,
TimeStamp _TimeStampSmall not null
);

alter table REF_Customer
add constraint U_CustomerId
primary key (CustomerId);

alter table REF_Customer
add constraint UC_ShortName
unique (ShortName);

-- Population: 2M rows
-- distributed evenly across the 1000 CustomerIds
-- 2000 per CustomerId: 1000 Credits; 1000 Debits
create table CustomerTransaction (
CustomerId CustomerId not null , -- SMALLINT
SequenceNo _IntSmall not null ,
Date "_Date" not null ,
TransactionCode TransactionCode not null , -- CHAR(1)
Amount _MoneySmall not null
);

alter table CustomerTransaction
add constraint UC_PK
primary key (CustomerId, SequenceNo);

-- Insert some randomly generated data, distributed as specified:
with words(word) as (
select unnest(string_to_array(pg_read_file('/usr/share/dict/words')::text,E'\n'))
) insert into REF_Customer(CustomerId, ShortName, FullName, CreatedDate, UserStamp, TimeStamp)
select row_number() over (),
word as ShortName,
word || word || word || word as FullName,
(now() - '1 day'::interval * round(random() * 3650))::date as CreateDate,
now() - '1 day'::interval * round(random() * 3650) as UserStamp,
now() - '1 day'::interval * round(random() * 3650) as TimeStamp
from words
order by random()
limit 1000;

-- Insert 2M rows, 2000 per customer, uniformly distributed over {C,D}
insert into CustomerTransaction(CustomerId, SequenceNo, Date, TransactionCode, Amount)
select CustomerId,
generate_series(1,2000) as SequenceNo,
(now() - '1 day'::interval * round(random() * 3650))::date as Date,
case when random() < 0.5 then 'C' else 'D' end as TransactionCode,
random() * 10000 as Amount
from REF_Customer;

analyze; -- Update the stats

--------------------------------------------------------------------------------

-- Main PostgreSQL settings:
-- shared_buffers = 128MB
-- maintenance_work_mem = 64MB
-- temp_buffers = 8MB
-- work_mem = 4MB

select * from ref_customer limit 3;
┌────────────┬───────────┬──────────────────────────────────────┬─────────────┬────────────────────────────┬────────────────────────────┐
│ customerid │ shortname │ fullname │ createddate │ userstamp │ timestamp │
├────────────┼───────────┼──────────────────────────────────────┼─────────────┼────────────────────────────┼────────────────────────────┤
│ 182577 │ skippet │ skippetskippetskippetskippet │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
│ 57408 │ downness │ downnessdownnessdownnessdownness │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
│ 132294 │ orocratic │ orocraticorocraticorocraticorocratic │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
└────────────┴───────────┴──────────────────────────────────────┴─────────────┴────────────────────────────┴────────────────────────────┘


Click here to read the complete article
Re: OLTP Transaction

<sb5e3l$16gu$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=149&group=comp.databases.theory#149

  copy link   Newsgroups: comp.databases.theory
Path: i2pn2.org!i2pn.org!aioe.org!Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org.POSTED!not-for-mail
From: nic...@nohost.org (Nicola)
Newsgroups: comp.databases.theory
Subject: Re: OLTP Transaction
Date: Fri, 25 Jun 2021 20:25:25 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 20
Message-ID: <sb5e3l$16gu$1@gioia.aioe.org>
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org>
<792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org>
<6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org>
<88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org>
<17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org>
<247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org>
<bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org>
<2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org>
<f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
NNTP-Posting-Host: Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org
X-Complaints-To: abuse@aioe.org
User-Agent: slrn/1.0.3 (Darwin)
X-Notice: Filtered by postfilter v. 0.9.2
 by: Nicola - Fri, 25 Jun 2021 20:25 UTC

On 2021-06-24, Daniel Loth <daniel.j.loth@gmail.com> wrote:
> Hi Nicola,

Hi Daniel,
great to see someone else diving in! I hope you will last :)

Thanks for your comments. In particular:

> Importantly, and often overlooked as far as MVCC disadvantages go, the
> data you read in an hours-long query running under MVCC is arbitrarily
> old data. If your query takes 2 hours to complete then the very last
> row processed by your query is potentially 2 hours out of date.

Sure, that's a fair criticism of MVCC.

But if Derek can keep all his queries under 12 seconds with an old (as
in "mature") 2PC system, don't you think that a modern MVCC system can
achieve the same or better performance under the same load?

Nicola

Re: OLTP Transaction

<sb5ihs$vja$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=150&group=comp.databases.theory#150

  copy link   Newsgroups: comp.databases.theory
Path: i2pn2.org!i2pn.org!aioe.org!Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org.POSTED!not-for-mail
From: nic...@nohost.org (Nicola)
Newsgroups: comp.databases.theory
Subject: Re: OLTP Transaction
Date: Fri, 25 Jun 2021 21:41:16 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 28
Message-ID: <sb5ihs$vja$1@gioia.aioe.org>
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org>
<792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org>
<6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org>
<88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org>
<17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org>
<247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org>
<bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org>
<2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org>
<f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com>
<sb5dkr$102h$1@gioia.aioe.org>
NNTP-Posting-Host: Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org
X-Complaints-To: abuse@aioe.org
User-Agent: slrn/1.0.3 (Darwin)
X-Notice: Filtered by postfilter v. 0.9.2
 by: Nicola - Fri, 25 Jun 2021 21:41 UTC

On 2021-06-25, Nicola <nicola@nohost.org> wrote:
> I have just repeated my experiment. The above query takes ~2.4s with
> cold data and ~800ms with hot data on my laptop.

There's a bias in my test, because, the way it is generated, the
transaction data is physically sorted by CustomerID. After correcting
the bias, I'm still around 3s anyway for "Subquery SUM", with no
physical optimization. This query performs better:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Inline%20SUM%20ShowPlan.txt

taking ~1.2s.

To insert data in random order:

insert into CustomerTransaction(CustomerId, SequenceNo, Date, TransactionCode, Amount)
with T as (
select CustomerId,
generate_series(1,2000) as SequenceNo,
(now() - '1 day'::interval * round(random() * 3650))::date as Date,
case when random() < 0.5 then 'C' else 'D' end as TransactionCode,
random() * 10000 as Amount
from REF_Customer
)
select * from T order by random();

Nicola

Re: OLTP Transaction

<2b645c82-edd2-4cd8-9277-5b72c7cfe255n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=151&group=comp.databases.theory#151

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:a0c:bf4b:: with SMTP id b11mr14929667qvj.11.1624679787304; Fri, 25 Jun 2021 20:56:27 -0700 (PDT)
X-Received: by 2002:a05:6830:1ac4:: with SMTP id r4mr12164915otc.36.1624679787024; Fri, 25 Jun 2021 20:56:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Fri, 25 Jun 2021 20:56:26 -0700 (PDT)
In-Reply-To: <sb5dkr$102h$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=199.229.250.138; posting-account=bFMNewoAAAAHC6b_JPlV7XvI31zIuG5T
NNTP-Posting-Host: 199.229.250.138
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com> <sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com> <sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com> <sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com> <saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com> <saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com> <sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com> <sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com> <sb1jg9$19nq$1@gioia.aioe.org> <f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com> <sb5dkr$102h$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2b645c82-edd2-4cd8-9277-5b72c7cfe255n@googlegroups.com>
Subject: Re: OLTP Transaction
From: derek.as...@gmail.com (Derek Ignatius Asirvadem)
Injection-Date: Sat, 26 Jun 2021 03:56:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 225
 by: Derek Ignatius Asirv - Sat, 26 Jun 2021 03:56 UTC

Nicola

Great post.

> On Saturday, 26 June 2021 at 06:17:37 UTC+10, Nicola wrote:
> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
> >> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
> >> > On 2021-06-24, Derek Ignatius Asirvadem wrote:

> > 2. Not only OLAP, any SELECT runs at READ COMMITTED.
>
> Yes. I focused on OLAP to emphasize that I have "complex" queries in
> mind.
>
> >> - Each SELECT holds read locks for the duration of the statement.
> >
> > Yes.
> > SQL/ACID requirement,
>
> Fine.
>
> >that MV-non-CC suites cannot support.
>
> Trivially, as MVCC does not use locks.

Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!

Or are you just stating that because it does not use locks, it holds no locks for the duration of the statement ? Pffft.

> > Take a look at this Benchmark I did against Oracle (MV-non-CC). Where
> > Sybase returned in 2.16 secs, Oracle had to abandon the benchmark
> > after 120 MINUTES.
> > __ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
> I remember finding that benchmark some time ago

Nice to know.

> and trying to
> reproduce it in PostgreSQL, based on the DDL I found here:
>
> https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/DDL%20101201.txt
>
> and the query found here:
>
> https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt
>
> My only noteworthy remarks at the time were:
>
> 1. As bad as Oracle might be, it can't be so slow. It sounds like the
> benchmark was performed by Sybase people and/or no effort was made to
> optimize on the Oracle side.

No, no. Of course it was me on the Sybase side. On the Oracle side it was the famous Tony Andrews himself. The Oracle world is huge compared to the Sybase world. He is a hugely famous Oracle expert. He engaged me on StackOverflow about time series data, that needed a subquery, which I said Oracle cacks itself, which lead to the benchmark.
__ https://stackoverflow.com/questions/4083464/design-database-relating-to-time-attribute
__ https://stackoverflow.com/questions/4375192/performance-of-different-approaches-to-time-based-data

On the contrary, the Oracle side was heavily optimised, both logical and physical. As noted, TA even duplicated the CustomerTransaction table, to elevate the speed. And I had to downgrade the config on the Sybase side. There were a many of those, only the huge one is mentioned in the doc.

Once he lost out on the Subquery SUM() = ABANDONED, he did not finish the remaining tests (we had agreed to 5 tests per benchmark), hence the empty slots Inline View SUM() on the spreadsheet.

Separate to the ABANDONED after 120 MINUTES, the other tests were

1 Subquery COUNT ___________ 3.0 x slower than Sybase
3 InlineView COUNT __________ 3.6 x slower than Sybase
3 InlineView COUNT Improved 4.8 x slower than Sybase
1 Subquery SUM _____________ Abandoned after 120 mins (Sybase executed in 2..16 secs)
3 InlineView SUM ____________ 26.4 x slower than Sybase
3 InlineView SUM Improved __ 36.4 x slower than Sybase

Not being honest, he tried posting falsities here. Which I had to destroy unfortunately:
__ https://groups.google.com/g/comp.databases.theory/c/IkJyS0MTzUs

> Anyway, I don't care about Oracle.

No problem. I referred to it because it is the oldest and most mature MV-non-CC, and it too, does not comply with SQL or ACID.

> 2. 488MB for the 2M records of CustomerTransaction seems quite a lot.
> I've got ~147MB, including indexes (well, one index), in PostgreSQL.

Yes, that figure is odd. Even with the duplicated table. I can’t check it now as that disk image is archived, and my current laptop is two generations newer.

> I have just repeated my experiment. The above query takes ~2.4s with
> cold data and ~800ms with hot data on my laptop. Ok, that's flash
> storage and a laptop one decade newer than the one in your document. But
> it's stock PostgreSQL with default parameters (which are extremely
> conservative), with no physical tuning applied.

Excellent. The ‘hot’ figure is the one we want (test is logical not physical I/O).

And considering the next post, re Inline SUM() at 1.2 secs.

Excellent, it is approaching Sybase performance, noting the fact that this is SSD, and the benchmark 11 years ago was on HDD.

> So, all in all it's not
> so bad for the price.

All in all ? Nah. It is hopeless for the price. No one takes responsibility. Literally the definition of insanity (the premise of the Mental Health Act is that the person is not responsible for their actions).

Put another way, with technical precision, you pay zero for non-SQL and non-ACID. And the thing that you paid nothing for has great performance in single user mode.

Now if we bring it back to the context of locking vs MV-non-CC, run:
__ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
__ 5 of those SELECTs
__ concurrently.

> For the sake of completeness, and possibly for comparison and further
> discussion, at the end of this message I report my benchmark.
> >> Don't such queries create contention (other transactions cannot
> >> update the data read by the OLAP query until the query is over)?
> >
> > Correct.
> > So what (see above).
> > No one cares about waiting a few seconds.
>
> There are many applications where waiting more than a few hundreds
> milliseconds might be problematic.

The context of my statement there is the OLAP and SELECTs. Outside that context, yes, we want millisec response. Since 2007, microsec resolution and metrics.

> For example, waiting ten seconds
> makes a web app unacceptably slow.

The metric at the bank is 2 secs.
The SG metric is 1 sec.

> > 2. I have stated that if you want to get into this, there is a Server
> > Monitoring report that just happens to expose this problem, and the
> > considerations, and with more granularity than “contention” ... but
> > you have not taken it up.
> > __ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
> >
> > The Read Locks are precisely this, SELECT queries (not OLAP, but that
> > makes no difference) contending with OLTP transactions. Do you not
> > care, that after fixing a SAN Config problem, the Read Locks dropped
> > by 91%, or what that means at the coal face. Or stated in converse,
> > why Read Locks were 1,100% higher when the SAN was mis-configured.
>
> I have no doubt that, given your experience, you are able to achieve
> great improvements on every system you touch.

Thanks.

> My questions are for
> better understanding on my part, not to tell you that you should run
> things differently.

But that is not what I meant, what >>I<< did is not relevant. Given your [then] concern re Read Locks, and such being held for some duration, and such contending with OLTP Transactions, you may be interested in what that actually looks like, in terms of metrics in a production server.

Admittedly that is a top-down view. Daniel has given a bottom-up view, and a further level of detail. I would not go there (further levels), not only are there many levels, and differences due to each implementation flavour, it is bottom-up, not good for understanding. I am trying to get you to think top-down, for understanding.

>> Integrity within a ResultSet
Meaning:
1. (yes) no phantoms; no anomalies
where that would normally be obtained via REPEATABLE_READ, which demands a Transaction, and implies several SELECTS, I provide it in a single SELECT at READ_COMMITTED, no Transaction.

2. I provide Predicate level Integrity in the ResultSet, somewhat beyond [1]. Yes, the methods exploit platform-specific features, but it can be readily translated to any other Commercial RDBMS. This has more to do with performance issues, less to do with SQL.

=========
To proceed with this thread then:
__ generally an understanding of OLTP, that is the LARGE SHARED DATA BANK context, rather than mere speed in a single-user context,
__ hopefully resolve my declaration that PissGriss does not support ACID
__ (ie. any MV-non-CC system cannot support ACID)
____ and therefore cannot provide OLTP
__ the two items that remain from my side:
___1 there are two major occurrences in OLTP that have not been mentioned yet, that must be covered before we can consider this subject complete (MV-non-CC types are totally ignorant about this)
___2 Optimistic Locking. We are dancing around it, without confirming its absolute need in OLTP.

Although you have no further questions re the Batch Transaction, that understanding is not complete, it can be completed only when the above outstanding elements are completed.

Cheers
Derek

Re: OLTP Transaction

<88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=152&group=comp.databases.theory#152

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:ac8:7b58:: with SMTP id m24mr12680994qtu.268.1624680601395;
Fri, 25 Jun 2021 21:10:01 -0700 (PDT)
X-Received: by 2002:a9d:471:: with SMTP id 104mr12472015otc.48.1624680601126;
Fri, 25 Jun 2021 21:10:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Fri, 25 Jun 2021 21:10:00 -0700 (PDT)
In-Reply-To: <sb5e3l$16gu$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=199.229.250.138; posting-account=bFMNewoAAAAHC6b_JPlV7XvI31zIuG5T
NNTP-Posting-Host: 199.229.250.138
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
<sb5e3l$16gu$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
Subject: Re: OLTP Transaction
From: derek.as...@gmail.com (Derek Ignatius Asirvadem)
Injection-Date: Sat, 26 Jun 2021 04:10:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Derek Ignatius Asirv - Sat, 26 Jun 2021 04:10 UTC

> On Saturday, 26 June 2021 at 06:25:28 UTC+10, Nicola wrote:
> > On 2021-06-24, Daniel Loth wrote:
>
> Hi Daniel,
> great to see someone else diving in! I hope you will last :)

Yes !

> But if Derek can keep all his queries under 12 seconds with an old (as
> in "mature") 2PC system

Sorry if I wasn't clear. Almost all my SELECTs, complex or not, execute sub-second. Almost all even at customer sites. The problem is, on their production system, even though I am hired to reconfigure it, I rarely get the power to change everything that requires correction, there is always some area (eg. SAN config) or some config parm (eg. auditing set up) that I cannot change. Thus I have a few SELECTs on customer systems, that take more than 1 sec, which I remember them well, the worst one being 12 secs.

> 2PC
No.
Not even "1PL"; "2PL"; "C2PL"; "S2PL"; "SS2PL", which are Straw Man understandings of Ordinary Locking.
I am trying to give you the real deal on Ordinary Locking, to remove you from the filth that the MV-non-CC mob declare it to be.

Cheers
Derek

Re: OLTP Transaction

<0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=153&group=comp.databases.theory#153

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:ac8:4a18:: with SMTP id x24mr12800365qtq.239.1624695160908; Sat, 26 Jun 2021 01:12:40 -0700 (PDT)
X-Received: by 2002:a9d:7c96:: with SMTP id q22mr5288213otn.105.1624695160637; Sat, 26 Jun 2021 01:12:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Sat, 26 Jun 2021 01:12:40 -0700 (PDT)
In-Reply-To: <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=193.119.51.108; posting-account=S056qgoAAAASvuIiqcdQPeaQ70w-ZkVa
NNTP-Posting-Host: 193.119.51.108
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com> <sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com> <sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com> <sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com> <saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com> <saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com> <sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com> <sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com> <sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com> <sb5e3l$16gu$1@gioia.aioe.org> <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com>
Subject: Re: OLTP Transaction
From: daniel.j...@gmail.com (Daniel Loth)
Injection-Date: Sat, 26 Jun 2021 08:12:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 196
 by: Daniel Loth - Sat, 26 Jun 2021 08:12 UTC

Hi Derek, Nicola,

> But if Derek can keep all his queries under 12 seconds with an old (as
> in "mature") 2PC system

So having gone through much of the content that Derek has proffered over time (much of it on StackOverflow), I believe this is mostly owing to the meticulous normalisation that his databases implement.

Your ordinary database out in industry is often very sloppy.
For example, one of our databases at work has at least one table with at least 70 columns, and those columns are sparsely populated (that is, they are nullable and quite often null).
Suppose each of those 70 columns are, on average, 8 bytes each. Each row is therefore 560 bytes wide.

If we keep it simple and assume that the entire 8KiB of a data page is available for row data then we can fit 14 rows of 560 bytes on each data page. So that's a low data density to begin with.

Next, consider how many ways in which you might wish to query those 70 columns. There are many permutations, and supporting even a modest number of those permutations would involve a menagerie of indexes.

On each insert, all of these indexes must be updated.

On each update, all indexes that contain the modified column values must be updated.
If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).

In the specific case of Postgres, I understand they had a write amplification issue a number of years ago. This meant that all updates involved updating all indexes, even where those indexes didn't strictly require an update.
Uber famously blogged about it as their reason for abandoning Postgres. You can read more about it here: https://eng.uber.com/postgres-to-mysql-migration/

For what it's worth, I accept that this is an implementation detail of Postgres and not something inherent in MVCC. Had they designed their MVCC in another way, they might have avoided the problem. I couldn't say with certainty though just how much work they'd have to do to fix it. Who knows, maybe they have fixed it - Uber's blog post was published in 2016.

Finally, on each delete, all indexes must once again be updated.

Now let's consider what that index count might look like. In practice, it's not that unusual to see an un-normalised SQL Server database table with 10 or so indexes.
If you defer to consultants such as Brent Ozar (of 'Microsoft Certified Master of SQL Server' fame), they'll tell you that you should aim for 5 indexes.

Of course, in our table with 70 columns, it's extraordinarily unlikely that we'll be able to satisfy all of the desirable data access paths with indexes. So a trade-off is made.
We can do one of two things:

1. We can employ more indexes. Each index has its associated cost when performing an insert / update / delete operation, as described above.

2. We can employ fewer indexes. We face lower cost when inserting / updating / deleting data. However it's more likely that a user will run a query with a predicate not easily satisfied by one of the existing indexes, and the DBMS will be forced to resort to a table scan (in physical terms, a clustered index scan).

---

So given the above, I'll contrast it with Derek's typical implementation and the characteristics as I see them.

Firstly, Derek's implementations are highly normalised. The tables in his models often comprise a primary key (natural key) and several attributes. Tables with fewer columns require far fewer indexes to satisfy the queries that you would perform on them.

In fact, he often mentions that his databases contain no null values. This is due to him modelling optional attributes in their own tables. That is, a table that he migrates the key of the parent table into plus that one additional optional attribute. This table, comprising a key and one attribute, is in 6NF.

Suppose you have a 'Person' table with a nullable 'MiddleName' column. And suppose in practice that this table contains a million rows, but only 10% of Person records contain a non-null value for MiddleName.

When modelled as a single table with nullable column, there are 900,000 records with a null value.

Derek would likely model the relationship like this:
Person -> { PersonKey, Attribute1, ..., AttributeN }
PersonMiddleName -> { PersonKey, MiddleName }

Person would contain 1 million rows that do not comprise a 30 to 50 character MiddleName column. In SQL Server, each nullable column has 2 bytes of bookkeeping information when the column is null. These must be processed for every select query executed on the table.

PersonMiddleName would contain 100,000 records. You'd only ever query these records when you specifically require a Person's MiddleName, so they don't impose a cost when querying the Person table.
The PersonMiddleName table would also only require a single index to support those queries, that index being the clustered index.

If we scale this idea up to hundreds or even thousands of nullable columns across a database then they impose a significant cost. In practice, a million records is not a lot. I used 'million' to avoid any ambiguity with 'billion' (which depends on the locale that a discussion participant resides in)..

---

Secondly, and perhaps most importantly, MVCC produces a great deal of litter. Old versions of a row are retained in pages. In my example above there are 14 rows per page. This is already very low density because each row requires so much space.

However those 14 rows might additionally include former versions of a logically equivalent row (that is, former versions of some row for a given key).

So if there are 10 versions of a particular row on that data page, and that data page can hold 14 rows, then the page actually contains at most 5 distinct rows (1 row versioned 10 times, and at most 4 other distinct rows).

This means that each page of data processed might provide only minimal utility for that processing cost.

So an MVCC-based system is very likely making less progress with each page that it reads from disk.

An MVCC-based system is also very likely deriving less utility per GiB of RAM.

---

Thirdly is the meticulous transaction control. All transactions live within the database, implemented as procedures.

I can think of at least two benefits:

1. A database administrator has the complete picture on-hand. They can readily optimise the database as required because they have a definitive view of all queries and can determine the indexes required to best satisfy them.

2. Misbehaving client code can't unwittingly start a transaction, acquire a number of exclusive locks, and then retain those locks for a prolonged period of time (thus blocking other users of the system).
And well-behaved client code written with the best intentions can't diligently start a transaction, acquire a number of exclusive locks, and then unwittingly fall prey to CPU scheduler preemption for an arbitrarily long period of time.

On this matter, I posted a new discussion last night (my time) titled 'Stored procedure structure in RDBMS using Lock Manager for transaction isolation'.
I view these discussions through Google Groups. If you do too then the discussion can be viewed here: https://groups.google.com/g/comp.databases.theory/c/BNL-TwgMfPY

---

Fourthly, the databases are modelled and their performance measured with a whole-of-system mindset.

Many software developers jump immediately to 'de-normalisation' (I've used the phrase 'un-normalised' above because most of them don't actually start from a normalised model) because on a micro-level basis they can write a query that is demonstrably faster for one query.

Of course, that's just one query. What developers often overlook is the hundreds or thousands of other queries that must co-exist with this one query.

In my view, this is why MVCC is seen to be working reasonably well by many in the industry.

Many workplaces performing software development activities lack the rigour that is required to implement a database that would be even modestly consistent with Derek's ideal.

In fact, Derek's ideal necessarily involves a degree of Change Control (think older fashioned Change Advisory Boards), and this aspect alone is anathema in modern software development workplaces (who often strive to apply agile methodologies and mindsets, for better or worse).

In this context, which you might ascribe the phrase 'lowest common denominator', MVCC works pretty well. And is seen to work pretty well.

---

Anyway, these things taken together might go some way towards explaining the performance discrepancies mentioned in this discussion and elsewhere.

Cheers,
Daniel

Re: OLTP Transaction

<5a1ed84c-3c31-4ebb-be58-fa6db9900addn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=154&group=comp.databases.theory#154

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:a0c:eb0c:: with SMTP id j12mr15216631qvp.3.1624695390565; Sat, 26 Jun 2021 01:16:30 -0700 (PDT)
X-Received: by 2002:a05:6808:6d1:: with SMTP id m17mr13943377oih.34.1624695390332; Sat, 26 Jun 2021 01:16:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Sat, 26 Jun 2021 01:16:30 -0700 (PDT)
In-Reply-To: <0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=193.119.51.108; posting-account=S056qgoAAAASvuIiqcdQPeaQ70w-ZkVa
NNTP-Posting-Host: 193.119.51.108
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com> <sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com> <sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com> <sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com> <saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com> <saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com> <sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com> <sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com> <sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com> <sb5e3l$16gu$1@gioia.aioe.org> <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com> <0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a1ed84c-3c31-4ebb-be58-fa6db9900addn@googlegroups.com>
Subject: Re: OLTP Transaction
From: daniel.j...@gmail.com (Daniel Loth)
Injection-Date: Sat, 26 Jun 2021 08:16:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 7
 by: Daniel Loth - Sat, 26 Jun 2021 08:16 UTC

Sorry, just a follow-up concerning this remark:

> If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).

In fact, this is likely not true even in products using MVCC such as Postgres.

Cheers,
Daniel

Re: OLTP Transaction

<sb7153$mhi$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=155&group=comp.databases.theory#155

  copy link   Newsgroups: comp.databases.theory
Path: i2pn2.org!i2pn.org!aioe.org!Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org.POSTED!not-for-mail
From: nic...@nohost.org (Nicola)
Newsgroups: comp.databases.theory
Subject: Re: OLTP Transaction
Date: Sat, 26 Jun 2021 10:56:35 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 59
Message-ID: <sb7153$mhi$1@gioia.aioe.org>
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org>
<792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org>
<6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org>
<88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org>
<17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org>
<247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org>
<bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org>
<2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org>
<f65ca597-d1dc-462f-9671-4d3e7dadf6e8n@googlegroups.com>
<sb5dkr$102h$1@gioia.aioe.org>
<2b645c82-edd2-4cd8-9277-5b72c7cfe255n@googlegroups.com>
NNTP-Posting-Host: Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: slrn/1.0.3 (Darwin)
X-Notice: Filtered by postfilter v. 0.9.2
 by: Nicola - Sat, 26 Jun 2021 10:56 UTC

On 2021-06-26, Derek Ignatius Asirvadem <derek.asirvadem@gmail.com> wrote:
>> Trivially, as MVCC does not use locks.
>
> Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!

I am not sure that by ACID compliance we mean the same thing. You seem
to imply more than what the acronym stands for (or I am taking a narrow
interpretation of it). But I am not excluding that you may be right.
Hopefully, we will be able to clarify this.

>> I have just repeated my experiment. The above query takes ~2.4s with
>> cold data and ~800ms with hot data on my laptop. Ok, that's flash
>> storage and a laptop one decade newer than the one in your document. But
>> it's stock PostgreSQL with default parameters (which are extremely
>> conservative), with no physical tuning applied.
>
> Excellent. The ‘hot’ figure is the one we want (test is logical not physical I/O).

Ok.

> Now if we bring it back to the context of locking vs MV-non-CC, run:
> __ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
> __ 5 of those SELECTs
> __ concurrently.

Time permitting, I'll try.

>> My questions are for
>> better understanding on my part, not to tell you that you should run
>> things differently.
>
> But that is not what I meant, what >>I<< did is not relevant. Given
> your [then] concern re Read Locks, and such being held for some
> duration, and such contending with OLTP Transactions, you may be
> interested in what that actually looks like, in terms of metrics in
> a production server.

With the additional context you have provided, I'm fine.

> To proceed with this thread then:
> __ generally an understanding of OLTP, that is the LARGE SHARED DATA
> BANK context, rather than mere speed in a single-user context,
> __ hopefully resolve my declaration that PissGriss does not support ACID
> __ (ie. any MV-non-CC system cannot support ACID)
> ____ and therefore cannot provide OLTP
> __ the two items that remain from my side:
> ___1 there are two major occurrences in OLTP that have not been
> mentioned yet, that must be covered before we can consider this
> subject complete (MV-non-CC types are totally ignorant about this)
> ___2 Optimistic Locking. We are dancing around it, without confirming
> its absolute need in OLTP.
>
> Although you have no further questions re the Batch Transaction, that
> understanding is not complete, it can be completed only when the above
> outstanding elements are completed.

Good. Hopefully, we'll get to paint the full picture.

Nicola

Re: OLTP Transaction

<sb76t3$134d$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=156&group=comp.databases.theory#156

  copy link   Newsgroups: comp.databases.theory
Path: i2pn2.org!i2pn.org!aioe.org!Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org.POSTED!not-for-mail
From: nic...@nohost.org (Nicola)
Newsgroups: comp.databases.theory
Subject: Re: OLTP Transaction
Date: Sat, 26 Jun 2021 12:34:43 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 219
Message-ID: <sb76t3$134d$1@gioia.aioe.org>
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org>
<792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org>
<6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org>
<88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org>
<17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org>
<247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org>
<bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org>
<2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org>
<f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
<sb5e3l$16gu$1@gioia.aioe.org>
<88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
<0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com>
NNTP-Posting-Host: Ca3fXwVFPO74v52zQgh0qQ.user.gioia.aioe.org
X-Complaints-To: abuse@aioe.org
User-Agent: slrn/1.0.3 (Darwin)
X-Notice: Filtered by postfilter v. 0.9.2
 by: Nicola - Sat, 26 Jun 2021 12:34 UTC

On 2021-06-26, Daniel Loth <daniel.j.loth@gmail.com> wrote:
> Hi Derek, Nicola,
>
>> But if Derek can keep all his queries under 12 seconds with an old (as
>> in "mature") 2PC system
>
> So having gone through much of the content that Derek has proffered
> over time (much of it on StackOverflow), I believe this is mostly
> owing to the meticulous normalisation that his databases implement.

Sure, the whole discussion we are having assumes a proper database
design.

> Your ordinary database out in industry is often very sloppy.
> For example, one of our databases at work has at least one table with
> at least 70 columns, and those columns are sparsely populated (that
> is, they are nullable and quite often null).

That must be fun to work with!

> Suppose each of those 70 columns are, on average, 8 bytes each. Each
> row is therefore 560 bytes wide.
>
> If we keep it simple and assume that the entire 8KiB of a data page is
> available for row data then we can fit 14 rows of 560 bytes on each
> data page. So that's a low data density to begin with.
>
> Next, consider how many ways in which you might wish to query those 70
> columns. There are many permutations, and supporting even a modest
> number of those permutations would involve a menagerie of indexes.
>
> On each insert, all of these indexes must be updated.

A nightmare. But that cannot be saved by any carefully crafted
transactions.

> In the specific case of Postgres, I understand they had a write
> amplification issue a number of years ago. This meant that all updates
> involved updating all indexes, even where those indexes didn't
> strictly require an update.
> Uber famously blogged about it as their reason for abandoning
> Postgres. You can read more about it here:
> https://eng.uber.com/postgres-to-mysql-migration/
>
> For what it's worth, I accept that this is an implementation detail of
> Postgres and not something inherent in MVCC. Had they designed their
> MVCC in another way, they might have avoided the problem. I couldn't
> say with certainty though just how much work they'd have to do to fix
> it. Who knows, maybe they have fixed it - Uber's blog post was
> published in 2016.

I doubt that things have changed. But, as you say, that is not the only
possible implementation of MVCC (and likely not the best).

> Now let's consider what that index count might look like. In practice,
> it's not that unusual to see an un-normalised SQL Server database
> table with 10 or so indexes.
> If you defer to consultants such as Brent Ozar (of 'Microsoft
> Certified Master of SQL Server' fame), they'll tell you that you
> should aim for 5 indexes.
>
> Of course, in our table with 70 columns, it's extraordinarily unlikely
> that we'll be able to satisfy all of the desirable data access paths
> with indexes. So a trade-off is made.
> We can do one of two things:

Or, rather than patching the unpatchable, redesign the database.

> So given the above, I'll contrast it with Derek's typical
> implementation and the characteristics as I see them.
>
> Firstly, Derek's implementations are highly normalised. The tables in
> his models often comprise a primary key (natural key) and several
> attributes. Tables with fewer columns require far fewer indexes to
> satisfy the queries that you would perform on them.

Correct.

> In fact, he often mentions that his databases contain no null values.

I agree with his choice. There are lots of reasons to avoid NULLs, from
the most theoretical to the most practical.

> This is due to him modelling optional attributes in their own tables.
> That is, a table that he migrates the key of the parent table into
> plus that one additional optional attribute. This table, comprising
> a key and one attribute, is in 6NF.

> Suppose you have a 'Person' table with a nullable 'MiddleName' column.
> And suppose in practice that this table contains a million rows, but
> only 10% of Person records contain a non-null value for MiddleName.
>
> When modelled as a single table with nullable column, there are
> 900,000 records with a null value.
>
> Derek would likely model the relationship like this:
> Person -> { PersonKey, Attribute1, ..., AttributeN }
> PersonMiddleName -> { PersonKey, MiddleName }

I can't speak for Derek, but in this specific example I'd leave
MiddleName as an attribute of Person, just making not NULLable
(actually, part of a key). When a person has no middle name, you may
record an empty string. One compelling reason to keep MiddleName with
Person is that it is used to identify a person.

As a general approach, however, what you say is correct: optional
attributes (and multi-valued attributes, such as the phone numbers of
a person) are separate, dependent, facts of Person.

> Person would contain 1 million rows that do not comprise a 30 to 50
> character MiddleName column. In SQL Server, each nullable column has
> 2 bytes of bookkeeping information when the column is null. These must
> be processed for every select query executed on the table.
>
> PersonMiddleName would contain 100,000 records. You'd only ever query
> these records when you specifically require a Person's MiddleName, so
> they don't impose a cost when querying the Person table.
> The PersonMiddleName table would also only require a single index to
> support those queries, that index being the clustered index.
>
> If we scale this idea up to hundreds or even thousands of nullable
> columns across a database then they impose a significant cost.

I am not sure how you would incur into such costs, unless you need to
produce tables with thousands of columns.

> Secondly, and perhaps most importantly, MVCC produces a great deal of
> litter. Old versions of a row are retained in pages. In my example
> above there are 14 rows per page. This is already very low density
> because each row requires so much space.
>
> However those 14 rows might additionally include former versions of
> a logically equivalent row (that is, former versions of some row for
> a given key).
>
> So if there are 10 versions of a particular row on that data page, and
> that data page can hold 14 rows, then the page actually contains at
> most 5 distinct rows (1 row versioned 10 times, and at most 4 other
> distinct rows).
>
> This means that each page of data processed might provide only minimal
> utility for that processing cost.
>
> So an MVCC-based system is very likely making less progress with each
> page that it reads from disk.

That's sensible.

> An MVCC-based system is also very likely deriving less utility per GiB
> of RAM.

In the sense that buffers can potentially be filled with old records?
Yes, that's reasonable.

> Thirdly is the meticulous transaction control. All transactions live
> within the database, implemented as procedures.

Again, I agree with Derek on this. Of course, we are talking about
*database* transactions, not *business* transactions (e.g., buying an
item on an online shop is one business transaction, but it is carried
out through several database transactions).

> I can think of at least two benefits:
>
> 1. A database administrator has the complete picture on-hand. They can
> readily optimise the database as required because they have
> a definitive view of all queries and can determine the indexes
> required to best satisfy them.
>
> 2. Misbehaving client code can't unwittingly start a transaction,
> acquire a number of exclusive locks, and then retain those locks for
> a prolonged period of time (thus blocking other users of the system).
> And well-behaved client code written with the best intentions can't
> diligently start a transaction, acquire a number of exclusive locks,
> and then unwittingly fall prey to CPU scheduler preemption for an
> arbitrarily long period of time.

There are also security benefits: better access control, application of
the least privilege principle, etc.

I say to my students: look at the installation instructions of any
open-source web app. For sure they'll tell you that you should
absolutely give tight permissions to the web directory, that it should
not be owned by root, etc. Then, they go on by making you create
a database and a database user (or, in the best cases, two) to own it.
Then, the web app reads and rides across the database as that user. That
is the moral equivalent of chmod 777 on the file system.

Then, there's the matter of correctness: only the operations on the
database that preserve consistency should be allowed.

> On this matter, I posted a new discussion last night (my time) titled
> 'Stored procedure structure in RDBMS using Lock Manager for
> transaction isolation'.


Click here to read the complete article
Re: OLTP Transaction

<56540e68-7a9f-48dc-84e1-7d1cc552db2bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=157&group=comp.databases.theory#157

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:a37:a8f:: with SMTP id 137mr13044104qkk.81.1624719009120;
Sat, 26 Jun 2021 07:50:09 -0700 (PDT)
X-Received: by 2002:a05:6808:6d1:: with SMTP id m17mr14790961oih.34.1624719008928;
Sat, 26 Jun 2021 07:50:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Sat, 26 Jun 2021 07:50:08 -0700 (PDT)
In-Reply-To: <sb76t3$134d$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=193.119.51.108; posting-account=S056qgoAAAASvuIiqcdQPeaQ70w-ZkVa
NNTP-Posting-Host: 193.119.51.108
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
<sb5e3l$16gu$1@gioia.aioe.org> <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
<0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com> <sb76t3$134d$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <56540e68-7a9f-48dc-84e1-7d1cc552db2bn@googlegroups.com>
Subject: Re: OLTP Transaction
From: daniel.j...@gmail.com (Daniel Loth)
Injection-Date: Sat, 26 Jun 2021 14:50:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Daniel Loth - Sat, 26 Jun 2021 14:50 UTC

Hi Nicola,

> Can you elaborate on that? Do you mean that MVCC works well only when
> your goal is to micro-optimize, but it is not suitable to obtain
> system-wide good performance?

Yes, happy to elaborate by way of anecdote.

Joe Developer discovers that he can examine a query plan in a tool like SQL Server Management Studio. He compares two queries:

select FirstName, MiddleName, FamilyName
from Person
where PersonId = 1;

and

select p.FirstName, pmn.MiddleName, p.FamilyName
from Person p
left join PersonMiddleName pmn
on p.PersonId = pmn.PersonId
where p.PersonId = 1;

And rightfully concludes that the former is faster. The former query produces a plan comprising a single clustered index seek. The latter comprises two clustered index seeks, so twice the cost.
Though note that twice the cost, in this example, really means 1 logical read for the former and 2 logical reads for the latter. So twice as expensive, but really that extra expense is marginal.

What this developer doesn't consider is that we might only actually need to select the middle name as part of two or three queries, and that these queries that do require the middle name actually comprise a very small proportion of the workload that the DBMS must handle.
The developer, an individual working on a task, is focused on their own work and very easily takes a myopic view with little thought for next month, let alone next year or the next five years.

This behaviour is what leads to the aforementioned 70 column table, and the perception that de-normalization leads to higher performance. One person does it, someone else follows their lead, and then it becomes the norm.

Developers make other poor choices too. All of the time, really. Often due to it being expedient to do so due to perceived or real pressure to deliver..

For example, they might write code that updates a substantial number of records within a single statement (which has been touched on in recent discussions here).
They test it locally and it's fine, because they're working with a small set of records locally.
But in production environments with a substantial volume of data this creates an untenable situation in the context of a database that uses a Lock Manager, because the writers-block-readers situation arises.

A series of often short-term expedient decisions are taken and death spiralling ensues. Developers inevitably reach a point where they struggle with slow transactions, excessive blocking, deadlocks.
MVCC is seen as a get-out-of-jail-free card when improper transaction management leads to excessive blocking in a database with a Lock Manager.

This creates a perverse situation where MVCC is seen to be better because it avoids the untenable: A database that might effectively be single-user-only until the aforementioned writers-block-readers situation subsides and normal service resumes.
You could say that MVCC provides a safety net. A poorly behaved query won't necessarily bring the system to its knees. At least not for the same reason as a system relying on a Lock Manager, anyway.

In the Microsoft world, the recommendation nowadays from a number of highly regarded thought-leaders (such as Brent Ozar) is to enable 'snapshot isolation' and 'read committed snapshot isolation' by default (i.e., SQL Server's MVCC).
Pragmatic DBAs accept that developers are going to make a mess of it, and they recognise that in many companies developers are going to be making database design decisions because those companies reject ideas such as Change Advisory Boards having to approve things. The idea being that CAB is a handbrake on development velocity. And I recognise and accept that a CAB does slow things down in practice (though for the purpose of due diligence).

In this context I would agree that MVCC can work very well in spite of some of the issues previously discussed. But not because of technological superiority when compared with a RDBMS using a Lock Manager.

But to circle back to my remarks about utility, I don't think MVCC can physically outperform a system that uses a Lock Manager.

Reduced utility from resources due to more data pages being required to store the same content (i.e., remnant versions of rows take space).
This means more pages to cache in RAM and thus more contention for this finite RAM.
In turn, more cache churn. Page Life Expectancy is lower, so it's more likely that a page won't be cached when it needs to be accessed. So then there's additional IO pressure.

With those physical realities in mind, I'd expect a database using locking to perform better when implemented properly.

However, I imagine that if you have a database hosted on a system and that database fits entirely in RAM (and has plenty of headroom) then perhaps they perform comparably.

Cheers,
Daniel

OLTP Transaction, ACID Definition

<75b6ace0-87e0-4c56-bdaf-551ac97fa7d9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=162&group=comp.databases.theory#162

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:ae9:ed4c:: with SMTP id c73mr844520qkg.37.1624843615885;
Sun, 27 Jun 2021 18:26:55 -0700 (PDT)
X-Received: by 2002:a05:6808:6d1:: with SMTP id m17mr18362847oih.34.1624843615624;
Sun, 27 Jun 2021 18:26:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.swapon.de!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Sun, 27 Jun 2021 18:26:55 -0700 (PDT)
In-Reply-To: <sb76t3$134d$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=199.229.250.66; posting-account=bFMNewoAAAAHC6b_JPlV7XvI31zIuG5T
NNTP-Posting-Host: 199.229.250.66
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
<sb5e3l$16gu$1@gioia.aioe.org> <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
<0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com> <sb76t3$134d$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <75b6ace0-87e0-4c56-bdaf-551ac97fa7d9n@googlegroups.com>
Subject: OLTP Transaction, ACID Definition
From: derek.as...@gmail.com (Derek Ignatius Asirvadem)
Injection-Date: Mon, 28 Jun 2021 01:26:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 20928
 by: Derek Ignatius Asirv - Mon, 28 Jun 2021 01:26 UTC

Nicola

> On Saturday, 26 June 2021 at 20:56:40 UTC+10, Nicola wrote:
> > On 2021-06-26, Derek Ignatius Asirvadem wrote:
> >
> >> Trivially, as MVCC does not use locks.
> >
> > Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!
>
> I am not sure that by ACID compliance we mean the same thing. You seem
> to imply more than what the acronym stands for (or I am taking a narrow
> interpretation of it). But I am not excluding that you may be right.
> Hopefully, we will be able to clarify this.

Being attached to objective truth, let’s start with definitions. Obviously I do not have private definitions. Since IBM invented it, and set the standard for Transaction Processing, we use theirs. But before I go to yet another item, that will be viewed as a fragment, in isolation from the reality of its existence, let’s name that problem.

For understanding, for the full context, think about the academics,
(a) who remain clueless about ACID and Ordinary Locking,
(b) who erect Straw Men for “ACID”; “2PL” and its variants, in order to maintain [a],
(c) who are addicted to the hysterical fiction of MV-non-CC,
(d) who seek to impose that insanity (count ‘em, three levels) on the world,
(e) who sing the Stonebraker mantra while getting 10,000 developers across the planet to write /one/ codeline, and
(f) who maintain denial of the mountain of evidence that there are problems consequent to the insanity. (Just reading that makes me ROTFLMAO.)

Now those freaks write hundreds of papers to maintain their fantasy world, otherwise they would have nothing to read. So to them, yes, in their fantasy world, and their Straw Man concept of the real world, there are all sorts of “problems” in Ordinary Locking, and ACID is reduced to the meaningless filth that they promote as “ACID”. Even the notion of Transaction is fragmented, very shallow. And of course, the overarching problem that there is no integrity in the system because they are deal ing with fragments; not Atoms.

In my response to Daniel, I stated:
> On Sunday, 27 June 2021 at 19:11:07 UTC+10, Derek Ignatius Asirvadem wrote:
>
> The thing is this. And both Nicola and you have this problem. Not because you are stupid, you are not, but due to the programming that passes for “education” that has been imposed on you. You have been trained to (a) think in terms of fragments, not integration, not atoms. And that means (b) bottom-up, instead of top-down. So you have noticed and copied things correctly, but the overall need, and the integration within the Atom, is not complete.

Now you have left that, at least tentatively, and you are crossing the chasm between academia and reality in this science. Excellent. But the consistent problem, the obstacle, is that you are viewing the real world (here Transactions and ACID), from your academic perspective, as fragments, and bottom-up, which is dead wrong, as detailed above.

In science, we simply cannot have different definitions. If we did, we would be in two tiled cells, across the corridor from each other, screaming our “truth” to the other. No, science is objective. We have the same objective definitions, or you are not a scientist.

We cannot deal with fragments, while denying the atoms that the fragments are located in.

So you need to drop the academic, now-known-to-be-hysterical notions of Transactions; of ACID, and pick up the unchanged scientific ones.

Even Transactions, and ACID (which has to do with Transactions), are fragments, out of context. So first let’s establish that outer context, the entirety of which the Stonebraker cult deny ... and then they try to resurrect fragments thereof at COMMIT time.

--------------------------------------------
-- Transaction Processing --
--------------------------------------------

You are not in Disneyland, where you can forget about reality, sing a hymn to Stonebraker, and erect a system that magically produces “transactions” that “have ACID properties” that “never conflict”. You are not in an asylum, where you can deny the reality of a single version of the truth in a shared database, and maintain a your private version that you do God-knows-what with.

No, this is the real world, conflicts are normal, and we have ways to resolve them efficiently. The shared database is an ONLINE AND EVER-CHANGING single version of the truth. The moment you take your snapshot or “version”, it is, by definition, out-of-date; stale; obsolete. Stick that in your photo album as a keepsake from your trip to Disneyland, grab a coffee, and get ready to deal with the real world.

__ https://www.ibm.com/docs/en/cics-ts/5.4?topic=overview-transaction-processing

Now in that context, we have:

-----------------------------------------------------
-- Transaction • ACID Properties --
-----------------------------------------------------

__ https://www.ibm.com/docs/en/cics-ts/5.4?topic=processing-acid-properties-transactions

Since the academics have perverted the meanings, in order to provide tiny fragments of it, and have that accepted as “ACID”, we need to take that down, to maintain the original and full meaning:

----------------------------------------------------------
-- Transaction • ACID • Consistency --
----------------------------------------------------------

//Data is in a consistent state when a transaction starts and when it ends.//
What is described above is obviously Transaction Consistency. Separately, the database has Database Consistency, which is the sum total of all constraint declarations. These cannot be violated, meaning that:
__ Data is in a consistent state when a transaction starts; at each step; and thus when it ends.
Eg. Customer_Has_[1-to-n]Addresses is not enforceable ate the database level, it is enforced at the Transaction level.
Eg. That the Addresses belong to a valid Customer, is enforced at the database level.

The pig sucker brigade (Date; Darwen; Fagin; Stonebraker; Ellison; all their followers; etc) do not have a database, they have only pre-Relational Record Filing Systems, the database level constraints are not understood, and what constraints are implemented in their RFS is meagre. They pervert that definition, and demand that database constraints can be violated during the transaction, and only enforced when the transaction ends. Eg. The /Relational Model/ prohibits circular references, but they need it in their RFS, thus they need the “deferred non-constraint checking” that are foreign to Relational databases.

In a Relational database, data Consistency is maintained throughout (when a Transaction begins; during its execution; when it ends)

-----------------------------------------------------
-- Transaction • ACID • Isolation --
-----------------------------------------------------

//The intermediate state of a transaction is invisible to [or isolated from] other transactions.
As a result, transactions that run concurrently appear to be serialized.//

Stonebraker and his slaves pervert that at two levels.
1. Having denied:
__ a. that the database is a shared resource, with other concurrent users, and
__ b. that the database maintains a single version of any truth (Fact),
__ c. that can be updated by anyone,
it takes Isolation as the entirety of its existential reality, isolating the user in a tiled cell, where the madding crowd of reality does not intrude..

But note, this fosters a single-user, single-thread, mindset. A pathological anti-social mindset in a TP environment.

2. Of course the bubble of denial of reality, is punctured when the thus isolated user has to COMMIT his changes, to the [oh no, it exists] shared database, that is [oh no, they exist] shared by concurrent users. So it hysterically mixes up the cause and the effect, and try to obtain the effect without the cause:

__2.1 It takes [ Transaction/ACID/Isolation ] and perverts it into [ TransactionIsolation ].
Instead of understanding that THE UNCOMMITTED CHANGES INSIDE a Transaction are isolated from other users, they define a new notion: TransactionIsolation, as a thing, and they give the guy in the tiled cell who is very very attached to his stale “version” of the database, even more power to reinforce his insanity, there you now have TransactionIsolation, you can hold it and suck it for as long as you like, just like the “versions”. Just do not mention the resolution problem (to be clear even that is not Concurrency Control) at COMMIT time.

Yet more erection of fantasy, of users pretending to be isolated from each other, totally anti-social, while operating on a shared database, that has social requirements. Reinforcement of the schizophrenic position, at war with reality. This entire page defines the perversion, it makes teh fantasy “real”.

No, that is taking a fragment of meaning, about one component of ACID, and creating a new fantasy of TI, inside the old fantasy of MV, for the explicit purpose of maintaining the fantasy of MV.

No, in the real world, we have no fantasies, we are attached to the shared database and the social behaviour requirements that go with such, we have [ Transaction/ACID/Isolation ], therefore we do not need the hysterical notion of [ TransactionIsolation ], please erase it from your mind.

__2.2 It takes [ appear to be serialized ] and even [ SERIALIZABLE ], and perverts it into [ SERIALIZED ]. Hell, it is not the actor in a costume, that appears as the character, that makes the character see real, it is the costume that is real. You can’t make this stuff up, you have to get a PhD in Computer Science at Warwick University or Berkeley, to be able to come up with such filth. And then all the academics genuflect. And then 10,000 young minds that have been perverted run along and write code. Ever-changing code. To make fantasies appear real.


Click here to read the complete article
Redefined terms for Academics

<08f7379e-99ef-4504-9448-5f4c2a4c6fecn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=253&group=comp.databases.theory#253

  copy link   Newsgroups: comp.databases.theory
X-Received: by 2002:a05:622a:1a11:: with SMTP id f17mr24417103qtb.308.1629604719068;
Sat, 21 Aug 2021 20:58:39 -0700 (PDT)
X-Received: by 2002:a4a:5241:: with SMTP id d62mr21326368oob.9.1629604718871;
Sat, 21 Aug 2021 20:58:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.databases.theory
Date: Sat, 21 Aug 2021 20:58:38 -0700 (PDT)
In-Reply-To: <75b6ace0-87e0-4c56-bdaf-551ac97fa7d9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=49.195.211.78; posting-account=bFMNewoAAAAHC6b_JPlV7XvI31zIuG5T
NNTP-Posting-Host: 49.195.211.78
References: <0e34f425-23a5-4392-8152-ee854184be53n@googlegroups.com>
<sa847p$1ne6$1@gioia.aioe.org> <792b63d0-1a5a-4ce9-ad5e-71bb9b861647n@googlegroups.com>
<sacede$1onk$1@gioia.aioe.org> <6025af42-64e1-4b06-b91a-3138ce27e49en@googlegroups.com>
<sadkc2$eof$1@gioia.aioe.org> <88277854-15c8-4c9d-a948-be65e1ca979cn@googlegroups.com>
<saf4hi$dr8$1@gioia.aioe.org> <17d4f756-5085-45f7-8fde-5fe363b76aecn@googlegroups.com>
<saia6u$1h0q$1@gioia.aioe.org> <247b509c-df38-461e-8d88-37a5ca123231n@googlegroups.com>
<sasecn$1f00$1@gioia.aioe.org> <bc9e655a-1747-42e4-898b-81e71d223a18n@googlegroups.com>
<sb0asf$10e8$1@gioia.aioe.org> <2e0c9078-dabf-4684-8fbf-c8d6aa79a708n@googlegroups.com>
<sb1jg9$19nq$1@gioia.aioe.org> <f10e57fd-0a16-4371-b5ef-d3367314c482n@googlegroups.com>
<sb5e3l$16gu$1@gioia.aioe.org> <88a873ec-a718-4aff-90fc-a299e397d5ffn@googlegroups.com>
<0dcfbab4-f7ef-422e-b783-bcf20358e5edn@googlegroups.com> <sb76t3$134d$1@gioia.aioe.org>
<75b6ace0-87e0-4c56-bdaf-551ac97fa7d9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <08f7379e-99ef-4504-9448-5f4c2a4c6fecn@googlegroups.com>
Subject: Redefined terms for Academics
From: derek.as...@gmail.com (Derek Ignatius Asirvadem)
Injection-Date: Sun, 22 Aug 2021 03:58:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Derek Ignatius Asirv - Sun, 22 Aug 2021 03:58 UTC

Nicola

In the /Stored Proc for OLTP Transactions/ thread, one major issue we dealt with was (a) the exposure and use of false and redefined terms by academics, (b) the assertion of standard terms, noting the chronology. It occurs to me this thread suffers the same, the extent of which I did not appreciate at the time. I suggest you read this thread again, keeping in mind that when I use standard terms, I mean the standard meanings (not any academic re-definitions).

Cheers
Derek

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor