novaBBS - comp.lang.python - Threading question .. am I doing this right?

I have a multi-threaded application (a web service) where several threads need
data from an external database. That data is quite a lot, but it is almost
always the same. Between incoming requests, timestamped records get added to
the DB.

So I decided to keep an in-memory cache of the DB records that gets only
"topped up" with the most recent records on each request:

from threading import Lock, Thread

class MyCache():
def __init__(self):
self.cache = None
self.cache_lock = Lock()

def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)

def get_data(self):
with self.cache_lock:
self._update()

return self.cache

my_cache = MyCache() # module level

This works, but even those "small" queries can sometimes hang for a long time,
causing incoming requests to pile up at the "with self.cache_lock" block.

Since it is better to quickly serve the client with slightly outdated data than
not at all, I came up with the "impatient" solution below. The idea is that an
incoming request triggers an update query in another thread, waits for a short
timeout for that thread to finish and then returns either updated or old data.

class MyCache():
def __init__(self):
self.cache = None
self.thread_lock = Lock()
self.update_thread = None

def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)

def get_data(self):
if self.cache is None:
timeout = 10 # allow more time to get initial batch of data
else:
timeout = 0.5
with self.thread_lock:
if self.update_thread is None or not self.update_thread.is_alive():
self.update_thread = Thread(target=self._update)
self.update_thread.start()
self.update_thread.join(timeout)

return self.cache

my_cache = MyCache()

My question is: Is this a solid approach? Am I forgetting something? For
instance, I believe that I don't need another lock to guard self.cache.append()
because _update() can ever only run in one thread at a time. But maybe I'm
overlooking something.

Re: Threading question .. am I doing this right?

<mailman.15.1645733047.2329.python-list@python.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17166&group=comp.lang.python#17166

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Threading question .. am I doing this right?
Date: Fri, 25 Feb 2022 07:03:54 +1100
Lines: 76
Message-ID: <mailman.15.1645733047.2329.python-list@python.org>
References: <j7paqiF2b3vU1@mid.individual.net>
<CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de FxyGQWgPF2VKhLlTaQKrVwwXzja6VQ2nJEK9Lxqi/8yA==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=SWbmFUL8;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.006
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'def': 0.04; '2022': 0.05;
'fairly': 0.05; 'thread': 0.05; 'approach?': 0.09; 'else:': 0.09;
'idle': 0.09; 'threads': 0.09; 'timeout': 0.09; '"request': 0.16;
'(eg': 0.16; '06:54,': 0.16; 'approach,': 0.16; 'batch': 0.16;
'benchmarking': 0.16; 'benefit,': 0.16; 'caching': 0.16; 'chrisa':
0.16; 'constantly': 0.16; 'database,': 0.16; 'database.': 0.16;
'easy,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
angelico': 0.16; 'lot,': 0.16; 'naive': 0.16; 'none:': 0.16;
'outdated': 0.16; 'received:209.85.221.47': 0.16; 'received:mail-
wr1-f47.google.com': 0.16; 'request:': 0.16; 'something.': 0.16;
'something?': 0.16; 'timezone': 0.16; 'wrote:': 0.16;
'subject:question': 0.17; 'feb': 0.17; 'instead': 0.17; 'to:addr
:python-list': 0.20; 'all,': 0.20; 'first,': 0.22; 'fri,': 0.22;
'maybe': 0.22; 'returns': 0.22; 'skip:_ 10': 0.22; "what's": 0.22;
'run': 0.23; 'idea': 0.24; 'robert': 0.26; 'old': 0.27;
'requests': 0.28; 'computer': 0.29; 'approach': 0.31; 'question':
0.32; 'python-list': 0.32; 'requests,': 0.32; 'simple,': 0.32;
'message-id:@mail.gmail.com': 0.32; 'but': 0.32; "i'm": 0.33;
'there': 0.33; '100': 0.33; 'particular': 0.33; 'same': 0.34;
'mean': 0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com':
0.34; 'running': 0.34; 'from:addr:gmail.com': 0.35; 'request':
0.35; 'really': 0.37; "it's": 0.37; 'received:209.85': 0.37;
'class': 0.37; 'could': 0.38; 'received:209': 0.39; 'two': 0.39;
'quite': 0.39; 'added': 0.39; 'handle': 0.39; 'data.': 0.40;
'wants': 0.40; 'initial': 0.61; 'apps': 0.62; 'data,': 0.63;
'ever': 0.63; 'between': 0.63; 'updates': 0.64; 'service': 0.64;
'lock': 0.64; 'your': 0.64; 'came': 0.65; 'skip:t 20': 0.66;
'time.': 0.66; 'decided': 0.67; 'subject:this': 0.67; 'back':
0.67; 'per': 0.68; 'day.': 0.68; 'order': 0.69; 'complexity':
0.69; 'times': 0.69; 'average': 0.70; 'depending': 0.70;
'operate': 0.75; 'records': 0.75; 'database': 0.80; 'quickly':
0.80; 'confirmed': 0.81; 'client': 0.82; 'importantly,': 0.84;
'negligible': 0.84; 'periods': 0.84; 'postgresql': 0.84; 'skip:q
20': 0.95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=FNXfVG2LiiQMp2jqU5dtfS00keqZrBhhmn0+pP9fvPM=;
b=SWbmFUL8ncD8qAimdMsESzBgyMuOJCfBCtFmLhM3Mb2qsiqbzHMt8f6yVyL722d3hI
9vuA3lqwDcsimT1DJJ+0u7Lvw5fBBnnnsqu+Y1g5htDXedfUwqzhQxk2AuYWRji7j5NU
dG0CanVDozAwPGoP/E0vc5hn+Bc1+a3p4Cr8KDWRIPU3z66GvDXxtuhQt+uCsyaiJbol
jWrgEyTvgITw+2k4gAQ1UGC22yWBaGmkloCU040XV9ECDxY3Osbz/Oc01GtLNUIi4dTo
1lnbEy6dx+N9crUt80xGM+3etVpiWLK+oLIvd7IKFCaykFj0mWQ2pDrUpgztF8HjvctS
63/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=FNXfVG2LiiQMp2jqU5dtfS00keqZrBhhmn0+pP9fvPM=;
b=dWKJwxSznrl10LKn00nsmtoyE4UN4H4hw1MDUaS2khrYl9VxzSGmFkx4jSvrUEu6Sb
Yq8LCZnu6C8a82HonJ1827NBsTwcXtaXN+lTyJ/f0+4QciGYnjqUVPxzfSCXl42Cm3HW
lnrHQTHurNhGJDZtUGX0y6kpPitYvmU/BXaJjQr2sowj6AwLd36yqePVi12R3zgFuvKi
paNbvRKTpKJlgxJu51johhRyvAdR5mnQ2oSwBbAiw91KZU1zSiwP/Sm6Z7KIWVdOeoDq
+7VW+ZBXG56LJSVyh2jIOLdPBQpDClq966gDiVIO9LmNvHIhP+cWUXkVz+AO6zYX5+f+
DDhw==
X-Gm-Message-State: AOAM532nyTemm6kKZHzGSpfuBdeOdVqGe3qzUy9PeTMqZskDIRnc3o4G
wqU12AACqBgH6239xrdH/ao3b4VLfXR8GcI+XE4lKknR
X-Google-Smtp-Source: ABdhPJy1EHhuXPGvSIGbUGYXAlOwLw01w4EqtM3ePD8wKKDf143jJ8lJxyOzuRWk0mY2O+1ce0Xw7PGC0OSG3Xikxn4=
X-Received: by 2002:a5d:6047:0:b0:1ea:9a8a:9542 with SMTP id
j7-20020a5d6047000000b001ea9a8a9542mr3379043wrt.243.1645733045736; Thu, 24
Feb 2022 12:04:05 -0800 (PST)
In-Reply-To: <j7paqiF2b3vU1@mid.individual.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
X-Mailman-Original-References: <j7paqiF2b3vU1@mid.individual.net>

by: Chris Angelico - Thu, 24 Feb 2022 20:03 UTC

On Fri, 25 Feb 2022 at 06:54, Robert Latest via Python-list
<python-list@python.org> wrote:
>
> I have a multi-threaded application (a web service) where several threads need
> data from an external database. That data is quite a lot, but it is almost
> always the same. Between incoming requests, timestamped records get added to
> the DB.
>
> So I decided to keep an in-memory cache of the DB records that gets only
> "topped up" with the most recent records on each request:

Depending on your database, this might be counter-productive. A
PostgreSQL database running on localhost, for instance, has its own
caching, and data transfers between two apps running on the same
computer can be pretty fast. The complexity you add in order to do
your own caching might be giving you negligible benefit, or even a
penalty. I would strongly recommend benchmarking the naive "keep going
back to the database" approach first, as a baseline, and only testing
these alternatives when you've confirmed that the database really is a
bottleneck.

> Since it is better to quickly serve the client with slightly outdated data than
> not at all, I came up with the "impatient" solution below. The idea is that an
> incoming request triggers an update query in another thread, waits for a short
> timeout for that thread to finish and then returns either updated or old data.
>
> class MyCache():
> def __init__(self):
> self.cache = None
> self.thread_lock = Lock()
> self.update_thread = None
>
> def _update(self):
> new_records = query_external_database()
> if self.cache is None:
> self.cache = new_records
> else:
> self.cache.extend(new_records)
>
> def get_data(self):
> if self.cache is None:
> timeout = 10 # allow more time to get initial batch of data
> else:
> timeout = 0.5
> with self.thread_lock:
> if self.update_thread is None or not self.update_thread.is_alive():
> self.update_thread = Thread(target=self._update)
> self.update_thread.start()
> self.update_thread.join(timeout)
>
> return self.cache
>
> my_cache = MyCache()
>
> My question is: Is this a solid approach? Am I forgetting something? For
> instance, I believe that I don't need another lock to guard self.cache.append()
> because _update() can ever only run in one thread at a time. But maybe I'm
> overlooking something.

Hmm, it's complicated. There is another approach, and that's to
completely invert your thinking: instead of "request wants data, so
let's get data", have a thread that periodically updates your cache
from the database, and then all requests return from the cache,
without pinging the requester. Downside: It'll be requesting fairly
frequently. Upside: Very simple, very easy, no difficulties debugging.

How many requests per second does your service process? (By
"requests", I mean things that require this particular database
lookup.) What's average throughput, what's peak throughput? And
importantly, what sorts of idle times do you have? For instance, if
you might have to handle 100 requests/second, but there could be
hours-long periods with no requests at all (eg if your clients are all
in the same timezone and don't operate at night), that's a very
different workload from 10 r/s constantly throughout the day.

ChrisA

Re: Threading question .. am I doing this right?

<j7r61eFdf8oU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17168&group=comp.lang.python#17168

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!speedkom.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: greg.ew...@canterbury.ac.nz (Greg Ewing)
Newsgroups: comp.lang.python
Subject: Re: Threading question .. am I doing this right?
Date: Fri, 25 Feb 2022 17:59:23 +1300
Lines: 19
Message-ID: <j7r61eFdf8oU1@mid.individual.net>
References: <j7paqiF2b3vU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net L6s5aQMtljUnYDK2PhR64gwz1BMmU72DlwmfpN3vAhBZesgWpx
Cancel-Lock: sha1:5DdYL9D21DdtfteMOlNDg9kGjbk=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:91.0)
Gecko/20100101 Thunderbird/91.3.2
Content-Language: en-US
In-Reply-To: <j7paqiF2b3vU1@mid.individual.net>

by: Greg Ewing - Fri, 25 Feb 2022 04:59 UTC

Re: Threading question .. am I doing this right?

<j7sh95Flcr9U1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17176&group=comp.lang.python#17176

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: boblat...@yahoo.com (Robert Latest)
Newsgroups: comp.lang.python
Subject: Re: Threading question .. am I doing this right?
Date: 25 Feb 2022 17:17:25 GMT
Lines: 38
Message-ID: <j7sh95Flcr9U1@mid.individual.net>
References: <j7paqiF2b3vU1@mid.individual.net>
<CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
<mailman.15.1645733047.2329.python-list@python.org>
X-Trace: individual.net WHFrHdEBXP1KKUu8IIzSwwVqJI31W8wPUPtXRmOFTbRUMQprjX
Cancel-Lock: sha1:C8ZD1rBfImgABxd58RFHxv6vI8I=
User-Agent: slrn/1.0.3 (Linux)

by: Robert Latest - Fri, 25 Feb 2022 17:17 UTC

Chris Angelico wrote:
> Depending on your database, this might be counter-productive. A
> PostgreSQL database running on localhost, for instance, has its own
> caching, and data transfers between two apps running on the same
> computer can be pretty fast. The complexity you add in order to do
> your own caching might be giving you negligible benefit, or even a
> penalty. I would strongly recommend benchmarking the naive "keep going
> back to the database" approach first, as a baseline, and only testing
> these alternatives when you've confirmed that the database really is a
> bottleneck.

"Depending on your database" is the key phrase. This is not "my" database that
is running on localhost. It is an external MSSQL server that I have no control
over and whose requests frequently time out.

> Hmm, it's complicated. There is another approach, and that's to
> completely invert your thinking: instead of "request wants data, so
> let's get data", have a thread that periodically updates your cache
> from the database, and then all requests return from the cache,
> without pinging the requester. Downside: It'll be requesting fairly
> frequently. Upside: Very simple, very easy, no difficulties debugging.

I'm using a similar approach in other places, but there I actually have a
separate process that feeds my local, fast DB with unwieldy data. But that is
not merely replicating, it actually preprocesses and "adds value" to the data,
and the data is worth retaining on my server. I didn't want to take that
approach in this instance because it is a bit too much overhead for essentially
"throwaway" stuff. I like the idea of starting a separated "timed" thread in
the same application. Need to think about that.

Background: The clients are SBCs that display data on screens distributed
throughout a manufacturing facility. They periodically refresh every few
minutes. Occasionally the requests would pile up waiting for the databsase, so
that some screens displayed error messages for a minute or two. Nobody cares
but my pride was piqued and the error logs filled up.

I've had my proposed solution running for a few days now without errors. For me
that's enough but I wanted to ask you guys if I made some logical mistakes.

Re: Threading question .. am I doing this right?

<j7shd4Flcr9U2@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17177&group=comp.lang.python#17177

copy link Newsgroups: comp.lang.python

by: Robert Latest - Fri, 25 Feb 2022 17:19 UTC

Re: Threading question .. am I doing this right?

<mailman.24.1645817480.2329.python-list@python.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17181&group=comp.lang.python#17181

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Threading question .. am I doing this right?
Date: Sat, 26 Feb 2022 06:31:08 +1100
Lines: 27
Message-ID: <mailman.24.1645817480.2329.python-list@python.org>
References: <j7paqiF2b3vU1@mid.individual.net>
<CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
<mailman.15.1645733047.2329.python-list@python.org>
<j7sh95Flcr9U1@mid.individual.net>
<CAPTjJmra6m=JjACbXoJjSJU+OCVFXsS9Pby=uY4eY5L9q8a+vQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de WCy3QuswAUFRTThSCOHTSg5lb01YGfPeqJxdn9odxxTg==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=dgAwFgKg;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.054
X-Spam-Evidence: '*H*': 0.89; '*S*': 0.00; '2022': 0.05; 'debug':
0.07; 'angelico': 0.09; 'benchmarking': 0.16; 'benefit,': 0.16;
'caching': 0.16; 'chrisa': 0.16; 'database,': 0.16; 'frequently':
0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16;
'mssql': 0.16; 'naive': 0.16; 'okay,': 0.16;
'received:209.85.221.54': 0.16; 'received:mail-
wr1-f54.google.com': 0.16; 'wrote:': 0.16; 'subject:question':
0.17; 'feb': 0.17; 'to:addr:python-list': 0.20; 'first,': 0.22;
'sat,': 0.22; 'robert': 0.26; 'chris': 0.28; 'requests': 0.28;
'computer': 0.29; 'approach': 0.31; 'python-list': 0.32; 'message-
id:@mail.gmail.com': 0.32; 'but': 0.32; "i'm": 0.33; 'server':
0.33; 'same': 0.34; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'running': 0.34;
'from:addr:gmail.com': 0.35; 'really': 0.37; 'received:209.85':
0.37; 'received:209': 0.39; 'two': 0.39; 'still': 0.40; 'apps':
0.62; 'between': 0.63; 'key': 0.64; 'your': 0.64; 'subject:this':
0.67; 'back': 0.67; 'per': 0.68; 'know.': 0.68; 'order': 0.69;
'change.': 0.69; 'complexity': 0.69; 'model.': 0.69; 'depending':
0.70; 'database': 0.80; 'out.': 0.80; 'confirmed': 0.81;
'crucial': 0.84; 'negligible': 0.84; 'postgresql': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=uAWWCjh2T3L9YCtFgpNnXwSi5Wx+Em6W7nLbGzDmS0Y=;
b=dgAwFgKgp4zXnIxXOf24KG4t7VoP8GGBlntgFVMFxMS0VFoVBHq3fdYlqj7a+CsPcl
kHSsiaFqVU+0x6FIttEJdRNKWsJOxH1ruYYoAld476TbDa/cp0J6VbmBo6vnuNqMKcaq
WuH4v9SLf/TqvLnJSYBOBwF3A5aKI/wAJRWzdLX1YIUQogrElVwf0gGv5PhrwCkdUNht
OjKhdkF3e96rY8LYrtpbFbMtJZzqjsQGIDXs3JUZtYVmS3IHk6Xa0EoTk2aAr2jVK5Kd
48gjoUka+DZBg/S1bOC/DiwBtoDvzZY7rTC5nZsenWZjXntZPbP/LGEeLFe31mgK/rU5
KFdA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=uAWWCjh2T3L9YCtFgpNnXwSi5Wx+Em6W7nLbGzDmS0Y=;
b=peQcA51curq8JI5K2M0bjL9S2asDQxa+gd/WLdlUr/3PNZSM4O5XBDEqudbls7bJf1
CIs/aHbm5RliqgjxcxX8RKl603EV1ht+FkP25d835Rm7PUuLkuLydJKf0A91/6y0y1ap
ywJJg1+RcB3uINzrJeeiU91c8gPrujjgqyQTHPVqGUQycm1RRz7yjw+niK5SougUUNUR
Z4QhppwH4uyeTuwjYE49xiqXe/S9uZzhmFVt7dJ++xDHZscIOp6aGAKuRXD2JpITgzyj
AydXAMZPvFDjJtobqb8F55E3utCIKBU+mfyfgEMP6sWbc605vajxhOXUXKNJKve/NNcX
5RzA==
X-Gm-Message-State: AOAM533Wr/NrP63ThTCOzPfg1/IRDzvVa2wLTphEcNN+7jtO82819b1Z
OUcIhcaR/P//4ez5ogm0XQyuW5WXTPT1ancZr2MJYe1X
X-Google-Smtp-Source: ABdhPJyMeO5gcPIGOZov8ciFviJU3ZhXcJr97+5JKEhmGLApsRp1d+51kgP8DumLh7FR3/5KNNgSEUMfp7XSt8J/fgA=
X-Received: by 2002:adf:f583:0:b0:1ed:b63a:819a with SMTP id
f3-20020adff583000000b001edb63a819amr6929854wro.104.1645817479009; Fri, 25
Feb 2022 11:31:19 -0800 (PST)
In-Reply-To: <j7sh95Flcr9U1@mid.individual.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmra6m=JjACbXoJjSJU+OCVFXsS9Pby=uY4eY5L9q8a+vQ@mail.gmail.com>
X-Mailman-Original-References: <j7paqiF2b3vU1@mid.individual.net>
<CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
<mailman.15.1645733047.2329.python-list@python.org>
<j7sh95Flcr9U1@mid.individual.net>

by: Chris Angelico - Fri, 25 Feb 2022 19:31 UTC

On Sat, 26 Feb 2022 at 05:16, Robert Latest via Python-list
<python-list@python.org> wrote:
>
> Chris Angelico wrote:
> > Depending on your database, this might be counter-productive. A
> > PostgreSQL database running on localhost, for instance, has its own
> > caching, and data transfers between two apps running on the same
> > computer can be pretty fast. The complexity you add in order to do
> > your own caching might be giving you negligible benefit, or even a
> > penalty. I would strongly recommend benchmarking the naive "keep going
> > back to the database" approach first, as a baseline, and only testing
> > these alternatives when you've confirmed that the database really is a
> > bottleneck.
>
> "Depending on your database" is the key phrase. This is not "my" database that
> is running on localhost. It is an external MSSQL server that I have no control
> over and whose requests frequently time out.
>

Okay, cool. That's crucial to know.

I'm still curious as to the workload (requests per second), as it
might still be worth going for the feeder model. But if your current
system works, then it may be simplest to debug that rather than
change.

ChrisA

Re: Threading question .. am I doing this right?

<bfebb77a-0152-42df-a4d7-799d35c5416dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17187&group=comp.lang.python#17187

copy link Newsgroups: comp.lang.python

X-Received: by 2002:a05:622a:1044:b0:2de:2db0:3c01 with SMTP id f4-20020a05622a104400b002de2db03c01mr8439028qte.365.1645819414393;
Fri, 25 Feb 2022 12:03:34 -0800 (PST)
X-Received: by 2002:a37:c441:0:b0:62c:defd:22ea with SMTP id
h1-20020a37c441000000b0062cdefd22eamr5922061qkm.479.1645819414208; Fri, 25
Feb 2022 12:03:34 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Fri, 25 Feb 2022 12:03:33 -0800 (PST)
In-Reply-To: <j7paqiF2b3vU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=93.41.98.152; posting-account=F3H0JAgAAADcYVukktnHx7hFG5stjWse
NNTP-Posting-Host: 93.41.98.152
References: <j7paqiF2b3vU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bfebb77a-0152-42df-a4d7-799d35c5416dn@googlegroups.com>
Subject: Re: Threading question .. am I doing this right?
From: jul...@diegidio.name (Julio Di Egidio)
Injection-Date: Fri, 25 Feb 2022 20:03:34 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 31

by: Julio Di Egidio - Fri, 25 Feb 2022 20:03 UTC

On Thursday, 24 February 2022 at 13:09:50 UTC+1, Robert Latest wrote:
> I have a multi-threaded application (a web service) where several threads need
> data from an external database. That data is quite a lot, but it is almost
> always the same. Between incoming requests, timestamped records get added to
> the DB.
>
> So I decided to keep an in-memory cache of the DB records that gets only
> "topped up" with the most recent records on each request:

If records just get inserted and nothing else, as in some log collection, couldn't
the viewer app just query the database at intervals for all records past the date
of the latest previous (successful) update?

If that's what you meant, and the problem is that queries still timeout, I'd rather
not query for all new records at every iteration, just a small number, possibly
the oldest ones (similar to a queue), assuming excess work is typically
reabsorbed in reasonable amounts of time...

And then (assuming I am not just going off a tangent) I don't think any better
can really be done, except for ensuring that inserts on a side and reads on the
other happen in transactions at the respective, minimal, serialization levels...

That said, one should not lock at code level, rather use transactions, unless
really transactions are not available: but then the best you can do is anyway
keep the "scope" of the locked regions as minimal as possible...

(I won't comment on the Python code, others here are way more expert than I
am on that side.)

HTH,

Julio

Re: Threading question .. am I doing this right?

<j846kvF5bk6U1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17239&group=comp.lang.python#17239

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: boblat...@yahoo.com (Robert Latest)
Newsgroups: comp.lang.python
Subject: Re: Threading question .. am I doing this right?
Date: 28 Feb 2022 15:05:03 GMT
Lines: 19
Message-ID: <j846kvF5bk6U1@mid.individual.net>
References: <j7paqiF2b3vU1@mid.individual.net>
<CAPTjJmo3sn7_hoWMb+UghrNd0dJ-AimKb315hOtFqAhqWOLk3w@mail.gmail.com>
<mailman.15.1645733047.2329.python-list@python.org>
<j7sh95Flcr9U1@mid.individual.net>
<CAPTjJmra6m=JjACbXoJjSJU+OCVFXsS9Pby=uY4eY5L9q8a+vQ@mail.gmail.com>
<mailman.24.1645817480.2329.python-list@python.org>
X-Trace: individual.net Pu8/0dCE688X8GHaXCO0JgZ1LdnUwimltCgDhLfdt1Ea1IPKGi
Cancel-Lock: sha1:JKunEZICpcubIX1Ix+K61sfIQvQ=
User-Agent: slrn/1.0.3 (Linux)

by: Robert Latest - Mon, 28 Feb 2022 15:05 UTC

Chris Angelico wrote:
> I'm still curious as to the workload (requests per second), as it might still
> be worth going for the feeder model. But if your current system works, then
> it may be simplest to debug that rather than change.

It is by all accounts a low-traffic situation, maybe one request/second. But
the view in question opens four plots on one page, generating four separate
requests. So with only two clients and a blocking DB connection, the whole
application with eight uwsgi worker threads comes down. Now with the "extra
load thread" modification, the app worked fine for several days with only two
threads.

Out of curiosity I tried the "feeder thread" approach with a dummy thread that
just sleeps and logs something every few seconds, ten times total. For some
reason it sometimes hangs after eight or nine loops, and then uwsgi cannot
restart gracefully probably because it is still waiting for that thread to
finish. Also my web app is built around setting up the DB connections in the
request context, so using an extra thread outside that context would require
doubling some DB infrastructure. Probably not worth it at this point.

Today is the first day of the rest of your lossage.

devel / comp.lang.python / Threading question .. am I doing this right?

Subject	Author
Threading question .. am I doing this right?	Robert Latest
Re: Threading question .. am I doing this right?	Chris Angelico
Re: Threading question .. am I doing this right?	Robert Latest
Re: Threading question .. am I doing this right?	Chris Angelico
Re: Threading question .. am I doing this right?	Robert Latest
Re: Threading question .. am I doing this right?	Greg Ewing
Re: Threading question .. am I doing this right?	Robert Latest
Re: Threading question .. am I doing this right?	Julio Di Egidio