Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Help me, I'm a prisoner in a Fortune cookie file!


devel / comp.lang.python / Re: How to replace characters in a string?

SubjectAuthor
o Re: How to replace characters in a string?Dave

1
Re: How to replace characters in a string?

<mailman.583.1654707865.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18531&group=comp.lang.python#18531

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: dav...@looktowindward.com (Dave)
Newsgroups: comp.lang.python
Subject: Re: How to replace characters in a string?
Date: Wed, 8 Jun 2022 19:01:28 +0200
Lines: 157
Message-ID: <mailman.583.1654707865.20749.python-list@python.org>
References: <B1F63761-AA3A-4B9D-B135-ED80C56E015A@looktowindward.com>
<CAPTjJmqynw1UqQk2wXMOhdRocyRursBhUnAouZLc0hMVDNGA+w@mail.gmail.com>
<0CBD9609-D7E7-4DF3-82E1-524367FB844C@looktowindward.com>
<1707808815.586773.1654705396591@mail.yahoo.com>
<4E7A5722-AB38-49AD-83AE-A838DFD673CA@looktowindward.com>
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\))
Content-Type: text/plain;
charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de n8ZYVvwpQoz6MKdu/UsLmAgaNYucgO70Fh+fv2+IArKA==
Return-Path: <dave@looktowindward.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.005
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python?': 0.03; '2022':
0.05; 'fairly': 0.05; 'matching': 0.07; 'string': 0.07;
'translate': 0.07; 'wanting': 0.07; 'angelico': 0.09; 'can,':
0.09; 'expression': 0.09; 'parse': 0.09; 'smaller': 0.09;
'string,': 0.09; 'text.': 0.09; 'typeerror:': 0.09; 'url:mailman':
0.15; 'problem.': 0.15; '2022,': 0.16; '>>>>>': 0.16; 'alphabet':
0.16; 'another.': 0.16; 'avi': 0.16; 'chrisa': 0.16; 'comparing':
0.16; 'expressions': 0.16; 'gross': 0.16; 'humans': 0.16;
'instead?': 0.16; 'integer': 0.16; 'marks': 0.16; 'modules,':
0.16; 'problems:': 0.16; 'redundant': 0.16; 'solved': 0.16;
'step,': 0.16; 'stick': 0.16; 'subject:characters': 0.16;
'subject:string': 0.16; 'unicode': 0.16; 'wrote:': 0.16;
'problem': 0.16; 'python': 0.16; 'probably': 0.17; 'solve': 0.19;
'libraries': 0.19; 'name.': 0.19; 'to:addr:python-list': 0.20;
'language': 0.21; 'written': 0.22; 'languages': 0.22; 'basically':
0.22; 'maybe': 0.22; 'returns': 0.22; 'version': 0.23; 'goal':
0.23; 'subject:How': 0.23; 'idea': 0.24; 'to:name:python-
list@python.org': 0.24; 'anything': 0.25; 'skip:- 10': 0.25; 'url-
ip:188.166.95.178/32': 0.25; 'url-ip:188.166.95/24': 0.25;
'python,': 0.25; 'actual': 0.25; 'examples': 0.25; 'stuff': 0.25;
'url:listinfo': 0.25; 'cannot': 0.25; 'url-ip:188.166/16': 0.25;
'questions,': 0.26; 'space': 0.26; 'tried': 0.26; 'jun': 0.26;
'object': 0.26; 'pattern': 0.26; 'project.': 0.27; 'local': 0.27;
'bit': 0.27; '>>>': 0.28; 'chris': 0.28; 'example,': 0.28;
'ideas': 0.28; 'keeping': 0.28; 'suggest': 0.28; 'computer': 0.29;
'environment': 0.29; 'it,': 0.29; 'error': 0.29; 'seem': 0.31;
'url-ip:188/8': 0.31; "doesn't": 0.32; 'question': 0.32; 'do.':
0.32; 'expand': 0.32; 'point,': 0.32; 'python-list': 0.32;
'white': 0.32; 'zero': 0.32; 'but': 0.32; 'there': 0.33; 'smart':
0.67; 'stand': 0.67; 'matter': 0.68; 'compare': 0.69;
'interpreted': 0.69; 'i\xe2\x80\x99ll': 0.69; 'obvious': 0.69;
'perfectly': 0.69; 'received:phx3.secureserver.net': 0.69;
'received:prod.phx3.secureserver.net': 0.69; 'sentence': 0.69;
'sequence': 0.69; 'times': 0.69; 'within': 0.69; 'depending':
0.70; 'skip:\xe2 10': 0.71; 'deal': 0.73; '(you': 0.76; 'cheers':
0.76; 'received:94': 0.76; 'food': 0.78; 'sent:': 0.78;
'8bit%:35': 0.81; 'perfect': 0.82; 'extra': 0.84;
'received:secureserver.net': 0.84; 'add-on': 0.84; 'catch': 0.84;
'characters': 0.84; 'dark': 0.84; 'itunes': 0.84; 'non-english':
0.84; 'quotes': 0.84; 'received:68.178': 0.84;
'received:68.178.252': 0.84; 'received:68.178.252.182': 0.84;
'received:me': 0.84;
'received:p3plwbeout24-01.prod.phx3.secureserver.net': 0.84;
'shoes': 0.84; 'so:': 0.84; 'strings': 0.84; 'this!': 0.84;
'two?': 0.84; 'folding': 0.91; 'punctuation': 0.91;
'received:217.61': 0.91; 'replacing': 0.91; 'titles': 0.91;
'mixed': 0.93; 'seemingly': 0.93
X-CMAE-Analysis: v=2.4 cv=T7VJ89GQ c=1 sm=1 tr=0 ts=62a0d5eb
a=7e6w4QD8YWtpVJ/7+iiidw==:117 a=y7DvasQWGNOZJvXGlv4pTw==:17
a=xUEgCuPwUUcA:10 a=IkcTkHD0fZMA:10 a=JPEYwPQDsx4A:10 a=8AHkEIZyAAAA:8
a=ZyfxOTdOAAAA:8 a=pGLkceISAAAA:8 a=JssMg0xMvhEATkxP2lEA:9 a=QEXdDO2ut3YA:10
a=OGAothflpLNn0Wj4icuf:22
X-SECURESERVER-ACCT: dave@looktowindward.com
X-SID: yz3in1FriNi9k
In-Reply-To: <1707808815.586773.1654705396591@mail.yahoo.com>
X-Mailer: Apple Mail (2.3445.104.21)
X-Mailcore-Auth: 407226104
X-Mailcore-Domain: 182562
X-123-reg-Authenticated: dave@looktowindward.com
X-Originating-IP: 217.61.226.69
X-CMAE-Envelope: MS4xfOmk/76VI7elN1zkoZ1Uyv6cYxNinzoC2hMT+u235PlOe243VLpEmsKEYH0ECS1SrqhCA0d8LxFaiZ8mEu00tzwR215t+enOxqQIEqzeMIG2nUIHynk+
PNrSAv0uJnvpzVX3ygM+pzMVDSwfjO9E4kZivH00q5PemtJI3C1nPdJdOnR6fooOM25EWeIl+eawBzqAdr2eas3zbwURA8klSB4=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <4E7A5722-AB38-49AD-83AE-A838DFD673CA@looktowindward.com>
X-Mailman-Original-References: <B1F63761-AA3A-4B9D-B135-ED80C56E015A@looktowindward.com>
<CAPTjJmqynw1UqQk2wXMOhdRocyRursBhUnAouZLc0hMVDNGA+w@mail.gmail.com>
<0CBD9609-D7E7-4DF3-82E1-524367FB844C@looktowindward.com>
<1707808815.586773.1654705396591@mail.yahoo.com>
 by: Dave - Wed, 8 Jun 2022 17:01 UTC

Hi,

This is a tool I’m using on my own files to save me time. Basically or most of the tracks were imported with different version iTunes over the years. There are two problems:

1. File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file name).
2. Smart Quotes were added at some point, these need to replaced.
3. Other character based of name being of a non-english origin.

If find others I’ll add them.

I’m using MusicBrainz to do a fuzzy match and get the correct name.

it’s not perfect, but works for 99% of files which is good enough for me!

Cheers
Dave

> On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list@python.org> wrote:
>
> Dave,
>
> Your goal is to compare titles and there can be endless replacements needed if you allow the text to contain anything but ASCII.
>
> Have you considered stripping out things instead? I mean remove lots of stuff that is not ASCII in the first place and perhaps also remove lots of extra punctuation likesingle quotes or question marks or redundant white space and compare the sort of skeletons of the two?
>
> And even if that fails, could you have a measure of how different they are and tolerate if they were say off by one letter albeit "My desert" matching "My Dessert" might not be a valid match with one being a song about an arid environment and the other about food you don't need!
>
> Your seemingly simple need can expand into a fairly complex project. There may be many ideas on how to deal with it but not anything perfect enough to catch all cases as even a trained human may have to make decisions at times and not match what other humans do. We have examples like the TV show "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is often written when I look it up as NUMBERS. You have obvious cases where titles of songs may contain composite symbols like "œ" which will not compare to one where it is written out as "oe" so the idea of comparing is quite complex and the best you might do is heuristic.
>
> UNICODE has many symbols that are almost the same or even look the same or maybe in one font versus another. There are libraries of functions that allow some kinds of comparisons or conversions that you could look into but the gain for you may not be worth it. Nothing stops a person from naming a song any way they want and I speak many languages and often see a song re-titled in the local language and using the local alphabet mixed often with another.
>
> Your original question is perhaps now many questions, depending on what you choose. You started by wanting to know how to compare and it is moving on to how to delete parts or make substitutions or use regular expressions and it can get worse. You can, for example, take a string and identify the words within it and create a regular expression that inserts sequences between the words that match any zero or one or more non-word characters such as spaces, tabs, punctuation or non-ASCII, so that song titles with the same words in a sequence match no matter what is between them. The possibilities are endless but consider some of the techniques that are used by some programs that parse text and suggest alternate spellings or even programs like Google Translate that can take a sentence and then suggest you may mean a slightly altered sentence with one word changed to fit better.
>
> You need to decide what you want to deal with and what will be mis-classified by your program. Some of us have suggested folding the case of the words but that means asong about a dark skinned person in Poland called "Black Polish" would match a song about keeping your shoes dark with "black polish" so I keep repeating it is very hard or frankly impossible, to catch every case I can imagine and the many I can't!
>
> But the emphasis here is not your overall problem. It is about whether and how the computer language called python, and perhaps some add-on modules, can be used to solve each smaller need such as recognizing a pattern or replacing text. It can do quite a bit but only when the specification of the problem is exact.
>
>
>
>
> -----Original Message-----
> From: Dave <dave@looktowindward.com>
> To: python-list@python.org
> Sent: Wed, Jun 8, 2022 5:09 am
> Subject: Re: How to replace characters in a string?
>
> Hi,
>
> Thanks for this!
>
> So, is there a copy function/method that returns a MutableString like in objective-C? I’ve solved this problems before in a number of languages like Objective-C and AppleScript.
>
> Basically there is a set of common characters that need “normalizing” and I have a method that replaces them in a string, so:
>
> myString = [myString normalizeCharacters];
>
> Would return a new string with all the “common” replacements applied.
>
> Since the following gives an error :
>
> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)
>
> TypeError: 'str' object cannot be interpreted as an integer
>
> I can’t see of a way to do this in Python?
>
> All the Best
> Dave
>
>
>> On 8 Jun 2022, at 10:14, Chris Angelico <rosuav@gmail.com> wrote:
>>
>> On Wed, 8 Jun 2022 at 18:12, Dave <dave@looktowindward.com> wrote:
>>
>>> I tried the but it doesn’t seem to work?
>>> myCompareFile1 = ascii(myTitleName)
>>> myCompareFile1.replace("\u2019", "'")
>>
>> Strings in Python are immutable. When you call ascii(), you get back a
>> new string, but it's one that has actual backslashes and such in it.
>> (You probably don't need this step, other than for debugging; check
>> the string by printing out the ASCII version of it, but stick to the
>> original for actual processing.) The same is true of the replace()
>> method; it doesn't change the string, it returns a new string.
>>
>>>>> word = "spam"
>>>>> print(word.replace("sp", "h"))
>> ham
>>>>> print(word)
>> spam
>>
>> ChrisA
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
> --
> https://mail.python.org/mailman/listinfo/python-list

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor