Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"If value corrupts then absolute value corrupts absolutely."


devel / comp.lang.c / Re: How to optimize this search?

SubjectAuthor
* How to optimize this search?Thiago Adams
+* Re: How to optimize this search?Thiago Adams
|`* Re: How to optimize this search?Scott Lurndal
| +* Re: How to optimize this search?Thiago Adams
| |+- Re: How to optimize this search?Thiago Adams
| |+- Re: How to optimize this search?Scott Lurndal
| |`* Re: How to optimize this search?Anton Shepelev
| | `* Re: How to optimize this search?Thiago Adams
| |  `* Re: How to optimize this search?Anton Shepelev
| |   `* Re: How to optimize this search?Ben Bacarisse
| |    `* Re: How to optimize this search?Anton Shepelev
| |     +* Re: How to optimize this search?Thiago Adams
| |     |`- Re: How to optimize this search?Anton Shepelev
| |     `* Re: How to optimize this search?Ben Bacarisse
| |      `* Re: How to optimize this search?Anton Shepelev
| |       `* Re: How to optimize this search?Anton Shepelev
| |        `* Re: How to optimize this search?Ben Bacarisse
| |         `* Re: How to optimize this search?Anton Shepelev
| |          `* Re: How to optimize this search?Ben Bacarisse
| |           `* Re: How to optimize this search?Anton Shepelev
| |            `- Re: How to optimize this search?bart c
| +* Re: How to optimize this search?Thiago Adams
| |`* Re: How to optimize this search?Scott Lurndal
| | `* Re: How to optimize this search?bart c
| |  +* Re: How to optimize this search?Keith Thompson
| |  |`* Re: How to optimize this search?bart c
| |  | `* Re: How to optimize this search?Ben Bacarisse
| |  |  `- Re: How to optimize this search?bart c
| |  +* Re: How to optimize this search?antispam
| |  |`* Re: How to optimize this search?bart c
| |  | `- Re: How to optimize this search?antispam
| |  `* Re: How to optimize this search?David Brown
| |   `* Re: How to optimize this search?bart c
| |    `- Re: How to optimize this search?David Brown
| `- Re: How to optimize this search?antispam
+* Re: How to optimize this search?Ben Bacarisse
|`* Re: How to optimize this search?Thiago Adams
| `- Re: How to optimize this search?Ben Bacarisse
+* Re: How to optimize this search?Siri Cruise
|+* Re: How to optimize this search?Ben Bacarisse
||`* Re: How to optimize this search?Siri Cruise
|| `- Re: How to optimize this search?Ben Bacarisse
|+- Re: How to optimize this search?Ben Bacarisse
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?antispam
|  `* Re: How to optimize this search?Thiago Adams
|   +- Re: How to optimize this search?Öö Tiib
|   `* Re: How to optimize this search?antispam
|    `* Re: How to optimize this search?Thiago Adams
|     +- Re: How to optimize this search?Stefan Ram
|     `* Re: How to optimize this search?Thiago Adams
|      +* Re: How to optimize this search?bart c
|      |`* Re: How to optimize this search?Thiago Adams
|      | +- Re: How to optimize this search?Keith Thompson
|      | `* Re: How to optimize this search?bart c
|      |  `- Re: How to optimize this search?Thiago Adams
|      `* Re: How to optimize this search?David Brown
|       `- Re: How to optimize this search?William Ahern
+* Re: How to optimize this search?Bonita Montero
|+- Re: How to optimize this search?Ben Bacarisse
|+- Re: How to optimize this search?Bonita Montero
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?Bonita Montero
|  `* Re: How to optimize this search?Thiago Adams
|   `* Re: How to optimize this search?Bonita Montero
|    `* Re: How to optimize this search?Thiago Adams
|     +- Re: How to optimize this search?Bonita Montero
|     `- Re: How to optimize this search?Anton Shepelev
+* Re: How to optimize this search?bart c
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?antispam
|  `- Re: How to optimize this search?bart c
+* Re: How to optimize this search?Tim Rentsch
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?bart c
|  `* Re: How to optimize this search?Keith Thompson
|   +- Re: How to optimize this search?Thiago Adams
|   `* Re: How to optimize this search?bart c
|    +* Re: How to optimize this search?Thiago Adams
|    |`- Re: How to optimize this search?Stefan Ram
|    `* Re: How to optimize this search?Keith Thompson
|     +* Re: How to optimize this search?bart c
|     |`* Re: How to optimize this search?Keith Thompson
|     | `- Re: How to optimize this search?bart c
|     `* Re: How to optimize this search?antispam
|      +- Re: How to optimize this search?Keith Thompson
|      `* Re: How to optimize this search?David Brown
|       +- Re: How to optimize this search?bart c
|       `- Re: How to optimize this search?antispam
`* Re: How to optimize this search?Ben Bacarisse
 +* Re: How to optimize this search?Thiago Adams
 |+- Re: How to optimize this search?Thiago Adams
 |+* Re: How to optimize this search?Ben Bacarisse
 ||+* Re: How to optimize this search?Thiago Adams
 |||`- Re: How to optimize this search?Ben Bacarisse
 ||`* Re: How to optimize this search?Thiago Adams
 || `- Re: How to optimize this search?Ben Bacarisse
 |+- Re: How to optimize this search?bart c
 |`* Re: How to optimize this search?Anton Shepelev
 | `* Re: How to optimize this search?Thiago Adams
 |  `* Re: How to optimize this search?Thiago Adams
 +* Re: How to optimize this search?bart c
 `* Re: How to optimize this search?Anton Shepelev

Pages:12345
Re: How to optimize this search?

<e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22685&group=comp.lang.c#22685

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:1a0e:b0:6bc:3aa1:90a6 with SMTP id bk14-20020a05620a1a0e00b006bc3aa190a6mr1291854qkb.756.1661385289793;
Wed, 24 Aug 2022 16:54:49 -0700 (PDT)
X-Received: by 2002:a05:6830:44a3:b0:637:c53:5f55 with SMTP id
r35-20020a05683044a300b006370c535f55mr441010otv.256.1661385289407; Wed, 24
Aug 2022 16:54:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 24 Aug 2022 16:54:49 -0700 (PDT)
In-Reply-To: <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.252.226; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.252.226
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk> <7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com>
<20220825010423.97f368943e1fd1e556e0321f@gmail.com> <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
Subject: Re: How to optimize this search?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 24 Aug 2022 23:54:49 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 24505
 by: Thiago Adams - Wed, 24 Aug 2022 23:54 UTC

On Wednesday, August 24, 2022 at 8:43:35 PM UTC-3, Thiago Adams wrote:
> On Wednesday, August 24, 2022 at 7:04:36 PM UTC-3, Anton Shepelev wrote:

> This generator outputs the code
>
output is:

This code looks a bad linear search..but -o2 gcc transform this
in a jump give the initial letter. The same code can be written using switch..

int is_keyword(const char* s)
{ if(*s == 'N') {
s++;
/*NULL*/
if( *s++ == 'U' && *s++ == 'L' && *s++ == 'L' && *s =='\0') return 0; else return -1;;
}
if(*s == '_') {
s++;
if(*s == 'A') {
s++;
if(*s == 'l') {
s++;
if(*s == 'i') {
s++;
if(*s == 'g') {
s++;
if(*s == 'n') {
s++;
if(*s == 'a') {
s++;
/*_Alignas*/
if( *s++ == 's' && *s =='\0') return 1; else return -1;;
}
if(*s == 'o') {
s++;
/*_Alignof*/
if( *s++ == 'f' && *s =='\0') return 2; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 't') {
s++;
/*_Atomic*/
if( *s++ == 'o' && *s++ == 'm' && *s++ == 'i' && *s++ == 'c' && *s =='\0') return 3; else return -1;;
}
return -1;
};
if(*s == 'B') {
s++;
if(*s == 'i') {
s++;
/*_BitInt*/
if( *s++ == 't' && *s++ == 'I' && *s++ == 'n' && *s++ == 't' && *s =='\0') return 4; else return -1;;
}
if(*s == 'o') {
s++;
/*_Bool*/
if( *s++ == 'o' && *s++ == 'l' && *s =='\0') return 5; else return -1;;
}
return -1;
};
if(*s == 'C') {
s++;
/*_Complex*/
if( *s++ == 'o' && *s++ == 'm' && *s++ == 'p' && *s++ == 'l' && *s++ == 'e' && *s++ == 'x' && *s =='\0') return 6; else return -1;;
}
if(*s == 'D') {
s++;
if(*s == 'e') {
s++;
if(*s == 'c') {
s++;
if(*s == 'i') {
s++;
if(*s == 'm') {
s++;
if(*s == 'a') {
s++;
if(*s == 'l') {
s++;
if(*s == '1') {
s++;
/*_Decimal128*/
if( *s++ == '2' && *s++ == '8' && *s =='\0') return 7; else return -1;;
}
if(*s == '3') {
s++;
/*_Decimal32*/
if( *s++ == '2' && *s =='\0') return 8; else return -1;;
}
if(*s == '6') {
s++;
/*_Decimal64*/
if( *s++ == '4' && *s =='\0') return 9; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 'G') {
s++;
/*_Generic*/
if( *s++ == 'e' && *s++ == 'n' && *s++ == 'e' && *s++ == 'r' && *s++ == 'i' && *s++ == 'c' && *s =='\0') return 10; else return -1;;
}
if(*s == 'H') {
s++;
/*_Hashof*/
if( *s++ == 'a' && *s++ == 's' && *s++ == 'h' && *s++ == 'o' && *s++ == 'f' && *s =='\0') return 11; else return -1;;
}
if(*s == 'I') {
s++;
/*_Imaginary*/
if( *s++ == 'm' && *s++ == 'a' && *s++ == 'g' && *s++ == 'i' && *s++ == 'n' && *s++ == 'a' && *s++ == 'r' && *s++ == 'y' && *s =='\0') return 12; else return -1;;
}
if(*s == 'N') {
s++;
/*_Noreturn*/
if( *s++ == 'o' && *s++ == 'r' && *s++ == 'e' && *s++ == 't' && *s++ == 'u' && *s++ == 'r' && *s++ == 'n' && *s =='\0') return 13; else return -1;;
}
if(*s == 'S') {
s++;
/*_Static_assert*/
if( *s++ == 't' && *s++ == 'a' && *s++ == 't' && *s++ == 'i' && *s++ == 'c' && *s++ == '_' && *s++ == 'a' && *s++ == 's' && *s++ == 's' && *s++ == 'e' && *s++ == 'r' && *s++ == 't' && *s =='\0') return 14; else return -1;;
}
if(*s == 'T') {
s++;
/*_Thread_local*/
if( *s++ == 'h' && *s++ == 'r' && *s++ == 'e' && *s++ == 'a' && *s++ == 'd' && *s++ == '_' && *s++ == 'l' && *s++ == 'o' && *s++ == 'c' && *s++ == 'a' && *s++ == 'l' && *s =='\0') return 15; else return -1;;
}
if(*s == '_') {
s++;
if(*s == 'a') {
s++;
if(*s == 'l') {
s++;
/*__alignof*/
if( *s++ == 'i' && *s++ == 'g' && *s++ == 'n' && *s++ == 'o' && *s++ == 'f' && *s =='\0') return 16; else return -1;;
}
if(*s == 's') {
s++;
/*__asm*/
if( *s++ == 'm' && *s =='\0') return 17; else return -1;;
}
return -1;
};
if(*s == 'f') {
s++;
/*__forceinline*/
if( *s++ == 'o' && *s++ == 'r' && *s++ == 'c' && *s++ == 'e' && *s++ == 'i' && *s++ == 'n' && *s++ == 'l' && *s++ == 'i' && *s++ == 'n' && *s++ == 'e' && *s =='\0') return 18; else return -1;;
}
if(*s == 'i') {
s++;
if(*s == 'n') {
s++;
if(*s == 'l') {
s++;
/*__inline*/
if( *s++ == 'i' && *s++ == 'n' && *s++ == 'e' && *s =='\0') return 19; else return -1;;
}
if(*s == 't') {
s++;
if(*s == '1') {
s++;
/*__int16*/
if( *s++ == '6' && *s =='\0') return 20; else return -1;;
}
if(*s == '3') {
s++;
/*__int32*/
if( *s++ == '2' && *s =='\0') return 21; else return -1;;
}
if(*s == '6') {
s++;
/*__int64*/
if( *s++ == '4' && *s =='\0') return 22; else return -1;;
}
if(*s == '8') {
s++;
/*__int8*/
if( *s =='\0') return 23; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 'a') {
s++;
/*_asm*/
if( *s++ == 's' && *s++ == 'm' && *s =='\0') return 24; else return -1;;
}
return -1;
};
if(*s == 'a') {
s++;
if(*s == 'l') {
s++;
if(*s == 'i') {
s++;
if(*s == 'g') {
s++;
if(*s == 'n') {
s++;
if(*s == 'a') {
s++;
/*alignas*/
if( *s++ == 's' && *s =='\0') return 25; else return -1;;
}
if(*s == 'o') {
s++;
/*alignof*/
if( *s++ == 'f' && *s =='\0') return 26; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 'u') {
s++;
/*auto*/
if( *s++ == 't' && *s++ == 'o' && *s =='\0') return 27; else return -1;;
}
return -1;
};
if(*s == 'b') {
s++;
if(*s == 'o') {
s++;
/*bool*/
if( *s++ == 'o' && *s++ == 'l' && *s =='\0') return 28; else return -1;;
}
if(*s == 'r') {
s++;
/*break*/
if( *s++ == 'e' && *s++ == 'a' && *s++ == 'k' && *s =='\0') return 29; else return -1;;
}
return -1;
};
if(*s == 'c') {
s++;
if(*s == 'a') {
s++;
if(*s == 's') {
s++;
/*case*/
if( *s++ == 'e' && *s =='\0') return 30; else return -1;;
}
if(*s == 't') {
s++;
/*catch*/
if( *s++ == 'c' && *s++ == 'h' && *s =='\0') return 31; else return -1;;
}
return -1;
};
if(*s == 'h') {
s++;
/*char*/
if( *s++ == 'a' && *s++ == 'r' && *s =='\0') return 32; else return -1;;
}
if(*s == 'o') {
s++;
if(*s == 'n') {
s++;
if(*s == 's') {
s++;
if(*s == 't') {
s++;
if(*s == '\0') {
s++;
/*const*/
return 33;
};
if(*s == 'e') {
s++;
/*constexpr*/
if( *s++ == 'x' && *s++ == 'p' && *s++ == 'r' && *s =='\0') return 34; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 't') {
s++;
/*continue*/
if( *s++ == 'i' && *s++ == 'n' && *s++ == 'u' && *s++ == 'e' && *s =='\0') return 35; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
if(*s == 'd') {
s++;
if(*s == 'e') {
s++;
if(*s == 'f') {
s++;
if(*s == 'a') {
s++;
/*default*/
if( *s++ == 'u' && *s++ == 'l' && *s++ == 't' && *s =='\0') return 36; else return -1;;
}
if(*s == 'e') {
s++;
/*defer*/
if( *s++ == 'r' && *s =='\0') return 37; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'o') {
s++;
if(*s == '\0') {
s++;
/*do*/
return 38;
};
if(*s == 'u') {
s++;
/*double*/
if( *s++ == 'b' && *s++ == 'l' && *s++ == 'e' && *s =='\0') return 39; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'e') {
s++;
if(*s == 'l') {
s++;
/*else*/
if( *s++ == 's' && *s++ == 'e' && *s =='\0') return 40; else return -1;;
}
if(*s == 'n') {
s++;
/*enum*/
if( *s++ == 'u' && *s++ == 'm' && *s =='\0') return 41; else return -1;;
}
if(*s == 'x') {
s++;
/*extern*/
if( *s++ == 't' && *s++ == 'e' && *s++ == 'r' && *s++ == 'n' && *s =='\0') return 42; else return -1;;
}
return -1;
};
if(*s == 'f') {
s++;
if(*s == 'a') {
s++;
/*false*/
if( *s++ == 'l' && *s++ == 's' && *s++ == 'e' && *s =='\0') return 43; else return -1;;
}
if(*s == 'l') {
s++;
/*float*/
if( *s++ == 'o' && *s++ == 'a' && *s++ == 't' && *s =='\0') return 44; else return -1;;
}
if(*s == 'o') {
s++;
/*for*/
if( *s++ == 'r' && *s =='\0') return 45; else return -1;;
}
return -1;
};
if(*s == 'g') {
s++;
/*goto*/
if( *s++ == 'o' && *s++ == 't' && *s++ == 'o' && *s =='\0') return 46; else return -1;;
}
if(*s == 'i') {
s++;
if(*s == 'f') {
s++;
/*if*/
if( *s =='\0') return 47; else return -1;;
}
if(*s == 'n') {
s++;
if(*s == 'l') {
s++;
/*inline*/
if( *s++ == 'i' && *s++ == 'n' && *s++ == 'e' && *s =='\0') return 48; else return -1;;
}
if(*s == 't') {
s++;
/*int*/
if( *s =='\0') return 49; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'l') {
s++;
/*long*/
if( *s++ == 'o' && *s++ == 'n' && *s++ == 'g' && *s =='\0') return 50; else return -1;;
}
if(*s == 'n') {
s++;
/*nullptr*/
if( *s++ == 'u' && *s++ == 'l' && *s++ == 'l' && *s++ == 'p' && *s++ == 't' && *s++ == 'r' && *s =='\0') return 51; else return -1;;
}
if(*s == 'r') {
s++;
if(*s == 'e') {
s++;
if(*s == 'g') {
s++;
/*register*/
if( *s++ == 'i' && *s++ == 's' && *s++ == 't' && *s++ == 'e' && *s++ == 'r' && *s =='\0') return 52; else return -1;;
}
if(*s == 'p') {
s++;
/*repeat*/
if( *s++ == 'e' && *s++ == 'a' && *s++ == 't' && *s =='\0') return 53; else return -1;;
}
if(*s == 's') {
s++;
/*restrict*/
if( *s++ == 't' && *s++ == 'r' && *s++ == 'i' && *s++ == 'c' && *s++ == 't' && *s =='\0') return 54; else return -1;;
}
if(*s == 't') {
s++;
/*return*/
if( *s++ == 'u' && *s++ == 'r' && *s++ == 'n' && *s =='\0') return 55; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 's') {
s++;
if(*s == 'h') {
s++;
/*short*/
if( *s++ == 'o' && *s++ == 'r' && *s++ == 't' && *s =='\0') return 56; else return -1;;
}
if(*s == 'i') {
s++;
if(*s == 'g') {
s++;
/*signed*/
if( *s++ == 'n' && *s++ == 'e' && *s++ == 'd' && *s =='\0') return 57; else return -1;;
}
if(*s == 'z') {
s++;
/*sizeof*/
if( *s++ == 'e' && *s++ == 'o' && *s++ == 'f' && *s =='\0') return 58; else return -1;;
}
return -1;
};
if(*s == 't') {
s++;
if(*s == 'a') {
s++;
if(*s == 't') {
s++;
if(*s == 'i') {
s++;
if(*s == 'c') {
s++;
if(*s == '\0') {
s++;
/*static*/
return 59;
};
if(*s == '_') {
s++;
/*static_assert*/
if( *s++ == 'a' && *s++ == 's' && *s++ == 's' && *s++ == 'e' && *s++ == 'r' && *s++ == 't' && *s =='\0') return 60; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 'r') {
s++;
/*struct*/
if( *s++ == 'u' && *s++ == 'c' && *s++ == 't' && *s =='\0') return 61; else return -1;;
}
return -1;
};
if(*s == 'w') {
s++;
/*switch*/
if( *s++ == 'i' && *s++ == 't' && *s++ == 'c' && *s++ == 'h' && *s =='\0') return 62; else return -1;;
}
return -1;
};
if(*s == 't') {
s++;
if(*s == 'h') {
s++;
if(*s == 'r') {
s++;
if(*s == 'e') {
s++;
/*thread_local*/
if( *s++ == 'a' && *s++ == 'd' && *s++ == '_' && *s++ == 'l' && *s++ == 'o' && *s++ == 'c' && *s++ == 'a' && *s++ == 'l' && *s =='\0') return 63; else return -1;;
}
if(*s == 'o') {
s++;
/*throw*/
if( *s++ == 'w' && *s =='\0') return 64; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'r') {
s++;
if(*s == 'u') {
s++;
/*true*/
if( *s++ == 'e' && *s =='\0') return 65; else return -1;;
}
if(*s == 'y') {
s++;
/*try*/
if( *s =='\0') return 66; else return -1;;
}
return -1;
};
if(*s == 'y') {
s++;
if(*s == 'p') {
s++;
if(*s == 'e') {
s++;
if(*s == 'd') {
s++;
/*typedef*/
if( *s++ == 'e' && *s++ == 'f' && *s =='\0') return 67; else return -1;;
}
if(*s == 'i') {
s++;
/*typeid*/
if( *s++ == 'd' && *s =='\0') return 68; else return -1;;
}
if(*s == 'o') {
s++;
if(*s == 'f') {
s++;
if(*s == '\0') {
s++;
/*typeof*/
return 69;
};
if(*s == '_') {
s++;
/*typeof_unqual*/
if( *s++ == 'u' && *s++ == 'n' && *s++ == 'q' && *s++ == 'u' && *s++ == 'a' && *s++ == 'l' && *s =='\0') return 70; else return -1;;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(*s == 'u') {
s++;
if(*s == 'n') {
s++;
if(*s == 'i') {
s++;
/*union*/
if( *s++ == 'o' && *s++ == 'n' && *s =='\0') return 71; else return -1;;
}
if(*s == 's') {
s++;
/*unsigned*/
if( *s++ == 'i' && *s++ == 'g' && *s++ == 'n' && *s++ == 'e' && *s++ == 'd' && *s =='\0') return 72; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'v') {
s++;
if(*s == 'o') {
s++;
if(*s == 'i') {
s++;
/*void*/
if( *s++ == 'd' && *s =='\0') return 73; else return -1;;
}
if(*s == 'l') {
s++;
/*volatile*/
if( *s++ == 'a' && *s++ == 't' && *s++ == 'i' && *s++ == 'l' && *s++ == 'e' && *s =='\0') return 74; else return -1;;
}
return -1;
};
return -1;
};
if(*s == 'w') {
s++;
/*while*/
if( *s++ == 'h' && *s++ == 'i' && *s++ == 'l' && *s++ == 'e' && *s =='\0') return 75; else return -1;;
}
return -1;
}


Click here to read the complete article
Re: How to optimize this search?

<a6176a75-ca68-48c5-85fc-0c1417027f1cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22686&group=comp.lang.c#22686

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:5010:b0:496:f7af:bce0 with SMTP id jo16-20020a056214501000b00496f7afbce0mr1496930qvb.15.1661386138088;
Wed, 24 Aug 2022 17:08:58 -0700 (PDT)
X-Received: by 2002:a05:6830:22c6:b0:637:21a9:98fb with SMTP id
q6-20020a05683022c600b0063721a998fbmr460166otc.218.1661386137785; Wed, 24 Aug
2022 17:08:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 24 Aug 2022 17:08:57 -0700 (PDT)
In-Reply-To: <e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.252.226; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.252.226
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk> <7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com>
<20220825010423.97f368943e1fd1e556e0321f@gmail.com> <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
<e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a6176a75-ca68-48c5-85fc-0c1417027f1cn@googlegroups.com>
Subject: Re: How to optimize this search?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Thu, 25 Aug 2022 00:08:58 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2031
 by: Thiago Adams - Thu, 25 Aug 2022 00:08 UTC

On Wednesday, August 24, 2022 at 8:54:56 PM UTC-3, Thiago Adams wrote:
> On Wednesday, August 24, 2022 at 8:43:35 PM UTC-3, Thiago Adams wrote:
> > On Wednesday, August 24, 2022 at 7:04:36 PM UTC-3, Anton Shepelev wrote:
>
> > This generator outputs the code
> >
> output is:
>
> This code looks a bad linear search..but -o2 gcc transform this
> in a jump give the initial letter. The same code can be written using switch..

The bad news it that without a "label pointer" we cannot write similar code by hand.
We need a table jump.
We have function pointers but I thing it will pay the price of function call.(haven't checked)

Re: How to optimize this search?

<20220825112934.440b0eebe1adafa5388cf716@g{oogle}mail.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22689&group=comp.lang.c#22689

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: anton....@g{oogle}mail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Thu, 25 Aug 2022 11:29:34 +0300
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <20220825112934.440b0eebe1adafa5388cf716@g{oogle}mail.com>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk>
<20220825011715.c10469adfe9f90f757b0fe8d@gmail.com>
<87k06xqncg.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="1c0106dfb5f280bb0e9a5c50a6f5131d";
logging-data="3798550"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19he3XpkQvlw/nbyBL1bRzu898ekkYsyPE="
Cancel-Lock: sha1:hDknK2mICNUuBOa6OKa7du6hXS0=
X-Newsreader: Sylpheed 3.5.0 (GTK+ 2.24.23; i686-pc-mingw32)
 by: Anton Shepelev - Thu, 25 Aug 2022 08:29 UTC

Ben Bacarisse to Anton Shepelev:

> > My program for performance testing accepts a set of
> > measured implementations, including a reference null
> > implementaion, as function pointers, runs them
> > repeatedly with each dataset until total running time
> > exceeds a certain limit (e.g. 10 seconds), divides the
> > result by the number of iterations, and subracts the
> > null time. It has the advantage of testing all the
> > algorithms against all the datasets in a single run and
> > of outputting their performace in a table. It uses
> > standard C fucntions to measure time. How do you measure
> > time?
>
> For the numbers I posted, the time built-in bash command.
> You code looks fast, but it seems to have a bug.

Probably due to the confusion between ' ' as string
terminator and zero the empty value. I hoped I could get
away with it by way of a hack but failed. I was in a hurry,
will fix it.

> What times do you get for the other search strategies?

That is the program I used for testing sort and hash
algorithms but I did not apply it to the task at hand.
Thanks for poining out the bug and corresponding input in
the other post.

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Re: How to optimize this search?

<20220825113322.8fbbf5bdcc9de4f143c5ccc7@g{oogle}mail.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22690&group=comp.lang.c#22690

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: anton....@g{oogle}mail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Thu, 25 Aug 2022 11:33:22 +0300
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <20220825113322.8fbbf5bdcc9de4f143c5ccc7@g{oogle}mail.com>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk>
<20220825011715.c10469adfe9f90f757b0fe8d@gmail.com>
<87k06xqncg.fsf@bsb.me.uk>
<20220825112934.440b0eebe1adafa5388cf716@g{oogle}mail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="1c0106dfb5f280bb0e9a5c50a6f5131d";
logging-data="3798550"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18VB+YEL0ybrKiUVEh8uTfhyq8n01m7IB0="
Cancel-Lock: sha1:p4A0b5+h+9ie75hLRz7YSansLbM=
X-Newsreader: Sylpheed 3.5.0 (GTK+ 2.24.23; i686-pc-mingw32)
 by: Anton Shepelev - Thu, 25 Aug 2022 08:33 UTC

I wrote to

> > You code looks fast, but it seems to have a bug.
>
> Probably due to the confusion between ' ' as string
> terminator and zero the empty value.

I meant `\0`. Which will make more sense for this
conversation: to fix the bug or to rewrite the function so
that it accepts a raw non-terminated string and a
length -- an interface better suitable for a tokeniser?

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Re: How to optimize this search?

<20220826013841.422b5a9f2e46dd8fa88e0b42@gmail.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22704&group=comp.lang.c#22704

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: anton....@gmail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Fri, 26 Aug 2022 01:38:41 +0300
Organization: A noiseless patient Spider
Lines: 160
Message-ID: <20220826013841.422b5a9f2e46dd8fa88e0b42@gmail.com>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<b8624e58-8af6-4c48-8b27-694174e4243cn@googlegroups.com>
<5_NMK.799766$ntj.539650@fx15.iad>
<ac7f8956-aadb-4179-9327-ab8a25547de9n@googlegroups.com>
<20220823145856.464a208687d544911b246825@g{oogle}mail.com>
<607d6a50-9932-4b63-9108-0894646ad04bn@googlegroups.com>
<20220823175014.5b34c47ce60ab0dd18768269@g{oogle}mail.com>
<87a67vuf2n.fsf@bsb.me.uk>
<20220823192116.ccb5f45f05abfc38e14ad9af@g{oogle}mail.com>
<874jy2vb3m.fsf@bsb.me.uk>
<20220824112016.061ec6fc03ae86688a8534b6@g{oogle}mail.com>
<20220825005213.77a5a9e36a0e8fe01a1c42f3@gmail.com>
<87pmgpqo1d.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="729b4fd20241f5b1d5e3a8830c5f8689";
logging-data="3949160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/z9i+Kx9mJK0N6jm7m18kTq2pZBBCBak="
Cancel-Lock: sha1:T3XeAE5vyNxqHoL9pjCfEPOM0mo=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
 by: Anton Shepelev - Thu, 25 Aug 2022 22:38 UTC

Ben Bacarisse:

> This program reports zero (the token value assigned to
> NULL) for a number of strings that are not keywords.

Yes, it misfires for any words that is a prefix of a
keyword. Below is a fixed version:

#include <stdio.h>
#include <stdlib.h>

/* TODO: allocate tree[] elements from kw_reg() as needed */
#define MAX_RECS 4096
#define EMPTY -1
int tree[MAX_RECS][256]; /* tree of character-tables for NFA-like recognition */
int tree_h; /* index of highest used element in tree[] */

/* register a keyword: */
static void kw_reg( char* word, int id )
{ int* ctab;
int next;
char c;

ctab = tree[0];

while( 1 )
{ c = *word;
/* token scanned => store its id and exit: */
if( c == '\0' )
{ ctab[c] = id; break; }
next = ctab[c];
/* known prefix passed => store current character: */
if( next == EMPTY )
{ ctab[c] = ++tree_h;
ctab = tree[tree_h];
}
/* still within known prefix => continue to next level: */
else
{ ctab = tree[next]; }
word += 1;
}
}

static int kw_check( char* word )
{ char c;
int* ctab;
int next;
int res;

res = -1;
ctab = tree[0];
while( 1 )
{ c = *word;
next = ctab[c];
/* end of token => recognised a keyword: */
if( c == '\0' )
{ res = next; break; }
if( next == EMPTY ) break; /* unknown prefix => not a keyword */
ctab = tree[next]; /* continue to next level */
word += 1;
}
return res;
}

static void kw_init()
{ int r, c;
tree_h = 0;
for( r = 0; r < MAX_RECS; r += 1 )
for( c = 0; c < 256 ; c += 1 )
tree[r][c] = EMPTY;

kw_reg("NULL" , 0);
kw_reg("_Alignas" , 1);
kw_reg("_Atomic" , 2);
kw_reg("_BitInt" , 3);
kw_reg("_Bool" , 4);
kw_reg("_Complex" , 5);
kw_reg("_Decimal128" , 6);
kw_reg("_Decimal32" , 7);
kw_reg("_Decimal64" , 8);
kw_reg("_Generic" , 9);
kw_reg("_Hashof" , 10);
kw_reg("_Imaginary" , 11);
kw_reg("_Noreturn" , 12);
kw_reg("_Static_assert", 13);
kw_reg("_Thread_local" , 14);
kw_reg("__alignof" , 15);
kw_reg("__asm" , 16);
kw_reg("__forceinline" , 17);
kw_reg("__inline" , 18);
kw_reg("__int16" , 19);
kw_reg("__int32" , 20);
kw_reg("__int64" , 21);
kw_reg("__int8" , 22);
kw_reg("_asm" , 23);
kw_reg("alignas" , 24);
kw_reg("alignof" , 25);
kw_reg("auto" , 26);
kw_reg("bool" , 27);
kw_reg("break" , 28);
kw_reg("case" , 29);
kw_reg("catch" , 30);
kw_reg("char" , 31);
kw_reg("const" , 32);
kw_reg("constexpr" , 33);
kw_reg("continue" , 34);
kw_reg("default" , 35);
kw_reg("defer" , 36);
kw_reg("do" , 37);
kw_reg("double" , 38);
kw_reg("else" , 39);
kw_reg("enum" , 40);
kw_reg("extern" , 41);
kw_reg("false" , 42);
kw_reg("float" , 43);
kw_reg("for" , 44);
kw_reg("goto" , 45);
kw_reg("if" , 46);
kw_reg("inline" , 47);
kw_reg("int" , 48);
kw_reg("long" , 49);
kw_reg("nullptr" , 50);
kw_reg("register" , 51);
kw_reg("repeat" , 52);
kw_reg("restrict" , 53);
kw_reg("return" , 54);
kw_reg("short" , 55);
kw_reg("signed" , 56);
kw_reg("sizeof" , 57);
kw_reg("static" , 58);
kw_reg("static_assert" , 59);
kw_reg("struct" , 60);
kw_reg("switch" , 61);
kw_reg("thread_local" , 62);
kw_reg("throw" , 63);
kw_reg("true" , 64);
kw_reg("try" , 65);
kw_reg("typedef" , 66);
kw_reg("typeid" , 67);
kw_reg("typeof" , 68);
kw_reg("typeof_unqual" , 69);
kw_reg("union" , 70);
kw_reg("unsigned" , 71);
kw_reg("void" , 72);
kw_reg("volatile" , 73);
kw_reg("while" , 74);
}

static void test( char* token )
{ printf("%14s: %2i\n", token, kw_check( token ) ); }

int main( int argc, char** argv )
{ kw_init();
test( "NULL" );
test( "constexpr" );
test( "while" );
test( "_Thread_local" );
test( "rest" );
test( "random" );
}

Re: How to optimize this search?

<20220826014719.d44a7aa91a2368c66573374c@gmail.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22705&group=comp.lang.c#22705

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: anton....@gmail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Fri, 26 Aug 2022 01:47:19 +0300
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <20220826014719.d44a7aa91a2368c66573374c@gmail.com>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk>
<7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com>
<20220825010423.97f368943e1fd1e556e0321f@gmail.com>
<08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
<e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
<a6176a75-ca68-48c5-85fc-0c1417027f1cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="729b4fd20241f5b1d5e3a8830c5f8689";
logging-data="3949160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XWl2PVaDxu4buPMG6+3V2up3LLh//M5U="
Cancel-Lock: sha1:mhhDECZNMDBkj2dOUELf1SSksms=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
 by: Anton Shepelev - Thu, 25 Aug 2022 22:47 UTC

Thiago Adams:

> This generator outputs the code
> [...]
> This code looks a bad linear search..but -o2 gcc transform
> this in a jump give the initial letter.

That is what I dislike about too clever and too high-level
optimisers: they change the essence of the algorithm so much
that the code is not executed as it is written. The compiler
does the work of the programmer. Now, depending on the
compiler, the same C code, may perform in O(n) or O(1),
which is crazy!

> The same code can be written using switch..
>
> int is_keyword(const char* s)
> {
> if(*s == 'N') {
> s++;
> /*NULL*/
> if( *s++ == 'U' && *s++ == 'L' && *s++ == 'L' && *s ==' ') return 0; else return -1;;
> }
> if(*s == '_') {
> s++;
> if(*s == 'A') {
> s++;
> if(*s == 'l') {

Thank you for the explanation, Thiago.

> The bad news it that without a "label pointer" we cannot
> write similar code by hand. We need a table jump. We
> have function pointers but I thing it will pay the price
> of function call.(haven't checked)

I have tried.

--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Re: How to optimize this search?

<87a67ranej.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22706&group=comp.lang.c#22706

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Fri, 26 Aug 2022 00:55:48 +0100
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <87a67ranej.fsf@bsb.me.uk>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<b8624e58-8af6-4c48-8b27-694174e4243cn@googlegroups.com>
<5_NMK.799766$ntj.539650@fx15.iad>
<ac7f8956-aadb-4179-9327-ab8a25547de9n@googlegroups.com>
<20220823145856.464a208687d544911b246825@g{oogle}mail.com>
<607d6a50-9932-4b63-9108-0894646ad04bn@googlegroups.com>
<20220823175014.5b34c47ce60ab0dd18768269@g{oogle}mail.com>
<87a67vuf2n.fsf@bsb.me.uk>
<20220823192116.ccb5f45f05abfc38e14ad9af@g{oogle}mail.com>
<874jy2vb3m.fsf@bsb.me.uk>
<20220824112016.061ec6fc03ae86688a8534b6@g{oogle}mail.com>
<20220825005213.77a5a9e36a0e8fe01a1c42f3@gmail.com>
<87pmgpqo1d.fsf@bsb.me.uk>
<20220826013841.422b5a9f2e46dd8fa88e0b42@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="82ff49d57af2f8e7ac36802e47334ae2";
logging-data="3958230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18NK+TWUpiGR07XRX7WAfRQwuq+fq3RUnE="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:JyMIpWA+ZGIwqIy/P8IpVTA22B0=
sha1:WXzhFXL8WaIeHndZqhi9LRbpFxI=
X-BSB-Auth: 1.7402ddf09be2b82c2871.20220826005548BST.87a67ranej.fsf@bsb.me.uk
 by: Ben Bacarisse - Thu, 25 Aug 2022 23:55 UTC

Anton Shepelev <anton.txt@gmail.com> writes:

> Ben Bacarisse:
>
>> This program reports zero (the token value assigned to
>> NULL) for a number of strings that are not keywords.
>
> Yes, it misfires for any words that is a prefix of a
> keyword. Below is a fixed version:

<snip>

That seems to be a little faster than the perfect hash version (on the
sqlite source).

--
Ben.

Re: How to optimize this search?

<20220826120252.b8a994a2f425c1924488d6cc@g{oogle}mail.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22708&group=comp.lang.c#22708

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: anton....@g{oogle}mail.com (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: How to optimize this search?
Date: Fri, 26 Aug 2022 12:02:52 +0300
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <20220826120252.b8a994a2f425c1924488d6cc@g{oogle}mail.com>
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<b8624e58-8af6-4c48-8b27-694174e4243cn@googlegroups.com>
<5_NMK.799766$ntj.539650@fx15.iad>
<ac7f8956-aadb-4179-9327-ab8a25547de9n@googlegroups.com>
<20220823145856.464a208687d544911b246825@g{oogle}mail.com>
<607d6a50-9932-4b63-9108-0894646ad04bn@googlegroups.com>
<20220823175014.5b34c47ce60ab0dd18768269@g{oogle}mail.com>
<87a67vuf2n.fsf@bsb.me.uk>
<20220823192116.ccb5f45f05abfc38e14ad9af@g{oogle}mail.com>
<874jy2vb3m.fsf@bsb.me.uk>
<20220824112016.061ec6fc03ae86688a8534b6@g{oogle}mail.com>
<20220825005213.77a5a9e36a0e8fe01a1c42f3@gmail.com>
<87pmgpqo1d.fsf@bsb.me.uk>
<20220826013841.422b5a9f2e46dd8fa88e0b42@gmail.com>
<87a67ranej.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="26484669133c9693263d4589f82643b7";
logging-data="4135135"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19JLJg9Wgt3ocZgNx2NB2LKBMjqDZg+ibY="
Cancel-Lock: sha1:wLVF8CQ9ol4kw72fYQzXwGjLmrU=
X-Newsreader: Sylpheed 3.5.0 (GTK+ 2.24.23; i686-pc-mingw32)
 by: Anton Shepelev - Fri, 26 Aug 2022 09:02 UTC

Ben Bacarisse to Anton Shepelev:

> > Below is a fixed version:
> > [...]
>
> That seems to be a little faster than the perfect hash
> version (on the sqlite source).

I think my solution is algorithmically equivalent to
Thiago's generated optimiser-based one, about which you
write:

> It runs a little slower than the perfect hash: about
> 0.08s. Both are close to the IO-only times so the lookup
> itself it may be significantly slower, but much more
> detailed timings would be needed to find out.

I believe we should rewrite the function to accept a
character pointer and a length, and then run some decicated,
synthetic, isolated tests, say with just two inputs:

1. a keyword,
2. a non-keyword.

I can contribute a test program (even I fail to dig out my
old one), if I am not the only interested party.

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Re: How to optimize this search?

<c39893ca-432f-4ce7-9db3-14064d9afe5an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22709&group=comp.lang.c#22709

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2548:b0:6b6:113d:34fd with SMTP id s8-20020a05620a254800b006b6113d34fdmr5924719qko.132.1661512252667;
Fri, 26 Aug 2022 04:10:52 -0700 (PDT)
X-Received: by 2002:a05:6808:1b24:b0:344:8928:69fb with SMTP id
bx36-20020a0568081b2400b00344892869fbmr1413225oib.182.1661512252373; Fri, 26
Aug 2022 04:10:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 26 Aug 2022 04:10:52 -0700 (PDT)
In-Reply-To: <20220826120252.b8a994a2f425c1924488d6cc@g{oogle}mail.com>
Injection-Info: google-groups.googlegroups.com; posting-host=94.175.38.125; posting-account=rqC7UgoAAACeVvrGykivrxfPIl3bA_1y
NNTP-Posting-Host: 94.175.38.125
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<b8624e58-8af6-4c48-8b27-694174e4243cn@googlegroups.com> <5_NMK.799766$ntj.539650@fx15.iad>
<ac7f8956-aadb-4179-9327-ab8a25547de9n@googlegroups.com> <20220823145856.464a208687d544911b246825@g{oogle}mail.com>
<607d6a50-9932-4b63-9108-0894646ad04bn@googlegroups.com> <20220823175014.5b34c47ce60ab0dd18768269@g{oogle}mail.com>
<87a67vuf2n.fsf@bsb.me.uk> <20220823192116.ccb5f45f05abfc38e14ad9af@g{oogle}mail.com>
<874jy2vb3m.fsf@bsb.me.uk> <20220824112016.061ec6fc03ae86688a8534b6@g{oogle}mail.com>
<20220825005213.77a5a9e36a0e8fe01a1c42f3@gmail.com> <87pmgpqo1d.fsf@bsb.me.uk>
<20220826013841.422b5a9f2e46dd8fa88e0b42@gmail.com> <87a67ranej.fsf@bsb.me.uk>
<20220826120252.b8a994a2f425c1924488d6cc@g{oogle}mail.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c39893ca-432f-4ce7-9db3-14064d9afe5an@googlegroups.com>
Subject: Re: How to optimize this search?
From: bart4...@gmail.com (bart c)
Injection-Date: Fri, 26 Aug 2022 11:10:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 50
 by: bart c - Fri, 26 Aug 2022 11:10 UTC

On Friday, 26 August 2022 at 10:03:09 UTC+1, Anton Shepelev wrote:
> Ben Bacarisse to Anton Shepelev:
> > > Below is a fixed version:
> > > [...]
> >
> > That seems to be a little faster than the perfect hash
> > version (on the sqlite source).
> I think my solution is algorithmically equivalent to
> Thiago's generated optimiser-based one, about which you
> write:
>
> > It runs a little slower than the perfect hash: about
> > 0.08s. Both are close to the IO-only times so the lookup
> > itself it may be significantly slower, but much more
> > detailed timings would be needed to find out.
>
> I believe we should rewrite the function to accept a
> character pointer and a length, and then run some decicated,
> synthetic, isolated tests, say with just two inputs:
>
> 1. a keyword,
> 2. a non-keyword.
>
> I can contribute a test program (even I fail to dig out my
> old one), if I am not the only interested party.

I have a few problems with such a function.

My tests showed that the runtime of the function is usually dwarfed by the set-up time needed to obtain the input. So the emphasis is in the wrong place. (I also feel that those two parts are not distinct, and could either be combined, or made to work together.)

But also, by requiring a length, you've changed the game. With a length, you've now already narrowed down the possibilities for the keywords. For example, there is only one matching keyword when the length is 11, 12 or 14 characters,

Otherwise there is at most 13 keywords matching a particular length.

Without a length, the options are either to have it zero-terminated (which is how it works now), or to leave it open, which is more interesting: the pointer points into the middle of some text, and the end of the potential keyword is either:

- Non-alphabetic (including end-of-text marker)
- An alphabetic that rules out any keyword

This is interesting because then the function can be called as soon as the first character has been detected; no set-up is needed. It can improve overall performance.

Re: How to optimize this search?

<513ab3bb-cd81-48b0-83c9-1335c1566076n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22710&group=comp.lang.c#22710

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:d66:b0:496:fc5d:e1f6 with SMTP id 6-20020a0562140d6600b00496fc5de1f6mr7765256qvs.63.1661517603991;
Fri, 26 Aug 2022 05:40:03 -0700 (PDT)
X-Received: by 2002:a05:6870:5493:b0:11d:80d9:5f90 with SMTP id
f19-20020a056870549300b0011d80d95f90mr1646273oan.145.1661517603455; Fri, 26
Aug 2022 05:40:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 26 Aug 2022 05:40:03 -0700 (PDT)
In-Reply-To: <e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.252.226; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.252.226
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk> <7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com>
<20220825010423.97f368943e1fd1e556e0321f@gmail.com> <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
<e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <513ab3bb-cd81-48b0-83c9-1335c1566076n@googlegroups.com>
Subject: Re: How to optimize this search?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Fri, 26 Aug 2022 12:40:03 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Thiago Adams - Fri, 26 Aug 2022 12:40 UTC

On Wednesday, August 24, 2022 at 8:54:56 PM UTC-3, Thiago Adams wrote:
> On Wednesday, August 24, 2022 at 8:43:35 PM UTC-3, Thiago Adams wrote:
> > On Wednesday, August 24, 2022 at 7:04:36 PM UTC-3, Anton Shepelev wrote:
>
> > This generator outputs the code
> >
> output is:
>
> This code looks a bad linear search..but -o2 gcc transform this
> in a jump give the initial letter. The same code can be written using switch..
>
> int is_keyword(const char* s)
> {
> if(*s == 'N') {
> s++;
> /*NULL*/
> if( *s++ == 'U' && *s++ == 'L' && *s++ == 'L' && *s =='\0') return 0; else return -1;;
> }

I changed the generator to instead of using *s++ I put the [constant_index].

Like this:
if(s[0] == 'N') {
/*NULL*/
if( s[1] == 'U' && s[2] == 'L' && s[3] == 'L' && s[4] =='\0') return 0; else return -1;;
}

The compiler ouput is different.

--## GENERATOR ## --

#include <stdio.h>
#include <string.h>

void GenerateCore(const char* keywords[],
int first,
int last,
int level,
int* count) {
int ident = (level + 1) * 2;
for (int i = first; i <= last; i++) {
int begin = i;
int end = begin;
for (int k = i + 1; k <= last; k++) {
if (keywords[k][level] == keywords[begin][level]) {
i++;
end++;
}
else
break;
}
int index = level;
// we have the range
if (begin == end) {
// just one
if (keywords[i][level] != '\0') {
printf("%*cif(s[%d] == '%c') {\n", ident * 2, ' ', level, keywords[i][level]);

//printf("%*cs[%d];\n", ident * 3, ' ', index);
index++;

printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
printf("%*cif(", ident * 3, ' ');
int len = (int)strlen(keywords[i]);

int j = level + 1;
for (; j < len; j++) {
if (j != level + 1)
printf(" &&");

printf(" s[%d] == '%c'", index, keywords[i][j]);
index++;
}
if (j != level + 1) {
printf(" &&");
}
printf(" s[%d] =='\\0') return %i; else return -1;", index, i);

printf("\n");
printf("%*c}\n", ident * 2, ' ');
}
else {
printf("%*cif(*s == '\\0') {\n", ident * 2, ' ');
printf("%*cs[%d];\n", ident * 3, ' ', index); index++;
printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
printf("%*creturn % i;\n", ident * 3, ' ', i);
printf("%*c};\n", ident * 2, ' ');
}

(*count)++;
}
else {
printf("%*cif(s[%d] == '%c') {\n", ident * 2, ' ', level, keywords[i][level]);
//printf("%*cs[%d]; \n", ident * 3, ' ', index); index++;
index++;
GenerateCore(keywords, begin, end, level + 1, count);
printf("%*creturn -1;\n", ident * 3, ' ');
printf("%*c};\n", ident * 2, ' ');
}
}
}

void Generate(const char* keywords[], int size) {
/*sort keywords*/
int i, j;
for (i = 0; i < size - 1; i++) {
for (j = 0; j < size - i - 1; j++) {
if (strcmp(keywords[j], keywords[j + 1]) > 0) {
char* temp = keywords[j + 1];
keywords[j + 1] = keywords[j];
keywords[j] = temp;
}
}
}

printf("int is_keyword(const char* s)\n");
printf("{\n");
int count = 0;
GenerateCore(keywords, 0, size - 1, 0, &count);
printf("%*creturn -1;\n", 2, ' ');
printf("}\n");
}

int main() {
const char* keywords[] = { "NULL",
"_Alignas",
"_Alignof",
"__alignof",
"_Atomic",
"_BitInt",
"_Bool",
"_Complex",
"_Decimal128",
"_Decimal32",
"_Decimal64",
"_Generic",
"_Hashof",
"_Imaginary",
"_Noreturn",
"_Static_assert",
"_Thread_local",
"__asm",
"__forceinline",
"__inline",
"__int16",
"__int32",
"__int64",
"__int8",
"_asm",
"alignas",
"alignof",
"auto",
"bool",
"break",
"case",
"catch",
"char",
"const",
"constexpr",
"continue",
"default",
"defer",
"do",
"double",
"else",
"enum",
"extern",
"false",
"float",
"for",
"goto",
"if",
"inline",
"int",
"long",
"nullptr",
"register",
"repeat",
"restrict",
"return",
"short",
"signed",
"sizeof",
"static",
"static_assert",
"struct",
"switch",
"thread_local",
"throw",
"true",
"try",
"typedef",
"typeid",
"typeof",
"typeof_unqual",
"union",
"unsigned",
"void",
"volatile",
"while" };

Generate(keywords, sizeof(keywords) / sizeof(keywords[0]));
}

## GENERATOR OUTPUT ##

int is_keyword(const char* s)
{ if(s[0] == 'N') {
/*NULL*/
if( s[1] == 'U' && s[2] == 'L' && s[3] == 'L' && s[4] =='\0') return 0; else return -1;
}
if(s[0] == '_') {
if(s[1] == 'A') {
if(s[2] == 'l') {
if(s[3] == 'i') {
if(s[4] == 'g') {
if(s[5] == 'n') {
if(s[6] == 'a') {
/*_Alignas*/
if( s[7] == 's' && s[8] =='\0') return 1; else return -1;
}
if(s[6] == 'o') {
/*_Alignof*/
if( s[7] == 'f' && s[8] =='\0') return 2; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[2] == 't') {
/*_Atomic*/
if( s[3] == 'o' && s[4] == 'm' && s[5] == 'i' && s[6] == 'c' && s[7] =='\0') return 3; else return -1;
}
return -1;
};
if(s[1] == 'B') {
if(s[2] == 'i') {
/*_BitInt*/
if( s[3] == 't' && s[4] == 'I' && s[5] == 'n' && s[6] == 't' && s[7] =='\0') return 4; else return -1;
}
if(s[2] == 'o') {
/*_Bool*/
if( s[3] == 'o' && s[4] == 'l' && s[5] =='\0') return 5; else return -1;
}
return -1;
};
if(s[1] == 'C') {
/*_Complex*/
if( s[2] == 'o' && s[3] == 'm' && s[4] == 'p' && s[5] == 'l' && s[6] == 'e' && s[7] == 'x' && s[8] =='\0') return 6; else return -1;
}
if(s[1] == 'D') {
if(s[2] == 'e') {
if(s[3] == 'c') {
if(s[4] == 'i') {
if(s[5] == 'm') {
if(s[6] == 'a') {
if(s[7] == 'l') {
if(s[8] == '1') {
/*_Decimal128*/
if( s[9] == '2' && s[10] == '8' && s[11] =='\0') return 7; else return -1;
}
if(s[8] == '3') {
/*_Decimal32*/
if( s[9] == '2' && s[10] =='\0') return 8; else return -1;
}
if(s[8] == '6') {
/*_Decimal64*/
if( s[9] == '4' && s[10] =='\0') return 9; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[1] == 'G') {
/*_Generic*/
if( s[2] == 'e' && s[3] == 'n' && s[4] == 'e' && s[5] == 'r' && s[6] == 'i' && s[7] == 'c' && s[8] =='\0') return 10; else return -1;
}
if(s[1] == 'H') {
/*_Hashof*/
if( s[2] == 'a' && s[3] == 's' && s[4] == 'h' && s[5] == 'o' && s[6] == 'f' && s[7] =='\0') return 11; else return -1;
}
if(s[1] == 'I') {
/*_Imaginary*/
if( s[2] == 'm' && s[3] == 'a' && s[4] == 'g' && s[5] == 'i' && s[6] == 'n' && s[7] == 'a' && s[8] == 'r' && s[9] == 'y' && s[10] =='\0') return 12; else return -1;
}
if(s[1] == 'N') {
/*_Noreturn*/
if( s[2] == 'o' && s[3] == 'r' && s[4] == 'e' && s[5] == 't' && s[6] == 'u' && s[7] == 'r' && s[8] == 'n' && s[9] =='\0') return 13; else return -1;
}
if(s[1] == 'S') {
/*_Static_assert*/
if( s[2] == 't' && s[3] == 'a' && s[4] == 't' && s[5] == 'i' && s[6] == 'c' && s[7] == '_' && s[8] == 'a' && s[9] == 's' && s[10] == 's' && s[11] == 'e' && s[12] == 'r' && s[13] == 't' && s[14] =='\0') return 14; else return -1;
}
if(s[1] == 'T') {
/*_Thread_local*/
if( s[2] == 'h' && s[3] == 'r' && s[4] == 'e' && s[5] == 'a' && s[6] == 'd' && s[7] == '_' && s[8] == 'l' && s[9] == 'o' && s[10] == 'c' && s[11] == 'a' && s[12] == 'l' && s[13] =='\0') return 15; else return -1;
}
if(s[1] == '_') {
if(s[2] == 'a') {
if(s[3] == 'l') {
/*__alignof*/
if( s[4] == 'i' && s[5] == 'g' && s[6] == 'n' && s[7] == 'o' && s[8] == 'f' && s[9] =='\0') return 16; else return -1;
}
if(s[3] == 's') {
/*__asm*/
if( s[4] == 'm' && s[5] =='\0') return 17; else return -1;
}
return -1;
};
if(s[2] == 'f') {
/*__forceinline*/
if( s[3] == 'o' && s[4] == 'r' && s[5] == 'c' && s[6] == 'e' && s[7] == 'i' && s[8] == 'n' && s[9] == 'l' && s[10] == 'i' && s[11] == 'n' && s[12] == 'e' && s[13] =='\0') return 18; else return -1;
}
if(s[2] == 'i') {
if(s[3] == 'n') {
if(s[4] == 'l') {
/*__inline*/
if( s[5] == 'i' && s[6] == 'n' && s[7] == 'e' && s[8] =='\0') return 19; else return -1;
}
if(s[4] == 't') {
if(s[5] == '1') {
/*__int16*/
if( s[6] == '6' && s[7] =='\0') return 20; else return -1;
}
if(s[5] == '3') {
/*__int32*/
if( s[6] == '2' && s[7] =='\0') return 21; else return -1;
}
if(s[5] == '6') {
/*__int64*/
if( s[6] == '4' && s[7] =='\0') return 22; else return -1;
}
if(s[5] == '8') {
/*__int8*/
if( s[6] =='\0') return 23; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[1] == 'a') {
/*_asm*/
if( s[2] == 's' && s[3] == 'm' && s[4] =='\0') return 24; else return -1;
}
return -1;
};
if(s[0] == 'a') {
if(s[1] == 'l') {
if(s[2] == 'i') {
if(s[3] == 'g') {
if(s[4] == 'n') {
if(s[5] == 'a') {
/*alignas*/
if( s[6] == 's' && s[7] =='\0') return 25; else return -1;
}
if(s[5] == 'o') {
/*alignof*/
if( s[6] == 'f' && s[7] =='\0') return 26; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[1] == 'u') {
/*auto*/
if( s[2] == 't' && s[3] == 'o' && s[4] =='\0') return 27; else return -1;
}
return -1;
};
if(s[0] == 'b') {
if(s[1] == 'o') {
/*bool*/
if( s[2] == 'o' && s[3] == 'l' && s[4] =='\0') return 28; else return -1;
}
if(s[1] == 'r') {
/*break*/
if( s[2] == 'e' && s[3] == 'a' && s[4] == 'k' && s[5] =='\0') return 29; else return -1;
}
return -1;
};
if(s[0] == 'c') {
if(s[1] == 'a') {
if(s[2] == 's') {
/*case*/
if( s[3] == 'e' && s[4] =='\0') return 30; else return -1;
}
if(s[2] == 't') {
/*catch*/
if( s[3] == 'c' && s[4] == 'h' && s[5] =='\0') return 31; else return -1;
}
return -1;
};
if(s[1] == 'h') {
/*char*/
if( s[2] == 'a' && s[3] == 'r' && s[4] =='\0') return 32; else return -1;
}
if(s[1] == 'o') {
if(s[2] == 'n') {
if(s[3] == 's') {
if(s[4] == 't') {
if(*s == '\0') {
s[5];
/*const*/
return 33;
};
if(s[5] == 'e') {
/*constexpr*/
if( s[6] == 'x' && s[7] == 'p' && s[8] == 'r' && s[9] =='\0') return 34; else return -1;
}
return -1;
};
return -1;
};
if(s[3] == 't') {
/*continue*/
if( s[4] == 'i' && s[5] == 'n' && s[6] == 'u' && s[7] == 'e' && s[8] =='\0') return 35; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
if(s[0] == 'd') {
if(s[1] == 'e') {
if(s[2] == 'f') {
if(s[3] == 'a') {
/*default*/
if( s[4] == 'u' && s[5] == 'l' && s[6] == 't' && s[7] =='\0') return 36; else return -1;
}
if(s[3] == 'e') {
/*defer*/
if( s[4] == 'r' && s[5] =='\0') return 37; else return -1;
}
return -1;
};
return -1;
};
if(s[1] == 'o') {
if(*s == '\0') {
s[2];
/*do*/
return 38;
};
if(s[2] == 'u') {
/*double*/
if( s[3] == 'b' && s[4] == 'l' && s[5] == 'e' && s[6] =='\0') return 39; else return -1;
}
return -1;
};
return -1;
};
if(s[0] == 'e') {
if(s[1] == 'l') {
/*else*/
if( s[2] == 's' && s[3] == 'e' && s[4] =='\0') return 40; else return -1;
}
if(s[1] == 'n') {
/*enum*/
if( s[2] == 'u' && s[3] == 'm' && s[4] =='\0') return 41; else return -1;
}
if(s[1] == 'x') {
/*extern*/
if( s[2] == 't' && s[3] == 'e' && s[4] == 'r' && s[5] == 'n' && s[6] =='\0') return 42; else return -1;
}
return -1;
};
if(s[0] == 'f') {
if(s[1] == 'a') {
/*false*/
if( s[2] == 'l' && s[3] == 's' && s[4] == 'e' && s[5] =='\0') return 43; else return -1;
}
if(s[1] == 'l') {
/*float*/
if( s[2] == 'o' && s[3] == 'a' && s[4] == 't' && s[5] =='\0') return 44; else return -1;
}
if(s[1] == 'o') {
/*for*/
if( s[2] == 'r' && s[3] =='\0') return 45; else return -1;
}
return -1;
};
if(s[0] == 'g') {
/*goto*/
if( s[1] == 'o' && s[2] == 't' && s[3] == 'o' && s[4] =='\0') return 46; else return -1;
}
if(s[0] == 'i') {
if(s[1] == 'f') {
/*if*/
if( s[2] =='\0') return 47; else return -1;
}
if(s[1] == 'n') {
if(s[2] == 'l') {
/*inline*/
if( s[3] == 'i' && s[4] == 'n' && s[5] == 'e' && s[6] =='\0') return 48; else return -1;
}
if(s[2] == 't') {
/*int*/
if( s[3] =='\0') return 49; else return -1;
}
return -1;
};
return -1;
};
if(s[0] == 'l') {
/*long*/
if( s[1] == 'o' && s[2] == 'n' && s[3] == 'g' && s[4] =='\0') return 50; else return -1;
}
if(s[0] == 'n') {
/*nullptr*/
if( s[1] == 'u' && s[2] == 'l' && s[3] == 'l' && s[4] == 'p' && s[5] == 't' && s[6] == 'r' && s[7] =='\0') return 51; else return -1;
}
if(s[0] == 'r') {
if(s[1] == 'e') {
if(s[2] == 'g') {
/*register*/
if( s[3] == 'i' && s[4] == 's' && s[5] == 't' && s[6] == 'e' && s[7] == 'r' && s[8] =='\0') return 52; else return -1;
}
if(s[2] == 'p') {
/*repeat*/
if( s[3] == 'e' && s[4] == 'a' && s[5] == 't' && s[6] =='\0') return 53; else return -1;
}
if(s[2] == 's') {
/*restrict*/
if( s[3] == 't' && s[4] == 'r' && s[5] == 'i' && s[6] == 'c' && s[7] == 't' && s[8] =='\0') return 54; else return -1;
}
if(s[2] == 't') {
/*return*/
if( s[3] == 'u' && s[4] == 'r' && s[5] == 'n' && s[6] =='\0') return 55; else return -1;
}
return -1;
};
return -1;
};
if(s[0] == 's') {
if(s[1] == 'h') {
/*short*/
if( s[2] == 'o' && s[3] == 'r' && s[4] == 't' && s[5] =='\0') return 56; else return -1;
}
if(s[1] == 'i') {
if(s[2] == 'g') {
/*signed*/
if( s[3] == 'n' && s[4] == 'e' && s[5] == 'd' && s[6] =='\0') return 57; else return -1;
}
if(s[2] == 'z') {
/*sizeof*/
if( s[3] == 'e' && s[4] == 'o' && s[5] == 'f' && s[6] =='\0') return 58; else return -1;
}
return -1;
};
if(s[1] == 't') {
if(s[2] == 'a') {
if(s[3] == 't') {
if(s[4] == 'i') {
if(s[5] == 'c') {
if(*s == '\0') {
s[6];
/*static*/
return 59;
};
if(s[6] == '_') {
/*static_assert*/
if( s[7] == 'a' && s[8] == 's' && s[9] == 's' && s[10] == 'e' && s[11] == 'r' && s[12] == 't' && s[13] =='\0') return 60; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[2] == 'r') {
/*struct*/
if( s[3] == 'u' && s[4] == 'c' && s[5] == 't' && s[6] =='\0') return 61; else return -1;
}
return -1;
};
if(s[1] == 'w') {
/*switch*/
if( s[2] == 'i' && s[3] == 't' && s[4] == 'c' && s[5] == 'h' && s[6] =='\0') return 62; else return -1;
}
return -1;
};
if(s[0] == 't') {
if(s[1] == 'h') {
if(s[2] == 'r') {
if(s[3] == 'e') {
/*thread_local*/
if( s[4] == 'a' && s[5] == 'd' && s[6] == '_' && s[7] == 'l' && s[8] == 'o' && s[9] == 'c' && s[10] == 'a' && s[11] == 'l' && s[12] =='\0') return 63; else return -1;
}
if(s[3] == 'o') {
/*throw*/
if( s[4] == 'w' && s[5] =='\0') return 64; else return -1;
}
return -1;
};
return -1;
};
if(s[1] == 'r') {
if(s[2] == 'u') {
/*true*/
if( s[3] == 'e' && s[4] =='\0') return 65; else return -1;
}
if(s[2] == 'y') {
/*try*/
if( s[3] =='\0') return 66; else return -1;
}
return -1;
};
if(s[1] == 'y') {
if(s[2] == 'p') {
if(s[3] == 'e') {
if(s[4] == 'd') {
/*typedef*/
if( s[5] == 'e' && s[6] == 'f' && s[7] =='\0') return 67; else return -1;
}
if(s[4] == 'i') {
/*typeid*/
if( s[5] == 'd' && s[6] =='\0') return 68; else return -1;
}
if(s[4] == 'o') {
if(s[5] == 'f') {
if(*s == '\0') {
s[6];
/*typeof*/
return 69;
};
if(s[6] == '_') {
/*typeof_unqual*/
if( s[7] == 'u' && s[8] == 'n' && s[9] == 'q' && s[10] == 'u' && s[11] == 'a' && s[12] == 'l' && s[13] =='\0') return 70; else return -1;
}
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
return -1;
};
if(s[0] == 'u') {
if(s[1] == 'n') {
if(s[2] == 'i') {
/*union*/
if( s[3] == 'o' && s[4] == 'n' && s[5] =='\0') return 71; else return -1;
}
if(s[2] == 's') {
/*unsigned*/
if( s[3] == 'i' && s[4] == 'g' && s[5] == 'n' && s[6] == 'e' && s[7] == 'd' && s[8] =='\0') return 72; else return -1;
}
return -1;
};
return -1;
};
if(s[0] == 'v') {
if(s[1] == 'o') {
if(s[2] == 'i') {
/*void*/
if( s[3] == 'd' && s[4] =='\0') return 73; else return -1;
}
if(s[2] == 'l') {
/*volatile*/
if( s[3] == 'a' && s[4] == 't' && s[5] == 'i' && s[6] == 'l' && s[7] == 'e' && s[8] =='\0') return 74; else return -1;
}
return -1;
};
return -1;
};
if(s[0] == 'w') {
/*while*/
if( s[1] == 'h' && s[2] == 'i' && s[3] == 'l' && s[4] == 'e' && s[5] =='\0') return 75; else return -1;
}
return -1;
}


Click here to read the complete article
Re: How to optimize this search?

<4694466f-7d27-451d-bdd1-9a8fb9f140a7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22711&group=comp.lang.c#22711

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:6b18:0:b0:343:6b3:60ff with SMTP id w24-20020ac86b18000000b0034306b360ffmr8005781qts.176.1661520575270;
Fri, 26 Aug 2022 06:29:35 -0700 (PDT)
X-Received: by 2002:a05:6808:1491:b0:343:7543:1a37 with SMTP id
e17-20020a056808149100b0034375431a37mr1662356oiw.106.1661520574830; Fri, 26
Aug 2022 06:29:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 26 Aug 2022 06:29:34 -0700 (PDT)
In-Reply-To: <513ab3bb-cd81-48b0-83c9-1335c1566076n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.252.226; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.252.226
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>
<87edx5u8p9.fsf@bsb.me.uk> <7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com>
<20220825010423.97f368943e1fd1e556e0321f@gmail.com> <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com>
<e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com> <513ab3bb-cd81-48b0-83c9-1335c1566076n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4694466f-7d27-451d-bdd1-9a8fb9f140a7n@googlegroups.com>
Subject: Re: How to optimize this search?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Fri, 26 Aug 2022 13:29:35 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 10413
 by: Thiago Adams - Fri, 26 Aug 2022 13:29 UTC

On Friday, August 26, 2022 at 9:40:12 AM UTC-3, Thiago Adams wrote:
> On Wednesday, August 24, 2022 at 8:54:56 PM UTC-3, Thiago Adams wrote:
> > On Wednesday, August 24, 2022 at 8:43:35 PM UTC-3, Thiago Adams wrote:
> > > On Wednesday, August 24, 2022 at 7:04:36 PM UTC-3, Anton Shepelev wrote:
> >
> > > This generator outputs the code
> > >
> > output is:
> >
> > This code looks a bad linear search..but -o2 gcc transform this
> > in a jump give the initial letter. The same code can be written using switch..
> >
> > int is_keyword(const char* s)
> > {
> > if(*s == 'N') {
> > s++;
> > /*NULL*/
> > if( *s++ == 'U' && *s++ == 'L' && *s++ == 'L' && *s =='\0') return 0; else return -1;;
> > }
>
> I changed the generator to instead of using *s++ I put the [constant_index].
>
> Like this:
> if(s[0] == 'N') {
> /*NULL*/
> if( s[1] == 'U' && s[2] == 'L' && s[3] == 'L' && s[4] =='\0') return 0; else return -1;;
> }
>
> The compiler ouput is different.
>
> --## GENERATOR ## --
>
> #include <stdio.h>
> #include <string.h>
>
> void GenerateCore(const char* keywords[],
> int first,
> int last,
> int level,
> int* count) {
> int ident = (level + 1) * 2;
> for (int i = first; i <= last; i++) {
> int begin = i;
> int end = begin;
> for (int k = i + 1; k <= last; k++) {
> if (keywords[k][level] == keywords[begin][level]) {
> i++;
> end++;
> }
> else
> break;
> }
> int index = level;
> // we have the range
> if (begin == end) {
> // just one
> if (keywords[i][level] != '\0') {
> printf("%*cif(s[%d] == '%c') {\n", ident * 2, ' ', level, keywords[i][level]);
>
> //printf("%*cs[%d];\n", ident * 3, ' ', index);
> index++;
>
> printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
> printf("%*cif(", ident * 3, ' ');
> int len = (int)strlen(keywords[i]);
>
> int j = level + 1;
> for (; j < len; j++) {
> if (j != level + 1)
> printf(" &&");
>
> printf(" s[%d] == '%c'", index, keywords[i][j]);
> index++;
> }
> if (j != level + 1) {
> printf(" &&");
> }
> printf(" s[%d] =='\\0') return %i; else return -1;", index, i);
>
> printf("\n");
> printf("%*c}\n", ident * 2, ' ');
> }
> else {
> printf("%*cif(*s == '\\0') {\n", ident * 2, ' ');
> printf("%*cs[%d];\n", ident * 3, ' ', index); index++;
> printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
> printf("%*creturn % i;\n", ident * 3, ' ', i);
> printf("%*c};\n", ident * 2, ' ');
> }
>
> (*count)++;
> }
> else {
> printf("%*cif(s[%d] == '%c') {\n", ident * 2, ' ', level, keywords[i][level]);
> //printf("%*cs[%d]; \n", ident * 3, ' ', index); index++;
> index++;
> GenerateCore(keywords, begin, end, level + 1, count);
> printf("%*creturn -1;\n", ident * 3, ' ');
> printf("%*c};\n", ident * 2, ' ');
> }
> }
> }
>
> void Generate(const char* keywords[], int size) {
> /*sort keywords*/
> int i, j;
> for (i = 0; i < size - 1; i++) {
> for (j = 0; j < size - i - 1; j++) {
> if (strcmp(keywords[j], keywords[j + 1]) > 0) {
> char* temp = keywords[j + 1];
> keywords[j + 1] = keywords[j];
> keywords[j] = temp;
> }
> }
> }
>
> printf("int is_keyword(const char* s)\n");
> printf("{\n");
> int count = 0;
> GenerateCore(keywords, 0, size - 1, 0, &count);
> printf("%*creturn -1;\n", 2, ' ');
> printf("}\n");
> }
>
> int main() {
> const char* keywords[] = { "NULL",
> "_Alignas",
> "_Alignof",
> "__alignof",
> "_Atomic",
> "_BitInt",
> "_Bool",
> "_Complex",
> "_Decimal128",
> "_Decimal32",
> "_Decimal64",
> "_Generic",
> "_Hashof",
> "_Imaginary",
> "_Noreturn",
> "_Static_assert",
> "_Thread_local",
> "__asm",
> "__forceinline",
> "__inline",
> "__int16",
> "__int32",
> "__int64",
> "__int8",
> "_asm",
> "alignas",
> "alignof",
> "auto",
> "bool",
> "break",
> "case",
> "catch",
> "char",
> "const",
> "constexpr",
> "continue",
> "default",
> "defer",
> "do",
> "double",
> "else",
> "enum",
> "extern",
> "false",
> "float",
> "for",
> "goto",
> "if",
> "inline",
> "int",
> "long",
> "nullptr",
> "register",
> "repeat",
> "restrict",
> "return",
> "short",
> "signed",
> "sizeof",
> "static",
> "static_assert",
> "struct",
> "switch",
> "thread_local",
> "throw",
> "true",
> "try",
> "typedef",
> "typeid",
> "typeof",
> "typeof_unqual",
> "union",
> "unsigned",
> "void",
> "volatile",
> "while" };
>
>
>
> Generate(keywords, sizeof(keywords) / sizeof(keywords[0]));
> }
>
> ## GENERATOR OUTPUT ##
Sorry..the previous generator has a bug..

NEW CODE

#include <stdio.h>
#include <string.h>

void GenerateCore(const char* keywords[],
int first,
int last,
int level,
int* count) {
int ident = (level + 1) * 2;
for (int i = first; i <= last; i++) {
int begin = i;
int end = begin;
for (int k = i + 1; k <= last; k++) {
if (keywords[k][level] == keywords[begin][level]) {
i++;
end++;
}
else
break;
}
int index = level;
// we have the range
if (begin == end) {
// just one
if (keywords[i][level] != '\0') {
printf("%*cif(s[%d] == '%c') {\n", ident * 2, ' ', level, keywords[i][level]);

//printf("%*cs[%d];\n", ident * 3, ' ', index);
index++;

printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
printf("%*cif(", ident * 3, ' ');
int len = (int)strlen(keywords[i]);

int j = level + 1;
for (; j < len; j++) {
if (j != level + 1)
printf(" &&");

printf(" s[%d] == '%c'", index, keywords[i][j]);
index++;
}
if (j != level + 1) {
printf(" &&");
}
printf(" s[%d] =='\\0') return %i; else return -1;", index, i);

printf("\n");
printf("%*c}\n", ident * 2, ' ');
}
else {
printf("%*cif(s[%d] == '\\0') {\n", ident * 2, ' ', index);

printf("%*c/*%s*/\n", ident * 3, ' ', keywords[i]);
printf("%*creturn % i;\n", ident * 3, ' ', i);
printf("%*c};\n", ident * 2, ' ');
}


Click here to read the complete article
Re: How to optimize this search?

<sq4OK.862731$ntj.624798@fx15.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22713&group=comp.lang.c#22713

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: How to optimize this search?
Newsgroups: comp.lang.c
References: <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com> <87edx5u8p9.fsf@bsb.me.uk> <7caa0f30-bc19-4c93-80b9-93aa8611fb67n@googlegroups.com> <20220825010423.97f368943e1fd1e556e0321f@gmail.com> <08a98867-c006-480c-9b15-0b0019c064fen@googlegroups.com> <e72e2f97-da10-417e-8252-9015b6f70f5fn@googlegroups.com> <513ab3bb-cd81-48b0-83c9-1335c1566076n@googlegroups.com>
Lines: 51
Message-ID: <sq4OK.862731$ntj.624798@fx15.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 26 Aug 2022 14:05:12 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 26 Aug 2022 14:05:12 GMT
X-Received-Bytes: 2791
 by: Scott Lurndal - Fri, 26 Aug 2022 14:05 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>On Wednesday, August 24, 2022 at 8:54:56 PM UTC-3, Thiago Adams wrote:
>> On Wednesday, August 24, 2022 at 8:43:35 PM UTC-3, Thiago Adams wrote:
>> > On Wednesday, August 24, 2022 at 7:04:36 PM UTC-3, Anton Shepelev wrote:
>>
>> > This generator outputs the code
>> >
>> output is:
>>
>> This code looks a bad linear search..but -o2 gcc transform this
>> in a jump give the initial letter. The same code can be written using switch..
>>

<snip>

A perfect example of trading space for speed. Your is_keyword function
is over 8Kb of x86_64 code (unoptimized), over 5Kb with O2/O3. That
might have some negative effect on the L1 icache, actually reducing
overall performance due to icache misses in other parts of the application.

A symbol table lookup function using a simple double-linked list hash
bucket scheme is less than 512 bytes of object code. This algorithm
however, has potential locality issues, impacting both dcache
and tlb miss rate depending on how the symbol table linked
list entries are allocated (using a pool allocator will reduce
locality issues when compared with willy-nilly use of new,
either directly or under the covers with soi disant smart pointers).

c_symbol *
c_symbol_table::lookup(const char *name, uint64 line)
{ c_dlist_iterator di(&st_buckets[hash(name)]);
while (di.next()) {
c_symbol *sp = (c_symbol *)di.curr();

if (strcmp(name, sp->get_name()) == 0) {
if (line > 0) {
sp->add_ref(line);
}
return sp;
}
}

return NULL;
}

Before microoptimizing a lookup function, I would profile the application which
calls the function to see if it is a hotspot.

Before complaining about the home-made iterator, note that this
code dates to circa 1990 with cfront 2.1.

Pages:12345
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor