Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Linux is addictive, I'm hooked! -- MaDsen Wikholm's .sig


devel / comp.lang.c / How to optimize this search?

SubjectAuthor
* How to optimize this search?Thiago Adams
+* Re: How to optimize this search?Thiago Adams
|`* Re: How to optimize this search?Scott Lurndal
| +* Re: How to optimize this search?Thiago Adams
| |+- Re: How to optimize this search?Thiago Adams
| |+- Re: How to optimize this search?Scott Lurndal
| |`* Re: How to optimize this search?Anton Shepelev
| | `* Re: How to optimize this search?Thiago Adams
| |  `* Re: How to optimize this search?Anton Shepelev
| |   `* Re: How to optimize this search?Ben Bacarisse
| |    `* Re: How to optimize this search?Anton Shepelev
| |     +* Re: How to optimize this search?Thiago Adams
| |     |`- Re: How to optimize this search?Anton Shepelev
| |     `* Re: How to optimize this search?Ben Bacarisse
| |      `* Re: How to optimize this search?Anton Shepelev
| |       `* Re: How to optimize this search?Anton Shepelev
| |        `* Re: How to optimize this search?Ben Bacarisse
| |         `* Re: How to optimize this search?Anton Shepelev
| |          `* Re: How to optimize this search?Ben Bacarisse
| |           `* Re: How to optimize this search?Anton Shepelev
| |            `- Re: How to optimize this search?bart c
| +* Re: How to optimize this search?Thiago Adams
| |`* Re: How to optimize this search?Scott Lurndal
| | `* Re: How to optimize this search?bart c
| |  +* Re: How to optimize this search?Keith Thompson
| |  |`* Re: How to optimize this search?bart c
| |  | `* Re: How to optimize this search?Ben Bacarisse
| |  |  `- Re: How to optimize this search?bart c
| |  +* Re: How to optimize this search?antispam
| |  |`* Re: How to optimize this search?bart c
| |  | `- Re: How to optimize this search?antispam
| |  `* Re: How to optimize this search?David Brown
| |   `* Re: How to optimize this search?bart c
| |    `- Re: How to optimize this search?David Brown
| `- Re: How to optimize this search?antispam
+* Re: How to optimize this search?Ben Bacarisse
|`* Re: How to optimize this search?Thiago Adams
| `- Re: How to optimize this search?Ben Bacarisse
+* Re: How to optimize this search?Siri Cruise
|+* Re: How to optimize this search?Ben Bacarisse
||`* Re: How to optimize this search?Siri Cruise
|| `- Re: How to optimize this search?Ben Bacarisse
|+- Re: How to optimize this search?Ben Bacarisse
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?antispam
|  `* Re: How to optimize this search?Thiago Adams
|   +- Re: How to optimize this search?Öö Tiib
|   `* Re: How to optimize this search?antispam
|    `* Re: How to optimize this search?Thiago Adams
|     +- Re: How to optimize this search?Stefan Ram
|     `* Re: How to optimize this search?Thiago Adams
|      +* Re: How to optimize this search?bart c
|      |`* Re: How to optimize this search?Thiago Adams
|      | +- Re: How to optimize this search?Keith Thompson
|      | `* Re: How to optimize this search?bart c
|      |  `- Re: How to optimize this search?Thiago Adams
|      `* Re: How to optimize this search?David Brown
|       `- Re: How to optimize this search?William Ahern
+* Re: How to optimize this search?Bonita Montero
|+- Re: How to optimize this search?Ben Bacarisse
|+- Re: How to optimize this search?Bonita Montero
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?Bonita Montero
|  `* Re: How to optimize this search?Thiago Adams
|   `* Re: How to optimize this search?Bonita Montero
|    `* Re: How to optimize this search?Thiago Adams
|     +- Re: How to optimize this search?Bonita Montero
|     `- Re: How to optimize this search?Anton Shepelev
+* Re: How to optimize this search?bart c
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?antispam
|  `- Re: How to optimize this search?bart c
+* Re: How to optimize this search?Tim Rentsch
|`* Re: How to optimize this search?Thiago Adams
| `* Re: How to optimize this search?bart c
|  `* Re: How to optimize this search?Keith Thompson
|   +- Re: How to optimize this search?Thiago Adams
|   `* Re: How to optimize this search?bart c
|    +* Re: How to optimize this search?Thiago Adams
|    |`- Re: How to optimize this search?Stefan Ram
|    `* Re: How to optimize this search?Keith Thompson
|     +* Re: How to optimize this search?bart c
|     |`* Re: How to optimize this search?Keith Thompson
|     | `- Re: How to optimize this search?bart c
|     `* Re: How to optimize this search?antispam
|      +- Re: How to optimize this search?Keith Thompson
|      `* Re: How to optimize this search?David Brown
|       +- Re: How to optimize this search?bart c
|       `- Re: How to optimize this search?antispam
`* Re: How to optimize this search?Ben Bacarisse
 +* Re: How to optimize this search?Thiago Adams
 |+- Re: How to optimize this search?Thiago Adams
 |+* Re: How to optimize this search?Ben Bacarisse
 ||+* Re: How to optimize this search?Thiago Adams
 |||`- Re: How to optimize this search?Ben Bacarisse
 ||`* Re: How to optimize this search?Thiago Adams
 || `- Re: How to optimize this search?Ben Bacarisse
 |+- Re: How to optimize this search?bart c
 |`* Re: How to optimize this search?Anton Shepelev
 | `* Re: How to optimize this search?Thiago Adams
 |  `* Re: How to optimize this search?Thiago Adams
 +* Re: How to optimize this search?bart c
 `* Re: How to optimize this search?Anton Shepelev

Pages:12345
How to optimize this search?

<d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22575&group=comp.lang.c++#22575

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 14:17 UTC

Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.

struct keyword_pair{
const char* lexeme;
int token;
};

struct keyword_pair[] = {
{ "NULL", 0},
{ "_Alignas", 1},
{ "_Atomic", 2},
{ "_BitInt", 3},
{ "_Bool", 4},
{ "_Complex", 5},
{ "_Decimal128", 6},
{ "_Decimal32", 7},
{ "_Decimal64", 8},
{ "_Generic", 9},
{ "_Hashof", 10},
{ "_Imaginary", 11},
{ "_Noreturn", 12},
{ "_Static_assert", 13},
{ "_Thread_local", 14},
{ "__alignof", 15},
{ "__asm", 16},
{ "__forceinline", 17},
{ "__inline", 18},
{ "__int16", 19},
{ "__int32", 20},
{ "__int64", 21},
{ "__int8", 22},
{ "_asm", 23},
{ "alignas", 24},
{ "alignof", 25},
{ "auto", 26},
{ "bool", 27},
{ "break", 28},
{ "case", 29},
{ "catch", 30},
{ "char", 31},
{ "const", 32},
{ "constexpr", 33},
{ "continue", 34},
{ "default", 35},
{ "defer", 36},
{ "do", 37},
{ "double", 38},
{ "else", 39},
{ "enum", 40},
{ "extern", 41},
{ "false", 42},
{ "float", 43},
{ "for", 44},
{ "goto", 45},
{ "if", 46},
{ "inline", 47},
{ "int", 48},
{ "long", 49},
{ "nullptr", 50},
{ "register", 51},
{ "repeat", 52},
{ "restrict", 53},
{ "return", 54},
{ "short", 55},
{ "signed", 56},
{ "sizeof", 57},
{ "static", 58},
{ "static_assert", 59},
{ "struct", 60},
{ "switch", 61},
{ "thread_local", 62},
{ "throw", 63},
{ "true", 64},
{ "try", 65},
{ "typedef", 66},
{ "typeid", 67},
{ "typeof", 68},
{ "typeof_unqual", 69},
{ "union", 70},
{ "unsigned", 71},
{ "void", 72},
{ "volatile", 73},
{ "while", 74},
};

Re: How to optimize this search?

<b8624e58-8af6-4c48-8b27-694174e4243cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22576&group=comp.lang.c++#22576

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 14:34 UTC

On Monday, August 22, 2022 at 11:17:29 AM UTC-3, Thiago Adams wrote:
> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.

How compiler generates code for switch?
But does assembler has a "dynamic" jump?
I want to "jump" to code that search for words starting with 'a' for instance
and jump to the code that search for words starting with 'b'.

switch(str[0])
{ case 'a':
switch(str[1])
{
case 'l': /*alignof*/
break;
case 'u': /*auto*/
break;
case 'l': /*alignas, alignof*/
break;
}
break;
...
}

Re: How to optimize this search?

<87pmgswavq.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22577&group=comp.lang.c++#22577

  copy link   Newsgroups: comp.lang.c
 by: Ben Bacarisse - Mon, 22 Aug 2022 15:34 UTC

Thiago Adams <thiago.adams@gmail.com> writes:

> Given a c string I need to return as fast as possible the pair (if
> exist) of the corresponding keyword_pair.

Since the set is know at compile time, a "perfect hash" would be a good
candidate. (Searching for "perfect hash" should work.)

But why as fast as possible? Tools like the one you appear to be
building often have significant IO costs, so a keyword look-up is
unlikely to be very significant.

> struct keyword_pair{
> const char* lexeme;
> int token;
> };
>
> struct keyword_pair[] = {
> { "NULL", 0},
> { "_Alignas", 1},
> { "_Atomic", 2},
> { "_BitInt", 3},

etc...

--
Ben.

Re: How to optimize this search?

<chine.bleu-825230.08435922082022@news.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22578&group=comp.lang.c++#22578

  copy link   Newsgroups: comp.lang.c
 by: Siri Cruise - Mon, 22 Aug 2022 15:44 UTC

In article
<d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>,
Thiago Adams <thiago.adams@gmail.com> wrote:

> Given a c string I need to return as fast as possible the pair (if exist) of
> the corresponding keyword_pair.

You can build a Finite State Automaton.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Chen sockpuppet. insults Islam. Mohammed

Re: How to optimize this search?

<te08ma$2mrgb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22579&group=comp.lang.c++#22579

  copy link   Newsgroups: comp.lang.c
 by: Bonita Montero - Mon, 22 Aug 2022 15:51 UTC

Use a proper language:

#include <iostream>
#include <unordered_map>

using namespace std;

int main( int argc, char **argv )
{ if( argc < 2 )
return EXIT_FAILURE;
static unordered_map<char const *, int> keywordMap =
{
{ "NULL", 0 },
{ "_Alignas", 1 },
{ "_Atomic", 2 },
{ "_BitInt", 3 },
{ "_Bool", 4 },
{ "_Complex", 5 },
{ "_Decimal128", 6 },
{ "_Decimal32", 7 },
{ "_Decimal64", 8 },
{ "_Generic", 9 },
{ "_Hashof", 10 },
{ "_Imaginary", 11 },
{ "_Noreturn", 12 },
{ "_Static_assert", 13 },
{ "_Thread_local", 14 },
{ "__alignof", 15 },
{ "__asm", 16 },
{ "__forceinline", 17 },
{ "__inline", 18 },
{ "__int16", 19 },
{ "__int32", 20 },
{ "__int64", 21 },
{ "__int8", 22 },
{ "_asm", 23 },
{ "alignas", 24 },
{ "alignof", 25 },
{ "auto", 26 },
{ "bool", 27 },
{ "break", 28 },
{ "case", 29 },
{ "catch", 30 },
{ "char", 31 },
{ "const", 32 },
{ "constexpr", 33 },
{ "continue", 34 },
{ "default", 35 },
{ "defer", 36 },
{ "do", 37 },
{ "double", 38 },
{ "else", 39 },
{ "enum", 40 },
{ "extern", 41 },
{ "false", 42 },
{ "float", 43 },
{ "for", 44 },
{ "goto", 45 },
{ "if", 46 },
{ "inline", 47 },
{ "int", 48 },
{ "long", 49 },
{ "nullptr", 50 },
{ "register", 51 },
{ "repeat", 52 },
{ "restrict", 53 },
{ "return", 54 },
{ "short", 55 },
{ "signed", 56 },
{ "sizeof", 57 },
{ "static", 58 },
{ "static_assert", 59 },
{ "struct", 60 },
{ "switch", 61 },
{ "thread_local", 62 },
{ "throw", 63 },
{ "true", 64 },
{ "try", 65 },
{ "typedef", 66 },
{ "typeid", 67 },
{ "typeof", 68 },
{ "typeof_unqual", 69 },
{ "union", 70 },
{ "unsigned", 71 },
{ "void", 72 },
{ "volatile", 73 },
{ "while", 74 },
};
auto mapped = keywordMap.find( argv[1] );
if( mapped == keywordMap.end() )
return EXIT_FAILURE;
cout << mapped->second << endl;
}

Re: How to optimize this search?

<87edx8wa03.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22580&group=comp.lang.c++#22580

  copy link   Newsgroups: comp.lang.c
 by: Ben Bacarisse - Mon, 22 Aug 2022 15:53 UTC

Siri Cruise <chine.bleu@yahoo.com> writes:

> In article
> <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>,
> Thiago Adams <thiago.adams@gmail.com> wrote:
>
>> Given a c string I need to return as fast as possible the pair (if exist) of
>> the corresponding keyword_pair.
>
> You can build a Finite State Automaton.

If you mean a deterministic FSA, that's tedious and error-prone if done
by hand. But if TA is happy with a slightly slower nondeterministic FSA
it can be done reasonably simply by maintaining an set of candidate
matches. Often, a one-word bit-map will do.

But if you want a deterministic FSA, there are tools like flex that will
do it right.

--
Ben.

Re: How to optimize this search?

<878rngw9zg.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22581&group=comp.lang.c++#22581

  copy link   Newsgroups: comp.lang.c
 by: Ben Bacarisse - Mon, 22 Aug 2022 15:53 UTC

Siri Cruise <chine.bleu@yahoo.com> writes:

> In article
> <d8294420-7b45-4344-b9eb-0ad867850ec6n@googlegroups.com>,
> Thiago Adams <thiago.adams@gmail.com> wrote:
>
>> Given a c string I need to return as fast as possible the pair (if exist) of
>> the corresponding keyword_pair.
>
> You can build a Finite State Automaton.

If you mean a deterministic FSA, that's tedious and error-prone if done
by hand. But if TA is happy with a slightly slower nondeterministic FSA
it can be done reasonably simply by maintaining an set of candidate
matches. Often, a one-word bit-map will do.

But if you want a deterministic FSA, there are tools like flex that will
do it right.

--
Ben.

Re: How to optimize this search?

<8735dow9ve.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22582&group=comp.lang.c++#22582

  copy link   Newsgroups: comp.lang.c
Followup: comp.lang.c++
 by: Ben Bacarisse - Mon, 22 Aug 2022 15:55 UTC

Bonita Montero <Bonita.Montero@gmail.com> writes:

> Use a proper language:
> #include <iostream>

Please use the correct group.

--
Ben.

Re: How to optimize this search?

<9ddfe85a-a07f-4d99-aab1-946f678a1f07n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22583&group=comp.lang.c++#22583

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:11 UTC

On Monday, August 22, 2022 at 12:44:22 PM UTC-3, Siri Cruise wrote:
> In article
> <d8294420-7b45-4344...@googlegroups.com>,
> Thiago Adams <thiago...@gmail.com> wrote:
>
> > Given a c string I need to return as fast as possible the pair (if exist) of
> > the corresponding keyword_pair.
> You can build a Finite State Automaton.

Yes.. I am thinking exactly like that. But at some point
we need to map like char -> state then it is the same
switch problem. How to given char how to find the state in a fast way?

Re: How to optimize this search?

<c27b619d-35a7-4445-ab40-ac7342806c54n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22584&group=comp.lang.c++#22584

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:14 UTC

On Monday, August 22, 2022 at 12:34:15 PM UTC-3, Ben Bacarisse wrote:
> Thiago Adams <thiago...@gmail.com> writes:
>
> > Given a c string I need to return as fast as possible the pair (if
> > exist) of the corresponding keyword_pair.
> Since the set is know at compile time, a "perfect hash" would be a good
> candidate. (Searching for "perfect hash" should work.)

Yes but let's say we have a perfect hash. Then I find
the index of keyword.. and then I guess a strcmp is also
required to check against conflicts. Because the input
can be any string.
So there is at least two operations hash and strcmp.

Re: How to optimize this search?

<te0a1f$2mvv9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22585&group=comp.lang.c++#22585

  copy link   Newsgroups: comp.lang.c
 by: Bonita Montero - Mon, 22 Aug 2022 16:14 UTC

I've benched a linear search against a map-access:

#include <iostream>
#include <unordered_map>
#include <random>
#include <concepts>
#include <chrono>

using namespace std;
using namespace chrono;

atomic_int sum( 0 );

int main( int argc, char **argv )
{ using init_pair_t = pair<char const *, int>;
static init_pair_t init[] =
{
{ "NULL", 0 },
{ "_Alignas", 1 },
{ "_Atomic", 2 },
{ "_BitInt", 3 },
{ "_Bool", 4 },
{ "_Complex", 5 },
{ "_Decimal128", 6 },
{ "_Decimal32", 7 },
{ "_Decimal64", 8 },
{ "_Generic", 9 },
{ "_Hashof", 10 },
{ "_Imaginary", 11 },
{ "_Noreturn", 12 },
{ "_Static_assert", 13 },
{ "_Thread_local", 14 },
{ "__alignof", 15 },
{ "__asm", 16 },
{ "__forceinline", 17 },
{ "__inline", 18 },
{ "__int16", 19 },
{ "__int32", 20 },
{ "__int64", 21 },
{ "__int8", 22 },
{ "_asm", 23 },
{ "alignas", 24 },
{ "alignof", 25 },
{ "auto", 26 },
{ "bool", 27 },
{ "break", 28 },
{ "case", 29 },
{ "catch", 30 },
{ "char", 31 },
{ "const", 32 },
{ "constexpr", 33 },
{ "continue", 34 },
{ "default", 35 },
{ "defer", 36 },
{ "do", 37 },
{ "double", 38 },
{ "else", 39 },
{ "enum", 40 },
{ "extern", 41 },
{ "false", 42 },
{ "float", 43 },
{ "for", 44 },
{ "goto", 45 },
{ "if", 46 },
{ "inline", 47 },
{ "int", 48 },
{ "long", 49 },
{ "nullptr", 50 },
{ "register", 51 },
{ "repeat", 52 },
{ "restrict", 53 },
{ "return", 54 },
{ "short", 55 },
{ "signed", 56 },
{ "sizeof", 57 },
{ "static", 58 },
{ "static_assert", 59 },
{ "struct", 60 },
{ "switch", 61 },
{ "thread_local", 62 },
{ "throw", 63 },
{ "true", 64 },
{ "try", 65 },
{ "typedef", 66 },
{ "typeid", 67 },
{ "typeof", 68 },
{ "typeof_unqual", 69 },
{ "union", 70 },
{ "unsigned", 71 },
{ "void", 72 },
{ "volatile", 73 },
{ "while", 74 }
};
constexpr size_t
N = size( init ),
#if defined(NDEBUG)
ROUNDS = 1'000'000;
#else
ROUNDS = 100;
#endif
static unordered_map<char const *, int> keywordMap( init, init + N );
auto bench = [&]<typename FindFn>( FindFn findFn ) -> double
requires requires( FindFn findFn, char const *what ) { { findFn( what
) } -> same_as<int>; }
{
auto start = high_resolution_clock::now();
mt19937_64 mt;
uniform_int_distribution<size_t> uid( 0, N - 1 );
int sum = 0;
for( size_t r = ROUNDS; r--; )
sum += findFn( init[uid( mt )].first );
::sum.store( sum, memory_order_relaxed );
return (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / ROUNDS;
};
double
nsIterate = bench( [&]( char const *what ) -> int
{
for( size_t i = 0; i != N; ++i )
if( strcmp( init[i].first, what ) == 0 )
return init[i].second;
return -1;
} ),
nsMap = bench( [&]( char const *what ) -> int { return
keywordMap.find( what )->second; } );
cout << nsIterate / nsMap << endl;
}

For me the difference is about factor 5.6.
So there's a small number of attempts necessary until you're beyond
a linear search.

Re: How to optimize this search?

<febd215c-8541-4e3e-a101-be98bde7308an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22586&group=comp.lang.c++#22586

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:15 UTC

On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
> Use a proper language:
>
> #include <iostream>
> #include <unordered_map>

With unordered_map you are not taking advantage that
strings are know at compile times.
So I am sure a better performance/memory can be achieved.

Re: How to optimize this search?

<5_NMK.799766$ntj.539650@fx15.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22587&group=comp.lang.c++#22587

  copy link   Newsgroups: comp.lang.c
 by: Scott Lurndal - Mon, 22 Aug 2022 16:17 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>On Monday, August 22, 2022 at 11:17:29 AM UTC-3, Thiago Adams wrote:
>> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.
>

Use a binary search. Max comparisons for your 70+ entries would
be 6.

Re: How to optimize this search?

<ac7f8956-aadb-4179-9327-ab8a25547de9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22588&group=comp.lang.c++#22588

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:24 UTC

On Monday, August 22, 2022 at 1:17:18 PM UTC-3, Scott Lurndal wrote:
> Thiago Adams <thiago...@gmail.com> writes:
> >On Monday, August 22, 2022 at 11:17:29 AM UTC-3, Thiago Adams wrote:
> >> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.
> >
> Use a binary search. Max comparisons for your 70+ entries would
> be 6.
I tried!

Using a switch for the first character and then using if with strcmp was
faster than binary search.

switch(str[0])
{ case 'a':
if (strcmp(str, "auto") == 0) return 1;
break;
....
etc...
}

The compiler is creating the code for me and I want to understand
better how it is doing then maybe I can improve further..

Re: How to optimize this search?

<te0aqp$2n27h$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22589&group=comp.lang.c++#22589

  copy link   Newsgroups: comp.lang.c
 by: Bonita Montero - Mon, 22 Aug 2022 16:28 UTC

Am 22.08.2022 um 18:15 schrieb Thiago Adams:
> On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
>> Use a proper language:
>>
>> #include <iostream>
>> #include <unordered_map>
>
> With unordered_map you are not taking advantage that
> strings are know at compile times.

I don't see where's the problem here. If I initialize the map static
as I did the map is initialized thread-safe with double-checked locking
when the function is called first. DCL is that fast that further checks
if the map is already initialized carry nearly no weight.

> So I am sure a better performance/memory can be achieved.

Re: How to optimize this search?

<chine.bleu-976816.09293522082022@news.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22590&group=comp.lang.c++#22590

  copy link   Newsgroups: comp.lang.c
 by: Siri Cruise - Mon, 22 Aug 2022 16:29 UTC

In article <87edx8wa03.fsf@bsb.me.uk>,
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

> > You can build a Finite State Automaton.
>
> If you mean a deterministic FSA, that's tedious and error-prone if done
> by hand. But if TA is happy with a slightly slower nondeterministic FSA
> it can be done reasonably simply by maintaining an set of candidate
> matches. Often, a one-word bit-map will do.

In this particular example it's a deterministic FSA that could be
implemented as (this can be done easily with real macro
processors)

> Given a c string I need to return as fast as possible the pair (if exist) of
> the corresponding keyword_pair.
>
> struct keyword_pair{
> const char* lexeme;
> int token;
> };
>
> struct keyword_pair[] = {
> { "NULL", 0},
> { "_Alignas", 1},
> { "_Atomic", 2},
> { "_BitInt", 3},
> { "_Bool", 4},

switch (input) {
case 'N': goto stateN;
case '_': goto state_;
default: goto fail;
}
state_: switch (input) {
case 'A': goto state_A;
case 'B': goto state_B;
...
}
state_A: switch (input) ...

If the language has computed gotos, like GNU label arrays, you
can implement with something like (though cache misses would lose
any supposed efficiency).

static void *start[256] = {'N': &&stateN, '_', &&state_,
...};
static void *state_[256] = {'A': &&state_A, 'B', &&stateB_,
...};

goto start[input];
state_: goto state_[input];
state_A: ...
state_B: ...
....

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Chen sockpuppet. insults Islam. Mohammed

Re: How to optimize this search?

<87wnb0utkd.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22591&group=comp.lang.c++#22591

  copy link   Newsgroups: comp.lang.c
 by: Ben Bacarisse - Mon, 22 Aug 2022 16:33 UTC

Thiago Adams <thiago.adams@gmail.com> writes:

> On Monday, August 22, 2022 at 12:34:15 PM UTC-3, Ben Bacarisse wrote:
>> Thiago Adams <thiago...@gmail.com> writes:
>>
>> > Given a c string I need to return as fast as possible the pair (if
>> > exist) of the corresponding keyword_pair.
>> Since the set is know at compile time, a "perfect hash" would be a good
>> candidate. (Searching for "perfect hash" should work.)
>
> Yes but let's say we have a perfect hash. Then I find
> the index of keyword.. and then I guess a strcmp is also
> required to check against conflicts. Because the input
> can be any string.
> So there is at least two operations hash and strcmp.

Well, the function written by gperf has a slight optimisation, but, yes,
there are two operations. Are you sure you are not prematurely
optimising this?

--
Ben.

Re: How to optimize this search?

<59e36b02-2c8a-4948-bfd8-34346e03b286n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22592&group=comp.lang.c++#22592

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:34 UTC

On Monday, August 22, 2022 at 1:28:23 PM UTC-3, Bonita Montero wrote:
> Am 22.08.2022 um 18:15 schrieb Thiago Adams:
> > On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
> >> Use a proper language:
> >>
> >> #include <iostream>
> >> #include <unordered_map>
> >
> > With unordered_map you are not taking advantage that
> > strings are know at compile times.
> I don't see where's the problem here. If I initialize the map static
> as I did the map is initialized thread-safe with double-checked locking
> when the function is called first. DCL is that fast that further checks
> if the map is already initialized carry nearly no weight.
> > So I am sure a better performance/memory can be achieved.

You need a compile time state machine analizer..
I am sure unordered_map will not do that.
but compiler will do on switches.

Re: How to optimize this search?

<te0b8p$2n27h$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22593&group=comp.lang.c++#22593

  copy link   Newsgroups: comp.lang.c
 by: Bonita Montero - Mon, 22 Aug 2022 16:35 UTC

Am 22.08.2022 um 18:34 schrieb Thiago Adams:
> On Monday, August 22, 2022 at 1:28:23 PM UTC-3, Bonita Montero wrote:
>> Am 22.08.2022 um 18:15 schrieb Thiago Adams:
>>> On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
>>>> Use a proper language:
>>>>
>>>> #include <iostream>
>>>> #include <unordered_map>
>>>
>>> With unordered_map you are not taking advantage that
>>> strings are know at compile times.
>> I don't see where's the problem here. If I initialize the map static
>> as I did the map is initialized thread-safe with double-checked locking
>> when the function is called first. DCL is that fast that further checks
>> if the map is already initialized carry nearly no weight.
>>> So I am sure a better performance/memory can be achieved.
>
> You need a compile time state machine analizer..
> I am sure unordered_map will not do that.
> but compiler will do on switches.

Whatever you want to try to do. With that number
of keywords you'll be slower than with a hash map.

Re: How to optimize this search?

<87r118utej.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22594&group=comp.lang.c++#22594

  copy link   Newsgroups: comp.lang.c
 by: Ben Bacarisse - Mon, 22 Aug 2022 16:36 UTC

Siri Cruise <chine.bleu@yahoo.com> writes:

> In article <87edx8wa03.fsf@bsb.me.uk>,
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>
>> > You can build a Finite State Automaton.
>>
>> If you mean a deterministic FSA, that's tedious and error-prone if done
>> by hand. But if TA is happy with a slightly slower nondeterministic FSA
>> it can be done reasonably simply by maintaining an set of candidate
>> matches. Often, a one-word bit-map will do.
>
> In this particular example it's a deterministic FSA that could be
> implemented as (this can be done easily with real macro
> processors)

You mean like TRAC or M4? I'd like to see how this is done as I've
always found hand-written FSAs to be error-prone.

--
Ben.

Re: How to optimize this search?

<60da3e78-a685-446e-b75a-a35bf249d2cen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22595&group=comp.lang.c++#22595

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:38 UTC

On Monday, August 22, 2022 at 1:24:12 PM UTC-3, Thiago Adams wrote:
> On Monday, August 22, 2022 at 1:17:18 PM UTC-3, Scott Lurndal wrote:
> > Thiago Adams <thiago...@gmail.com> writes:
> > >On Monday, August 22, 2022 at 11:17:29 AM UTC-3, Thiago Adams wrote:
> > >> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.
> > >
> > Use a binary search. Max comparisons for your 70+ entries would
> > be 6.
> I tried!
>
> Using a switch for the first character and then using if with strcmp was
> faster than binary search.
> switch(str[0])
> {
> case 'a':
> if (strcmp(str, "auto") == 0) return 1;
> break;
> ...
> etc...
> }
>
> The compiler is creating the code for me and I want to understand
> better how it is doing then maybe I can improve further..

int find(const char* s)
{ switch(s[0])
{
case 'a':
return 1;
case 'b':
return 2;
case 'c':
return 3;
}
return -1;
} int main(int argv, char* args[]){
return find(args[1]);
}

without optimisations if looks like if else if else...
https://godbolt.org/z/Gn7K7vThM

but with -o3..it is completely different.
https://godbolt.org/z/6b65zYoEq

Re: How to optimize this search?

<85faf079-8adf-4528-9010-54ac3d483664n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22596&group=comp.lang.c++#22596

  copy link   Newsgroups: comp.lang.c
 by: Thiago Adams - Mon, 22 Aug 2022 16:45 UTC

On Monday, August 22, 2022 at 1:35:50 PM UTC-3, Bonita Montero wrote:
> Am 22.08.2022 um 18:34 schrieb Thiago Adams:
> > On Monday, August 22, 2022 at 1:28:23 PM UTC-3, Bonita Montero wrote:
> >> Am 22.08.2022 um 18:15 schrieb Thiago Adams:
> >>> On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
> >>>> Use a proper language:
> >>>>
> >>>> #include <iostream>
> >>>> #include <unordered_map>
> >>>
> >>> With unordered_map you are not taking advantage that
> >>> strings are know at compile times.
> >> I don't see where's the problem here. If I initialize the map static
> >> as I did the map is initialized thread-safe with double-checked locking
> >> when the function is called first. DCL is that fast that further checks
> >> if the map is already initialized carry nearly no weight.
> >>> So I am sure a better performance/memory can be achieved.
> >
> > You need a compile time state machine analizer..
> > I am sure unordered_map will not do that.
> > but compiler will do on switches.
> Whatever you want to try to do. With that number
> of keywords you'll be slower than with a hash map.

If you want to try a benchmark with C and C++
here is the C code you need to be faster. :)

https://godbolt.org/z/3v1xMPscG

Re: How to optimize this search?

<te0c68$2n70a$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22597&group=comp.lang.c++#22597

  copy link   Newsgroups: comp.lang.c
 by: Bonita Montero - Mon, 22 Aug 2022 16:51 UTC

Am 22.08.2022 um 18:45 schrieb Thiago Adams:
> On Monday, August 22, 2022 at 1:35:50 PM UTC-3, Bonita Montero wrote:
>> Am 22.08.2022 um 18:34 schrieb Thiago Adams:
>>> On Monday, August 22, 2022 at 1:28:23 PM UTC-3, Bonita Montero wrote:
>>>> Am 22.08.2022 um 18:15 schrieb Thiago Adams:
>>>>> On Monday, August 22, 2022 at 12:51:52 PM UTC-3, Bonita Montero wrote:
>>>>>> Use a proper language:
>>>>>>
>>>>>> #include <iostream>
>>>>>> #include <unordered_map>
>>>>>
>>>>> With unordered_map you are not taking advantage that
>>>>> strings are know at compile times.
>>>> I don't see where's the problem here. If I initialize the map static
>>>> as I did the map is initialized thread-safe with double-checked locking
>>>> when the function is called first. DCL is that fast that further checks
>>>> if the map is already initialized carry nearly no weight.
>>>>> So I am sure a better performance/memory can be achieved.
>>>
>>> You need a compile time state machine analizer..
>>> I am sure unordered_map will not do that.
>>> but compiler will do on switches.
>> Whatever you want to try to do. With that number
>> of keywords you'll be slower than with a hash map.
>
> If you want to try a benchmark with C and C++
> here is the C code you need to be faster. :)
>
> https://godbolt.org/z/3v1xMPscG

I don't need to benchmark that ugly code. A hash map is faster for sure.
I extended by benchmark and I compared a linear search with N elements
with a hash map, and the hash map is faster from 2 to N.
Your code is simply ugly and error-prone.

Re: How to optimize this search?

<VFOMK.157471$Me2.32072@fx47.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22598&group=comp.lang.c++#22598

  copy link   Newsgroups: comp.lang.c
 by: Scott Lurndal - Mon, 22 Aug 2022 17:03 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>On Monday, August 22, 2022 at 1:17:18 PM UTC-3, Scott Lurndal wrote:
>> Thiago Adams <thiago...@gmail.com> writes:
>> >On Monday, August 22, 2022 at 11:17:29 AM UTC-3, Thiago Adams wrote:
>> >> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.
>> >
>> Use a binary search. Max comparisons for your 70+ entries would
>> be 6.
>I tried!
>
>Using a switch for the first character and then using if with strcmp was
>faster than binary search.
>
>switch(str[0])
>{
>case 'a':
> if (strcmp(str, "auto") == 0) return 1;
> break;
>...
>etc...
>}
>
>The compiler is creating the code for me and I want to understand
>better how it is doing then maybe I can improve further..

code space vs. performance.

Re: How to optimize this search?

<db9c05cf-f055-48aa-8074-d9f58d8a0693n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22599&group=comp.lang.c++#22599

  copy link   Newsgroups: comp.lang.c
 by: bart c - Mon, 22 Aug 2022 17:20 UTC

On Monday, 22 August 2022 at 15:17:29 UTC+1, Thiago Adams wrote:
> Given a c string I need to return as fast as possible the pair (if exist) of the corresponding keyword_pair.
>
>
> struct keyword_pair{
> const char* lexeme;
> int token;
> };
>
> struct keyword_pair[] = {
> { "NULL", 0},
> { "_Alignas", 1},
> { "_Atomic", 2},
....
> { "volatile", 73},
> { "while", 74},
> };

I suspect other things are going on. Presumably you are looking up an identifier encountered in source code to see if it's a keyword, but if not, another lookup is needed to see if it matches a user-identifier in a symbol table.

My approach usually involves a hash table (the symbol table), containing both keywords and user identifiers. Each entry contains information on whether it is a keyword name (and its token value), or user identifier.

To start, your list is used to populate the empty hash table.

You don't need a perfect hash for this. Typically, most of the time you only end up doing one strcmp to confirm the right selection. If significantly more than one on average, then you have too many clashes: the table is getting full, or you have a poor hash function.

Pages:12345
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor