1-intro.md - website - The source code for https://bpaul.xyz

1-intro.md (5812B)
      1 7 august 2021
      2 
      3 # Intro
      4 
      5 I decided that I want to write a chess engine so here you go.
      6 
      7 First of all, this isn't my first time writing a chess engine. I stopped developing my previous engine because I didn't like how much code I had copied.
      8 
      9 So what is a chess engine? At its core (I think) a chess engine which takes a chess position as input and outputs the best move (or at least what the engine thinks is the best move). In order to implement this, we need some sort of protocol to communicate to the engine which position to look at.
     10 
     11 ## UCI Protocol
     12 
     13 UCI (Universal Chess Interface) is the most commonly used chess protocol. The engine sends stuff to stdout, then a gui will read the stuff and process it. The stuff is specified in UCI.
     14 
     15 There are a few basic commands in UCI that can be implemented without programming anything chess related. I'll leave out the exact specifics of how you do UCI until it is implemented. These commands are
     16 
     17 `uci`
     18 : Meant to be used to tell the engine to use uci instead of something else.
     19 : Give the various options that may be set to change how the engine behaves
     20 : The engine is meant to identify itself in response
     21 
     22 `isready`
     23 : This command is meant to "synchronise the engine with the gui"
     24 : All you have to do is print "isready"
     25 
     26 `quit`
     27 : Just quit the engine
     28 
     29 I'm only going to cover these commands for the rest of this article, so no chess just yet.
     30 
     31 ```
     32 inline bool
     33 cmd(const char *b, const char c) {
     34     return strncmp(b, c, strlen(c)) == 0;
     35 }
     36 
     37 int
     38 main() {
     39     char buf[2048];
     40     
     41     while (1) {
     42         fgets(buf, 2048, stdin);
     43         
     44         if (cmd(buf, "uci")) {
     45             puts("id name bpaul-chess v0.0.1");
     46             puts("id author Benjamin Paul");
     47             puts("uciok");
     48         } else if (cmd(buf, "isready")) {
     49             puts("readyok");
     50         } else if (cmd(buf, "quit")) {
     51             break;
     52         }
     53     }
     54     
     55     return 0;
     56 }
     57 ```
     58 
     59 A uci command has the format "command arg1 arg2 ... argn\n". The \n is important as it represents the end of a command.
     60 
     61 The buffer size of 2048 is to handle the size of other commands in the protocol, although the number 2048 is pretty arbitrary. I only chose it because it is a power of 2.
     62 
     63 strncmp was used because maybe (as a human and not as a gui) I may enter a command such as "uci ". strcmp would say that this is not the same as "uci", but strncmp with n=3 would say that they are the same.
     64 
     65 There may be room to optimize out the strlen in the strncmp.
     66 
     67 WARNING this is completely useless.
     68 
     69 The command c is always constant, and running strlen on a constant will always give the same results, so feeding in the constant would improve speed over using strlen. This can be implemented using the C preprocessor instead of just writing the constant value of strlen for each command.
     70 
     71 ```
     72 #define CMD(s) strncmp(buf, s, sizeof(s)-1) == 0
     73 .
     74 .
     75 .
     76 if (CMD("uci")) {
     77     ...
     78 } else if (CMD("isready")) {
     79     ...
     80 } else if (CMD("quit")) {
     81     ...
     82 }
     83 ```
     84 
     85 sizeof gives the length of the string including the null byte at the end, but strlen ignores the null byte, so 1 is subtracted to get rid of the null byte from the count.
     86 
     87 These two versions of the program can now be compared. I am using the `perf(1)` command which is part of linux I think. These two versions are compiled with just -O3.
     88 
     89 I ran the command `perf stat -e instructions:u ./prog < commands` on both versions and read the result. The commands file contains:
     90 ```
     91 uci
     92 isready
     93 quit
     94 ```
     95 essentially testing all of the commands so far implemented.
     96 
     97 The first version had about 138505 instructions and the second version had about 138500. The difference was only 5 instructions??? As it turns out there is a gcc flag `-foptimize-strlen` which is enabled by -O2 or -O3, but it is interesting that using the macro still saved a bit of time.
     98 
     99 After some more testing, it turns out that there is actually no improvement. For some reason, changing the binary file's name caused the program to use 5 more instructions, and it had nothing to do with the code.
    100 
    101 By the way these optimizations are completely useless.
    102 
    103 The next article will probably start going into the making of a board representation. Thanks for reading! Have a good day/night.
    104 
    105 Update:
    106 OMG I saved like 4000 instructions somehow by rewriting strncmp!!
    107 
    108 Old strncmp: 138505
    109 New strncmp: 134630
    110 
    111 I couldn't figure out why the second version was better but I found it trying to optimize for SIMD. Lets hope the improvement did actually come from SIMD and not something else.
    112 
    113 New strncmp code
    114 ```
    115 bool
    116 new_strncmp(const char *s1, const char *s2, unsigned long n) {
    117 	int i;
    118 	char c[4];
    119 
    120 	if (n >= 4) {
    121 		unsigned long n4 = n >> 2;
    122 		do {
    123             for (i = 0; i < 4; i++) {
    124                 c[i] = s1[i] - s2[i];
    125 				if (s1[i] == '\0' || c[i] != 0) {
    126 					return c[i];
    127 				}
    128 			}
    129 
    130 			s1+=4;
    131 			s2+=4;
    132 			n4--;
    133 		} while (n4 > 0);
    134 		n &= 3;
    135 	}
    136 
    137 	while (n > 0) {
    138 		c[0] = *s1 - *s2;
    139 		if (*s1 == '\0' || c[0] != 0) {
    140 			return c[0];
    141 		}
    142 		s1++;
    143 		s2++;
    144 	}
    145 
    146 	return c[0];
    147 }
    148 ```
    149 
    150 Update 2:
    151 Ok so I have found out that using instructions is bad because some instructions are faster than others. This means that something else has to be used like cycles.
    152 
    153 Using cycles shows that the new algorithm saves about 10000 cpu cycles which is pretty good (according to roughly looking at manual tests, so it isn't very formal)! Yay!
    154 
    155 Update 3:
    156 It turns out that i forgot to decrement n in the last while loop of the new algorithm, so the program wouldn't work if n was not a multiple of 4. Fixing this shows that the algorithm is the same as the one implemented in glibc, so this was all a waste of time. Yay!
    157 
    158 ## Links
    159 
    160 UCI protocol specification
    161 http://wbec-ridderkerk.nl/html/UCIProtocol.html
    162 strncmp in glibc source
    163 https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strncmp.c;hb=HEAD
	website The source code for https://bpaul.xyz
	Log \| Files \| Refs