|
|
@@ -0,0 +1,368 @@
|
|
|
1
|
+# README: Synchronisation: Locks
|
|
|
2
|
+
|
|
|
3
|
+Welcome to this simulator. The idea is to gain familiarity with threads by
|
|
|
4
|
+seeing how they interleave; the simulator, x86.py, will help you in gaining this
|
|
|
5
|
+understanding.
|
|
|
6
|
+
|
|
|
7
|
+The simulator mimicks the execution of short assembly sequences by multiple
|
|
|
8
|
+threads. Note that the OS code that would run (for example, to perform a context
|
|
|
9
|
+switch) is *not* shown; thus, all you see is the interleaving of the user code.
|
|
|
10
|
+
|
|
|
11
|
+The assembly code that is run is based on x86, but somewhat simplified. In this
|
|
|
12
|
+instruction set, there are four general-purpose registers (%ax, %bx, %cx, %dx),
|
|
|
13
|
+a program counter (PC), and a small set of instructions which will be enough for
|
|
|
14
|
+our purposes. We've also added a few extra GP registers (%ex, %fx) which don't
|
|
|
15
|
+quite match anything in x86 land (but that is OK).
|
|
|
16
|
+
|
|
|
17
|
+Here is an example code snippet that we will be able to run:
|
|
|
18
|
+
|
|
|
19
|
+```assembly
|
|
|
20
|
+.main
|
|
|
21
|
+mov 2000, %ax # get the value at the address
|
|
|
22
|
+add $1, %ax # increment it
|
|
|
23
|
+mov %ax, 2000 # store it back
|
|
|
24
|
+halt
|
|
|
25
|
+```
|
|
|
26
|
+
|
|
|
27
|
+The code is easy to understand. The first instruction, an x86 "mov", simply
|
|
|
28
|
+loads a value from the address specified by 2000 into the register %ax.
|
|
|
29
|
+Addresses, in this subset of x86, can take some of the following forms:
|
|
|
30
|
+
|
|
|
31
|
+ 2000 -> the number (2000) is the address (%cx) -> contents of
|
|
|
32
|
+ register (in parentheses) forms the address 1000(%dx) -> the number +
|
|
|
33
|
+ contents of the register form the address 10(%ax,%bx) -> the number + reg1 +
|
|
|
34
|
+ reg2 forms the address 10(%ax,%bx,4) -> the number + reg1 + (reg2*scaling)
|
|
|
35
|
+ forms the address
|
|
|
36
|
+
|
|
|
37
|
+To store a value, the same "mov" instruction is used, but this time with the
|
|
|
38
|
+arguments reversed, e.g.:
|
|
|
39
|
+
|
|
|
40
|
+ mov %ax, 2000
|
|
|
41
|
+
|
|
|
42
|
+The "add" instruction, from the sequence above, should be clear: it adds an
|
|
|
43
|
+immediate value (specified by $1) to the register specified in the second
|
|
|
44
|
+argument (i.e., %ax = %ax + 1).
|
|
|
45
|
+
|
|
|
46
|
+Thus, we now can understand the code sequence above: it loads the value at
|
|
|
47
|
+address 2000, adds 1 to it, and then stores the value back into address 2000.
|
|
|
48
|
+
|
|
|
49
|
+The fake-ish "halt" instruction just stops running this thread.
|
|
|
50
|
+
|
|
|
51
|
+Let's run the simulator and see how this all works! Assume the above code
|
|
|
52
|
+sequence is in the file "simple-race.s".
|
|
|
53
|
+
|
|
|
54
|
+```text
|
|
|
55
|
+prompt> ./x86.py -p simple-race.s -t 1
|
|
|
56
|
+
|
|
|
57
|
+ Thread 0
|
|
|
58
|
+1000 mov 2000, %ax
|
|
|
59
|
+1001 add $1, %ax
|
|
|
60
|
+1002 mov %ax, 2000
|
|
|
61
|
+1003 halt
|
|
|
62
|
+
|
|
|
63
|
+prompt>
|
|
|
64
|
+```
|
|
|
65
|
+
|
|
|
66
|
+The arguments used here specify the program (-p), the number of threads (-t 1),
|
|
|
67
|
+and the interrupt interval, which is how often a scheduler will be woken and run
|
|
|
68
|
+to switch to a different task. Because there is only one thread in this example,
|
|
|
69
|
+this interval does not matter.
|
|
|
70
|
+
|
|
|
71
|
+The output is easy to read: the simulator prints the program counter (here shown
|
|
|
72
|
+from 1000 to 1003) and the instruction that gets executed. Note that we assume
|
|
|
73
|
+(unrealistically) that all instructions just take up a single byte in memory; in
|
|
|
74
|
+x86, instructions are variable-sized and would take up from one to a small
|
|
|
75
|
+number of bytes.
|
|
|
76
|
+
|
|
|
77
|
+We can use more detailed tracing to get a better sense of how machine state
|
|
|
78
|
+changes during the execution:
|
|
|
79
|
+
|
|
|
80
|
+```text
|
|
|
81
|
+prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx
|
|
|
82
|
+
|
|
|
83
|
+ 2000 ax bx Thread 0
|
|
|
84
|
+ ? ? ?
|
|
|
85
|
+ ? ? ? 1000 mov 2000, %ax
|
|
|
86
|
+ ? ? ? 1001 add $1, %ax
|
|
|
87
|
+ ? ? ? 1002 mov %ax, 2000
|
|
|
88
|
+ ? ? ? 1003 halt
|
|
|
89
|
+```
|
|
|
90
|
+
|
|
|
91
|
+Oops! Forgot the -c flag (which actually computes the answers for you).
|
|
|
92
|
+
|
|
|
93
|
+```text
|
|
|
94
|
+prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx -c
|
|
|
95
|
+
|
|
|
96
|
+ 2000 ax bx Thread 0
|
|
|
97
|
+ 0 0 0
|
|
|
98
|
+ 0 0 0 1000 mov 2000, %ax
|
|
|
99
|
+ 0 1 0 1001 add $1, %ax
|
|
|
100
|
+ 1 1 0 1002 mov %ax, 2000
|
|
|
101
|
+ 1 1 0 1003 halt
|
|
|
102
|
+```
|
|
|
103
|
+
|
|
|
104
|
+By using the -M flag, we can trace memory locations (a comma-separated list lets
|
|
|
105
|
+you trace more than one, e.g., 2000,3000); by using the -R flag we can track the
|
|
|
106
|
+values inside specific registers.
|
|
|
107
|
+
|
|
|
108
|
+The values on the left show the memory/register contents AFTER the instruction
|
|
|
109
|
+on the right has executed. For example, after the "add" instruction, you can see
|
|
|
110
|
+that %ax has been incremented to the value 1; after the second "mov" instruction
|
|
|
111
|
+(at PC=1002), you can see that the memory contents at 2000 are now also
|
|
|
112
|
+incremented.
|
|
|
113
|
+
|
|
|
114
|
+There are a few more instructions you'll need to know, so let's get to them now.
|
|
|
115
|
+Here is a code snippet of a loop:
|
|
|
116
|
+
|
|
|
117
|
+```assembly
|
|
|
118
|
+.main
|
|
|
119
|
+.top
|
|
|
120
|
+sub $1,%dx
|
|
|
121
|
+test $0,%dx
|
|
|
122
|
+jgte .top
|
|
|
123
|
+halt
|
|
|
124
|
+```
|
|
|
125
|
+
|
|
|
126
|
+A few things have been introduced here. First is the "test" instruction. This
|
|
|
127
|
+instruction takes two arguments and compares them; it then sets implicit
|
|
|
128
|
+"condition codes" (kind of like 1-bit registers) which subsequent instructions
|
|
|
129
|
+can act upon.
|
|
|
130
|
+
|
|
|
131
|
+In this case, the other new instruction is the "jump" instruction (in this case,
|
|
|
132
|
+"jgte" which stands for "jump if greater than or equal to"). This instruction
|
|
|
133
|
+jumps if the second value is greater than or equal to the first in the test.
|
|
|
134
|
+
|
|
|
135
|
+One last point: to really make this code work, dx must be initialized to 1 or
|
|
|
136
|
+greater.
|
|
|
137
|
+
|
|
|
138
|
+Thus, we run the program like this:
|
|
|
139
|
+
|
|
|
140
|
+```text
|
|
|
141
|
+prompt> ./x86.py -p loop.s -t 1 -a dx=3 -R dx -C -c
|
|
|
142
|
+
|
|
|
143
|
+ dx >= > <= < != == Thread 0
|
|
|
144
|
+ 3 0 0 0 0 0 0
|
|
|
145
|
+ 2 0 0 0 0 0 0 1000 sub $1,%dx
|
|
|
146
|
+ 2 1 1 0 0 1 0 1001 test $0,%dx
|
|
|
147
|
+ 2 1 1 0 0 1 0 1002 jgte .top
|
|
|
148
|
+ 1 1 1 0 0 1 0 1000 sub $1,%dx
|
|
|
149
|
+ 1 1 1 0 0 1 0 1001 test $0,%dx
|
|
|
150
|
+ 1 1 1 0 0 1 0 1002 jgte .top
|
|
|
151
|
+ 0 1 1 0 0 1 0 1000 sub $1,%dx
|
|
|
152
|
+ 0 1 0 1 0 0 1 1001 test $0,%dx
|
|
|
153
|
+ 0 1 0 1 0 0 1 1002 jgte .top
|
|
|
154
|
+ 0 1 0 1 0 0 1 1003 halt
|
|
|
155
|
+```
|
|
|
156
|
+
|
|
|
157
|
+The "-R dx" flag traces the value of %dx; the "-C" flag traces the values of the
|
|
|
158
|
+condition codes that get set by a test instruction. Finally, the "-a dx=3" flag
|
|
|
159
|
+sets the %dx register to the value 3 to start with.
|
|
|
160
|
+
|
|
|
161
|
+As you can see from the trace, the "sub" instruction slowly lowers the value of
|
|
|
162
|
+%dx. The first few times "test" is called, only the ">=", ">", and "!="
|
|
|
163
|
+conditions get set. However, the last "test" in the trace finds %dx and 0 to be
|
|
|
164
|
+equal, and thus the subsequent jump does NOT take place, and the program finally
|
|
|
165
|
+halts.
|
|
|
166
|
+
|
|
|
167
|
+Now, finally, we get to a more interesting case, i.e., a race condition with
|
|
|
168
|
+multiple threads. Let's look at the code first:
|
|
|
169
|
+
|
|
|
170
|
+```assembly
|
|
|
171
|
+.main
|
|
|
172
|
+.top
|
|
|
173
|
+# critical section
|
|
|
174
|
+mov 2000, %ax # get the value at the address
|
|
|
175
|
+add $1, %ax # increment it
|
|
|
176
|
+mov %ax, 2000 # store it back
|
|
|
177
|
+
|
|
|
178
|
+# see if we're still looping
|
|
|
179
|
+sub $1, %bx
|
|
|
180
|
+test $0, %bx
|
|
|
181
|
+jgt .top
|
|
|
182
|
+
|
|
|
183
|
+halt
|
|
|
184
|
+```
|
|
|
185
|
+
|
|
|
186
|
+The code has a critical section which loads the value of a variable (at address
|
|
|
187
|
+2000), then adds 1 to the value, then stores it back.
|
|
|
188
|
+
|
|
|
189
|
+The code after just decrements a loop counter (in %bx), tests if it is greater
|
|
|
190
|
+than or equal to zero, and if so, jumps back to the top to the critical section
|
|
|
191
|
+again.
|
|
|
192
|
+
|
|
|
193
|
+```text
|
|
|
194
|
+prompt> ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -c
|
|
|
195
|
+
|
|
|
196
|
+ 2000 bx Thread 0 Thread 1
|
|
|
197
|
+ 0 1
|
|
|
198
|
+ 0 1 1000 mov 2000, %ax
|
|
|
199
|
+ 0 1 1001 add $1, %ax
|
|
|
200
|
+ 1 1 1002 mov %ax, 2000
|
|
|
201
|
+ 1 0 1003 sub $1, %bx
|
|
|
202
|
+ 1 0 1004 test $0, %bx
|
|
|
203
|
+ 1 0 1005 jgt .top
|
|
|
204
|
+ 1 0 1006 halt
|
|
|
205
|
+ 1 1 ----- Halt;Switch ----- ----- Halt;Switch -----
|
|
|
206
|
+ 1 1 1000 mov 2000, %ax
|
|
|
207
|
+ 1 1 1001 add $1, %ax
|
|
|
208
|
+ 2 1 1002 mov %ax, 2000
|
|
|
209
|
+ 2 0 1003 sub $1, %bx
|
|
|
210
|
+ 2 0 1004 test $0, %bx
|
|
|
211
|
+ 2 0 1005 jgt .top
|
|
|
212
|
+ 2 0 1006 halt
|
|
|
213
|
+```
|
|
|
214
|
+
|
|
|
215
|
+Here you can see each thread ran once, and each updated the shared variable at
|
|
|
216
|
+address 2000 once, thus resulting in a count of two there.
|
|
|
217
|
+
|
|
|
218
|
+The "Halt;Switch" line is inserted whenever a thread halts and another thread
|
|
|
219
|
+must be run.
|
|
|
220
|
+
|
|
|
221
|
+One last example: run the same thing above, but with a smaller interrupt
|
|
|
222
|
+frequency. Here is what that will look like:
|
|
|
223
|
+
|
|
|
224
|
+```text
|
|
|
225
|
+[mac Race-Analyze] ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -i 2
|
|
|
226
|
+
|
|
|
227
|
+ 2000 Thread 0 Thread 1
|
|
|
228
|
+ ?
|
|
|
229
|
+ ? 1000 mov 2000, %ax
|
|
|
230
|
+ ? 1001 add $1, %ax
|
|
|
231
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
232
|
+ ? 1000 mov 2000, %ax
|
|
|
233
|
+ ? 1001 add $1, %ax
|
|
|
234
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
235
|
+ ? 1002 mov %ax, 2000
|
|
|
236
|
+ ? 1003 sub $1, %bx
|
|
|
237
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
238
|
+ ? 1002 mov %ax, 2000
|
|
|
239
|
+ ? 1003 sub $1, %bx
|
|
|
240
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
241
|
+ ? 1004 test $0, %bx
|
|
|
242
|
+ ? 1005 jgt .top
|
|
|
243
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
244
|
+ ? 1004 test $0, %bx
|
|
|
245
|
+ ? 1005 jgt .top
|
|
|
246
|
+ ? ------ Interrupt ------ ------ Interrupt ------
|
|
|
247
|
+ ? 1006 halt
|
|
|
248
|
+ ? ----- Halt;Switch ----- ----- Halt;Switch -----
|
|
|
249
|
+ ? 1006 halt
|
|
|
250
|
+```
|
|
|
251
|
+
|
|
|
252
|
+As you can see, each thread is interrupt every 2 instructions, as we specify via
|
|
|
253
|
+the "-i 2" flag. What is the value of memory[2000] throughout this run? What
|
|
|
254
|
+should it have been?
|
|
|
255
|
+
|
|
|
256
|
+Now let's give a little more information on what can be simulated with this
|
|
|
257
|
+program. The full set of registers: %ax, %bx, %cx, %dx, and the PC. In this
|
|
|
258
|
+version, there is no support for a "stack", nor are there call and return
|
|
|
259
|
+instructions.
|
|
|
260
|
+
|
|
|
261
|
+The full set of instructions simulated are:
|
|
|
262
|
+
|
|
|
263
|
+```assembly
|
|
|
264
|
+mov immediate, register # moves immediate value to register
|
|
|
265
|
+mov memory, register # loads from memory into register
|
|
|
266
|
+mov register, register # moves value from one register to other
|
|
|
267
|
+mov register, memory # stores register contents in memory
|
|
|
268
|
+mov immediate, memory # stores immediate value in memory
|
|
|
269
|
+
|
|
|
270
|
+add immediate, register # register = register + immediate
|
|
|
271
|
+add register1, register2 # register2 = register2 + register1
|
|
|
272
|
+sub immediate, register # register = register - immediate
|
|
|
273
|
+sub register1, register2 # register2 = register2 - register1
|
|
|
274
|
+
|
|
|
275
|
+neg register # negates contents of register
|
|
|
276
|
+
|
|
|
277
|
+test immediate, register # compare immediate and register (set condition codes)
|
|
|
278
|
+test register, immediate # same but register and immediate
|
|
|
279
|
+test register, register # same but register and register
|
|
|
280
|
+
|
|
|
281
|
+jne # jump if test'd values are not equal
|
|
|
282
|
+je # ... equal
|
|
|
283
|
+jlt # ... second is less than first
|
|
|
284
|
+jlte # ... less than or equal
|
|
|
285
|
+jgt # ... is greater than
|
|
|
286
|
+jgte # ... greater than or equal
|
|
|
287
|
+
|
|
|
288
|
+push memory or register # push value in memory or from reg onto stack
|
|
|
289
|
+ # stack is defined by sp register
|
|
|
290
|
+pop [register] # pop value off stack (into optional register)
|
|
|
291
|
+call label # call function at label
|
|
|
292
|
+
|
|
|
293
|
+xchg register, memory # atomic exchange:
|
|
|
294
|
+ # put value of register into memory
|
|
|
295
|
+ # return old contents of memory into reg
|
|
|
296
|
+ # do both things atomically
|
|
|
297
|
+
|
|
|
298
|
+yield # switch to the next thread in the runqueue
|
|
|
299
|
+
|
|
|
300
|
+nop # no op
|
|
|
301
|
+```
|
|
|
302
|
+
|
|
|
303
|
+Notes:
|
|
|
304
|
+
|
|
|
305
|
+- 'immediate' is something of the form $number
|
|
|
306
|
+- 'memory' is of the form 'number' or '(reg)' or 'number(reg)' or
|
|
|
307
|
+ 'number(reg,reg)' or 'number(reg,reg,scale)' (as described above)
|
|
|
308
|
+- 'register' is one of %ax, %bx, %cx, %dx
|
|
|
309
|
+
|
|
|
310
|
+Finally, here are the full set of options to the simulator are available with
|
|
|
311
|
+the -h flag:
|
|
|
312
|
+
|
|
|
313
|
+```text
|
|
|
314
|
+Usage: x86.py [options]
|
|
|
315
|
+
|
|
|
316
|
+Options:
|
|
|
317
|
+ -s SEED, --seed=SEED the random seed
|
|
|
318
|
+ -t NUMTHREADS, --threads=NUMTHREADS
|
|
|
319
|
+ number of threads
|
|
|
320
|
+ -p PROGFILE, --program=PROGFILE
|
|
|
321
|
+ source program (in .s)
|
|
|
322
|
+ -i INTFREQ, --interrupt=INTFREQ
|
|
|
323
|
+ interrupt frequency
|
|
|
324
|
+ -P PROCSCHED, --procsched=PROCSCHED
|
|
|
325
|
+ control exactly which thread runs when
|
|
|
326
|
+ -r, --randints if interrupts are random
|
|
|
327
|
+ -a ARGV, --argv=ARGV comma-separated per-thread args (e.g., ax=1,ax=2 sets
|
|
|
328
|
+ thread 0 ax reg to 1 and thread 1 ax reg to 2);
|
|
|
329
|
+ specify multiple regs per thread via colon-separated
|
|
|
330
|
+ list (e.g., ax=1:bx=2,cx=3 sets thread 0 ax and bx and
|
|
|
331
|
+ just cx for thread 1)
|
|
|
332
|
+ -L LOADADDR, --loadaddr=LOADADDR
|
|
|
333
|
+ address where to load code
|
|
|
334
|
+ -m MEMSIZE, --memsize=MEMSIZE
|
|
|
335
|
+ size of address space (KB)
|
|
|
336
|
+ -M MEMTRACE, --memtrace=MEMTRACE
|
|
|
337
|
+ comma-separated list of addrs to trace (e.g.,
|
|
|
338
|
+ 20000,20001)
|
|
|
339
|
+ -R REGTRACE, --regtrace=REGTRACE
|
|
|
340
|
+ comma-separated list of regs to trace (e.g.,
|
|
|
341
|
+ ax,bx,cx,dx)
|
|
|
342
|
+ -C, --cctrace should we trace condition codes
|
|
|
343
|
+ -S, --printstats print some extra stats
|
|
|
344
|
+ -v, --verbose print some extra info
|
|
|
345
|
+ -H HEADERCOUNT, --headercount=HEADERCOUNT
|
|
|
346
|
+ how often to print a row header
|
|
|
347
|
+ -c, --compute compute answers for me
|
|
|
348
|
+```
|
|
|
349
|
+
|
|
|
350
|
+Most are obvious. Usage of -r turns on a random interrupter (from 1 to intfreq
|
|
|
351
|
+as specified by -i), which can make for more fun during homework problems.
|
|
|
352
|
+
|
|
|
353
|
+-P lets you specify exactly which threads run when; e.g., 11000 would run thread
|
|
|
354
|
+ 1 for 2 instructions, then thread 0 for 3, then repeat
|
|
|
355
|
+
|
|
|
356
|
+-L specifies where in the address space to load the code.
|
|
|
357
|
+
|
|
|
358
|
+-m specified the size of the address space (in KB).
|
|
|
359
|
+
|
|
|
360
|
+-S prints some extra stats
|
|
|
361
|
+
|
|
|
362
|
+-c lets you see the values of the traced registers or memory values (otherwise
|
|
|
363
|
+ they show up as question marks)
|
|
|
364
|
+
|
|
|
365
|
+-H lets you specify how often to print a row header (useful for long traces)
|
|
|
366
|
+
|
|
|
367
|
+Now you have the basics in place; read the questions at the end of the chapter
|
|
|
368
|
+to study this race condition and related issues in more depth.
|