This lab will familiarize you with multithreading. You will solve three problems.
Before writing code, you should make sure you have read
In this lab, there are several questions for you to answer. Questions are in boxes with a light orange background. Write each question and its answer in your notebook. Take photo(s) of your questions/answers and submit the photo(s) on Canvas.
The Linux grep command can be helpful on some questions. For example, suppose a question asks you about the struct proc. You can discover the definition and uses of the struct proc by issuing the following Linux grep command in the kernel directory.
% grep sem_post *.c barrier_sem.c:/* 8 */ sem_post(&bstate.barrier_turnstile1); // wakeup one thread barrier_sem.c:/* 10 */ sem_post(&bstate.barrier_mutex); barrier_sem.c:/* 13 */ sem_post(&bstate.barrier_turnstile1); // Wakeup next thread barrier_sem.c:/* 21 */ sem_post(&bstate.barrier_turnstile2); barrier_sem.c:/* 24 */ sem_post(&bstate.barrier_mutex); barrier_sem.c:/* 27 */ sem_post(&bstate.barrier_turnstile2); % grep pthread_cond_wait *.c barrier.c: pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex); barrier_cv.c:/* 7 */ pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex); zemaphore.c: pthread_cond_wait(&z->cond, &z->lock);
In the directory of your xv6-labs, create two files: answers-thread.txt and time.txt that I may use if I run your code using your zip file submission. The answers-thread.txt is a blank file. The time.txt file contains the number of hours you spent on the lab.
$ echo > answers-fs.txt $ echo 8 > time.txt
To start the lab, switch to the thread branch:
$ git fetch $ git checkout thread $ make clean
1. Compare processes and threads.
2. When I have a process with two threads, the OS creates three page tables - one for the process and one for each of the threads. Is this a true statement? Justify your answer.
3. When I have a program with two processes, both of them can share access to a static variable int X;. I this a true statement? Justify your answer.
4. What is a lock?
5. Xv6 often uses locks such as the following. Why does it need to use locks?
acquire(&kmem.lock); r = kmem.freelist; if(r) kmem.freelist = r->next; release(&kmem.lock);
6. The acquire() in the previous is a spinning lock. Create the C code for acquire(). Your code may include a race condition. What hardware fixes the race condition?
7. Explain why two threads cannot concurrently increment a static variable successfully without using locks. Use assembly code in your explanation.
8. What is a sleeping lock?
9. What is a scenario where a spinning lock is better than a sleeping lock?
For this lab, you build your code with the Linux (or host) C compiler toolchain. You can also build and run the code with the MacOS operating system. The starting code is given to you in folder notxv6 to indicate this code is not part of Xv6. You do not use the RISC-V gcc compiler system for these labs. See CPSC 405 Tools for a discussion on the host and target compiler systems.
$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU @ 2.80GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 Stepping: 7 BogoMIPS: 5600.44 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f xsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2ap ic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpci d_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bm i2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512b w avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 32 KiB (1 instance) L1i: 32 KiB (1 instance) L2: 1 MiB (1 instance) L3: 33 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0,1 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Not affected Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Retbleed: Mitigation; Enhanced IBRS Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Syscall hardening, KVM SW loop Srbds: Not affected Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknownusing the "Task Manager" on Windows or "Activity Monitor" on macOS.
10. What is a core? How many cores on your computer?
In this assignment you will explore parallel programming with threads and locks using a hash table. You should do this assignment on a real Linux or MacOS computer (not xv6, not qemu) that has multiple cores. Most recent laptops have multicore processors.
This assignment uses the UNIX pthread threading library. We have explored pthread in class. You can find information about it from the manual page, with man pthreads, and you can look on the web, for example Pthread Create, Mutex Init, and Mutex Lock.
The notxv6 folder means this code is not part of our Xv6 build. Instead, you use your native Linux environment (or MacOS environment) and the host gcc compiler toolchain.
The files notxv6/hash_bad.c and notxv6/hash_good.c contain a simple hash table. hash_bad.c is correct if used from a single thread, but incorrect when used from multiple threads. hash_good.c is correct for both single and m multiple threads. Both of the hash programs are the same except hash_good synchronizes the two treads. Both programs run two benchmarks. First it adds lots of keys to the hash table by calling put(), and prints the achieved rate in puts per second. The it fetches keys from the hash table with get(). It prints the number keys that should have been in the hash table as a result of the puts but are missing (zero is the correct value), and it prints the number of gets per second it achieved.
You should cd into the directory notxv6 for running the hash problem programs. You will use the gcc compiler toolchain for building these programs. There are many ways to generate the executable. The following show several options.
$ gcc hash_bad.c $ ./a.out 1 100000 puts, 4.982 seconds, 20074 puts/second 0: 0 keys missing 100000 gets, 4.947 seconds, 20216 gets/secondThis uses gcc to create the executable a.out, and then executes it with 1 thread.
$ gcc -o hash_bad hash_bad.c $ ./hash_bad 2 100000 puts, 2.597 seconds, 38513 puts/second 1: 155 keys missing 0: 155 keys missing 200000 gets, 5.196 seconds, 38490 gets/secondThis uses gcc to create the executable hash_bad, and then executes it with 2 threads.
$ gcc -o hash_bad hash_bad.c -pthreadThis uses gcc to create the executable hash_bad. When building hash_bad the -pthread option informs the gcc linker to use the pthreads library. Some systems require you to explicitly state when you are using the pthreads library.
In the notxv6 directory (perhaps ~/xv6-labs/notxv6), type this:
$ gcc -o hash_bad hash_bad.c $ gcc -o hash_good hash_good.c $ ./hash_bad 1 100000 puts, 9.582 seconds, 10436 puts/second 0: 0 keys missing 100000 gets, 9.588 seconds, 10430 gets/second $ ./hash_bad 2 100000 puts, 3.218 seconds, 31079 puts/second 1: 15582 keys missing 0: 15582 keys missing 200000 gets, 8.774 seconds, 22795 gets/second $ ./hash_good 2 100000 puts, 3.591 seconds, 27849 puts/second 0: 0 keys missing 1: 0 keys missing 200000 gets, 7.026 seconds, 28466 gets/second $ ./hash_good 8 100000 puts, 3.733 seconds, 26785 puts/second 3: 0 keys missing 2: 0 keys missing 0: 0 keys missing 7: 0 keys missing 6: 0 keys missing 1: 0 keys missing 5: 0 keys missing 4: 0 keys missing 800000 gets, 28.734 seconds, 27842 gets/secondNote that to build the hash programs, you are using your OS's gcc, not the RISC-V tools. Also note the numbers you see may differ from this sample output depending on how fast your computer is, whether it has multiple cores, and whether it's busy doing other things.
The argument to hash_bad specifies the number of threads that execute put and get operations on the the hash table. After running for a little while, hash_bad 1 produces output that shows one thread (which is 0:) and that it has 0 keys missing. hash_bad 2 produces output that shows two threads (0: and 1:) and each thread is missing keys.
The first and last lines of the hash programs' output indicates the number of puts per second and the number of gets per second. Notice how two threads achieve has higher rate of puts and gets per second. This is because each thread runs on its own core, resulting ia a parallel speedup. If you examine the output of ./hash_good 8 (which uses 8 threads), you will see the rate of puts/gets per second is about the same as 2 threads. This is because the CPSC Server has two cores. Thus the 8 threads had to be multiplexed between the two cores.
For hash_bad 2, there are two lines saying 15582 keys missing. That means the puts were supposed to add those keys to the hash table, but something went wrong. Have a look at notxv6/hash_bad.c, particularly at put() and insert().
To avoid this sequence of events, we must insert lock and unlock statements in put and get in notxv6/hash_bad.c so that the number of keys missing is always 0 with two threads. The relevant pthread calls are:
pthread_mutex_t lock; // declare a lock pthread_mutex_init(&lock, NULL); // initialize the lock pthread_mutex_lock(&lock); // acquire lock pthread_mutex_unlock(&lock); // release lockThis has been done for you in hash_good.c
The solution in hash_good.c executes the put operations in parallel while maintaining correctness.
The solution in hash_good.c uses a mutux lock. When using the pthreads mutex locks, don't forget to first define your mutex and then call pthread_mutex_init() to initialize your mutex before using it. Notice how the functions init, lock, and unlock have an address of the mutex as their argument. to the single-threaded version?
There are situations where concurrent put()s have no overlap in the memory they read or write in the hash table, and thus don't need a lock to protect against each other. The solution in hash_good.c uses a mutex lock lock per hash bucket.
In this assignment you'll examine two solutions to a barrier problem. A barrier is a point in an application at which all participating threads must wait until all other participating threads reach that point too. The first solution uses pthread condition variables, which are a sequence coordination technique similar to xv6's sleep and wakeup.
The solutions are provided in the files notxv6/barrier_cv.c and not xv6/barrier_sem.c Notice these programs execute on a real computer - not xv6, not qemu.
You should cd into the directory notxv6 for running the hash problem programs. You will use the gcc compiler toolchain for building these programs. There are many ways to generate the executable. The following show several options. See the Parallel Hash Problem section for ways to build executable programs.
$ gcc -o barrier_cv barrier_cv.c -pthread $ gcc -o barrier_sem barrier_sem.c -pthread $ ./barrier_cv 2 OK; passed $ ./barrier_sem 2 OK; passedThe argument 2 for the barrier programs specifies the number of threads that synchronize on the barrier (the variable nthread in barrier_cv.c and barrier_sem.c). Each thread executes a loop. In each loop iteration a thread calls barrier() and then sleeps for a random number of microseconds. The assert statement in the loop will trigger if one thread leaves the barrier before the other thread has reached the barrier. The desired behavior is that each thread blocks in barrier() until all nthreads of them have called barrier().
There are two issues that complicate the solution
To achieve the desired barrier behavior using condition variables, the solution uses the lock primitives that you have seen in the hash assignment. The solution also uses the following new condition variable primitives.
pthread_cond_wait(&cond, &mutex); // go to sleep on cond, releasing lock mutex, acquiring upon wake up pthread_cond_broadcast(&cond); // wake up every thread sleeping on cond
Condition variables an interesting semantics with the relationship between the mutex and the condition variable. For example, pthread_cond_wait releases the mutex when called, and re-acquires the mutex before returning. Study OSTEP Chapter 30 - Condition Variables for more details.
To achieve the desired barrier behavior using semaphores, the solution uses two turnstiles, which are describe in the Barrier Section 3.6 of The Little Book of Semaphores. In addition to the lock primitives that you have seen in the hash assignment. You will also need the following new semaphore primitives.
pthread_cond_wait(&cond, &mutex); // go to sleep on cond, releasing lock mutex, acquiring upon wake up pthread_cond_broadcast(&cond); // wake up every thread sleeping on cond
Explanation of $ ./barrier_sem 2 terminating with an error where for loop in the function thread() stops with an assertion error with i not equal to bstate.round.
Suppose we have three threads - TA, TB, and TC. After line 15 (labeled Critical point) of the function barrier(), TA and TB go through the critical region (lines 17 to 24), and sleep on trunstile2 at the sem_wait on line 26. TC wakes up either TA or TB with sem_post on line 21. We'll say TA wakes up. At this point we have TA and TC running (or runnable). TC will sleep on turnstile2 at the sem_wait on line 26. The goal is for TA to wake up TB with the sem_post on line 27 and then for TB to wake up TC with the sem_post on line 27. A call to sem_post, increments the semaphore and wakes up a thread, if one is sleeping. The problem is TA can awake TB, which can call sem_post before TC has called sem_wait on line 26. In this scenario, we get the error.
$ ./barrier_cv 5
/* 11 */ pthread_cond_broadcast(&bstate.barrier_cond);
I used semaphores in my solution, which I thought was a simpler solution; however, as discussed above condition variables was a simpler solution for the barrier problem.
For this problem, you will design and implement a thread-safe message queue data structure and an API that provides the ability for threads to produce/consume messages in a first-in-first-out order. The following is the API you must design and implement.
The returned pointer is used in the other API functions.
You must design and implement struct msgq, which must involve dynamically allocated memory. My solution includes a linked list of messages, where send_msg mallocs memory for the data structure and places it on the end of the linked list. My struct msgq has a linked list of struct msgs, which has has a char *msg pointer to the message and a struct msgs *next to implement the linked list. The heap memory for the message is allocated using strdup. recv_msg removes the head from the linked list and returns the char* pointer to the message. My struct msgq also has members for the semaphores and number of messages.
MacOS does not support unnamed semaphores that are provided by semaphore.h and the -pthread compiler option. If you are developing this program on MacOS, I have provided two files: zemaphore.h and zemaphore.c that you can use. This zemaphore libary implements semaphores using condition variables. You should examine the code in zemaphore.c to see how condition variables can be used to implement semaphores.
OSTEP Chapter 31 provides the same (or similar) code. The following is how I include semaphore.h in my msgq.c and main.c files on MacOS.
#include "zemaphore.h"; #include "msgq.h";
You are welcome to make this fancier by using conditional #ifdefs style programming. In this style of programming, you can create code that works on both MacOS and Linux. The following demonstrates this concept.
#ifdef __linux__ #include#elif __APPLE__ #include "zemaphore.h" #endif
Testing your thread-safe message queue is not done with Makefile. Instead, you build and run your program using mainmsgq.c. The code provided in mainmsgq.c performs five tests. Once you have finished your implementation, you can run the tests as shown below.
$ ./msgq 1 test fill and empty msgq Send? y sending: msg1 Send? y sending: msg2 Send? y sending: hellomsg Send? y sending: gustymsg msgq_show() after filling for test 1: msg1 msg2 hellomsg gustymsg mq msg_count: 4 recvMsgs: msg1 recvMsgs: msg2 recvMsgs: hellomsg recvMsgs: gustymsg msgq_show() after all consumed by test 1: $ ./msgq 2 test fill msgs and pass it on Send? y sending: msg1 Send? y sending: msg2 Send? y sending: hellomsg Send? y sending: gustymsg msgq_show() after filling for test 2: msg1 msg2 hellomsg gustymsg passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c148000fa0 msg1 passiton1 after recv msgq_len: 3 passiton2 initial msgq_len: 4 passiton1 after send msgq_len: 4 passiton2: 0x77c14effde10 0x77c148000fe0 msg2 passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c148001020 hellomsg passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c148001060 gustymsg passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c148000f80 msg1 passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c148000fc0 msg2 passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c148001000 hellomsg passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c148001040 gustymsg passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c148001080 msg1 passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c140000b70 msg2 passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c1480010a0 hellomsg passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c140000b90 gustymsg passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c1480010c0 msg1 passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c140000bb0 msg2 passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 passiton1 initial msgq_len: 4 passiton1: 0x77c14f7fee10 0x77c1480010e0 hellomsg passiton1 after recv msgq_len: 3 passiton1 after send msgq_len: 4 passiton2 initial msgq_len: 4 passiton2: 0x77c14effde10 0x77c140000bd0 gustymsg passiton2 after recv msgq_len: 3 passiton2 after send msgq_len: 4 ^C $ ./msgq 3 test send blocks on full msgq final message will block and be sent last sending Message 1 sending Message 2 sending Message 3 sending Message 4 sending Message 5 with queue of 4 - blocks received Message 1 received Message 2 received Message 3 received Message 3 received Message 5 with queue of 4 sent Message 5 with queue of 4 ecooper@cpsc:~/gustyx$ ./msgq 4 test recv blocks on empty msgq Starting recv... waiting 10 seconds for msgs sending Message 1 sending Message 2 sending Message 3 sending Message 4 sending Message 5 with queue of 4 - blocks received Message 1 received Message 2 received Message 3 received Message 3 received Message 5 with queue of 4 sent Message 5 with queue of 4 $ ./msgq 5 test 2 producers 3 consumers 5 finished sending 4 finished sending wait for consumers to finish consumer 1 msg tid:4, mnum:0 msg tid:5, mnum:3 msg tid:5, mnum:10 msg tid:5, mnum:11 msg tid:5, mnum:14 msg tid:4, mnum:5 msg tid:5, mnum:15 msg tid:4, mnum:7 msg tid:5, mnum:19 msg tid:4, mnum:9 msg tid:4, mnum:10 msg tid:4, mnum:11 msg tid:5, mnum:22 msg tid:5, mnum:25 msg tid:4, mnum:14 msg tid:4, mnum:15 msg tid:4, mnum:16 msg tid:5, mnum:30 msg tid:4, mnum:18 msg tid:5, mnum:31 msg tid:4, mnum:20 msg tid:4, mnum:21 msg tid:5, mnum:35 msg tid:4, mnum:23 msg tid:4, mnum:24 msg tid:4, mnum:25 msg tid:5, mnum:40 msg tid:4, mnum:27 msg tid:5, mnum:41 msg tid:4, mnum:31 msg tid:4, mnum:32 msg tid:5, mnum:44 msg tid:5, mnum:45 msg tid:4, mnum:36 msg tid:5, mnum:47 msg tid:5, mnum:49 msg tid:4, mnum:40 msg tid:4, mnum:41 msg tid:4, mnum:42 msg tid:4, mnum:44 msg tid:4, mnum:45 msg tid:4, mnum:46 msg tid:4, mnum:47 msg tid:4, mnum:48 msg tid:4, mnum:49 consumer 2 msg tid:5, mnum:0 msg tid:5, mnum:1 msg tid:5, mnum:2 msg tid:5, mnum:4 msg tid:5, mnum:6 msg tid:5, mnum:7 msg tid:4, mnum:3 msg tid:4, mnum:6 msg tid:4, mnum:8 msg tid:5, mnum:17 msg tid:5, mnum:18 msg tid:5, mnum:21 msg tid:4, mnum:13 msg tid:5, mnum:23 msg tid:5, mnum:24 msg tid:5, mnum:27 msg tid:4, mnum:19 msg tid:5, mnum:32 msg tid:5, mnum:36 msg tid:4, mnum:26 msg tid:5, mnum:38 msg tid:5, mnum:39 msg tid:4, mnum:30 msg tid:5, mnum:46 msg tid:4, mnum:34 msg tid:4, mnum:35 msg tid:4, mnum:43 consumer 3 msg tid:5, mnum:5 msg tid:4, mnum:1 msg tid:5, mnum:8 msg tid:5, mnum:9 msg tid:4, mnum:2 msg tid:5, mnum:12 msg tid:4, mnum:4 msg tid:5, mnum:13 msg tid:5, mnum:16 msg tid:5, mnum:20 msg tid:4, mnum:12 msg tid:5, mnum:26 msg tid:4, mnum:17 msg tid:5, mnum:28 msg tid:5, mnum:29 msg tid:4, mnum:22 msg tid:5, mnum:33 msg tid:5, mnum:34 msg tid:5, mnum:37 msg tid:4, mnum:28 msg tid:5, mnum:42 msg tid:4, mnum:29 msg tid:5, mnum:43 msg tid:4, mnum:33 msg tid:5, mnum:48 msg tid:4, mnum:37 msg tid:4, mnum:38 msg tid:4, mnum:39