This lab will familiarize you with xv6 and its system calls.
You will learn how to create Unix/Linux utility programs using the
Unix/Linux API. The Xv6 API is a simplfied Linux API.
For example, the Xv6 API has a single exec
function call to execute a new program within a process, whereas, Linux has a family of exec calls (ie., execp
, execl
, execvp
, execlp
).
You will create several Unix/Linux utility programs. All of the programs have a starting .c program that you may have to edit. For example, there is a kittycat.c program. You must design the code, edit the program, and test that it satisfies the requirements. However, I have provided the solution code to many of the problems. For those problems, you should study the provided solution code so that you understand it. Then your editing will consist of placing the solution code into the appropriate .c file.
Before you begin lab system calls, read Chapter 1 of the xv6 book.
In this lab, there are several questions for you to answer. Questions are in boxes with a light orange background. Write each question and its answer in your notebook. Take photo(s) of your questions/answers and submit the photo(s) on Canvas.
The Linux grep command can be helpful on C programming using Linux with a terminal and on some questions. For example, suppose a question asks you about the struct proc. You can discover the definition and uses of the struct proc by issuing the following Linux grep command in the kernel directory.
$ grep "struct proc" *.h proc.h:struct proc { ... lots of matches proc.h:struct proc *find_proc(int pid); proc.h:bool enqueue_proc(struct proc *p); % grep "struct proc" *.c main.c: struct proc *p = NULL; ... lots of matches proc.c: struct proc *p = kernel_proc; proc.c: struct proc *p = new_proc(name, priority, ppid);
In the directory of your xv6-labs, create two files: answers-util.txt and time.txt that I may use if I run your code using your zip file submission. The answers-util.txt is a blank file. The time.txt file contains the number of hours you spent on the lab.
$ echo > answers-util.txt $ echo 8 > time.txt
When you make Xv6, you are building a system that includes the Xv6 kernel and its utility programs. The utility programs are placed on a disk image such that when Xv6 bootstraps, the files are available to the Xv6 kernel, the Xv6 shell, and the Xv6 utility programs. The Xv6 system bootstraps into the qemu emulator, emulating a RISC-V CPU, a timeer interrupt, a terminal device, and a disk/SSD device.
Utility programs are added to the Xv6 system by creating a .c program in the user directory and adding the name of the program to the UPROGS variable in the Makefile. For example, when adding the kittycat utility program, you place the program's file (kittycat.c) in the user directory and add $U/_kittycat\ to the UPROGS variable in the Makefile. When you have done this, make qemu will compile your program, include it in the resulting system, and you will be able to run your program from the Xv6 shell.
For this lab, I have provided a Makefile where the variable UPROGS has all of the lab's utility programs. In future labs, if you add a new utility program, you will have to remember to update the variable UPROGS
You will notice that the hints have words such as "Ensure your kittycat program has been added UPROGS in Makefile." This has already been done, but you should examine the Makefile to see how it has been done.
You can do these labs on our CPSC server or on your own computer. If you use your own computer, have a look at the lab tools page for setup tips.
1. What does it mean to boot (which is short for bootstrap) a computer?
Fetch the git repository for the xv6 source for the lab:
$ git clone https://github.com/gustycooper/xv6-labs.git Cloning into 'xv6-labs'... ... $ cd xv6-labs
The repo is setup so that git checkouts the riscv branch when cloning the repo. You must switch to the util branch to conduct this lab.
$ git status On branch riscv Your branch is up to date with 'origin/riscv'. nothing to commit, working tree clean $ git checkout util
The xv6-labs repository differs slightly from the book's xv6-riscv; it mostly adds some files. If you are curious look at the git log:
$ git log
The files you will need for this and subsequent lab assignments are distributed using the Git version control system. For each of the labs you will checkout (git checkout util) a version of xv6 tailored for that lab. To learn more about Git, take a look at the Git user's manual, or, you may find this CS-oriented overview of Git useful. Git allows you to keep track of the changes you make to the code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:
$ git commit -am 'my solution for util lab exercise 1' Created commit 60d2135: my solution for util lab exercise 1 1 files changed, 1 insertions(+), 0 deletions(-) $
You can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff origin/util will display the changes relative to the initial util code. Here, origin/util is the name of the git branch with the initial code you downloaded for the class.
Build and run xv6:
2. The Makefile has many targets. Locate the target qemu for the command make qemu and explain what it does.
$ make qemu riscv64-unknown-elf-gcc -c -o kernel/entry.o kernel/entry.S riscv64-unknown-elf-gcc -Wall -Werror -O -fno-omit-frame-pointer -ggdb -DSOL_UTIL -MD -mcmodel=medany -ffreestanding -fno-common -nostdlib -mno-relax -I. -fno-stack-protector -fno-pie -no-pie -c -o kernel/start.o kernel/start.c ... riscv64-unknown-elf-ld -z max-page-size=4096 -N -e main -Ttext 0 -o user/_zombie user/zombie.o user/ulib.o user/usys.o user/printf.o user/umalloc.o riscv64-unknown-elf-objdump -S user/_zombie > user/zombie.asm riscv64-unknown-elf-objdump -t user/_zombie | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$/d' > user/zombie.sym mkfs/mkfs fs.img README user/xargstest.sh user/_cat user/_echo user/_forktest user/_grep user/_init user/_kill user/_ln user/_ls user/_mkdir user/_rm user/_sh user/_stressfs user/_usertests user/_grind user/_wc user/_zombie nmeta 46 (boot, super, log blocks 30 inode blocks 13, bitmap blocks 1) blocks 954 total 1000 balloc: first 591 blocks have been allocated balloc: write bitmap block at sector 45 qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0 xv6 kernel is booting hart 2 starting hart 1 starting init: starting sh $
If you type ls at the prompt, you should see output similar to the following:
$ ls . 1 1 1024 .. 1 1 1024 README 2 2 508 kittycat.txt 2 3 597 xargstest.sh 2 4 93 cat 2 5 32848 echo 2 6 31696 forktest 2 7 15832 grep 2 8 36224 init 2 9 32200 kill 2 10 31664 ln 2 11 31480 ls 2 12 34792 mkdir 2 13 31712 rm 2 14 31704 sh 2 15 54144 stressfs 2 16 32592 usertests 2 17 180488 grind 2 18 47536 wc 2 19 33800 zombie 2 20 31064 kittycat 2 21 32864 sleep 2 22 31208 pingpong 2 23 31224 primes 2 24 31208 find 2 25 31208 xargs 2 26 31208 console 3 27 0These are the files that mkfs includes in the initial file system; most are programs you can run. You just ran one of them: ls. Several of the files are templates that you must design, edit, and test as part of this lab. Here are a couple more commands for you to enter.
$ echo kittycat kittycat $ cat kittycat.txt This is the text in the file kittycat. Soft kitty, warm kitty, little ball of fur! Happy kitty, sleepy kitty, purr purr purr! Xv6 kitty, Xv6 kitty, grrr, grrr, grrr! And the cat's in the cradle and the silver spoon Little boy blue and the man in the moon "When you coming home, dad?" "I don't know when" But we'll get together then You know we'll have a good time then A long tailed cat sitting by the old rocking chair Now he don't realize that there's a danger there But he don't care no rock 'n' roll chair is gonna boogie on his day 'Cause when his tail took a low down, syncopate, yeah
xv6 has no ps command, but, if you type Ctrl-p, the kernel will print information about each process. If you try it now, you'll see two lines: one for init, and one for sh.
To quit qemu type: Ctrl-a x (press Ctrl and a at the same time, followed by x).
Implement the UNIX program kittycat for xv6; your kittycat mimics the UNIX cat program, but it only concatenates the file kittycat.txt to standard output. Invoking kittycat with an argument results in a error. See shell commands below for sample kittycat sessions. Your solution should be in the file user/kittycat.c.
Run the program from the xv6 shell:
$ make qemu ... init: starting sh $ kittycat This is the text in the file kittycat. Soft kitty, warm kitty, little ball of fur! Happy kitty, sleepy kitty, purr purr purr! Xv6 kitty, Xv6 kitty, grrr, grrr, grrr! And the cat's in the cradle and the silver spoon Little boy blue and the man in the moon "When you coming home, dad?" "I don't know when" But we'll get together then You know we'll have a good time then A long tailed cat sitting by the old rocking chair Now he don't realize that there's a danger there But he don't care no rock 'n' roll chair is gonna boogie on his day 'Cause when his tail took a low down, syncopate, yeah $ kittycat gusty kittycat: No args allowed
3. What are three songs and artists in the kittycat printout? For the "Soft kitty" song, instead of an artist, which in which TV show was it used and by which character?
After you have implemented your kittycat.c, you must test it, which in this case is rather easy. Here is what I would do to test kittycat.c
Implement the UNIX program sleep for xv6; your sleep should pause for a user-specified number of ticks. A tick is a notion of time defined by the xv6 kernel, namely the time between two interrupts from the timer chip. Your solution should be in the file user/sleep.c. When running sleep, you must pass an integer argument that is the number of ticks to sleep. If an argument is not pass, your sleep must print an error message.
$ sleep 10 $ sleep Usage: sleep <ticks>
#include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" int main(int argc, char *argv[]) { if(argc < 2){ fprintf(2, "Usage: sleep\n"); exit(1); } sleep(atoi(argv[1])); exit(0); }
4. What is the type of argv[1] and why is it an appropriate argument for atoi()?
Run the program from the xv6 shell:
$ make qemu ... init: starting sh $ sleep 10 (nothing happens for a little while) $
5. Use the process state diagram to explain how you think the OS implements a sleeping process.
#include "kernel/types.h" #include "user/user.h" int main() { int status; int pid = fork(); if(pid == 0){ char *argv[] = { "echo", "THIS", "IS", "ECHO", 0 }; exec("echo", argv); printf("exec failed!\n"); exit(1); } else { printf("parent waiting\n"); wait(&status); printf("the child exited with status %d\n", status); } exit(0); }
int main(int argv, char **argv)If the argv parameter is 2, then you will pass argv[1] as the second element in the second argument passed to exec()
10 20 30
#include "kernel/types.h" #include "kernel/fcntl.h" #include "user/user.h" int main(int argc, char **argv) { char buf[3]; int fd; if (argc == 2) fd = open(argv[1], O_RDONLY); else fd = open("nums.txt", O_RDONLY); if (fd < 0) { printf("file not found\n"); exit(1); } printf("Compute sum of 2 digit numbers in a file.\n"); int n, sum = 0, count = 1; while ((n = read(fd, buf, 3)) == 3) { printf("----------\nNumber%d: ", count); write(1, buf, n); printf("buf[0]: 0x%x, buf[1]: 0x%x, buf[2]: 0x%x\n", buf[0], buf[1], buf[2]); printf("buf[0]: %c, buf[1]: %c, buf[2]: %c", buf[0], buf[1], buf[2]); buf[2] = 0; int num = atoi(buf); printf("num: %d\n", num); sum += num; count++; } printf("----------\nsum: %d\n", sum); exit(0); }
0x31 0x30 0xA 0x32 0x30 0xA 0x33 0x30 0xA0x30, 0x31, 0x32, and 0x33 are the hex values for ASCII 0, 1, 2, and 3. And 0xA is the hex value for an ASCII Carriage Return (CR). The first three bytes of the file are ASCII 1, ASCII 0, and ASCII CR, which is 10\n.
You can test your numbers program by running it as follows.
$ numbers Compute sum of 2 digit numbers in a file. ---------- Number1: 10 buf[0]: 0x31, buf[1]: 0x30, buf[2]: 0xA buf[0]: 1, buf[1]: 0, buf[2]: num: 10 ---------- Number2: 20 buf[0]: 0x32, buf[1]: 0x30, buf[2]: 0xA buf[0]: 2, buf[1]: 0, buf[2]: num: 20 ---------- Number3: 30 buf[0]: 0x33, buf[1]: 0x30, buf[2]: 0xA buf[0]: 3, buf[1]: 0, buf[2]: num: 30 ---------- sum: 60 $ numbers not file not found $ numbers kittycat.txt Compute sum of 2 digit numbers in a file. ---------- Number1: Thibuf[0]: 0x54, buf[1]: 0x68, buf[2]: 0x69 buf[0]: T, buf[1]: h, buf[2]: inum: 0 ... lots printed to display ... ---------- Number199: ah buf[0]: 0x61, buf[1]: 0x68, buf[2]: 0xA buf[0]: a, buf[1]: h, buf[2]: num: 0 ---------- sum: 6
Write a program, pingpoing that uses UNIX system calls to ''ping-pong'' the arguments passed to main() between a parent and child process over a pair of pipes, one for each direction.
// Sample program: send/receive a byte #include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" int main(int argc, char *argv[]) { int p1[2], p2[2]; pipe(p1); // create pipe p1 pipe(p2); // create pipe p2 int pid = fork(); if(pid == 0){ close(p1[1]); char buf[1]; read(p1[0], buf, sizeof(buf)); printf("%d: received ping\n", getpid()); close(p1[0]); close(p2[0]); write(p2[1], "o", 1); close(p2[1]); } else if(pid > 0){ close(p1[0]); write(p1[1], "i", 1); close(p1[1]); wait((int*) 0); char buff[1]; read(p2[0], buff, sizeof(buff)); printf("%d: received pong\n", getpid()); close(p2[0]); } else{ fprintf(2, "fork error!\n"); exit(1); } exit(0); }
If you first run the sample program, it should produce the following output:
$ make qemu ... init: starting sh $ pingpong 4: received ping 3: received pong $
Your solution is correct if you can get it to produce output like the following.
$ make qemu ... init: starting sh $ pingpong gusty cooper 4: received gusty 3: received cooper $
6. Explain in your own words what a Xv6/Linux pipe is - not a pipe used for smoking.
7. Notice how the I/O functions read() and write() are used for reading/writing both files and pipes. The open() function and the pipe() function both return a file descriptor. Explain in your own words what is the differenct between reading/writing pipes and files.
Write a concurrent version of prime sieve using pipes. This idea is due to Doug McIlroy, inventor of Unix pipes. The picture halfway down this page and the surrounding text explain how to do it. Your solution should be in the file user/primes.c.
Your goal is to use pipe and fork to set up the pipeline. The first process feeds the numbers 2 through 35 into the pipeline. For each prime number, you will arrange to create one process that reads from its left neighbor over a pipe and writes to its right neighbor over another pipe. Since xv6 has limited number of file descriptors and processes, the first process can stop at 35.
#include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" void primes(int lpipe[2]) { close(lpipe[1]); int first, num; int n = read(lpipe[0], &first, sizeof(int)); if(n != sizeof(int)) exit(0); printf("prime %d\n", first); int rpipe[2]; pipe(rpipe); while(read(lpipe[0], &num, sizeof(int))){ if(num % first != 0) write(rpipe[1], &num, sizeof(int)); } close(lpipe[0]); close(rpipe[1]); if(fork() == 0){ primes(rpipe); } else { close(rpipe[0]); wait(0); } exit(0); } int main(int argc, char *argv[]) { int p[2]; pipe(p); for (int i=2; i<=35; i++) { write(p[1], &i, sizeof(int)); } if(fork()==0){ primes(p); } else { close(p[0]); close(p[1]); wait(0); } exit(0); }
8. There is a lot going on in this solution code. One way to better understand it is to compare the picture halfway down this page to the solution code. Each box in the picture corresponds to process (created with fork()) in the solution code. The process is reading numbers from the left pipe and writing numbers to the right pipe. For example, the first primes() process reads the numbers 2 through 35. It prints the number two. Removes numbers even divisible by two and writes the remaining numbers to the right pipe. How many processes are created for this solution?
Your solution is correct if it implements a pipe-based sieve and produces the following output:
$ make qemu ... init: starting sh $ primes prime 2 prime 3 prime 5 prime 7 prime 11 prime 13 prime 17 prime 19 prime 23 prime 29 prime 31 $
9. Do you think there is a largest prime number such that it is the last prime number? What are the number of digits in the largest prime number discovers so far? When was it discovered?
Write a simple version of the UNIX find program: find all the files
in a directory tree with a specific name. Your solution
should be in the file user/find.c.
The find program has two arguments - the path in which to begin searching
and the filename for which to search. For example,
$ find . b
searches all subdirectories of the current directory for the filenamed b.
#include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" #include "kernel/fs.h" #include "kernel/fcntl.h" void find(char *path, char *name) { char buf[512], *p; int fd; struct dirent de; struct stat st; if((fd = open(path, O_RDONLY)) < 0){ fprintf(2, "find: cannot open %s\n", path); return; } if(fstat(fd, &st) < 0){ fprintf(2, "find: cannot stat %s\n", path); close(fd); return; } if(st.type != T_DIR){ fprintf(2, "find: first param must be directory\n"); close(fd); return; } strcpy(buf, path); p = buf + strlen(buf); *p++ = '/'; while(read(fd, &de, sizeof(de)) == sizeof(de)){ if(de.inum == 0) continue; memmove(p, de.name, DIRSIZ); p[DIRSIZ] = 0; if(stat(buf, &st) < 0){ printf("find: cannot stat %s\n", buf); continue; } // core code if(st.type==T_DIR && strcmp(p, ".")!=0 && strcmp(p, "..")!=0){ find(buf, name); }else if(st.type==T_FILE && strcmp(p, name)==0){ printf("%s\n", buf); } } close(fd); } int main(int argc, char *argv[]) { if(argc != 3){ fprintf(2, "Usage: find <dir> <file>\n"); exit(1); } find(argv[1], argv[2]); exit(0); }
10. What is the condition that causes find() to make its recursive call?
$ ls . # list the files in the current working directory $ ls .. # list the files in the parent directoryNotice how . and .. are files in the current working directory that have type T_DIR (type directory). Notice how the algorithm does not recurse into the . and .. files.
11. What would happen if the algorithm recursed into the . file?
Your solution is correct if produces the following output (when the file system contains the files b, a/b and a/aa/b).
$ make qemu ... init: starting sh $ echo > b $ mkdir a $ echo > a/b $ mkdir a/aa $ echo > a/aa/b $ find . b ./b ./a/b ./a/aa/b $
12. What does the Linux command $ echo > b do?
Write a simple version of the UNIX xargs program: its arguments describe a command to run, it reads lines from the standard input, and it runs the command for each line, appending the line to the command's arguments. Your solution should be in the file user/xargs.c.
$ echo hello too | xargs echo bye bye hello too $Note that the command here is "echo bye" and the additional arguments are "hello too", making the command "echo bye hello too", which outputs "bye hello too".
Please note that xargs on UNIX makes an optimization where it will feed more than argument to the command at a time. We don't expect you to make this optimization. To make xargs on UNIX behave the way we want it to for this lab, please run it with the -n option set to 1. For instance
$ (echo 1 ; echo 2) | xargs -n 1 echo 1 2 $
$ echo hello > b [1] $ cat b [2] hello $ echo 12README > a [3] $ echo b >> a [4] $ cat a [5] b README $ ls a b README [6] a 2 28 9 b 2 27 6 README 2 2 2227 $ lss a b README [7] a b README $ lss b README [8] b README $ wc b [9] 1 1 6 b $ wc README [10] 47 316 2227 README $ cat a | xargs wc [11] 1 1 6 b 47 316 2227 README $ lss b README | xargs wc [12] 1 1 6 b 47 316 2227 README
$ wc b $ wc README
$ mkdir a [1] $ echo hello > a/b [2] $ mkdir c [3] $ echo hello > c/b [4] $ echo hello > b [5] $ find . b [6] ./b ./a/b ./c/b $ grep hello b [7] hello $ grep hello ./b [8] hello $ grep hello a/b [9] hello $ grep hello ./a/b [10] hello $ find . b | xargs grep hello [11] hello hello hello
$ grep hello ./b $ grep hello ./a/b $ grep hello ./c/b
xargs is the parent process and it will use fork and exec to invoke the command on each line of input. The exec-ed command is the child process. xargs will use wait to wait for the child to complete the command.
Process args passed to xargs Make sure it has at least one argument. For example, $ xargs wc Create a char* array (e.g. xargv) for arguments to the exec call Put the first arg passed to xargs in xargv[0] The read loop will put values in xargv[1], xargv[2], ... READLOOP: Read 1 character from standard input If the read returns 0, you exit READLOOP If the character read is \n if (fork() == 0) call exec with xargv else wait else if the character read is a space move to next element of xargv else concatenate character onto string to put into xargv GOTO READLOOP
#include "kernel/types.h" #include "kernel/param.h" #include "user/user.h" void clear(char *xargv[MAXARG], int start){ for(int i=start; i<MAXARG; i++){ xargv[i] = 0; } } int main(int argc, char *argv[]) { if(argc < 2){ fprintf(2, "Usage: xarg <cmd>\n"); exit(1); } if(argc > MAXARG){ fprintf(2, "xargs: too many arguments\n"); exit(1); } char *xargv[MAXARG] = {0}; int xargc = 0; for(; xargc+1 < argc; xargc++){ xargv[xargc] = argv[xargc+1]; } char buf[1024], c; int p = 0, start = 0; while(read(0, &c, 1)){ switch (c){ case ' ': buf[p] = 0; xargv[xargc++] = &buf[start]; start = p; break; case '\n': buf[p] = 0; xargv[xargc++] = &buf[start]; if(fork() == 0){ exec(argv[1], xargv); } else { wait(0); } xargc = argc - 1; clear(xargv, xargc); p = 0; start = 0; break; default: buf[p++] = c; break; } } exit(0); }
To test your solution for xargs, run the seqence of commands provided in the hints section. The shell script xargstest.sh has been provided. Its contents are the following.
mkdir a echo hello > a/b mkdir c echo hello > c/b echo hello > b find . b | xargs grep hello
The program sh is the shell. You can use input redirection to redirect the contents xargstest.sh to the shell as shown next. You output should match this.
$ make qemu ... init: starting sh $ sh < xargstest.sh $ $ $ $ $ $ hello hello hello $ $You may have to go back and fix bugs in your find program. The output has many $ because the xv6 shell doesn't realize it is processing commands from a file instead of from the console, and prints a $ for each command in the file.
13. The testing has a command $ sh < xargstest.sh. What does this command do? What is the sh program?
14. Perform the following.
15. What I find amazing is that the basic Unix/Linux API has not changed in 60 years. Now that you have written programs with pipes, forks, execs, and I/O redirection, what do you think about the Linux API.
16. Without looking up functions, list as many Linux API functions that you can think of.
>
This completes the lab.
Read Lab Submissions for instructions on how
to submit your lab.
Submit the lab