Lab: Xv6 and Unix utilities

This lab will familiarize you with xv6 and its system calls. You will learn how to create Unix/Linux utility programs using the Unix/Linux API. The Xv6 API is a simplfied Linux API. For example, the Xv6 API has a single exec function call to execute a new program within a process, whereas, Linux has a family of exec calls (ie., execp, execl, execvp, execlp).

You will create several Unix/Linux utility programs. All of the programs have a starting .c program that you may have to edit. For example, there is a kittycat.c program. You must design the code, edit the program, and test that it satisfies the requirements. However, I have provided the solution code to many of the problems. For those problems, you should study the provided solution code so that you understand it. Then your editing will consist of placing the solution code into the appropriate .c file.

Before you begin lab system calls, read Chapter 1 of the xv6 book.

In this lab, there are several questions for you to answer. Questions are in boxes with a light orange background. Write each question and its answer in your notebook. Take photo(s) of your questions/answers and submit the photo(s) on Canvas.

The Linux grep command can be helpful on C programming using Linux with a terminal and on some questions. For example, suppose a question asks you about the struct proc. You can discover the definition and uses of the struct proc by issuing the following Linux grep command in the kernel directory.

$ grep "struct proc" *.h
proc.h:struct proc {
... lots of matches
proc.h:struct proc *find_proc(int pid);
proc.h:bool enqueue_proc(struct proc *p);
% grep "struct proc" *.c
main.c:        struct proc *p = NULL;
... lots of matches
proc.c:    struct proc *p = kernel_proc;
proc.c:    struct proc *p = new_proc(name, priority, ppid);

In the directory of your xv6-labs, create two files: answers-util.txt and time.txt that I may use if I run your code using your zip file submission. The answers-util.txt is a blank file. The time.txt file contains the number of hours you spent on the lab.

$ echo > answers-util.txt
$ echo 8 > time.txt

Xv6 Utility Programs

When you make Xv6, you are building a system that includes the Xv6 kernel and its utility programs. The utility programs are placed on a disk image such that when Xv6 bootstraps, the files are available to the Xv6 kernel, the Xv6 shell, and the Xv6 utility programs. The Xv6 system bootstraps into the qemu emulator, emulating a RISC-V CPU, a timeer interrupt, a terminal device, and a disk/SSD device.

Utility programs are added to the Xv6 system by creating a .c program in the user directory and adding the name of the program to the UPROGS variable in the Makefile. For example, when adding the kittycat utility program, you place the program's file (kittycat.c) in the user directory and add $U/_kittycat\ to the UPROGS variable in the Makefile. When you have done this, make qemu will compile your program, include it in the resulting system, and you will be able to run your program from the Xv6 shell.

For this lab, I have provided a Makefile where the variable UPROGS has all of the lab's utility programs. In future labs, if you add a new utility program, you will have to remember to update the variable UPROGS

You will notice that the hints have words such as "Ensure your kittycat program has been added UPROGS in Makefile." This has already been done, but you should examine the Makefile to see how it has been done.

Problem 1: Boot xv6

You can do these labs on our CPSC server or on your own computer. If you use your own computer, have a look at the lab tools page for setup tips.

1. What does it mean to boot (which is short for bootstrap) a computer?

Fetch the git repository for the xv6 source for the lab:

$ git clone https://github.com/gustycooper/xv6-labs.git
Cloning into 'xv6-labs'...
...
$ cd xv6-labs

The repo is setup so that git checkouts the riscv branch when cloning the repo. You must switch to the util branch to conduct this lab.

$ git status
On branch riscv
Your branch is up to date with 'origin/riscv'.

nothing to commit, working tree clean
$ git checkout util

The xv6-labs repository differs slightly from the book's xv6-riscv; it mostly adds some files. If you are curious look at the git log:

$ git log

The files you will need for this and subsequent lab assignments are distributed using the Git version control system. For each of the labs you will checkout (git checkout util) a version of xv6 tailored for that lab. To learn more about Git, take a look at the Git user's manual, or, you may find this CS-oriented overview of Git useful. Git allows you to keep track of the changes you make to the code. For example, if you are finished with one of the exercises, and want to checkpoint your progress, you can commit your changes by running:

$ git commit -am 'my solution for util lab exercise 1'
Created commit 60d2135: my solution for util lab exercise 1
 1 files changed, 1 insertions(+), 0 deletions(-)
$

You can keep track of your changes by using the git diff command. Running git diff will display the changes to your code since your last commit, and git diff origin/util will display the changes relative to the initial util code. Here, origin/util is the name of the git branch with the initial code you downloaded for the class.

Build and run xv6:

2. The Makefile has many targets. Locate the target qemu for the command make qemu and explain what it does.

$ make qemu
riscv64-unknown-elf-gcc    -c -o kernel/entry.o kernel/entry.S
riscv64-unknown-elf-gcc -Wall -Werror -O -fno-omit-frame-pointer -ggdb -DSOL_UTIL -MD -mcmodel=medany -ffreestanding -fno-common -nostdlib -mno-relax -I. -fno-stack-protector -fno-pie -no-pie   -c -o kernel/start.o kernel/start.c
...
riscv64-unknown-elf-ld -z max-page-size=4096 -N -e main -Ttext 0 -o user/_zombie user/zombie.o user/ulib.o user/usys.o user/printf.o user/umalloc.o
riscv64-unknown-elf-objdump -S user/_zombie > user/zombie.asm
riscv64-unknown-elf-objdump -t user/_zombie | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$/d' > user/zombie.sym
mkfs/mkfs fs.img README  user/xargstest.sh user/_cat user/_echo user/_forktest user/_grep user/_init user/_kill user/_ln user/_ls user/_mkdir user/_rm user/_sh user/_stressfs user/_usertests user/_grind user/_wc user/_zombie
nmeta 46 (boot, super, log blocks 30 inode blocks 13, bitmap blocks 1) blocks 954 total 1000
balloc: first 591 blocks have been allocated
balloc: write bitmap block at sector 45
qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0

xv6 kernel is booting

hart 2 starting
hart 1 starting
init: starting sh
$

If you type ls at the prompt, you should see output similar to the following:

$ ls
.              1 1 1024
..             1 1 1024
README         2 2 508
kittycat.txt   2 3 597
xargstest.sh   2 4 93
cat            2 5 32848
echo           2 6 31696
forktest       2 7 15832
grep           2 8 36224
init           2 9 32200
kill           2 10 31664
ln             2 11 31480
ls             2 12 34792
mkdir          2 13 31712
rm             2 14 31704
sh             2 15 54144
stressfs       2 16 32592
usertests      2 17 180488
grind          2 18 47536
wc             2 19 33800
zombie         2 20 31064
kittycat       2 21 32864
sleep          2 22 31208
pingpong       2 23 31224
primes         2 24 31208
find           2 25 31208
xargs          2 26 31208
console        3 27 0

These are the files that mkfs includes in the initial file system; most are programs you can run. You just ran one of them: ls. Several of the files are templates that you must design, edit, and test as part of this lab. Here are a couple more commands for you to enter.

$ echo kittycat
kittycat
$ cat kittycat.txt
This is the text in the file kittycat.

Soft kitty, warm kitty, 
little ball of fur! 
Happy kitty, sleepy kitty, 
purr purr purr!
Xv6 kitty, Xv6 kitty,
grrr, grrr, grrr!

And the cat's in the cradle and the silver spoon
Little boy blue and the man in the moon
"When you coming home, dad?" "I don't know when"
But we'll get together then
You know we'll have a good time then

A long tailed cat sitting by the old rocking chair
Now he don't realize that there's a danger there
But he don't care no rock 'n' roll chair is gonna boogie on his day
'Cause when his tail took a low down, syncopate, yeah

xv6 has no ps command, but, if you type Ctrl-p, the kernel will print information about each process. If you try it now, you'll see two lines: one for init, and one for sh.

To quit qemu type: Ctrl-a x (press Ctrl and a at the same time, followed by x).

Problem 2: kittycat

Implement the UNIX program kittycat for xv6; your kittycat mimics the UNIX cat program, but it only concatenates the file kittycat.txt to standard output. Invoking kittycat with an argument results in a error. See shell commands below for sample kittycat sessions. Your solution should be in the file user/kittycat.c.

Some Hints

Before you start coding, read Chapter 1 of the xv6 book.
The code in user/kittycat.c is a duplicate of user/cat.c. You should study the code in user/kittycat.c to understand how the UNIX cat program works. cat places the contents of a file (or multiple files) to standard output. You can redirect cat's output or pipe it to another UNIX program.
Look at some of the other programs in user/ (e.g., user/echo.c, user/grep.c, and user/rm.c) to see how you can obtain the command-line arguments passed to a program.
If the user passes an argument, kittycat should print the error message: kittycat: No args allowed.
Ensure your kittycat program has been added UPROGS in Makefile. The Makefile variable UPROGS defines all of the utility (or user) programs that are included in the Xv6 system that the Makefile builds. Once you have updated UPROGS to include kittycat, make qemu will compile your kittycat program and you'll be able to run it from the xv6 shell.

Run the program from the xv6 shell:

      $ make qemu
      ...
      init: starting sh
      $ kittycat
      This is the text in the file kittycat.

      Soft kitty, warm kitty, 
      little ball of fur! 
      Happy kitty, sleepy kitty, 
      purr purr purr!
      Xv6 kitty, Xv6 kitty,
      grrr, grrr, grrr!

      And the cat's in the cradle and the silver spoon
      Little boy blue and the man in the moon
      "When you coming home, dad?" "I don't know when"
      But we'll get together then
      You know we'll have a good time then

      A long tailed cat sitting by the old rocking chair
      Now he don't realize that there's a danger there
      But he don't care no rock 'n' roll chair is gonna boogie on his day
      'Cause when his tail took a low down, syncopate, yeah

      $ kittycat gusty
      kittycat: No args allowed

3. What are three songs and artists in the kittycat printout? For the "Soft kitty" song, instead of an artist, which in which TV show was it used and by which character?

Testing

After you have implemented your kittycat.c, you must test it, which in this case is rather easy. Here is what I would do to test kittycat.c

$ cat kittycat.txt - use cat to show what is in the kittycat.txt file.
$ kittycat - run kittycat and verify that it displays the same information as the previous cat command.
$ kittycat argument - run kittycat with an argument and verify that it displays the no args allowed message.

Problem 3: sleep

Implement the UNIX program sleep for xv6; your sleep should pause for a user-specified number of ticks. A tick is a notion of time defined by the xv6 kernel, namely the time between two interrupts from the timer chip. Your solution should be in the file user/sleep.c. When running sleep, you must pass an integer argument that is the number of ticks to sleep. If an argument is not pass, your sleep must print an error message.

$ sleep 10
$ sleep
Usage: sleep <ticks>

Some Hints

Before you start coding, read Chapter 1 of the xv6 book.
Look at some of the other programs in user/ (e.g., user/echo.c, user/grep.c, and user/rm.c) to see how you can obtain the command-line arguments passed to a program.
If the user forgets to pass an argument, sleep should print an error message.
The command-line argument is passed as a string; you can convert it to an integer using atoi (see user/ulib.c).
Use the system call sleep.
See kernel/sysproc.c for the xv6 kernel code that implements the sleep system call (look for sys_sleep), user/user.h for the C definition of sleep callable from a user program, and user/usys.S for the assembler code that jumps from user code into the kernel for sleep.
main should call exit(0) when it is done.
Ensure your sleep program has been added UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll be able to run it from the xv6 shell.
Look at Kernighan and Ritchie's book The C programming language (second edition) (K&R) to learn about C.

sleep.c solution code.


#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
  if(argc < 2){
    fprintf(2, "Usage: sleep \n");
    exit(1);
  }
  sleep(atoi(argv[1]));
  exit(0);
}

4. What is the type of argv[1] and why is it an appropriate argument for atoi()?

Testing

Run the program from the xv6 shell:

      $ make qemu
      ...
      init: starting sh
      $ sleep 10
      (nothing happens for a little while)
      $

5. Use the process state diagram to explain how you think the OS implements a sleeping process.

Problem 4: forkexec (fork and exec)

Write a program that calls fork and exec to execute the ls program as a child process.

$ forkexec - when your program is run without an argument, it calls ls to display all files.
$ forkexec kittycat - when your program is run with an argument, it calls ls, passing the argument to ls. In this example, ls will display the file kittycat.

When a process calls fork(), it results in a duplicate of the calling process. You now have two processes. The return value from fork is used to determine if you are the parent or child process. A return value of 0 indicates you are the child process. A return value of nonzero indicates you are the parent process.

Some Hints

The following code demonstrates a child process calling fork and exec to run the echo program.

#include "kernel/types.h"
#include "user/user.h"
int main() {
  int status;
  int pid = fork();
  if(pid == 0){
    char *argv[] = { "echo", "THIS", "IS", "ECHO", 0 };
    exec("echo", argv);
    printf("exec failed!\n");
    exit(1);
  } else {
    printf("parent waiting\n");
    wait(&status);
    printf("the child exited with status %d\n", status);
  }
  exit(0);
}

Notice that the first argument passed to exec() is the name of the program to execute.
Notice that the second argument passed to exec() is an array of pointers to strings. By convention the first element of the array points to the name of the program to execute. The remaining elements are pointers to the arguments passed to the program.
Notice how when calling exec(), it never returns if it is successful. The program that is executed overwrites the child process.
Notice how the parent wait()'s for the child to terminate.
Your main must have parameters as shown below.
```
int main(int argv, char **argv)
```
If the argv parameter is 2, then you will pass argv[1] as the second element in the second argument passed to exec()

Problem 5: File Reading

Write a program, numbers, that uses the system call read to read a file with numbers and computes the sum of those numbers.

To simplify the problem, the input file nums.txt has a sequence of lines. Each line contains a two-digit number.
```
10
20
30
```

Some Hints

numbers.c solution code.


#include "kernel/types.h"
#include "kernel/fcntl.h"
#include "user/user.h"

int
main(int argc, char **argv)
{
  char buf[3];
  int fd;
  if (argc == 2)
    fd = open(argv[1], O_RDONLY);
  else
    fd = open("nums.txt", O_RDONLY);
  if (fd < 0) {
    printf("file not found\n");
    exit(1);
  }
  printf("Compute sum of 2 digit numbers in a file.\n");
  int n, sum = 0, count = 1;
  while ((n = read(fd, buf, 3)) == 3) {
    printf("----------\nNumber%d: ", count);
    write(1, buf, n);
    printf("buf[0]: 0x%x, buf[1]: 0x%x, buf[2]: 0x%x\n", buf[0], buf[1], buf[2]);
    printf("buf[0]: %c, buf[1]: %c, buf[2]: %c", buf[0], buf[1], buf[2]);
    buf[2] = 0;
    int num = atoi(buf);
    printf("num: %d\n", num);
    sum += num;
    count++;
  }
  printf("----------\nsum: %d\n", sum);
  exit(0);
}

The Unix I/O system calls include open, close, read, and write. These are different from the C runtime I/O calls such as scanf, printf, fopen, etc. The C runtime I/O functions eventually call the Unix I/O system calls because those are the only functions that access the devices to perform I/O. You can see this by reading the file kernel/printf.c, which contains C runtime I/O functions.
A file is a sequence of bytes stored on a disk/SSD. In a text file, each byte is a 7-bit ASCII character, which allows it to be a readable file.
If you examined the details of the num.txt file shown above in the blue box, you would discover that it is a sequence of 9 bytes that are the following.
```
0x31 0x30 0xA 0x32 0x30 0xA 0x33 0x30 0xA
```
0x30, 0x31, 0x32, and 0x33 are the hex values for ASCII 0, 1, 2, and 3. And 0xA is the hex value for an ASCII Carriage Return (CR). The first three bytes of the file are ASCII 1, ASCII 0, and ASCII CR, which is 10\n.
The while loop reads 3 bytes at a time. Each read of 3 bytes consumes a "line", which is a 2-digit number and a CR.
The 3 bytes are read into the character buffer buf.
Once in buf, the data is printed in various ways.
- write(1, buf, n); - this writes 3 bytes to standard out, which is file descriptor 1.
- The elements of buf are printed as hex values using %x.
- The elemenst of buf are printed as char values using %c.
- The ASCII in buf is converted to an integer using the atoi function.
- The converted integer is printed as an integer using %d.
Notice how open returns an integer file descriptor, which is used by read as its first parameter. The other parameters to read are the buffer in which to read data, and the number of bytes to read. The write function is similar to read.

Testing

You can test your numbers program by running it as follows.

$ numbers
Compute sum of 2 digit numbers in a file.
----------
Number1: 10
buf[0]: 0x31, buf[1]: 0x30, buf[2]: 0xA
buf[0]: 1, buf[1]: 0, buf[2]: 
num: 10
----------
Number2: 20
buf[0]: 0x32, buf[1]: 0x30, buf[2]: 0xA
buf[0]: 2, buf[1]: 0, buf[2]: 
num: 20
----------
Number3: 30
buf[0]: 0x33, buf[1]: 0x30, buf[2]: 0xA
buf[0]: 3, buf[1]: 0, buf[2]: 
num: 30
----------
sum: 60
$ numbers not
file not found
$ numbers kittycat.txt
Compute sum of 2 digit numbers in a file.
----------
Number1: Thibuf[0]: 0x54, buf[1]: 0x68, buf[2]: 0x69
buf[0]: T, buf[1]: h, buf[2]: inum: 0
... lots printed to display ...
----------
Number199: ah
buf[0]: 0x61, buf[1]: 0x68, buf[2]: 0xA
buf[0]: a, buf[1]: h, buf[2]: 
num: 0
----------
sum: 6

Problem 6: pingpong

Write a program, pingpoing that uses UNIX system calls to ''ping-pong'' the arguments passed to main() between a parent and child process over a pair of pipes, one for each direction.

The parent shall send the first argument to the child;
The child shall print "<pid>: received "<arg1>", where <pid> is its process ID and <arg> is the argument received.
After printing, the child shall write the second argument on the pipe to the parent, and exit;
The parent shall read the argument from the child, print "<pid>: received "<arg>", and exit.
Your solution shall be in the file user/pingpong.c.

Some Hints

Use pipe to create a pipe.
Use fork to create a child.
Use read to read from a pipe, and write to write to a pipe.
Use getpid to find the process ID of the calling process.
Ensure your pingpong program has been added UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll
User programs on xv6 have a limited set of library functions available to them. You can see the list in user/user.h; the source (other than for system calls) is in user/ulib.c, user/printf.c, and user/umalloc.c.

The following is a version of pingpong.c that sends and received a single byte. You can modify this to send and receive arguments.


// Sample program: send/receive a byte
#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
int main(int argc, char *argv[]) {
  int p1[2], p2[2];
  pipe(p1); // create pipe p1
  pipe(p2); // create pipe p2
  int pid = fork();
  if(pid == 0){
    close(p1[1]);
    char buf[1];
    read(p1[0], buf, sizeof(buf));
    printf("%d: received ping\n", getpid());
    close(p1[0]);
    close(p2[0]);
    write(p2[1], "o", 1);
    close(p2[1]);
  } else if(pid > 0){
    close(p1[0]);
    write(p1[1], "i", 1);
    close(p1[1]);
    wait((int*) 0);
    char buff[1];
    read(p2[0], buff, sizeof(buff));
    printf("%d: received pong\n", getpid());
    close(p2[0]);
  } else{
    fprintf(2, "fork error!\n");
    exit(1);
  }
  exit(0);
}

I think you will find it helpful to create a drawing of the two pipes connecting the parent and child.
Notice how the parent and child close the input and output ends of the pipes.

Testing

If you first run the sample program, it should produce the following output:

    $ make qemu
    ...
    init: starting sh
    $ pingpong
    4: received ping
    3: received pong
    $

Your solution is correct if you can get it to produce output like the following.

    $ make qemu
    ...
    init: starting sh
    $ pingpong gusty cooper
    4: received gusty
    3: received cooper
    $

6. Explain in your own words what a Xv6/Linux pipe is - not a pipe used for smoking.

7. Notice how the I/O functions read() and write() are used for reading/writing both files and pipes. The open() function and the pipe() function both return a file descriptor. Explain in your own words what is the differenct between reading/writing pipes and files.

Problem 7: primes

Write a concurrent version of prime sieve using pipes. This idea is due to Doug McIlroy, inventor of Unix pipes. The picture halfway down this page and the surrounding text explain how to do it. Your solution should be in the file user/primes.c.

Your goal is to use pipe and fork to set up the pipeline. The first process feeds the numbers 2 through 35 into the pipeline. For each prime number, you will arrange to create one process that reads from its left neighbor over a pipe and writes to its right neighbor over another pipe. Since xv6 has limited number of file descriptors and processes, the first process can stop at 35.

Some Hints

primes.c solution code.


#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

void
primes(int lpipe[2])
{
  close(lpipe[1]);
  int first, num;
  int n = read(lpipe[0], &first, sizeof(int));
  if(n != sizeof(int)) exit(0);
  printf("prime %d\n", first);
  int rpipe[2];
  pipe(rpipe);
  while(read(lpipe[0], &num, sizeof(int))){
    if(num % first != 0)
      write(rpipe[1], &num, sizeof(int));
  }
  close(lpipe[0]);
  close(rpipe[1]);
  if(fork() == 0){
    primes(rpipe);
  } else {
    close(rpipe[0]);
    wait(0);
  }
  exit(0);
}

int
main(int argc, char *argv[])
{
  int p[2];
  pipe(p);
  for (int i=2; i<=35; i++) {
    write(p[1], &i, sizeof(int));
  }
  if(fork()==0){
    primes(p);
  } else {
    close(p[0]);
    close(p[1]);
    wait(0);
  }
  exit(0);
}

Be careful to close file descriptors that a process doesn't need, because otherwise your program will run xv6 out of resources before the first process reaches 35.
Once the first process reaches 35, it should wait until the entire pipeline terminates, including all children, grandchildren, &c. Thus the main primes process should only exit after all the output has been printed, and after all the other primes processes have exited.
Hint: read returns zero when the write-side of a pipe is closed.
It's simplest to directly write 32-bit (4-byte) ints to the pipes, rather than using formatted ASCII I/O.
You should create the processes in the pipeline only as they are needed.
Ensure your primes program has been added UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll

8. There is a lot going on in this solution code. One way to better understand it is to compare the picture halfway down this page to the solution code. Each box in the picture corresponds to process (created with fork()) in the solution code. The process is reading numbers from the left pipe and writing numbers to the right pipe. For example, the first primes() process reads the numbers 2 through 35. It prints the number two. Removes numbers even divisible by two and writes the remaining numbers to the right pipe. How many processes are created for this solution?

Testing

Your solution is correct if it implements a pipe-based sieve and produces the following output:

    $ make qemu
    ...
    init: starting sh
    $ primes
    prime 2
    prime 3
    prime 5
    prime 7
    prime 11
    prime 13
    prime 17
    prime 19
    prime 23
    prime 29
    prime 31
    $

9. Do you think there is a largest prime number such that it is the last prime number? What are the number of digits in the largest prime number discovers so far? When was it discovered?

Problem 8: find

Write a simple version of the UNIX find program: find all the files in a directory tree with a specific name. Your solution should be in the file user/find.c. The find program has two arguments - the path in which to begin searching and the filename for which to search. For example,
$ find . b
searches all subdirectories of the current directory for the filenamed b.

Some Hints

find.c solution code.


#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"
#include "kernel/fs.h"
#include "kernel/fcntl.h"

void
find(char *path, char *name)
{
  char buf[512], *p;
  int fd;
  struct dirent de;
  struct stat st;

  if((fd = open(path, O_RDONLY)) < 0){
    fprintf(2, "find: cannot open %s\n", path);
    return;
  }
  if(fstat(fd, &st) < 0){
    fprintf(2, "find: cannot stat %s\n", path);
    close(fd);
    return;
  }
  if(st.type != T_DIR){
    fprintf(2, "find: first param must be directory\n");
    close(fd);
    return;
  }

  strcpy(buf, path);
  p = buf + strlen(buf);
  *p++ = '/';
  while(read(fd, &de, sizeof(de)) == sizeof(de)){
    if(de.inum == 0) continue;
    memmove(p, de.name, DIRSIZ);
    p[DIRSIZ] = 0;
    if(stat(buf, &st) < 0){
      printf("find: cannot stat %s\n", buf);
      continue;
    }
    // core code
    if(st.type==T_DIR && strcmp(p, ".")!=0 && strcmp(p, "..")!=0){
      find(buf, name);
    }else if(st.type==T_FILE && strcmp(p, name)==0){
      printf("%s\n", buf);
    }
  }
  close(fd);
}

int
main(int argc, char *argv[])
{
  if(argc != 3){
    fprintf(2, "Usage: find <dir> <file>\n");
    exit(1);
  }
  find(argv[1], argv[2]);
  exit(0);
}

Notice how the function find() calls itself.

10. What is the condition that causes find() to make its recursive call?
In a Linux terminal enter the following commands.
```
$ ls .   # list the files in the current working directory
$ ls ..  # list the files in the parent directory
```
Notice how . and .. are files in the current working directory that have type T_DIR (type directory). Notice how the algorithm does not recurse into the . and .. files.

11. What would happen if the algorithm recursed into the . file?
Changes to the file system persist across runs of qemu; to get a clean file system run make clean and then make qemu.
You'll need to use C strings. Have a look at K&R (the C book), for example Section 5.5.
Note that == does not compare strings like in Python. Use strcmp() instead.
Ensure your find program has been added UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll

Testing

Your solution is correct if produces the following output (when the file system contains the files b, a/b and a/aa/b).

    $ make qemu
    ...
    init: starting sh
    $ echo > b
    $ mkdir a
    $ echo > a/b
    $ mkdir a/aa
    $ echo > a/aa/b
    $ find . b
    ./b
    ./a/b
    ./a/aa/b
    $

12. What does the Linux command $ echo > b do?

Problem 9: xargs

Write a simple version of the UNIX xargs program: its arguments describe a command to run, it reads lines from the standard input, and it runs the command for each line, appending the line to the command's arguments. Your solution should be in the file user/xargs.c.

The following example illustrates xarg's behavior:

    $ echo hello too | xargs echo bye
    bye hello too
    $

Note that the command here is "echo bye" and the additional arguments are "hello too", making the command "echo bye hello too", which outputs "bye hello too".

Please note that xargs on UNIX makes an optimization where it will feed more than argument to the command at a time. We don't expect you to make this optimization. To make xargs on UNIX behave the way we want it to for this lab, please run it with the -n option set to 1. For instance

    $ (echo 1 ; echo 2) | xargs -n 1 echo
    1
    2
    $

Some Hints

The first argument to xargs is the program to execute. The command xargs grep hello executes the program grep, which is the first argument passed to xargs
xargs is typically used on the right side of a pipe. This means the left side of the pipe generates output that is seen as xargs standard input. xargs sees this standard input as a sequence of lines, each terminated with a \n.
xargs executes the program on each line seen from standard input.
Consider the example sequence of commands that works with three files a, b, and README. README exists and we use echo to create a and b. I have numbered the commands in order to reference them. You should enter the commands in the Xv6 shell as you study to better understand xargs.
```
$ echo hello > b       [1] 
$ cat b                [2] 
hello
$ echo 12README > a    [3] 
$ echo b >> a          [4] 
$ cat a                [5] 
b
README
$ ls a b README        [6] 
a              2 28 9
b              2 27 6
README         2 2 2227
$ lss a b README       [7] 
a
b
README
$ lss b README         [8] 
b
README
$ wc b                 [9] 
1 1 6 b
$ wc README            [10] 
47 316 2227 README
$ cat a | xargs wc     [11] 
1 1 6 b
47 316 2227 README
$ lss b README | xargs wc [12] 
1 1 6 b
47 316 2227 README
      
```
- Command [1] creates the file b to have one line (the string hello). echo echos the string hello to standard output and the > character redirects standard output to the file b. Command [2] cats b to standard output to show it contains hello.
- Commands [3] and [4] create the file a to have two lines - line 1 is b and line 2 is README. This awkward use of the Xv6 cat creates a two line file. I fiddled around until I found a way to create a file with the two lines I wanted. Realize that the Xv6 cat is a simplified version of the Linux cat.
- Command [5] shows that the file a has the desired two lines. Each line contains a name of a file in the current directory.
- Commands [6] and [7] show the three files a, b, and README in the current directory. Notice the Xv6 ls command shows the file type (1 is file, 2 is directory, etc.), the inode number, and the size (in bytes). The lss command just shows the file names. We will use the lss command to pipe filenames to xargs wc so there are no spaces in the input stream to xargs.
- Command [8] show the two files b and README in the current directory. If you compare the output of the cat a of command [5] to the output of lss b README of command [8], you will see they are identical. They both write the following to standard output.
  b\nREADME\n
- Commands [9] and [10] count the lines, words, and characters in the files b and README.
- Commands [11] and [12] demonstrate how xargs works. Both cat a and lss b README generate the same stream of characters to standard output, (i.e., b\nREADME\n) which is piped to the standard input of xargs. xargs reads the standard input and applies the command wc to each line, where a line is marked by the character '\n'. This means the commands [11] and [12] are equivalent to the following.
```
$ wc b
$ wc README
      
```
xargs, find, and grep combine well. Consider the example sequence of commands.
```
$ mkdir a                      [1] 
$ echo hello > a/b             [2] 
$ mkdir c                      [3] 
$ echo hello > c/b             [4] 
$ echo hello > b               [5] 
$ find . b                     [6] 
./b
./a/b
./c/b
$ grep hello b                 [7] 
hello
$ grep hello ./b               [8] 
hello
$ grep hello a/b               [9] 
hello
$ grep hello ./a/b             [10] 
hello
$ find . b | xargs grep hello  [11] 
hello
hello
hello
      
```
- Commands [1] through [5] use mkdir and echo to create subdirectories a and b, each of which has a file named b with the string hello. If you are typing this commands in the same session as the previous sample commands, you will have to remove the file a that we created earlier.
- Command [6] shows the output of find . b, which shows a file b in the current directory and the subdirectories a and b. These three lines of output will be streamed to the standard input of xargs.
- Commands [7] through [10] show various grep commands. grep searches a file for a string. Notice commands [7] and [8] both search the file b in the current directoy. Command [8] uses a path.
- Command [11] shows the combinations of xargs, find, and grep. find generates three lines in standard output. Each line is a path of a file name. xargs applies the command grep hello to each of these lines. Command [11] is equivalent to the following.
```
$ grep hello ./b
$ grep hello ./a/b
$ grep hello ./c/b
      
```
xargs is the parent process and it will use fork and exec to invoke the command on each line of input. The exec-ed command is the child process. xargs will use wait to wait for the child to complete the command.
To read individual lines of standard input, read a character at a time until a newline ('\n') appears.
kernel/param.h declares MAXARG, which may be useful if you need to declare an argv array.
Ensure your xargs program has been added UPROGS in Makefile; once you've done that, make qemu will compile your program and you'll
Changes to the file system persist across runs of qemu; to get a clean file system run make clean and then make qemu.

High-level Design

Process args passed to xargs
Make sure it has at least one argument. 
For example, $ xargs wc
Create a char* array (e.g. xargv) for arguments to the exec call
Put the first arg passed to xargs in xargv[0]
The read loop will put values in xargv[1], xargv[2], ...
READLOOP:
Read 1 character from standard input
If the read returns 0, you exit READLOOP
If the character read is \n 
  if (fork() == 0)
    call exec with xargv
  else
    wait
else if the character read is a space
  move to next element of xargv
else
  concatenate character onto string to put into xargv
GOTO READLOOP

xargs Solution Code


#include "kernel/types.h"
#include "kernel/param.h"
#include "user/user.h"

void
clear(char *xargv[MAXARG], int start){
  for(int i=start; i<MAXARG; i++){
    xargv[i] = 0;
  }
}

int
main(int argc, char *argv[])
{
  if(argc < 2){
    fprintf(2, "Usage: xarg <cmd>\n");
    exit(1);
  }
  if(argc > MAXARG){
    fprintf(2, "xargs: too many arguments\n");
    exit(1);
  }
  char *xargv[MAXARG] = {0};
  int xargc = 0;
  for(; xargc+1 < argc; xargc++){
    xargv[xargc] = argv[xargc+1];
  }
  char buf[1024], c;
  int p = 0, start = 0;
  while(read(0, &c, 1)){
    switch (c){
    case ' ':
      buf[p] = 0;
      xargv[xargc++] = &buf[start];
      start = p;
      break;
    case '\n':
      buf[p] = 0;
      xargv[xargc++] = &buf[start];
      if(fork() == 0){
        exec(argv[1], xargv);
      } else {
        wait(0);
      }
      xargc = argc - 1;
      clear(xargv, xargc);
      p = 0;
      start = 0;
      break;
    default:
      buf[p++] = c;
      break;
    }
  }
  exit(0);
}

Testing

To test your solution for xargs, run the seqence of commands provided in the hints section. The shell script xargstest.sh has been provided. Its contents are the following.

mkdir a
echo hello > a/b
mkdir c
echo hello > c/b
echo hello > b
find . b | xargs grep hello

The program sh is the shell. You can use input redirection to redirect the contents xargstest.sh to the shell as shown next. You output should match this.

  $ make qemu
  ...
  init: starting sh
  $ sh < xargstest.sh
  $ $ $ $ $ $ hello
  hello
  hello
  $ $

You may have to go back and fix bugs in your find program. The output has many $ because the xv6 shell doesn't realize it is processing commands from a file instead of from the console, and prints a $ for each command in the file.

13. The testing has a command $ sh < xargstest.sh. What does this command do? What is the sh program?

14. Perform the following.

In a Linux shell enter the command $ echo $SHELL. This displays the program that is your currently running shell. On my Mac, it displays /bin/zsh, which is the Z shell. On the CPSC server, it displays /bin/bash, whish is the Bash (or Bourne Again) shell.
Enter the command $ ls /bin to see all of the Linux commands in the /bin directroy. These commands are simply C programs that have been compiled and placed in the /bin directory.
Enter the command $ echo $PATH. This shows a list of directories that the shell program searches for a matching program when you type in a command. One of the directories in your PATH variable is /bin. The shell will find various Linux commands in the /bin directory.
Enter the command $ which ls. This shows the location of the ls program, which is what the shell runs when you enter the ls command.

15. What I find amazing is that the basic Unix/Linux API has not changed in 60 years. Now that you have written programs with pipes, forks, execs, and I/O redirection, what do you think about the Linux API.

16. Without looking up functions, list as many Linux API functions that you can think of.

Submit the lab

This completes the lab. Read Lab Submissions for instructions on how to submit your lab.

Lab: Xv6 and Unix utilities

Xv6 Utility Programs

Problem 1: Boot xv6 g("easy")

Problem 2: kittycat g("easy - starting code provided")

Some Hints

Testing

Problem 3: sleep g("easy - solution code provided")

Some Hints

Testing

Problem 4: forkexec (fork and exec) g("easy - sample code provided")

Some Hints

Problem 5: File Reading g("easy - solution code provided")

Some Hints

Testing

Problem 6: pingpong g("easy - starting code provided")

Some Hints

Testing

Problem 7: primes g("moderate - solution code provided")

Some Hints

Testing

Problem 8: find g("moderate - solution code provided")

Some Hints

Testing

Problem 9: xargs g("moderate - solution code provided")

Some Hints

High-level Design

xargs Solution Code

Testing

Submit the lab

Problem 1: Boot xv6

Problem 2: kittycat

Problem 3: sleep

Problem 4: forkexec (fork and exec)

Problem 5: File Reading

Problem 6: pingpong

Problem 7: primes

Problem 8: find

Problem 9: xargs