CPSC 405 Lecture: Device drivers Topic: device drivers a CPU needs external devices: storage, communication, &c OS is in charge of programming the devices new issues/complications: devices often have rigid and complex interfaces devices and CPU run in parallel -- concurrency interrupts hardware wants attention now! e.g., pkt arrived software must set aside current work and respond on RISC-V use same trap mechanism as for syscalls and exceptions interrupts can arrive at awkward times most code in production OSes is device drivers you will write one for a network card Where are devices? [CPU, bus, RAM, uart/disk/net/&c, PLIC, interrupts] Programming devices: memory-mapped I/O device controller has address range ld/st to these addresses read/write control registers platform designer decides where devices live in physical memory space example device: UART Universal Asynchronous/Synchronous Transmitter serial interface, input and output "RS232 port" qemu emulates the very common 16550 chip many vendors, many "data sheets" on the web 16550.pdf data sheet details physical, electrical, and programming [rx wire, receive shift register, FIFO] [transmit FIFO, transmit shift register, tx wire] 16-byte FIFOs memory-mapped 8-bit registers at physical address UART0=0x10000000: (page 9 of 16550.pdf) 0: RHR / THR -- receive/transmit holding register 1: IER -- interrupt enable register, RX_ENABLE=0x1 and TX_ENABLE=0x2 ... 5: LSR -- line status register, RX_READY=0x1 and TX_IDLE=0x20 how does a kernel device driver use these registers? simple example: uartgetc() in kernel/uart.c ReadReg(RHR) turns into *(char*)(0x10000000 + 0) why does the UART have FIFOs? device driver must cope with times when device is not ready read() but rx FIFO is empty write() but tx FIFO is full how should device drivers wait? perhaps a "busy loop": while((LSR & 1) == 0) ; return RHR OK if waiting is unlikely -- if input nearly always available thus not OK for console! often no input (keystrokes) are waiting in FIFO most devices are like this -- may need to wait a long time for I/O the solution: interrupts when device needs driver attention, device raises an interrupt UART interrupts if: rx FIFO goes from empty to not-empty, or tx FIFO goes from full to not-full how does kernel see interrupts? device -> PLIC -> trap -> usertrap()/kerneltrap() -> devintr() trap.c devintr() scause high bit indicates the trap is from a device interrupt a PLIC register indicates which device interrupted the "IRQ" -- UART's IRQ is 10 IRQs are defined by the "board" -- qemu in this case an interrupt is usually just a hint that device state might have changed the real truth is in the device's status registers device driver must check them to decide action, if any for UART, check LSR to see if rx FIFO non-empty, tx FIFO non-full uart.c uartintr() Typical device driver structure [diagram: top-half/bottom-half] top half: executing a process's system call, e.g. write() or read() starts I/O, perhaps waits bottom half: the interrupt handler reads input, or sends more output, from/to device hardware needs to interact with "top half" process put input where top half can find it tell it more input has arrived or that more output can be sent does *not* run in context of top-half process maybe on different core maybe interrupting some other process so interactions must be arms-length -- buffers, sleep/wakeup Let's look at how xv6 sets up the interrupt machinery registers that control interrupts sie --- supervisor interrupt enabled register one bit per software interrupt, external interrupt, timer interrupt UART IER PLIC claim --- get next IRQ; acknowledge IRQ sstatus --- supervisor status register, SIE bit enables interrupts xv6's interrupt setup code: start(): w_sie(r_sie() | SIE_SEIE | SIE_STIE | SIE_SSIE); main(): consoleinit(); uartinit() plicinit(); scheduler(); intr_on(); w_sstatus(r_sstatus() | SSTATUS_SIE); Let's look at the shell reading input from the console/UART % make qemu-gdb % gdb (gdb) c (gdb) break sys_read (gdb) c (gdb) tui enable sys_read() fileread() consoleread() this is the top half look at cons.buf, cons.r, cons.w -- "producer/consumer buffer" [diagram: buf, r, w] sleep() now for the bottom half (gdb) break kerneltrap (gdb) c how did we get here? (gdb) where in kernel; no process wanted to run; scheduler() UART -> PLIC -> stvec -> kernelvec (gdb) p/x $stvec (gdb) p kernelvec kernelvec.S: kernelvec like trampoline, but for traps from kernel simpler: stack is valid, page table already kernel so can just save registers on stack, jump to kerneltrap save registers on current stack; which stack? in this case, special scheduler stack if executing system call in kernel, some proc's stack kerneltrap() devintr() (gdb) p/x $scause scause high bit means it's an interrupt p. 85 / Table 4.2 in riscv manual plic_claim() to find IRQ (which device) (gdb) p irq uartintr() uartgetc() x/1bx 0x10000005 check LSR for rx, copy from RHR to buffer, wake up consoleintr() print cons print cons.buf[cons.r] wakeup() x/1bx 0x10000005 -- note low bit no longer set plic_complete(irq) return through devintr, kerneltrap, kernelvec (gdb) b *0x80005b92 -- the end of kernelvec ... scheduler will now run the top half -- sh's read() since woken up let's break in top half -- consoleread() (gdb) b console.c:99 (gdb) c (gdb) where consoleread()'s sleep() returns consoleread() sees our character in cons.buf[cons.r] sh's read returns, with my typed character What if multiple devices want to interrupt at the same time? The PLIC distributes interrupts among cores Interrupts can be handled in parallel on different cores If no CPU claims the interrupt, the interrupt stays pending Eventually each interrupt is delivered to some CPU What if kernel has disabled interrupts when a device asks for one? by clearing SIE in sstatus, with intr_off() PLIC/CPU remember pending interrupts deliver when kernel re-enables interrupts Interrupts involve several forms of concurrency 1. Between device and CPU Producer/consumer parallelism 2. If enabled, interrupts can occur between any two instructions! Disable interrupts when code must be atomic 3. Interrupt may run on different CPU in parallel with top half Locks: next lecture Producer/consumer parallelism For input: Can arrive at time when reader not waiting Can arrive faster, or slower, than reader can read Want to accumulate input, and read(), in batches for efficiency For output: If device is slow, want to buffer output so process can continue If device is fast, want to send in batches for efficiency A common solution pattern producer/consumer buffer notification We've seen this at two levels: UART internal FIFOs, for device and driver -- plus interrupts cons.buf, for top-half and bottom-half -- plus sleep/wakeup We'll see this again when we look at pipes If enabled, an interrupt can occur between any two instructions Example: suppose the kernel is counting something in a global variable n top half: n = n + 1 interrupt bottom half: n = n + 1 the machine code for n=n+1 looks like this: a4 = &n lw a5,0(a4) addw a5,a5,1 sw a5,0(a4) what if an interrupt occurs between lw and addw? and interrupt handler also says n = n + 1? One solution: briefly disable interrupts in top half intr_off() n = n + 1 intr_on() intr_off(): w_sstatus(r_sstatus() & ~SSTATUS_SIE); RISC-V automatically turns off interrupts on a trap (interrupt/exception/syscall) trampoline cannot tolerate a second interrupt to trampoline! e.g. would overwrite registers saved in trapframe More on this when we look at locking Interrupt evolution Interrupts used to be cheap in CPU cycles; now they take many cycles due to pipelines, large register sets, cache misses interrupt overhead is around a microsecond -- excluding actual device driver code! So: old approach: simple h/w, smart s/w, lots of interrupts new approach: smart h/w, does lots of work for each interrupt What if interrupt rate is very high? Example: modern ethernet can deliver millions of packets /second At that rate, big fraction of CPU time in interrupt *overhead* Polling: an event notification strategy for high rates Top-half loops until device says it is ready e.g. uartputc_sync() Or perhaps check in some frequently executed kernel code e.g. scheduler() Then process everything accumulated since last poll More efficient than interrupts if device is usually ready quickly Perhaps switch strategies based on measured rate DMA (direct memory access) can move data efficiently the xv6 uart driver read bytes one at a time in software CPUs are not very efficient for this: off chip, not cacheable, 8 bits at a time but OK for low-speed devices most devices automatically copy input to RAM -- DMA then interrupt input is already in ordinary memory CPU memory operations usually much more efficient Interrupts and device handling a continuing area of concern Special fast paths Clever spreading of work over CPUs Forwarding of interrupts to user space for page faults and user-handled devices h/w delivers directly to user, w/o kernel intervention? faster forwarding path through kernel? We will be seeing these topics later in the course