it’s hard to reliably test whether code is free from locking errors and races. 很难通过可靠的测试来判断代码是否存在锁错误与竞态 Ultimately lock granularity decisions need to be driven by performance measurements as well as complexity considerations. 锁的粒度取决于性能需求与复杂度考量
通过在代码中以相同顺序请求锁来避免死锁问题,锁粒度越小越容易发生死锁问题。
可重入锁也不能真正解决死锁问题,而且可能引入逻辑错误
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
structspinlocklock; int data = 0; // protected by lock f() { acquire(&lock); if(data == 0){ call_once(); h(); data = 1; } release(&lock); } g() { aquire(&lock); if(data == 0){ call_once(); data = 1; } release(&lock); }
如果h函数里面调用g函数,则call_once执行了两次。
if a spinlock is used by an interrupt handler, a CPU must never hold that lock with interrupts enabled. Xv6 is more conservative: when a CPU acquires any lock, xv6 always disables interrupts on that CPU. Interrupts may still occur on other CPUs, so an interrupt’sacquire can wait for a thread to release a spinlock; just not on the same CPU.
Holding a spinlock that long would lead to waste if another process wanted to acquire it, since the acquiring process would waste CPU for a long time while spinning. Another drawback of spinlocks is that a process cannot yield the CPU while retaining a spinlock
Lock-free programming is more complicated, however, than programming locks; for example, one must worry about instruction and memory reordering.
无锁数据结构和算法非常复杂,需要考虑指令重排与内存操作重排序问题
重回并发性问题
This is a common pattern: one lock for the set of items, plus one lock per item
使用一个大锁对应于数据集合,小锁对应于单独的数据
Ordinarily the same function that acquires a lock will release it. But a more precise way to view things is that a lock is acquired at the start of a sequence that must appear atomic, and released when that sequence ends
加锁操作位于想要表现原子性的操作前,解锁操作位于想要表现原子性的操作后
freeing the object implicitly frees the embedded lock, which will cause the waiting thread to malfunction.
The file system uses struct inode reference counts as a kind of shared lock that can be held by multiple processes, in order to avoid deadlocks that would occur if the code used ordinary locks.
// Find the inode with number inum on device dev // and return the in-memory copy. Does not lock // the inode and does not read it from disk. staticstruct inode* iget(uint dev, uint inum) { structinode *ip, *empty;
acquire(&itable.lock);
// Is the inode already in the table? empty = 0; for(ip = &itable.inode[0]; ip < &itable.inode[NINODE]; ip++){ if(ip->ref > 0 && ip->dev == dev && ip->inum == inum){ ip->ref++; release(&itable.lock); return ip; } if(empty == 0 && ip->ref == 0) // Remember empty slot. empty = ip; }
// Recycle an inode entry. if(empty == 0) panic("iget: no inodes");
Some data items are protected by different mechanisms at different times For example, when a physical page is free, it is protected by kmem.lock (kernel/kalloc.c:24). If the page is then allocated as a pipe (kernel/pipe.c:23), it is protected by a different lock (the embedded pi->lock).
同一个数据结构可能在不同时刻由不同的锁保护,这实际上与它的使用方式有关
物理内存分配
The allocator sometimes treats addresses as integers in order to perform arithmetic on them (e.g., traversing all pages in freerange), and sometimes uses addresses as pointers to read and write memory (e.g., manipulating the run structure stored in each page); this dual use of addresses is the main reason that the allocator code is full of C type casts. The other reason is that freeing and allocation inherently change the type of the memory.
文件系统缓存层 The buffer cache layer caches disk blocks and synchronizes access to them, making sure that only one kernel process at a time can modify the data stored in any particular block.
The sleep-lock protects reads and writes of the block’s buffered content, while the bcache.lock protects information about which blocks are cached.
structbuf { int valid; // has data been read from disk? int disk; // does disk "own" buf? uint dev; uint blockno; structsleeplocklock; uint refcnt; structbuf *prev;// LRU cache list structbuf *next; uchar data[BSIZE]; }; struct { structspinlocklock; structbufbuf[NBUF];
// Linked list of all buffers, through prev/next. // Sorted by how recently the buffer was used. // head.next is most recent, head.prev is least. structbufhead; } bcache;
Your job is to implement per-CPU freelists, and stealing when a CPU’s free list is empty. You must give all of your locks names that start with “kmem”. That is, you should call initlock for each of your locks, and pass a name that starts with “kmem”. Run kalloctest to see if your implementation has reduced lock contention. To check that it can still allocate all of memory, run usertests sbrkmuch. Your output will look similar to that shown below, with much-reduced contention in total on kmem locks, although the specific numbers will differ. Make sure all tests in usertests pass. make grade should say that the kalloctests pass.
voidkinit() { char buf[10]; for (int i = 0; i < NCPU; i++) { snprintf(buf, 10, "kmem-%d", i); initlock(&kmem[i].lock, buf); kmem[i].freelist = 0; } freerange(end, (void *)PHYSTOP); }
voidfreerange(void *pa_start, void *pa_end) { uint64 page_start = PGROUNDUP((uint64)pa_start); uint64 page_end = PGROUNDDOWN((uint64)pa_end); uint64 cpu_size = PGROUNDDOWN((page_end - page_start) / NCPU); printf("all page:%d cpu page:%d\n", (page_end - page_start) / PGSIZE, cpu_size / PGSIZE); // 将物理页平均分配给各个CPU for (int i = 0; i < NCPU; i++) { for (uint64 pa = page_start + i * cpu_size; pa < page_start + (i + 1) * cpu_size; pa += PGSIZE) { if ((pa % PGSIZE) != 0 || (char *)pa < end || (uint64)pa >= PHYSTOP) panic("kfree"); memset((void *)pa, 1, PGSIZE); structrun *r = (struct run *)pa; r->next = kmem[i].freelist; kmem[i].freelist = r; } } // 将剩余页加入cpu0的空闲页列表 for (uint64 pa = page_start + NCPU * cpu_size; pa < page_end; pa += PGSIZE) { if ((pa % PGSIZE) != 0 || (char *)pa < end || (uint64)pa >= PHYSTOP) panic("kfree"); memset((void *)pa, 1, PGSIZE); structrun *r = (struct run *)pa; r->next = kmem[0].freelist; kmem[0].freelist = r; } }
// Free the page of physical memory pointed at by v, // which normally should have been returned by a // call to kalloc(). (The exception is when // initializing the allocator; see kinit above.) voidkfree(void *pa) { structrun *r;
if (((uint64)pa % PGSIZE) != 0 || (char *)pa < end || (uint64)pa >= PHYSTOP) panic("kfree");
// Fill with junk to catch dangling refs. memset(pa, 1, PGSIZE);
r = (struct run *)pa; // 将页放回当前CPU所属空闲页列表 int id = get_cpuid(); acquire(&kmem[id].lock); r->next = kmem[id].freelist; kmem[id].freelist = r; release(&kmem[id].lock); }
// Allocate one 4096-byte page of physical memory. // Returns a pointer that the kernel can use. // Returns 0 if the memory cannot be allocated. void *kalloc(void) { structrun *r; // 尝试从当前CPU对应空闲页列表分配内存 int id = get_cpuid(); acquire(&kmem[id].lock); r = kmem[id].freelist; if (r) kmem[id].freelist = r->next; release(&kmem[id].lock); // 若所属空闲页列表,从其他CPU空闲页列表获取一些页 if (r == 0) r = steal_free_page(id);
if (r) memset((char *)r, 5, PGSIZE); // fill with junk return (void *)r; }
Buffer cache
Modify the block cache so that the number of acquire loop iterations for all locks in the bcache is close to zero when running bcachetest. Ideally the sum of the counts for all locks involved in the block cache should be zero, but it’s OK if the sum is less than 500. Modify bget and brelse so that concurrent lookups and releases for different blocks that are in the bcache are unlikely to conflict on locks (e.g., don’t all have to wait for bcache.lock). You must maintain the invariant that at most one copy of each block is cached. When you are done, your output should be similar to that shown below (though not identical). Make sure usertests still passes. make grade should pass all tests when you are done.
这个实验还是比较难的,大致的思路是: 1:通过计算块号的哈希值(取余)将不同块请求分散至不同bucket 2:为每一个bucket配备一个锁,减少锁争用 3:记录最近访问时间,而不将节点移至链表首部 4:bucket可能会向其他bucket请求buffer,这可能造成死锁:a->b b->a。故借助一个大锁c,避免这种死锁情况:c->a->b c->b->a。 big lock-> bucket a -> bucket b big lock-> bucket b -> bucket a 5: 为避免为同一个块分配多个buffer,分配前再次检查是否存在对应块 6:对属于本bucket的块与其他bucket的块进行分类处理
structbuf { int valid; // has data been read from disk? int disk; // does disk "own" buf? uint dev; uint blockno; structsleeplock lock; uint refcnt; structbuf *prev; // LRU cache list structbuf *next; uchar data[BSIZE];
uint access_time; // 最近访问时间 int owner; // 所属的bucket };
// Buffer cache. // // The buffer cache is a linked list of buf structures holding // cached copies of disk block contents. Caching disk blocks // in memory reduces the number of disk reads and also provides // a synchronization point for disk blocks used by multiple processes. // // Interface: // * To get a buffer for a particular disk block, call bread. // * After changing buffer data, call bwrite to write it to disk. // * When done with the buffer, call brelse. // * Do not use the buffer after calling brelse. // * Only one process at a time can use a buffer, // so do not keep them longer than necessary.
// Linked list of all buffers, through prev/next. // Sorted by how recently the buffer was used. // head.next is most recent, head.prev is least. // 不同桶对应的链表头节点与保护锁 structspinlock bucket_lock[BUCKET]; structbuf head[BUCKET]; } bcache;
// Look through buffer cache for block on device dev. // If not found, allocate a buffer. // In either case, return locked buffer. staticstructbuf *bget(uint dev, uint blockno) { structbuf *b; int index = get_bucket_index(blockno); acquire(&bcache.bucket_lock[index]); // 遍历该桶链表,查询是否缓存该块 for (b = bcache.head[index].next; b != &bcache.head[index]; b = b->next) { if (b->dev == dev && b->blockno == blockno) { b->refcnt++;
release(&bcache.bucket_lock[index]); acquiresleep(&b->lock); return b; } } release(&bcache.bucket_lock[index]); // 未缓存该块,分配新块存放对应数据 // 先获取大锁,再获取bucket锁,避免死锁发生 acquire(&bcache.biglock); acquire(&bcache.bucket_lock[index]); // 再次检查是否存在对应buffer,避免为同一个块分配两个buffer for (b = bcache.head[index].next; b != &bcache.head[index]; b = b->next) { if (b->dev == dev && b->blockno == blockno) { b->refcnt++; release(&bcache.bucket_lock[index]); release(&bcache.biglock); acquiresleep(&b->lock); return b; } } // 分配一个buffer存放相应数据 while (1) { b = least_recent_used_bufffer(); if (b == 0) { printf("no buffer\n"); continue; } int old_owner = b->owner; // 若拥有者就是本身或未使用则不需要加其他锁 if (old_owner == NONE || old_owner == index) { b->dev = dev; b->blockno = blockno; b->valid = 0; b->refcnt = 1; b->owner = index;
// 之前未使用,需加入到对应链表 if (old_owner == NONE) add_entry(&bcache.head[index], b);
// Return a locked buf with the contents of the indicated block. structbuf *bread(uint dev, uint blockno) { structbuf *b;
b = bget(dev, blockno); if (!b->valid) { virtio_disk_rw(b, 0); b->valid = 1; } return b; }
// Write b's contents to disk. Must be locked. voidbwrite(struct buf *b){ if (!holdingsleep(&b->lock)) panic("bwrite"); virtio_disk_rw(b, 1); }
// Release a locked buffer. // Move to the head of the most-recently-used list. voidbrelse(struct buf *b){ if (!holdingsleep(&b->lock)) panic("brelse"); releasesleep(&b->lock); int index = get_bucket_index(b->blockno); acquire(&bcache.bucket_lock[index]); b->refcnt--; // 引用计数为0时记录访问时间 if (b->refcnt == 0) { b->access_time = ticks; } release(&bcache.bucket_lock[index]); }
voidbpin(struct buf *b){ int index = get_bucket_index(b->blockno); acquire(&bcache.bucket_lock[index]); b->refcnt++; release(&bcache.bucket_lock[index]); }
voidbunpin(struct buf *b){ int index = get_bucket_index(b->blockno); acquire(&bcache.bucket_lock[index]); b->refcnt--; release(&bcache.bucket_lock[index]); }