Wednesday 18 July 2018

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

The links to the commits are probably dead because I rebased the branch. Check the branch for all the commits

August 14 - 17:

Commits:

Summary:

  • Started converting the transmit side of netfront.
  • Almost complete. I can boot, and use ping without a problem.
  • But when I tried to scp a large file from one system to the Xen DomU, SSH fails to connect.
  • Also, if I try to send a file, it panics. Sending works fine without the transmit patch, just the receive patch, so it is a bug I introduced with the transmit update.
  • I will try to fix it in the upcoming weekend.
  • This is the last part. Once I have fixed the bugs, I’ll ask Roger to review. After this, we can start the process of getting it merged upstream.

August 10-13:

Commits:

Summary:

  • Minor fixes.
  • Almost finished updating netfront’s receive side of things to use the new busdma implementation.
  • It works, but only partly. The kernel boots and connects to xn0 without any problems. Also pinging also works.
  • But when I tried to scp a large file over, the kernel panics because of a failed assertion. This is probably because I have not yet figured out how I can update xn_rebuild_rx_bufs().
  • Fixed the failed assertion. It was because of incorrectly indexing the map we needed to look up.
  • Now I can download large files without a problem. Tested using scp and pkg install.
  • More small fixes and readability improvements.

August 6 - 9:

Commits:

Summary:

  • Partially implemented multi-load support, when I realized a flaw in my plan.
  • I was planning that we create one map for each rx or tx queue, and call load for every grant reference we need. Whenever we need a grant ref, we load exactly one page, so one reference is used. The call to xen_dmamap_get_grefs() would return the most recently allocated grant reference. This way we can get the reference.
  • The problem comes with unloads. We can’t unload parts of the map individually. An unload just unloads everything. So, we can’t individually free grant references. This essentially makes it impossible to maintain a pool of grant references, like netfront does.
  • Saved the partially completed code in a different branch, in case I need it later. Check it here.
  • So I have now decided it would be better to just create a map for each grant reference, like OpenBSD does. I will need to confirm with Roger that he is on board with this idea. I have already started updating netfront in the meantime.
  • Added the ability to reserve grant references on map creation.

August 4 - 5:

Commits:

Summary:

  • Spent some time cleaning up the code. Reading the style(9) guidelines, I realize I have been doing some things wrong. It says tab width is 8, and I have been developing with a tab width of 4. Because of this, some lines cross 80 columns. Fixed them.
  • Fixed some more minor style problems.
  • Documented the reasons behind using temp_segs in a code comment. Roger asked me about why I was doing this, because even he was confused, so I guess for anyone unfamiliar it would make absolutely no sense. Hopefully it would be easier to understand now.
  • Fixed a potential deadlock between xbd_io_lock and gnttab_list_lock. I am still in discussion with Roger about this, but his suggestion to use a taskqueue(9) wouldn’t work because the deadlock would happen before we ever get the chance to get to the callback. So the fix has to be in lockfunc supplied to the tag creation.
  • Started adding support for loading an already-loaded map.
  • Down with fever on Sunday. Wanted to complete support for loading already-loaded maps, but couldn’t.

August 2 - 3:

Commits:

Summary:

  • Two rather busy days of college. Couldn’t do as much as work as I would have liked. I’ll try to compensate it this weekend.
  • Discussed the issue of locking order with Roger. The conversation is still going, but I have a good idea on how to fix it. Let’s see if Roger agrees.
  • Wrote comments that explain why we need to copy the segs array into temp_segs. Roger asked me about it in a review, and it prompted a long discussion about it. It is rather confusing to anyone not intimate with the implementation, so a detailed comment would hopefully shed some light on why we do it.

July 31 - August 1:

Commits:

Summary:

  • Finally able to completely test the busdma implementation in the shortage of grant references. After some fixes, it works.
  • But there is a problem: if one map is waiting for grant references, and another map calls unload, making ample grant references available, the callback will be called with xbd_io_lock held. But xen_gnttab_free_callback() tries to acquire the lock before calling the client callback. This would result in a recursion on the non-recursive lock causing a panic.
  • But the lock should be held when xbd_queue_cb() (the callback specified to load by blkfront) is called.
  • Make xbd_io_lock recursive. It works, but I get a lock order reversal warning. From what I understand about lock order reversal warnings, they mean that if I was unlucky, I could have caused a deadlock. Looking in the locking order, sure enough, it is possible to deadlock in the following situation: One thread holds xbd_io_lock and it tries to allocate grant refs, so it tries to acquire gnttab_list_lock. But gnttab_list_lock is held by another thread that is currently freeing some grant references. If in this situation, the free callback is received, the thread holding gnttab_list_lock will try to acquire xbd_io_lock, causing a deadlock.
  • Need to ask Roger how I can fix it.

July 29-30:

Summary:

  • More testing.
  • Finally able to produce a shortage of grant refs.
  • The kernel panics when the grant table callback is received because of a recursion on the non-recursive lock gnttab_list_lock. More details here.
  • Fixed it by adding a check in get_free_entries() about whether the thread is holding the lock or not. If it is, don’t try to acquire the lock again.
  • Roger thinks it is better to make it a recursive lock. Submitted a patch. Accepted and committed by Roger.
  • More testing reveals that I probably got lucky when I was able to receive the grant table callback, because right now, the kernel deadlocks (I think) whenever I try to produce a shortage of grant references. It is either a deadlock or the busdma implementation is not releasing the grant references properly. Will need to look into the blkfront code to see where things are going wrong.
  • Testing for low grant refs has low return on effort I believe. Before the patch I submitted, it was IMPOSSIBLE to use the callback without causing a panic, and it never got fixed until today. So apparently shortage of grant references does not happen very often.

July 26 - 28:

Commits:

Summary:

  • It is really tough to balance GSoC and college. But I just have to suck it up. Working 75+ hours a week is really tiring.
  • Trying to produce a shortage of grant references has led me on a long chase. Turns out, there are more bugs in the Xen code (the code that is already there, not the one I’m writing) than I anticipated.
  • If USB is allowed to have root mount hold, and there is a situation with scarcity of grant references, there is a kernel panic originating from an incorrect list removal from usb_bus_explore()'s call to root_mount_rel().
  • I have no idea how to fix it. It is not directly caused by the grant table subsystem, so I’m lost.
  • Circumvented it by setting USB_HAVE_ROOT_MOUNT_HOLD to 0 in dev/usb/usb_freebsd.h.
  • It is a lot of trial and error involved here. It turns out, if I have a maximum of 8 grant table pages, I can boot a DomU and produce a shortage of grant refs.
  • Luckily, I just discovered a double free which happens when a disk is removed on a running system. Check the commit here for more details.
  • Running multiple block devices at the same time is proving to be a little complicated. I want to try with one disk first. Then I’ll go to multiple disks.
  • Ran make -j4 kernel to see if the build completes. It goes on for a while but then freezes because of a shortage of grant refs.
  • If I do ^C, it gets me back to the shell, but any disk operation permanently freezes the system. Maybe the grant references are not being freed properly?
  • I’ll admit, I am getting a little bored of this shortage of grant refs thing, but I have to test it. But, as a change, I started looking at the netfront code.
  • It is rather straightforward, but the current implementation can’t be used with netfront. That is because right now, both allocation and mapping of grant refs happens at the same time. This won’t work with netfront. This is because netfront maintains a pool of grant references upon connection, and the places where the grant refs are mapped do not expect failure.
  • There are two ways to approach this:
    • Add the ability to set the refs array of the map by the client. This way, netfront can allocate its references and then set it up in the map.
    • Add the ability to pre-allocate grant references, and map them one-by-one. But for this, a mechanism to specify which reference is being mapped needs to be introduced. I need to talk with Roger about this.
  • One more essential thing I need to implement is the ability to load an already-loaded map. This is because netfront maps one grant ref at a time and creating a new map for each grant reference is overkill.

July 24-25:

Summary:

  • A lot of trial-and-error later, I figure out that the lowest value of gnttab_max_frames at which I can get the kernel to boot is 4.
  • But at 4 frames, the Xen DomU does not boot. The netfront driver prints “Failed to allocate tx refs” and then a page fault happens.
  • Tried to figure out a fix for the page fault, so I can submit the fix.
  • It originates from gnttab_free_grant_references() which is called from xn_setup_txqs(). The reason for the page fault is that earlier when the allocation of grant refs failed, the gref_head was set to GNTTAB_LIST_END. gnttab_free_grant_references() does not check for an invalid gref_head and thus causes a page fault.
  • Fixed that, but now there is a page fault in gnttab_end_foreign_access_ref(). This is caused because on fail, xn_setup_txqs() calls xn_disconnect_txq() which calls gnttab_end_foreign_access_ref(). But since there was a shortage of grant refs, the ring reference is GNTTAB_REF_INVALID and this causes a page fault.
  • Fixed that too, but now the kernel panics somewhere else.
  • I can’t spend too much time on this bug. I already have limited time with my college going on. I’ll let Roger know about this, and maybe he’ll figure something out.
  • I’ll try to test the busdma implementation in a shortage of grant refs, and then start working on netfront ASAP.

July 23:

Summary:

  • An email from Google arrived talking about how I should submit my work for the final evaluation. Will need to talk with my mentors about it.
  • Like Roger suggested, I started looking into creating a shortage of grant refs. When I set gnttab_max_frames=1 in the Xen command line, it does not even boot. Need to figure out the minimum number of frames I need to boot.
  • In the meanwhile, I’ll start organizing my code for the final evaluation.

July 22:

Summary:

  • College starts tomorrow. I won’t have as much time on my hands from now on. Still, I’ll try my best to finish as much work as I can before the official GSoC deadline.

July 19-20:

Commits:

Summary:

  • More testing. Tried multiple transfers to multiple disks at the same time. Works.
  • Review with Roger. He started reviewing the busdma implementation from the start once more. Check it here
  • Fixed multiple code style issues Roger suggested.
  • Wrote multiple lengthy explanations about some parts of the code Roger was asking about.

July 18

Commits:

Summary:

  • Before trying to run on a FreeBSD Dom0, I decided to attempt to narrow down which part is failing. So I started systematically.
  • First, I tried to determine whether or not we were even loading the memory properly. So, I reverted the blkfront driver to the original, and removed the code in the busdma implementation that handled the grant refs, and left only the code that handled the loading.
  • Compiled and ran. No errors. So the loads are happening just fine.
  • Then either the allocation/mapping of grant refs is going wrong, or the blkfront code is going wrong.
  • Tracing of every loaded map would not work. Most of the maps succeed, and there would be too many prints making it difficult to read the outputs, and the system would be slowed down considerably. To trace the code paths of only the maps that fail would require a per-map logging system. I could implement a simple per-map logging system, but I want to leave it as a last resort.
  • So, I next checked if the update to blkfront was wrong somewhere.
  • We have touched two areas: xbd_mksegarray()/xbd_queue_cb() and xbd_connect().
  • I reset xbd_mksegarray() to how it was before (this would mean double the number of grant refs would get allocated because the busdma implementation allocates its own, and then xbd_queue_cb() allocates its own. But let’s leave that for a while). The errors are still showing.
  • So, either the mapping of indirection refs is going wrong somewhere, or the busdma implementation is not properly allocating grant refs.
  • I reset xbd_connect() to the original and kept xbd_mksegarray()/xbd_queue_cb() to how it was post-update.
  • The errors disappear.
  • So, the culprit is somewhere in xbd_connect().
  • I finally have my eureka moment. I tried resetting xbd_connect() to the original before (I was not going for this systematic approach then), and the errors were showing up then. But this time they didn’t. It caught my attention that I forgot to reset block.h last time. So, I actually forgot to change the type of cm_indirectionrefs, which was an array in the original, and a pointer after the update.
  • Ah! There is the bug. When xbd_queue_cb() uses the indirection refs, it does a memcpy() to copy the grant refs of the indirection pages to the ring’s indirect_grefs. There it calls:
    memcpy(ring_req->indirect_grefs, &cm->cm_indirectionrefs,
    	    sizeof(grant_ref_t) * sc->xbd_max_request_indirectpages);
    
  • Notice the ampersand with cm->cm_indirectionrefs. It was needed when cm_indirectionrefs was an array. But now it is a pointer. Adding the ampersand would result in the wrong memory being copied.
  • Changing it to this fixes the errors.
    memcpy(ring_req->indirect_grefs, cm->cm_indirectionrefs,
    	sizeof(grant_ref_t) * sc->xbd_max_request_indirectpages);
    
  • A simple ampersand took me 4 days of debugging.
  • Tested the newfs command, like Roger suggested. Works.
  • Tested with low memory (512M) and copied large files, works fine.
  • I will start looking at the netfront driver tomorrow.

July 16 - 17

Commits:

Summary:

  • Lots of debugging.
  • The kernel boots, but it prints an error message every second or so.
  • Around 35% reads/writes have failed by the time the kernel boots and we log in.
  • The error is originating from an EIO returned by the ring request that we send to the backend.
  • Still haven’t figured out where or what is going wrong.
  • Asked Roger, he has no clue either.
  • Guess its trial and error from here on.
  • Looked at the backend code. There are 10 places where BLKIF_RSP_ERROR is returned. Defined the macro XBB_DEBUG, so DPRINTF() would print the error messages.
  • Weirdly enough, no error messages are printed by the backend.
  • Ah, realized my mistake. I am running on a Linux dom0, so making changes to the FreeBSD blkback won’t do any good.
  • I really don’t want to build the Linux kernel and hack it. I’ll try to run the domU on a FreeBSD dom0 and see if I can figure something out.

No comments:

Post a Comment

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD The links to the commits are prob...