Wednesday, 18 July 2018

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

The links to the commits are probably dead because I rebased the branch. Check the branch for all the commits

August 14 - 17:

Commits:

Summary:

  • Started converting the transmit side of netfront.
  • Almost complete. I can boot, and use ping without a problem.
  • But when I tried to scp a large file from one system to the Xen DomU, SSH fails to connect.
  • Also, if I try to send a file, it panics. Sending works fine without the transmit patch, just the receive patch, so it is a bug I introduced with the transmit update.
  • I will try to fix it in the upcoming weekend.
  • This is the last part. Once I have fixed the bugs, I’ll ask Roger to review. After this, we can start the process of getting it merged upstream.

August 10-13:

Commits:

Summary:

  • Minor fixes.
  • Almost finished updating netfront’s receive side of things to use the new busdma implementation.
  • It works, but only partly. The kernel boots and connects to xn0 without any problems. Also pinging also works.
  • But when I tried to scp a large file over, the kernel panics because of a failed assertion. This is probably because I have not yet figured out how I can update xn_rebuild_rx_bufs().
  • Fixed the failed assertion. It was because of incorrectly indexing the map we needed to look up.
  • Now I can download large files without a problem. Tested using scp and pkg install.
  • More small fixes and readability improvements.

August 6 - 9:

Commits:

Summary:

  • Partially implemented multi-load support, when I realized a flaw in my plan.
  • I was planning that we create one map for each rx or tx queue, and call load for every grant reference we need. Whenever we need a grant ref, we load exactly one page, so one reference is used. The call to xen_dmamap_get_grefs() would return the most recently allocated grant reference. This way we can get the reference.
  • The problem comes with unloads. We can’t unload parts of the map individually. An unload just unloads everything. So, we can’t individually free grant references. This essentially makes it impossible to maintain a pool of grant references, like netfront does.
  • Saved the partially completed code in a different branch, in case I need it later. Check it here.
  • So I have now decided it would be better to just create a map for each grant reference, like OpenBSD does. I will need to confirm with Roger that he is on board with this idea. I have already started updating netfront in the meantime.
  • Added the ability to reserve grant references on map creation.

August 4 - 5:

Commits:

Summary:

  • Spent some time cleaning up the code. Reading the style(9) guidelines, I realize I have been doing some things wrong. It says tab width is 8, and I have been developing with a tab width of 4. Because of this, some lines cross 80 columns. Fixed them.
  • Fixed some more minor style problems.
  • Documented the reasons behind using temp_segs in a code comment. Roger asked me about why I was doing this, because even he was confused, so I guess for anyone unfamiliar it would make absolutely no sense. Hopefully it would be easier to understand now.
  • Fixed a potential deadlock between xbd_io_lock and gnttab_list_lock. I am still in discussion with Roger about this, but his suggestion to use a taskqueue(9) wouldn’t work because the deadlock would happen before we ever get the chance to get to the callback. So the fix has to be in lockfunc supplied to the tag creation.
  • Started adding support for loading an already-loaded map.
  • Down with fever on Sunday. Wanted to complete support for loading already-loaded maps, but couldn’t.

August 2 - 3:

Commits:

Summary:

  • Two rather busy days of college. Couldn’t do as much as work as I would have liked. I’ll try to compensate it this weekend.
  • Discussed the issue of locking order with Roger. The conversation is still going, but I have a good idea on how to fix it. Let’s see if Roger agrees.
  • Wrote comments that explain why we need to copy the segs array into temp_segs. Roger asked me about it in a review, and it prompted a long discussion about it. It is rather confusing to anyone not intimate with the implementation, so a detailed comment would hopefully shed some light on why we do it.

July 31 - August 1:

Commits:

Summary:

  • Finally able to completely test the busdma implementation in the shortage of grant references. After some fixes, it works.
  • But there is a problem: if one map is waiting for grant references, and another map calls unload, making ample grant references available, the callback will be called with xbd_io_lock held. But xen_gnttab_free_callback() tries to acquire the lock before calling the client callback. This would result in a recursion on the non-recursive lock causing a panic.
  • But the lock should be held when xbd_queue_cb() (the callback specified to load by blkfront) is called.
  • Make xbd_io_lock recursive. It works, but I get a lock order reversal warning. From what I understand about lock order reversal warnings, they mean that if I was unlucky, I could have caused a deadlock. Looking in the locking order, sure enough, it is possible to deadlock in the following situation: One thread holds xbd_io_lock and it tries to allocate grant refs, so it tries to acquire gnttab_list_lock. But gnttab_list_lock is held by another thread that is currently freeing some grant references. If in this situation, the free callback is received, the thread holding gnttab_list_lock will try to acquire xbd_io_lock, causing a deadlock.
  • Need to ask Roger how I can fix it.

July 29-30:

Summary:

  • More testing.
  • Finally able to produce a shortage of grant refs.
  • The kernel panics when the grant table callback is received because of a recursion on the non-recursive lock gnttab_list_lock. More details here.
  • Fixed it by adding a check in get_free_entries() about whether the thread is holding the lock or not. If it is, don’t try to acquire the lock again.
  • Roger thinks it is better to make it a recursive lock. Submitted a patch. Accepted and committed by Roger.
  • More testing reveals that I probably got lucky when I was able to receive the grant table callback, because right now, the kernel deadlocks (I think) whenever I try to produce a shortage of grant references. It is either a deadlock or the busdma implementation is not releasing the grant references properly. Will need to look into the blkfront code to see where things are going wrong.
  • Testing for low grant refs has low return on effort I believe. Before the patch I submitted, it was IMPOSSIBLE to use the callback without causing a panic, and it never got fixed until today. So apparently shortage of grant references does not happen very often.

July 26 - 28:

Commits:

Summary:

  • It is really tough to balance GSoC and college. But I just have to suck it up. Working 75+ hours a week is really tiring.
  • Trying to produce a shortage of grant references has led me on a long chase. Turns out, there are more bugs in the Xen code (the code that is already there, not the one I’m writing) than I anticipated.
  • If USB is allowed to have root mount hold, and there is a situation with scarcity of grant references, there is a kernel panic originating from an incorrect list removal from usb_bus_explore()'s call to root_mount_rel().
  • I have no idea how to fix it. It is not directly caused by the grant table subsystem, so I’m lost.
  • Circumvented it by setting USB_HAVE_ROOT_MOUNT_HOLD to 0 in dev/usb/usb_freebsd.h.
  • It is a lot of trial and error involved here. It turns out, if I have a maximum of 8 grant table pages, I can boot a DomU and produce a shortage of grant refs.
  • Luckily, I just discovered a double free which happens when a disk is removed on a running system. Check the commit here for more details.
  • Running multiple block devices at the same time is proving to be a little complicated. I want to try with one disk first. Then I’ll go to multiple disks.
  • Ran make -j4 kernel to see if the build completes. It goes on for a while but then freezes because of a shortage of grant refs.
  • If I do ^C, it gets me back to the shell, but any disk operation permanently freezes the system. Maybe the grant references are not being freed properly?
  • I’ll admit, I am getting a little bored of this shortage of grant refs thing, but I have to test it. But, as a change, I started looking at the netfront code.
  • It is rather straightforward, but the current implementation can’t be used with netfront. That is because right now, both allocation and mapping of grant refs happens at the same time. This won’t work with netfront. This is because netfront maintains a pool of grant references upon connection, and the places where the grant refs are mapped do not expect failure.
  • There are two ways to approach this:
    • Add the ability to set the refs array of the map by the client. This way, netfront can allocate its references and then set it up in the map.
    • Add the ability to pre-allocate grant references, and map them one-by-one. But for this, a mechanism to specify which reference is being mapped needs to be introduced. I need to talk with Roger about this.
  • One more essential thing I need to implement is the ability to load an already-loaded map. This is because netfront maps one grant ref at a time and creating a new map for each grant reference is overkill.

July 24-25:

Summary:

  • A lot of trial-and-error later, I figure out that the lowest value of gnttab_max_frames at which I can get the kernel to boot is 4.
  • But at 4 frames, the Xen DomU does not boot. The netfront driver prints “Failed to allocate tx refs” and then a page fault happens.
  • Tried to figure out a fix for the page fault, so I can submit the fix.
  • It originates from gnttab_free_grant_references() which is called from xn_setup_txqs(). The reason for the page fault is that earlier when the allocation of grant refs failed, the gref_head was set to GNTTAB_LIST_END. gnttab_free_grant_references() does not check for an invalid gref_head and thus causes a page fault.
  • Fixed that, but now there is a page fault in gnttab_end_foreign_access_ref(). This is caused because on fail, xn_setup_txqs() calls xn_disconnect_txq() which calls gnttab_end_foreign_access_ref(). But since there was a shortage of grant refs, the ring reference is GNTTAB_REF_INVALID and this causes a page fault.
  • Fixed that too, but now the kernel panics somewhere else.
  • I can’t spend too much time on this bug. I already have limited time with my college going on. I’ll let Roger know about this, and maybe he’ll figure something out.
  • I’ll try to test the busdma implementation in a shortage of grant refs, and then start working on netfront ASAP.

July 23:

Summary:

  • An email from Google arrived talking about how I should submit my work for the final evaluation. Will need to talk with my mentors about it.
  • Like Roger suggested, I started looking into creating a shortage of grant refs. When I set gnttab_max_frames=1 in the Xen command line, it does not even boot. Need to figure out the minimum number of frames I need to boot.
  • In the meanwhile, I’ll start organizing my code for the final evaluation.

July 22:

Summary:

  • College starts tomorrow. I won’t have as much time on my hands from now on. Still, I’ll try my best to finish as much work as I can before the official GSoC deadline.

July 19-20:

Commits:

Summary:

  • More testing. Tried multiple transfers to multiple disks at the same time. Works.
  • Review with Roger. He started reviewing the busdma implementation from the start once more. Check it here
  • Fixed multiple code style issues Roger suggested.
  • Wrote multiple lengthy explanations about some parts of the code Roger was asking about.

July 18

Commits:

Summary:

  • Before trying to run on a FreeBSD Dom0, I decided to attempt to narrow down which part is failing. So I started systematically.
  • First, I tried to determine whether or not we were even loading the memory properly. So, I reverted the blkfront driver to the original, and removed the code in the busdma implementation that handled the grant refs, and left only the code that handled the loading.
  • Compiled and ran. No errors. So the loads are happening just fine.
  • Then either the allocation/mapping of grant refs is going wrong, or the blkfront code is going wrong.
  • Tracing of every loaded map would not work. Most of the maps succeed, and there would be too many prints making it difficult to read the outputs, and the system would be slowed down considerably. To trace the code paths of only the maps that fail would require a per-map logging system. I could implement a simple per-map logging system, but I want to leave it as a last resort.
  • So, I next checked if the update to blkfront was wrong somewhere.
  • We have touched two areas: xbd_mksegarray()/xbd_queue_cb() and xbd_connect().
  • I reset xbd_mksegarray() to how it was before (this would mean double the number of grant refs would get allocated because the busdma implementation allocates its own, and then xbd_queue_cb() allocates its own. But let’s leave that for a while). The errors are still showing.
  • So, either the mapping of indirection refs is going wrong somewhere, or the busdma implementation is not properly allocating grant refs.
  • I reset xbd_connect() to the original and kept xbd_mksegarray()/xbd_queue_cb() to how it was post-update.
  • The errors disappear.
  • So, the culprit is somewhere in xbd_connect().
  • I finally have my eureka moment. I tried resetting xbd_connect() to the original before (I was not going for this systematic approach then), and the errors were showing up then. But this time they didn’t. It caught my attention that I forgot to reset block.h last time. So, I actually forgot to change the type of cm_indirectionrefs, which was an array in the original, and a pointer after the update.
  • Ah! There is the bug. When xbd_queue_cb() uses the indirection refs, it does a memcpy() to copy the grant refs of the indirection pages to the ring’s indirect_grefs. There it calls:
    memcpy(ring_req->indirect_grefs, &cm->cm_indirectionrefs,
    	    sizeof(grant_ref_t) * sc->xbd_max_request_indirectpages);
    
  • Notice the ampersand with cm->cm_indirectionrefs. It was needed when cm_indirectionrefs was an array. But now it is a pointer. Adding the ampersand would result in the wrong memory being copied.
  • Changing it to this fixes the errors.
    memcpy(ring_req->indirect_grefs, cm->cm_indirectionrefs,
    	sizeof(grant_ref_t) * sc->xbd_max_request_indirectpages);
    
  • A simple ampersand took me 4 days of debugging.
  • Tested the newfs command, like Roger suggested. Works.
  • Tested with low memory (512M) and copied large files, works fine.
  • I will start looking at the netfront driver tomorrow.

July 16 - 17

Commits:

Summary:

  • Lots of debugging.
  • The kernel boots, but it prints an error message every second or so.
  • Around 35% reads/writes have failed by the time the kernel boots and we log in.
  • The error is originating from an EIO returned by the ring request that we send to the backend.
  • Still haven’t figured out where or what is going wrong.
  • Asked Roger, he has no clue either.
  • Guess its trial and error from here on.
  • Looked at the backend code. There are 10 places where BLKIF_RSP_ERROR is returned. Defined the macro XBB_DEBUG, so DPRINTF() would print the error messages.
  • Weirdly enough, no error messages are printed by the backend.
  • Ah, realized my mistake. I am running on a Linux dom0, so making changes to the FreeBSD blkback won’t do any good.
  • I really don’t want to build the Linux kernel and hack it. I’ll try to run the domU on a FreeBSD dom0 and see if I can figure something out.

Monday, 16 July 2018

[After First Evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

[After First Evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

For summary after the second evaluation, check the new post here

June 18 - 19:

Commits:

Summary:

  • Review with Roger.
  • He suggested multiple changes. Had discussions about some other issues.
  • Failing to allocate grant references should not be fatal. Find out a way to use the grant table free callbacks.
  • Removed the extra malloc tag I introduced.
  • The three _load functions have the same exact piece of code. Replaced it with a call to a helper.
  • Removed gref_head from the Xen DMA map. No need for it. Use gnttab_end_foreign_access_references() instead.
  • Realized a potential bug in the implementation. If the load is called with BUS_DMA_WAITOK and the load is deferred, the grant references were not getting allocated and loaded. This is because in case of error, we returned from without calling xen_load_helper(). So, when the allocation completed from a deferred context, the client callback was called with no grant references allocated. Fixed it (maybe) by passing a custom xen callback to the parent’s waitok().
  • Fixed it.
  • Failing to allocate grant references is no longer fatal. Using the free callbacks
  • First stipend arrived. Yay!

June 20 - 21:

Commits:

Summary:

  • Fixed multiple bugs and errors.
  • The client callback was not being called if the allocation of grant refs was deferred. Fixed it.
  • Fixed some memory leaks.
  • The code has gotten rather complicated after the introduction of xen_gnttab_free_callback(). Spent a lot of time reasoning about its correction.

June 22 - 23:

Commits:

Summary:

  • The code was getting rather complicated. So time to overhaul it a bit.
  • Roger suggested that I try to get the code in the same state before it reaches xen_gnttab_free_callback() so there is no need for that if.
  • I fixed it by calling the parent’s map_complete() a little early. This means that we can save a copy of the segs array in xen_load_helper(), and the free callback has to simply use the copy in both cases: initial load deferred, and not deferred.
  • We need to call the parent’s map_complete() anyway, so it’s not like we are doing anything wrong. This is because once the parent’s load is complete, its map_complete() should be called. But if the allocation of grant refs fails, we will sleep. It is better to complete the parent’s load cycle before waiting.

June 25:

Commits:

Summary:

  • Roger had a great idea about the problem with moving more of the duplicated code to a single place. Since the loads use different types and number of arguments, he suggested that I use a union.
  • Modified the loads to use the union and moved the duplicated code to xen_load_helper().
  • Roger also pointed out that the code would be unreachable because I was doing:
    if (error == EINPROGRESS) {
    	return (error);
    }
    else {
    	return (error);
    }
    
  • But I should instead have else if (error != 0) otherwise the code below would never run.

June 26 - 27:

Summary:

  • The Xen-specific bus_dma(9) implementation is almost complete, other than some minor additions/fixes:
    • Memory leak and segmentation fault if unload is called before the load finishes.
    • The ability to reserve a pool of grant references, like netfront and blkfront do.
    • The ability to extend an already loaded map.
  • Wrote the freebsd-xen@ and freebsd-arch@ mailing lists to ask if I should expect a map to be unloaded before the load completes. No responses so far.
  • Looked at the netfront driver to understand how it works and convert it to use the bus_dma(9) implementation we wrote.
  • Need to write tests for our implementation. Spent some time reading sys/dev/xen/netback/netback_unit_tests.c.
  • Wrote to my Edward, Roger and Akshay about the tests. Roger says it is too much work, and testing of a busdma implementation has never been done before. It is probably a GSoC on its own.
  • So scrap the plans about testing.
  • Roger suggests I start with blkfront.
  • Started reading through the blkfront code.

June 28-30:

Commits:

Summary:

  • Lots of reading. Still going through the blkfront code.
  • Did some grepping around to find how other drivers use bus_get_dma_tag().
  • Not much success, there are a lot of results, and most don’t do anything interesting, they simple call the parent’s bus_get_dma_tag().
  • Looked at how busdma_dmar does things.
  • They have a function dmar_get_bus_dma_tag() that returns the ctx tag after some initialization.
  • Looked into sys/x86/iommu/intel_ctx.c. They simply set the parameters to max.
  • Also looked at sys/dev/acpica/acpi_pci.c. They use the dmar tag.
  • Wrote xen_get_dma_tag() that can be used to get the xen-specific dma tag.
  • Added xenpv_get_dma_tag(). It can be used by devices hanging off the xenpv bus to get the xen dma tag.

July 2-3:

Summary:

  • Read an article about the Xen device drivers.
  • Lots of code reading.
  • Made some minor changes/fixes to xenpv.c

July 4-5:

Commits:

Summary:

  • Started converting the blkfront driver to use the Xen-specific bus_dma(9) implementation.
  • There are two places grant refs are allocated or foreign access granted in the blkfront driver:
    • xbd_connect(): Here, foreign access for the xbd command indirection pages in granted.
    • xbd_mksegarray(): This function is called from the callback specified to the dma load called from xbd_queue_request(). The callbacks calls xbd_mksegarray(). This function more or less does what our busdma implementation does: get ds_addr and grant foreign access to that address. Converting it would be tougher than converting xbd_connect() because it chases a lot of functions and is more complex.
  • Converted the xbd command indirection pages to use the busdma implementation.
  • Will ask Roger for review, and then start converting the dma loads in xbd_queue_request().

July 6:

Commits:

Summary:

  • Read a blog post about Xen block device indirect pages.
  • Review with Roger.
  • He suggested some small changes. Did them.
  • Discovered a bug in xen_bus_dma_tag_create() which would cause an infinite recursion if a driver creates a tag. This would happen because we call bus_dma_tag_create(parent, ...) from xen_bus_dma_tag_create(). The call to tag create would expand to parent->impl->tag_create(parent,...). Parent’s tag_create is xen_bus_dma_tag_create(), so this would result in an infinite recursion. Need to fix it.

July 9:

Commits:

Summary:

  • Fixed the bug in xen_bus_dma_tag_create() which would cause an infinite recursion if a driver creates a tag. Achieved this by passing parent to common_bus_dma_tag_create() instead, and pass the parent tag’s machine dependent tag to bus_dma_tag_create().
  • This raises another problem: xen_get_dma_tag() calls xen_bus_dma_tag_create() with the machine dependent dma tag in the parent argument. So there is need for a way to signal to xen_bus_dma_tag_create() that this is a special case where we are “bootstrapping” the dma tag. Used BUS_DMA_BUS1 flag here, which is reserved for the busdma functions to use as they wish.
  • Added a refcount check in xen_bus_dma_tag_destroy(), so that we don’t free an in-use tag.

July 10-11:

Commits:

Summary:

  • Filled the second GSoC evaluation.
  • Multiple minor fixes like disabling indirection pages if the load for them fails, shifting ds_addr by PAGE_SHIFT.
  • Started updating xbd_mksegarray() to use the busdma implementation.
  • There is a problem: xbd_mksegarray() uses ds_addr, but we replace it with the grant reference, so xbd_mksegarray() will not work properly without it.
  • Introduced the function xen_dmamap_get_grefs() that client drivers can call to access the grant references. This makes it possible to access ds_addr and the grant refs simultaneously.
  • Updated xbd_mksegarray(), xbd_queue_request(), and xbd_int() to use the busdma implementation.
  • The blkfront driver is completely updated to use the Xen-specific busdma implementation now. What remains is to, of course, test it.
  • I need to build the world and kernel, and it takes a really long time. I’ll probably leave it running overnight.
  • Also need to figure out how to convert the source build into an ISO so I can run it as a guest.

July 12-13:

Summary:

  • Time to test the code I wrote.
  • Roger suggests I grab one of the pre-built VM images from the FreeBSD FTP server and run the build in them.
  • Downloaded the image and tried to boot. The boot gets stuck somewhere in the early stages.
  • After a lot of looking around, I decide to try the disk image in RAW format and not in VHD format.
  • Voila! It boots.
  • Now I need to set up networking so I can fetch the FreeBSD sources and build them.
  • Turns out it is much tougher than I expected.
  • The official Xen guide does not work. The VM detects xenbr0 but it does not correctly set up the network. I can’t SSH into the VM from dom0 or connect to the internet.
  • Googled around more. No success.
  • I caught a lucky break. As a last resort, I try the lxcbr0 bridge I set up for a VirtualBox VM some time ago, and I had it lying around in my network interfaces. It works!
  • After some tinkering around with the DNS settings, networking finally works in the VM.
  • Checked out the code from my repo and compiled.
  • The kernel panics on boot.
  • The panic comes from a failed assertion inxbd_indirectpage_cb(). After some looking around, I realise the assertion condition is wrong. I was using sc->xbd_max_request_segments and I should instead have used sc->xbd_max_indirectpages.
  • Fixed it.
  • The kernel now boots. But it keeps throwing some error messages every few seconds. Looked around, but couldn’t fix it. Need to ask Roger about it.

For summary after the second evaluation, check the new post here

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD

[After second evaluation] Import the Xen grant-table bus_dma(9) handlers from OpenBSD The links to the commits are prob...