mkfs: avoid blockdev failing to re-read partition table mika/273
authorMichael Prokop <mika@grml.org>
Thu, 25 Apr 2024 15:02:20 +0000 (17:02 +0200)
committerMichael Prokop <mika@grml.org>
Thu, 25 Apr 2024 15:13:06 +0000 (17:13 +0200)
Invoking `blockdev --rereadpt` straight after creating the file system
might fail with:

| blockdev: ioctl error on BLKRRPART: Device or resource busy
| Unexpected non-zero exit code 1 in /sbin/grml-debootstrap /sbin/grml-debootstrap /sbin/grml-debootstrap at line 1376 2132 0 detected!
| last bash command: blockdev --rereadpt "$main_device"

This is caused by udev kicking in and causing a race condition. Let's
invoke udevadm settle (which watches the udev event queue, and exits if
all current events are handled), and then retry `blockdev --rereadpt
...` up to 30 times/seconds.

Thanks: Darshaka Pathirana for bug report and initial investigation, and Chris Hofstaedtler for feedback
Closes: https://github.com/grml/grml-debootstrap/issues/273

grml-debootstrap

index 97dc26b..19d7a52 100755 (executable)
@@ -1372,10 +1372,28 @@ mkfs() {
       # if we deploy to /dev/sdX# then let's see if /dev/sdX exists
       local main_device="${TARGET%%[0-9]*}"
       # sanity check to not try to e.g. access /dev/loop if we get /dev/loop0
-      if [ -f "/sys/block/$(basename "${main_device}")/$(basename "${TARGET}")/dev" ] ; then
-        blockdev --rereadpt "$main_device"
-      else
+      if ! [ -f "/sys/block/$(basename "${main_device}")/$(basename "${TARGET}")/dev" ] ; then
         einfo "No underlying block device for $TARGET identified, skipping blockdev --rereadpt."
+      else
+        udevadm settle
+        # ensure we give blockdev up to 30 seconds/retries
+        local timeout=30
+        local success=0
+        while [ "$timeout" -gt 0 ] ; do
+          ((timeout--))
+          if blockdev --rereadpt "${main_device}" ; then
+            success=1
+            break
+          else
+            ewarn "Failed to reread partition table of ${main_device} [${timeout} retries left]"
+            sleep 1
+          fi
+        done
+
+        if [ "${success}" = "0" ] ; then
+          eerror "Error: failed to reread partition table, giving up."
+          bailout 1
+        fi
       fi
     fi
     # give the system 2 seconds, otherwise we might run into