aboutsummaryrefslogtreecommitdiffstats
path: root/disk-io.c
Commit message (Collapse)AuthorAgeFilesLines
* btrfs-progs: use proper unaligned helper in btrfs_csum_finalDavid Sterba2016-07-291-1/+1
| | | | | | | | This will not cause unaligned access as the checksum is at the beginning of btrfs_header and thus aligned to a page, but for clarity use the helper. Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs-progs: Initialize stripesize to the value of sectorsizeChandan Rajendra2016-06-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stripesize should ideally be set to the value of sectorsize. However previous versions of btrfs-progs/mkfs.btrfs had set stripesize to a value of 4096. On machines with PAGE_SIZE other than 4096, This could lead to the following scenario, - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single filesystem. The filesystem was created by an older version of mkfs.btrfs which set stripesize to 4k. - losetup -a /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img) /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img) /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img) - /etc/mtab lists only /dev/loop0 - losetup /dev/loop4 /root/disk-imgs/file-1.img The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even though /dev/loop1 has already been mounted and has /root/disk-imgs/file-1.img as its backing file. The above behaviour occurs because check_super() function returns an error code (due to stripesize not being set to 4096) and hence check_mounted_where() function treats /dev/loop1 as a disk containing a filesystem other than Btrfs. Hence as a workaround this commit allows 4096 as a valid stripesize. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: drop O_CREATE from open_ctree_fs_infoDavid Sterba2016-06-011-2/+2
| | | | | | | We stat the filesystem path before trying to open it so there's no point to pass O_CREAT ("btrfs-progs: add stat check in open_ctree_fs_info"). Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Enhance tree block check by checking empty leaf or nodeQu Wenruo2016-06-011-0/+5
| | | | | | | | | | | For btrfs, it's possible to have empty leaf, but empty node is not possible. Add check for empty node for tree blocks. Suggested-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: replace printf with message helpers in check_superLiu Bo2016-05-111-28/+23
| | | | | Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: add three more validation checks for superblockLiu Bo2016-05-111-14/+30
| | | | | | | | | | | | | | | This adds validation checks for super_total_bytes, super_bytes_used and super_stripesize. Since these checks are made after superblock finishes checksum checking, this also adds a notice of "superblock checksum matches but..". Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> [ adjusted message wording ] Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: fix incorrect flag check while recovering superLiu Bo2016-05-111-1/+1
| | | | | | | | The flag OPEN_CTREE_RECOVER_SUPER is set when it's going to recover any bad superblock copy, the current code doesn't match that. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: deprecate and stop using btrfs_level_sizeDavid Sterba2016-05-021-7/+5
| | | | | | Size of a b-tree node is always nodesize, regardless of the level. Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: replace leafsize with nodesizeDavid Sterba2016-05-021-4/+4
| | | | | | | | Nodesize is used in kernel, the values are always equal. We have to keep leafsize in headers, similarly the tree setting functions still take and set leafsize, but it's effectively a no-op. Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: switch to common message helpers in open_ctree_fs_infoDavid Sterba2016-03-181-2/+2
| | | | Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: handle stat errors in open_ctree_fs_infoDavid Sterba2016-03-181-1/+6
| | | | Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: add stat check in open_ctree_fs_infoAustin S. Hemmelgarn2016-03-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, open_ctree_fs_info will open whatever path you pass it and try to interpret it as a BTRFS filesystem. While this is not nessecarily dangerous (except possibly if done on a character device), it does result in some rather cryptic and non-sensical error messages when trying to run certain commands in ways they weren't intended to be run. Add a check using stat(2) to verify that the path we've been passed is in fact a regular file or a block device, or a symlink pointing to a regular file or block device. This causes the following commands to provide a helpful error message when run on a FIFO, directory, character device, or socket: * btrfs check * btrfs restore * btrfs-image * btrfs-find-root * btrfs inspect-internal dump-tree stat(2) is used instead of lstat(2), as stat(2) follows symlinks just like open(2) does, which means we check the same inode that open(2) opens, and thus don't need special handling for symlinks. Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Add new option for specify chunk root bytenrLu Fengqi2016-03-141-6/+21
| | | | | | | | | | Add new btrfsck option, '--chunk-root', to specify chunk root bytenr. And allow open_ctree_fs_info() function accept chunk_root_bytenr to override the bytenr in superblock. This will be mainly used when chunk tree corruption. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Add support for tree block operations on fs_info without rootsQu Wenruo2016-02-261-30/+39
| | | | | | | | | | | | | Since open_ctree_fs_info() now may return a fs_info even without any roots, modify functions like read_tree_block() to operate with such fs_info. This provides the basis for btrfs-find-root to operate on chunk tree with corrupted fs. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ coding style adjustments, unified declarations ] Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Allow open_ctree to return fs_info even chunk tree is corruptedQu Wenruo2016-02-261-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Current open_ctree_fs_info() won't return anything if chunk tree root is corrupted. This makes some function, like btrfs-find-root, unable to find any older chunk tree root, even it is possible to use system_chunk_array in super block. And at least two users in mail list has reported such heavily chunk corruption. Although we have 'btrfs rescue chunk-recovery' but it's too time consuming and sometimes not able to cope with a specific filesystem corruption. This patch adds a new open ctree flag, OPEN_CTREE_IGNORE_CHUNK_TREE_ERROR, allowing fs_info to be returned from open_ctree_fs_info() even there is no valid tree root in it. Also adds a new close_ctree() variant, close_ctree_fs_info() to handle possible fs_info without any root. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ adjusted error messages ] Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: add basic awareness of the free space treeOmar Sandoval2016-01-121-1/+15
| | | | | | | | | | | To start, let's tell btrfs-progs to read the free space root and how to print the on-disk format of the free space tree. However, we're not adding the FREE_SPACE_TREE read-only compat bit to the set of supported bits because progs doesn't know how to keep the free space tree consistent. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Enhance chunk validation checkQu Wenruo2016-01-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enhance chunk validation: 1) Num_stripes We already have such check but it's only in super block sys chunk array. Now check all on-disk chunks. 2) Chunk logical It should be aligned to sector size. This behavior should be *DOUBLE CHECKED* for 64K sector size like PPC64 or AArch64. Maybe we can found some hidden bugs. 3) Chunk length Same as chunk logical, should be aligned to sector size. 4) Stripe length It should be power of 2. 5) Chunk type Any bit out of TYPE_MAS | PROFILE_MASK is invalid. With all these much restrict rules, several fuzzed image reported in mail list should no longer cause btrfsck error. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: use on-stack buffer in __csum_tree_block_sizeDavid Sterba2015-11-131-7/+1
| | | | | | | We know the maximum size of a checksum, calling malloc for 4 bytes is weird. Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: use calloc instead of malloc+memsetSilvio Fricke2015-10-211-5/+2
| | | | | | | | | | | | | | | | | | | | | | This patch is generated from a coccinelle semantic patch: identifier t; expression e; statement s; @@ -t = malloc(e); +t = calloc(1, e); ( if (!t) s | if (t == NULL) s | ) -memset(t, 0, e); Signed-off-by: Silvio Fricke <silvio.fricke@gmail.com> [squashed patches into one] Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: add more superblock validation checksQu Wenruo2015-10-211-4/+142
| | | | | | | | | | | | | | Now btrfs-progs will have much more strict superblock checks based on kernel superblock checks. This should prevent crashes or invalid memory access on crafted or fuzzed images. Based on kernel commit c926093ec516f5d316ecdf8c1be11f577ac71b85 . Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [added reference to kernel and comments] Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Read the whole superblock instead of struct btrfs_super_blockQu Wenruo2015-10-211-16/+17
| | | | | | | | | | | Before the patch, btrfs-progs will only read sizeof(struct btrfs_super_block) and restore it into super_copy. This makes checksum check for superblock impossible. Change it to read the whole superblock. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Show detail error message when write sb failed in ↵Zhao Lei2015-10-021-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_dev_supers() fsck-tests.sh failed and show following message in my node: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr disk-io.c:1444: write_dev_supers: Assertion `ret != BTRFS_SUPER_INFO_SIZE` failed. /root/btrfsprogs/btrfs-image(write_all_supers+0x2d2)[0x41031c] /root/btrfsprogs/btrfs-image(write_ctree_super+0xc5)[0x41042e] /root/btrfsprogs/btrfs-image(btrfs_commit_transaction+0x208)[0x410976] /root/btrfsprogs/btrfs-image[0x438780] /root/btrfsprogs/btrfs-image(main+0x3d5)[0x438c5c] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] /root/btrfsprogs/btrfs-image[0x4074e9] failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img # # cat fsck-tests-results.txt === Entering /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr restoring image default_case.img failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img # Reason: I run above test in a NFS mountpoint, it don't have enouth space to write all superblock to image file, and don't support sparse file. So write_dev_supers() failed in writing sb and output above message. It takes me quite of time to know what happened, we can save these time by output exact information in write-sb-fail case. After patch: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr WARNING: Write sb failed: File too large disk-io.c:1492: write_all_supers: Assertion `ret` failed. ... # Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: use calloc instead of malloc+memset for tree rootsOmar Sandoval2015-09-141-15/+7
| | | | | Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: Avoid reading bad fd in case of missing device.Qu Wenruo2015-08-311-2/+2
| | | | | | | | | | | | | | | | | Offline btrfs tools, like btrfs-image, will infinitely loop when there is missing device. The reason is, for missing device, it's fd will be set to -1, but before we reading, we only check the fd validation by checking if it's 0. So in that case, -1 will pass the validation check, and cause pread to return 0, and loop to read. Just change the validation check from "== 0" to "<= 0" to avoid such problem. Reported-by: Timothy Normand Miller <theosib@gmail.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: add missing free operation of raid_map for raid56Zhao Lei2015-08-311-0/+1
| | | | | | | | We forgot free raid_map for raid56's map_bio. This patch add it. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: disk-io: Support commit transaction on chunk treeQu Wenruo2015-07-101-0/+2
| | | | | | | | | As chunk tree is only stored in super block, chunk tree commit doesn't need to go through tree root update. Or a BUG_ON will be triggered. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs-progs: export read_extent_data functionQu Wenruo2015-06-171-0/+34
| | | | | | | Export it for later btrfs-map-logical cleanup. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Enhance read_tree_block to avoid memory corruptionQu Wenruo2015-05-251-0/+26
| | | | | | | | | | | | | | | | | | | | | Add the following tree block check to avoid memory corruption on hostile image: 1) Check level. Level >= BTRFS_MAX_LEVEL won't be read out. 2) Nritems. For nr_items > max_nritems, the tree_block won't be read out. Max nritems is calculated in a easy method. For node, it's straightforward, just (nodesize - header size) / (btrfs_key_ptr) For leaf, (nodesize - header size) / (btrfs_item), as btrfs support zero item size This fixes 3 kernel bugs: BZ#97171, BZ#97191, BZ#97271. Reported-by: Lukas Lueg <lukas.lueg@gmail.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Export write_tree_blockQu Wenruo2015-05-141-2/+2
| | | | | | | | | | Export write_tree_block() function and allow it write extent without transaction. This provides the basis for later uuid change function. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Add open_ctree check for uuid changingQu Wenruo2015-05-141-1/+10
| | | | | | | | | | | Now open_ctree will exit if it found the superblock is marked CHANGING_FSID, except given IGNORE_FSID open ctree flags. Kernel will do the same thing later. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [removed the chunk tree flag, reworded the error message] Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: fix typo in OPEN_CTREE flagDavid Sterba2015-02-121-3/+3
| | | | | | | Introduced in "btrfs-progs: Add new btrfs_open_ctree_flags CHUNK_ONLY" by my local fixups. Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Add new btrfs_open_ctree_flags CHUNK_ONLYQu Wenruo2015-02-111-1/+5
| | | | | | | | | | | | Add new flag CHUNK_ONLY and internal used only flag __RETURN_CHUNK. CHUNK_ONLY will imply __RETURN_CHUNK, SUPPRESS_ERROR and PARTIAL, which will allow the fs to be opened with only chunk tree OK. This will improve the usability for btrfs-find-root. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Add support to suppress tree block csum error outputQu Wenruo2015-02-111-8/+15
| | | | | | | | | | | | Add new open ctree flag OPEN_CTREE_SUPPRESS_CHECK_BLOCK_ERRORS to suppress tree block csum error output. Provides the basis for new btrfs-find-root and other enhancement on btrfs offline tools output. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [renamed vars and funcs, added comments] Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Cleanup check_tree_block() functionQu Wenruo2015-02-111-8/+37
| | | | | | | | | | | | | Before this patch, check_tree_block() will print error on bytenr mismatch but don't output error on fsid mismatch. This patch will modify check_tree_block(), so it will only return errno but not print error messages. The error message will be output by print_tree_block_err() function. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [renamed and cleaned return codes] Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: skip opening all devices with restoreJosef Bacik2015-02-091-3/+5
| | | | | | | | | | | When we go to fixup the dev items after a restore we scan all existing devices. If you happen to be a btrfs developer you could possibly open up some random device that you didn't just restore onto, which gives you weird errors and makes you super cranky and waste a day trying to figure out what is failing. This will make it so that we use the fd we've already opened for opening our ctree. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>
* Btrfs-progs: remove global transaction from fsckJosef Bacik2015-02-091-0/+2
| | | | | | | | | | | We hold a transaction open for the entirety of fixing extent refs. This works out ok most of the time but we can be tight on space and run out of space when fixing things. To get around this just push down the transaction starting dance into the functions that actually fix things. This keeps us from ending up with ENOSPC because we pinned everything and allows the code to be a bit simpler. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>
* btrfs-progs: read_tree_block() and read_node_slot() cleanup.Qu Wenruo2015-02-021-5/+5
| | | | | | | | | | | | Allow read_tree_block() and read_node_slot() to return error pointer. This should help caller to get more specified error number. For existing callers, change (!eb) judgmentt to (!extent_buffer_uptodate(eb)) to keep the compatibility, and for caller missing the check, use PTR_ERR(eb) if possible. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Record orphan data extent ref to corresponding root.Qu Wenruo2015-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | Before this patch, when a extent's data ref points to a invalid key in fs tree, this happens if a leaf/node of fs tree is corrupted, btrfsck can't do any repair and just exit. In fact, such problem can be handled in fs tree repair routines, rebuild the inode item(if missing) and add back the extent data (with some assumption). So this patch records such data extent refs for later fs tree recovery routine. TODO: Restore orphan data extent refs into btrfs_root is not the best method. It's best to directly restore it into inode_record, however current extent tree and fs tree can't cooperate together, so use btrfs_root as a temporary storage until inode_cache is built. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: drop feature defines from C files, in favour of CFLAGS definesDimitri John Ledkov2015-01-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | glibc 2.10+ (5+ years old) enables all the desired features: _XOPEN_SOURCE 700, __XOPEN2K8, POSIX_C_SOURCE, DEFAULT_SOURCE; with a single _GNU_SOURCE define in the makefile alone. For portability to other libc implementations (e.g. dietlibc) _XOPEN_SOURCE=700 is also defined. This also resolves Debian bug report filed by Michael Tautschnig - "Inconsistent use of _XOPEN_SOURCE results in conflicting declarations". Whilst I was not able to reproduce the results, the reported fact is that _XOPEN_SOURCE set to 500 in one set of files (e.g. cmds-filesystem.c) generates/defines different struct stat from other files (cmds-replace.c). This patch thus cleans up all feature defines, and sets them at a consistent level. Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=747969 Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Fix a copy-n-paste bug in btrfs_read_fs_root().Qu Wenruo2015-01-091-1/+1
| | | | | | | | | Introduced in commit 96ec888aad41969 ("btrfs-progs: add quota group verify code"). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Fix a clang dead-judgement warning in disk-io.c.Qu Wenruo2014-12-191-3/+6
| | | | | | | | | | | | | | | | | | | | When compiled with clang, the following warning is outputted. disk-io.c:1017:15: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (dev_size < 0) ~~~~~~~~ ^ ~ 1 warning generated. This is because dev_size is defined as unsigned type, but lseek() will return singed valued. So the judgement will always to false. Use temporary off_t return value to solve it. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: Check sb_bytenr with device size before scanning one device.Qu Wenruo2014-11-141-0/+10
| | | | | | | | | | | | When using btrfs check with -s option, if using '-s 2' on a small device which doesn't have the third superblock, "No valid Btrfs found" will be output, but it is not appropriate. So check sb_bytenr against device size before scanning a device and output proper error message. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: don't fail on log tree opening with PARTIALJosef Bacik2014-11-141-1/+2
| | | | | | | | | We were failing to fsck a volume because we couldn't open the log tree, which is not helpful. Make us skip erroring out if we are using OPEN_CTREE_PARTIAL since it isn't a mandatory tree. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: optimize btrfs_scan_lblkid() for multiple callsAnand Jain2014-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_scan_lblikd() is called by most the device related command functions. And btrfs_scan_lblkid() is most expensive function and it becomes more expensive as number of devices in the system increase. Further some threads call this function more than once for absolutely no extra benefit and the real waste of resources. Below list of threads and number of times btrfs_scan_lblkid() is called in that thread. btrfs-find-root 1 btrfs rescue super-recover 2 btrfs-debug-tree 1 btrfs-image -r 2 btrfs check 2 btrfs restore 2 calc-size NC btrfs-corrupt-block NC btrfs-image NC btrfs-map-logical 1 btrfs-select-super NC btrfstune 2 btrfs-zero-log NC tester NC quick-test.c NC btrfs-convert 0 mkfs #number of devices to be mkfs btrfs label set unmounted 2 btrfs get label unmounted 2 This patch will: move out calling register_one_device with in btrfs_scan_lblkid() and so function setting the BTRFS_UPDATE_KERNEL to yes will call btrfs_register_all_devices() separately. introduce a global variable scan_done, which is set when scan is done succssfully per thread. So that following calls to this function will just return success. Further if any function needs to force scan after scan_done is set, then it can be done when there is such a requirement, but as of now there isn't any such requirement. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: fix csum root copy-n-paste errorZach Brown2014-11-031-30/+37
| | | | | | | | | | | btrfs_setup_all_roots() had some copy and pasted code for trying to setup a root and then creating a blank node if that failed. The copy for the csum_root created the blank node in the extent_root. So we create a function to use a consistent root. Signed-off-by: Zach Brown <zab@zabbo.net> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: check, ability to detect and fix outdated snapshot root itemsFilipe Manana2014-10-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds code to detect and fix the issue introduced in the kernel release 3.17, where creation of read-only snapshots lead to a corrupted filesystem if they were created at a moment when the source subvolume/snapshot had orphan items. The issue was that the on-disk root items became incorrect, referring to the pre orphan cleanup root node instead of the post orphan cleanup root node. A test filesystem can be generated with the test case recently submitted for xfstests/fstests, which is essencially the following (bash script): workout() { ops=$1 procs=$2 num_snapshots=$3 _scratch_mkfs >> $seqres.full 2>&1 _scratch_mount snapshot_cmd="$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT" snapshot_cmd="$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\`" run_check $FSSTRESS_PROG -p $procs \ -x "$snapshot_cmd" -X $num_snapshots -d $SCRATCH_MNT -n $ops } ops=10000 procs=4 snapshots=500 workout $ops $procs $snapshots Example of btrfsck's (btrfs check) behaviour against such filesystem: $ btrfsck /dev/loop0 root item for root 311, current bytenr 44630016, current gen 60, current level 1, new bytenr 44957696, new gen 61, new level 1 root item for root 1480, current bytenr 1003569152, current gen 1271, current level 1, new bytenr 1004175360, new gen 1272, new level 1 root item for root 1509, current bytenr 1037434880, current gen 1300, current level 1, new bytenr 1038467072, new gen 1301, new level 1 root item for root 1562, current bytenr 33636352, current gen 1354, current level 1, new bytenr 34455552, new gen 1355, new level 1 root item for root 3094, current bytenr 1011712000, current gen 2935, current level 1, new bytenr 1008484352, new gen 2936, new level 1 root item for root 3716, current bytenr 80805888, current gen 3578, current level 1, new bytenr 73515008, new gen 3579, new level 1 root item for root 4085, current bytenr 714031104, current gen 3958, current level 1, new bytenr 716816384, new gen 3959, new level 1 Found 7 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. $ echo $? 1 $ btrfsck --repair /dev/loop0 enabling repair mode fixing root item for root 311, current bytenr 44630016, current gen 60, current level 1, new bytenr 44957696, new gen 61, new level 1 fixing root item for root 1480, current bytenr 1003569152, current gen 1271, current level 1, new bytenr 1004175360, new gen 1272, new level 1 fixing root item for root 1509, current bytenr 1037434880, current gen 1300, current level 1, new bytenr 1038467072, new gen 1301, new level 1 fixing root item for root 1562, current bytenr 33636352, current gen 1354, current level 1, new bytenr 34455552, new gen 1355, new level 1 fixing root item for root 3094, current bytenr 1011712000, current gen 2935, current level 1, new bytenr 1008484352, new gen 2936, new level 1 fixing root item for root 3716, current bytenr 80805888, current gen 3578, current level 1, new bytenr 73515008, new gen 3579, new level 1 fixing root item for root 4085, current bytenr 714031104, current gen 3958, current level 1, new bytenr 716816384, new gen 3959, new level 1 Fixed 7 roots. Checking filesystem on /dev/loop0 UUID: 2186e9b9-c977-4a35-9c7b-69c6609d4620 checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 618537000 bytes used err is 0 total csum bytes: 130824 total tree bytes: 601620480 total fs tree bytes: 580288512 total extent tree bytes: 18464768 btree space waste bytes: 136939144 file data blocks allocated: 34150318080 referenced 27815415808 Btrfs v3.17-rc3-2-gbbe1dd8 $ echo $? 0 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs-progs: introduce a proper structure on which cli will call ↵Anand Jain2014-10-161-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register-device ioctl As of now commands mentioned below (with in [..]) are calling call register-device ioctl BTRFS_IOC_SCAN_DEV for all the devices in the system. Some issues with it: BTRFS_IOC_SCAN_DEV: ioctl is a write operation, we don't want command like btrfs-debug-tree threads to do that.. eg: ---- $ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l 0 $ btrfs-debug-tree /dev/sde (num_device > 1) $ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l 5 ---- btrfs_scan_fs_devices() ends up calling this ioctl only when num_device > 1. That's inconsistency with in feature/bug. We don't have to register _all_ the btrfs devices (again) in the system without user consent. Why its inconsistent: function btrfs_scan_fs_devices() calls btrfs_scan_lblkid only when num_devices is > 1, which in turn calls BTRFS_IOC_SCAN_DEV ioctl, if conditions are met. But main issue is we have too many consumers of btrfs_scan_fs_devices() the names below with in [] is the cli leading to this function. open_ctree_broken() [btrfs-find-root] recover_prepare() [btrfs rescue super-recover] __open_ctree_fd (updates always except when flag OPEN_CTREE_RECOVER_SUPER is set and flag OPEN_CTREE_RECOVER_SUPER is set only by 'btrfs rescue super- recover' but still this thread sneaks through the open_ctree function to call register-device-ioctl as show below). open_ctree_fs_info [btrfs-debug-tree] [btrfs-image -r] [btrfs check] open_fs [btrfs restore] open_ctree [calc-size] [btrfs-corrupt-block] [btrfs-image] (create) [btrfs-map-logical] [btrfs-select-super] [btrfstune] [btrfs-zero-log] [tester] [mkfs] [quick-test.c] [btrfs label set unmounted] [btrfs get label unmounted] [btrfs rescue super-recover] open_ctree_fd [btrfs-convert] Fix: In an effort to make register-device consistent, all calls to btrfs_scan_fs_devices() will have 5th parameter set to 0. that means we don't need 5th parameter at all. And with this function not calling the register ioctl at all, finally we will have following two cli to call the ioctl BTRFS_IOC_SCAN_DEV. btrfs dev scan and mkfs.btrfs Threads needing to update kernel about a device would have to use btrfs_register_one_device() separately. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: break out rbtree util functionsJosef Bacik2014-10-141-0/+1
| | | | | | | | | These were added to deal with duplicated functionality within btrfs-progs, but we specifically copied rbtree.c from the kernel, so move these functions out into their own file. This will make it easier to keep rbtree.c in sync. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: fsck: deal with corrupted csum rootWang Shilong2014-10-101-0/+7
| | | | | | | | | | | | | If checksum root is corrupted, fsck will get segmentation. This is because if we fail to load checksum root, root's node is NULL which cause NULL pointer deferences later. To fix this problem, we just did something like extent tree rebuilding. Allocate a new one and clear uptodate flag. We will do sanity check before fsck going on. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs-progs: fsck: disallow partial opening if critical roots corruptedWang Shilong2014-10-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If btrfs tree root is corrupted, fsck will hit the following segmentation. enabling repair mode Check tree block failed, want=29376512, have=0 Check tree block failed, want=29376512, have=0 Check tree block failed, want=29376512, have=0 Check tree block failed, want=29376512, have=0 Check tree block failed, want=29376512, have=0 read block failed check_tree_block Couldn't read tree root Checking filesystem on /dev/sda9 UUID: 0e1a754d-04a5-4256-ae79-0f769751803e Critical roots corrupted, unable to fsck the FS Segmentation fault (core dumped) In btrfs_setup_all_roots(), we could tolerate some trees(extent tree, csum tree) corrupted, and we have did careful check inside that function, it will return NULL if critial roots corrupt(for example tree root). The problem is that we check @OPEN_CTREE_PARTIAL flag again after calling btrfs_setup_all_roots() which will successfully return @fs_info though critial roots corrupted. Fix this problem by removing @OPEN_CTREE_PARTIAL flag check outsize btrfs_setup_all_roots(). Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.cz>