Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK, I found part of the problem. I resurrected my old setup and switched to using fio (was using iozone and some others). When you specify --filename= in fio but have multiple jobs, they are all sharing the same file (https://fio.readthedocs.io/en/latest/fio_doc.html). So instead of 28 jobs reading 28 distinct files, you have reading the same file 28 times. So even when dropping the buffer-cache, reboot clean, etc. etc. before the benchmark, this is 27 times a test of the read caches and 1 time a test of the actual disk, which will obviously seriously inflate the results. I'm thinking --directory= would be the best option.

I also did the following to drop (most) cached data before the test:

  sync; echo 3 > /proc/sys/vm/drop_caches; zpool export $pool; zpool import $pool


I still have some time to test - let me try this out! One thing I already proved is with 2GB ram, the system performs as predicted ~20 GB/s. But, I resonate with your definition of benchmarking - to test utlimate performance but also real world usage.

Update:

Ok, so I used the ':' syntax in the --filename option, I got something that makes sense. FIO created invidual files for each thread. I test numjobs: 4, 8, 16 and it plateaus around 8 jobs. Adding additional files+jobs actually makes things worse.

  neil@ubuntu:~/mnt/disk$ ls
  dummy0  dummy10  dummy12  dummy14  dummy2  dummy4  dummy6  dummy8
  dummy1  dummy11  dummy13  dummy15  dummy3  dummy5  dummy7  dummy9
  neil@ubuntu:~/mnt/disk$ sudo fio --name=read_test \
  >         --filename=/home/neil/mnt/disk/dummy0:/home/neil/mnt/disk/dummy1:/home/neil/mnt/disk/dummy2:/home/neil/mnt/disk/dummy3:/home/neil/mnt/disk/dummy4:/home/neil/mnt/disk/dummy5:/home/neil/mnt/disk/dummy6:/home/neil/mnt/disk/dummy7:/home/neil/mnt/disk/dummy8:/home/neil/mnt/disk/dummy9:/home/neil/mnt/disk/dummy10:/home/neil/mnt/disk/dummy11:/home/neil/mnt/disk/dummy12:/home/neil/mnt/disk/dummy13:/home/neil/mnt/disk/dummy14:/home/neil/mnt/disk/dummy15 \
  >         --filesize=20G \
  >         --ioengine=libaio \
  >         --direct=1 \
  >         --sync=1 \
  >         --bs=512k \
  >         --iodepth=1 \
  >         --rw=read \
  >         --numjobs=16 \
  >         --group_reporting

I get around ~22GB/s with 8 jobs, ~15 GB/s with 16 jobs. Thanks for correcting, I'll post an update to the misleading 157 Gb/s... uff.

Updated the article: https://neil.computer/notes/zfs-raidz2/#note-5


--directory makes those files automatically for you

actually, I think those numbers are pretty good. you can zpool export to unmount and then run a non-destructive read test with --filename=/dev/nvme0n1:/dev/nvme0n2... and check that the raw devices get, say, 80%-99% of the advertised speed.

Then you can create a "raw" zvol (no filesystem) with:

    create -b 128K -V 100G tank/testme
You can do a benchmark with --filename=/dev/zvol/tank/testme and it should only be testing the striping and raidz stuff. I could get like 65% of the ideal.

Running through the ZFS filesystem seemed to slow down by a good 50% of the ideal. None of this has been tuned.

So to see 20GB/s+ seems pretty good to me.


> real world usage

Since my machine is a build server, my benchmark for that is:

    git clean -dxf; make defconfig
    time make -j vmlinux
37s (after dropping caches). So I'm happy enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: