目录

最佳架构设计

BIOS配置

内核编译优化

There have been some pretty major IO performance boosts in the past few kernel releases, in some cases you might see 25% more random read or write IOPS if you are using an SSD.

If you're running Ubuntu with a kernel that's older than 3.19, and you have an SSD, you are probably only getting 50% - 75% of it's potential speed. I'm not joking, the performance gains are huge between Kernel 3.13 and 3.19 even for Ubuntu. CentOS 7 / RHEL just got to Kernel 3.10 which sucks slightly less than CentOS 6 2.6.something, which is just depressing when it comes to SSD IO performance.

AIO,Bcache,xfs,ext4,EPT,VPID,THP

开发环境配置

./configure --prefix=/opt --target-list=x86_64-softmmu --enable-linux-aio \ 
            --enable-numa --enable-spice --enable-kvm --enable-lzo --enable-snappy \
            --enable-libusb --enable-usb-redir --enable-libiscsi --enable-mc --enable-rdma \
            --disable-libnfs --disable-seccomp --disable-smartcard-nss --disable-fdt --disable-curl \
            --disable-curses --disable-sdl --disable-gtk  --disable-tpm --disable-vte --disable-xen  \
            --disable-cap-ng 

软件包依赖

yum install -y numactl lzo snappy pixman celt051

内核参数优化

net.core.netdev_max_backlog = 262144
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_rmem = 8192 87380 6291456
net.ipv4.tcp_wmem = 8192 87380 6291456
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_app_win = 40
net.ipv4.tcp_early_retrans = 1

减少SWAP波动

vm.swappiness=0

EPT/VPID

grep -E "ept|vpid" /proc/cpuinfo
cat /sys/module/kvm_intel/parameters/ept
cat /sys/module/kvm_intel/parameters/vpid

modprobe kvm_intel ept=1,vpid=1

THP

grep Hugepagesize /proc/meminfo
mount -t hugetlbfs hugetlbfs /dev/hugepages
sysctl vm.nr_hugepages=1024

qemu-kvm –mem-path /dev/huagepages

NTP

服务器开启定时同步时间服务

虚拟机也可以设置定时时间同步任务

存储优化配置

Usually both options are recommended for:

  • io=native for block device based VMs.
  • io=threads for file-based VMs.

Important note from Red Hat: Direct Asynchronous IO (AIO) that is not issued on filesystem block boundaries, and falls into a hole in a sparse file on ext4 or xfs filesystems, may corrupt file data if multiple I/O operations modify the same filesystem block. Specifically, if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, guest filesystem corruption will occur if partitions are not aligned with the host filesystem block size. Generally, do not use aio=native option along with cache=none for QEMU. This issue can be avoided by using one of the following techniques:

* cache=writeback,aio=threads
* schelder = deadline/cfq
* acpi=off
* preallocation = metadata

CPU分配原则

尽量模拟多路多核多线程,让软件自动识别,maxcpus hotplug cpu有bug风险,暂时不建议使用

宿主机尽量保留1路1核给自有系统使用,使用isolcpus

-smp 4,sockets=2,cores=2,threads=1/2

SSD加速

  1. 推荐大容量的本地存储 ZFS RAIDZ/Z2/Z3,性能要求高的上ZFS RAID10
  2. 性能要求很高,预算充裕的情况下直接上硬件存储,如光纤存储柜
  3. 推荐硬件Raid 10
  4. 增加SSD 实施Bcache/EnhanceIO/ARC
ZFS pools + ssd L2ARC

文件系统

ZFS

ZFS + L2ARC and ZIL on ssd.

  1. Performance was good. The L2ARC algoritihms are far more sophisticated than the moes for bcache or EnhanceIO, long term usage should benefit
  2. A delight to work with, so flexible and clear. Great command line tool set, very easy to see what is going on. The ability to add multiple SSD's fo cache is amazing.
  3. ZFS. Amazing powerful soft raid setup. Snapshots. Backups, Send/Recv, Compression, Deduplication.

dm-Cache:

  1. Complicated to setup
  2. Fiddly and error prone to manage
  3. Needs custom init scripts
  4. No auto ignoring of bulk reads/writes (disk cloning, backups)
  5. I managed to destroy the underlying file system while attempting to flush and dismount the write cache.
  6. Did give good results for reads/writes

bcache

  1. User tools have to be compiled and installed
  2. can't be used with existing file systems
  3. No auto ignoring of bulk reads/writes (disk cloning, backups)
  4. Needs custom init scripts
  5. 3.10 kernel version is hopelessly buggy. Trashed the file system
  6. No tools for uninstalling. Required a hard result and then I had to use dd to overwrite the partition table to remove the cache store. I then blacklisted the module

EnhanceIO

  1. Has to be compiled and installed
  2. can be used with existing file systems
  3. Can be created/edited/destroyed on the fly
  4. No auto ignoring of bulk reads/writes (disk cloning, backups)
  5. persistent between reboots (udev rules)
  6. Good results on reads/writes
  7. Unfortunately when I combined it with an external ext4 journal I got data corruption

ZFS (zfsonlinux.org)

  1. Used the kernel module
  2. has a repo, but required kernel headers, build-essentials and dkms.
  3. Builtin support for journal and read caching using multiple SSD's
  4. ZFS! with all the zfs goodies, raid, striping, snapshots, backups, pool managment.
  5. Auto ignoring of bulk reads/writes (disk cloning, backups)
  6. good tools for management and reporting disk/cache stats and errors
  7. I restricted it to 1GB RAM on Hosts
  8. Good results on reads/writes
  9. No support for O_DIRECT so I had to disable glusters io cache, which is recommenced anyway for virtual stores.
ZFS was the clear winner - ease of management, reliability, flexibility. Its going to make expanding the stores so much easier in the future.

ZFS + Gluster + SSD's caches seems to be a winner for shared HA storage to me.

options zfs zfs_arc_max=40000000000
options zfs zfs_vdev_max_pending=24
Where zfs_arc_max is roughly 40% of your RAM in bytes (Edit: try zfs_arc_max=1200000000). The compiled-in default for zfs_vdev_max_pending is 8 or 10, depending on version. The value should be high (48) for SSD or low-latency drives. Maybe 12-24 for SAS. Otherwise, leave at default.
 
You'll want to also have some floor values in /etc/sysctl.conf
 
vm.swappiness = 10
vm.min_free_kbytes = 512000
Finally, with CentOS, you may want to install tuned and tuned-utils and set your profile to virtual-guest with tuned-adm profile virtual-guest.
 
Try these and see if the problem persists.
 
Edit:
 
Run zfs set xattr=sa storage. Here's why. You may have to wipe the volumes and start again (I'd recommend).
  • mdadm + XFS(noatime,nodiratime,nobarrier,logbufs=8 0 0)
  • ZFS + L2ARC
  • ashift=12
  • options zfs zfsarcmin=4294967296(4G)
  • options zfs zfsarcmax=xxx (30% memory /etc/modprobe.d/zfs.conf)
  • options zfs zfsprefetchdisable=1 (modprobe zfs zfsprefetchdisable=1)
  • options zfs l2arcnoprefetch=0 * echo 1 >/sys/module/zfs/parameters/zfsprefetch_disable
  • zfs set atime=off
  • zfs set relatime=on
  • zfs set compression=lz4
  • zfs set primarycache=all
  • zfs set secondarycache=all
  • zfs set logbias=throughput (不使用ZIL)
  • zfs set dedup=off
  • zfs create -o casesensitivity=mixed
  • crontab -e → 30 19 * * 5 zpool scrub <pool>
#!/bin/sh
echo "options zfs zfs_prefetch_disable=1" > /etc/modprobe.d/zfs.conf
echo "options zfs l2arc_noprefetch=0" >> /etc/modprobe.d/zfs.conf
awk '/MemTotal/{printf "options zfs zfs_arc_min=%.f\n",$2*1024*1/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf
awk '/MemTotal/{printf "options zfs zfs_arc_max=%.f\n",$2*1024*3/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf
 
[ -z "$1" ] && echo "$0 poolname" && exit 0
zpool create $1 -f -o ashift=12 raidz \
        -O atime=off \
        -O relatime=on \
        -O compression=lz4 \
        -O primarycache=all \
        -O secondarycache=all \
        -O logbias=throughput  \
        -O dedup=off \
        -O casesensitivity=mixed /dev/sd[bcde]
zpool add $1 -f cache sda3
数据压缩,去重,加密,COW,QoS,客户端缓存技术,并行NFS

XFS

mkfs.xfs -L /ssd1 -l internal,lazy-count=1,size=128m -i attr=2 -d agcount=8 -i size=512 -f /dev/sda4
mount -t xfs -o rw,noexec,nodev,noatime,nodiratime,barrier=0,logbufs=8,logbsize=256k /dev/sda4 /storage
 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 31.1406 s, 673 MB/s
 
real	0m31.143s
user	0m0.010s
sys	0m16.413s
[root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 30.6331 s, 685 MB/s
 
real	0m31.501s
user	0m0.013s
sys	0m16.881s

EXT4

[root@vcn40 ~]# cd /storage
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 31.835 s, 659 MB/s
 
real	0m31.837s
user	0m0.010s
sys	0m25.371s
[root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 58.9783 s, 356 MB/s
 
real	0m59.003s
user	0m0.013s
sys	0m27.882s

网络子系统加速

       options bonding max_bonds=2 mode=4 
                miimon=100 downdelay=100 updelay=100 
                lacp_rate=1  use_carrier=1
                xmit_hash_policy=layer3+4
* -device virtio-net-pci,netdev=net0 -netdev tap,id=net0 →
* -net nic,model=virtio -net tap,vnet_hdr=on,vhost=on -device virtio-net-pci
* TSO,GSO = off , LRO = off

SPICE性能

QXL显卡驱动编译

yum install -y python-setuptools libjpeg-devel cyrus-sasl-devel openssl-devel celt051-devel alsa-lib-devel glib2-devel libXrandr-devel libXinerama-devel xorg-x11-server-devel gcc gcc-c++ autoconf automake
  1. spice-protocol-0.12.8
  2. xf86-video-qxl-0.1.4
  3. spice-vdagent-0.16.0
  4. spice-0.12.5

制作客户端镜像

选用Fedora 21 32bit为基本系统,自带原生友好的Spice支持,稳定性好 修改开机画面

vi /usr/bin/startyc

#!/bin/sh
cd /usr/local/bin/MiGateway/
./ycc

vi /etc/rc.d/rc.local

/usr/bin/xinit /usr/bin/startyc

如果是使用systemd来管理启动进程 systemd

vi /etc/systemd/system/rc.local.service

[Unit]
Description=/etc/rc.local Compatibility
ConditionPathExists=/etc/rc.local
 
[Service]
Type=forking
ExecStart=/etc/rc.local start
TimeoutSec=0
StandardOutput=tty
RemainAfterExit=yes
SysVStartPriority=99
 
[Install]
WantedBy=multi-user.target
systemd enable rc.local.service

mkusb.sh

#!/bin/sh
[ -z $1 ] && echo "$0 /dev/sdX" && exit 0
DEV=$1
IMG="/root/ycos_client_xinit.tgz"
 
FORMAT_FORCE="y"
MOUNT_POINT="/mnt/"
 
if [ $FORMAT_FORCE = "y" ];then
        fdisk $DEV<<EOF
d
3
d
2
d
n
p
1
1
+4096M
n
p
2
 
+1024M
n
p
3
 
 
t
2
82
a
1
 
w
EOF
partx -a $DEV
sleep 3
 
mkfs.ext4 -L /Amy ${DEV}1
mkswap -L /Swap ${DEV}2
fi
 
mount -f ${DEV}1 $MOUNT_POINT
tar zxvf $IMG -C $MOUNT_POINT
 
mount ${DEV}1 $MOUNT_POINT
tar zxvf $IMG -C $MOUNT_POINT
grub-install --root-directory=$MOUNT_POINT --no-floppy --recheck $DEV
umount $MOUNT_POINT

客户计算环境及使用习惯背景调研

类别选项勾选
安全电力保障UPS
交换机端口bonding双千兆绑定
网络网络速率100M~1000M
人员常用数量/总数量
使用类型办公,娱乐,安全

Windows客户端OS

旧版本提升方案

模板同步方案

yum install -y inotify-tools rsync

/usr/local/bin/inotify_rsync.sh

#!/bin/sh
SRC="/xx/"
DST="/root/xx/"
HOSTS="10.0.2.47 10.0.2.48"
SSH_OPTS="-i/root/.ssh/id_rsa -p65422 -x -T -c arcfour -o Compression=no -oStrictHostKeyChecking=no"
 
# Don't change below
NUM=($HOSTS)
NUM=${#NUM[*]}
SPEED=$((100000/$NUM))
/usr/bin/inotifywait -mrq -e close_write,delete --format '%f' $SRC | while read files;do
        for ip in $HOSTS;do
        echo $files
        rsync -aP --bwlimit=$SPEED --delete -e "ssh $SSH_OPTS" $SRC root@$ip:$DST
        done
done
nohup /usr/local/bin/inotify_rsync.sh 1>&2>/var/log/rsync.log &

守护进程

yum install supervisor
[program:vcnagent]
command=/opt/vcn/vdi/vcnagent "-info"
;environment=PATH=/opt/bin:/opt/sbin:%(ENV_PATH)s
priority=999
autostart=true
autorestart=true
startsecs=10
startretries=3
exitcodes=0,2
stopsignal=QUIT
stopwaitsecs=10
user=root
log_stdout=true
log_stderr=true
logfile=/var/log/vcnagent.log
logfile_maxbytes=1MB
logfile_backups=10

virtio-serial-bus unexpected port id

bcdedit -set loadoptions DISABLE_INTEGRITY_CHECKS
bcdedit -set TESTSIGNING ON

瘦终端资料

参考资料