最佳架构设计
BIOS配置
- 开启Intel VT-x/VT-d/VMDq/SR-IOV
- 开启CR3(控制寄存器3)
- 开启超线程支持
- 关闭节电模式
- 独立主板固态硬盘启动、U盘启动
内核编译优化
There have been some pretty major IO performance boosts in the past few kernel releases, in some cases you might see 25% more random read or write IOPS if you are using an SSD.
Kernel 3.17 SCSI Multi-Queue - http://www.phoronix.com/scan.php?page=news_item&px=MTcyMjk Kernel 3.19 multi-queue block layer (blk-mq) - http://www.phoronix.com/scan.php?page=news_item&px=MTg2MjgIf you're running Ubuntu with a kernel that's older than 3.19, and you have an SSD, you are probably only getting 50% - 75% of it's potential speed. I'm not joking, the performance gains are huge between Kernel 3.13 and 3.19 even for Ubuntu. CentOS 7 / RHEL just got to Kernel 3.10 which sucks slightly less than CentOS 6 2.6.something, which is just depressing when it comes to SSD IO performance.
开发环境配置
- 安装必要软件包:Virtualhost & Developments
- 安装最新的spice-protocol,spice-server(强制卸载spice-server-0.12.4-11.el6.x86_64 –nodeps)
- 重新编译qemu最新稳定版本: 2.1.3(最稳定,够用,参考qboot编译参数,替换系统自带)
./configure --prefix=/opt --target-list=x86_64-softmmu --enable-linux-aio \ --enable-numa --enable-spice --enable-kvm --enable-lzo --enable-snappy \ --enable-libusb --enable-usb-redir --enable-libiscsi --enable-mc --enable-rdma \ --disable-libnfs --disable-seccomp --disable-smartcard-nss --disable-fdt --disable-curl \ --disable-curses --disable-sdl --disable-gtk --disable-tpm --disable-vte --disable-xen \ --disable-cap-ng
软件包依赖
yum install -y numactl lzo snappy pixman celt051
内核参数优化
net.core.netdev_max_backlog = 262144 net.ipv4.tcp_sack = 0 net.ipv4.tcp_dsack = 0 net.ipv4.tcp_rmem = 8192 87380 6291456 net.ipv4.tcp_wmem = 8192 87380 6291456 net.ipv4.tcp_mem = 786432 1048576 1572864 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_app_win = 40 net.ipv4.tcp_early_retrans = 1
减少SWAP波动
vm.swappiness=0
EPT/VPID
grep -E "ept|vpid" /proc/cpuinfo cat /sys/module/kvm_intel/parameters/ept cat /sys/module/kvm_intel/parameters/vpid
modprobe kvm_intel ept=1,vpid=1
THP
grep Hugepagesize /proc/meminfo mount -t hugetlbfs hugetlbfs /dev/hugepages sysctl vm.nr_hugepages=1024
qemu-kvm –mem-path /dev/huagepages
NTP
服务器开启定时同步时间服务
虚拟机也可以设置定时时间同步任务
存储优化配置
Usually both options are recommended for:
- io=native for block device based VMs.
- io=threads for file-based VMs.
Important note from Red Hat: Direct Asynchronous IO (AIO) that is not issued on filesystem block boundaries, and falls into a hole in a sparse file on ext4 or xfs filesystems, may corrupt file data if multiple I/O operations modify the same filesystem block. Specifically, if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, guest filesystem corruption will occur if partitions are not aligned with the host filesystem block size. Generally, do not use aio=native option along with cache=none for QEMU. This issue can be avoided by using one of the following techniques:
- Align AIOs on filesystem block boundaries, or do not write to sparse files using AIO on xfs or ext4 filesystems.
- KVM: Use a non-sparse system image file or allocate the space by zeroing out the entire file.
- KVM: Create the image using an ext3 host filesystem instead of ext4.
- KVM: Invoke qemu-kvm with aio=threads (this is the default).
- KVM: Align all partitions within the guest image to the host’s filesystem block boundary (default 4k).
* cache=writeback,aio=threads
* schelder = deadline/cfq
* acpi=off
* preallocation = metadata
CPU分配原则
尽量模拟多路多核多线程,让软件自动识别,maxcpus hotplug cpu有bug风险,暂时不建议使用
宿主机尽量保留1路1核给自有系统使用,使用isolcpus
-smp 4,sockets=2,cores=2,threads=1/2
SSD加速
- 推荐大容量的本地存储 ZFS RAIDZ/Z2/Z3,性能要求高的上ZFS RAID10
- 性能要求很高,预算充裕的情况下直接上硬件存储,如光纤存储柜
- 推荐硬件Raid 10
- 增加SSD 实施Bcache/EnhanceIO/ARC
文件系统
ZFS
ZFS + L2ARC and ZIL on ssd.
- Performance was good. The L2ARC algoritihms are far more sophisticated than the moes for bcache or EnhanceIO, long term usage should benefit
- A delight to work with, so flexible and clear. Great command line tool set, very easy to see what is going on. The ability to add multiple SSD's fo cache is amazing.
- ZFS. Amazing powerful soft raid setup. Snapshots. Backups, Send/Recv, Compression, Deduplication.
dm-Cache:
- Complicated to setup
- Fiddly and error prone to manage
- Needs custom init scripts
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- I managed to destroy the underlying file system while attempting to flush and dismount the write cache.
- Did give good results for reads/writes
bcache
- User tools have to be compiled and installed
- can't be used with existing file systems
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- Needs custom init scripts
- 3.10 kernel version is hopelessly buggy. Trashed the file system
- No tools for uninstalling. Required a hard result and then I had to use dd to overwrite the partition table to remove the cache store. I then blacklisted the module
EnhanceIO
- Has to be compiled and installed
- can be used with existing file systems
- Can be created/edited/destroyed on the fly
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- persistent between reboots (udev rules)
- Good results on reads/writes
- Unfortunately when I combined it with an external ext4 journal I got data corruption
ZFS (zfsonlinux.org)
- Used the kernel module
- has a repo, but required kernel headers, build-essentials and dkms.
- Builtin support for journal and read caching using multiple SSD's
- ZFS! with all the zfs goodies, raid, striping, snapshots, backups, pool managment.
- Auto ignoring of bulk reads/writes (disk cloning, backups)
- good tools for management and reporting disk/cache stats and errors
- I restricted it to 1GB RAM on Hosts
- Good results on reads/writes
- No support for O_DIRECT so I had to disable glusters io cache, which is recommenced anyway for virtual stores.
ZFS + Gluster + SSD's caches seems to be a winner for shared HA storage to me.
options zfs zfs_arc_max=40000000000 options zfs zfs_vdev_max_pending=24 Where zfs_arc_max is roughly 40% of your RAM in bytes (Edit: try zfs_arc_max=1200000000). The compiled-in default for zfs_vdev_max_pending is 8 or 10, depending on version. The value should be high (48) for SSD or low-latency drives. Maybe 12-24 for SAS. Otherwise, leave at default. You'll want to also have some floor values in /etc/sysctl.conf vm.swappiness = 10 vm.min_free_kbytes = 512000 Finally, with CentOS, you may want to install tuned and tuned-utils and set your profile to virtual-guest with tuned-adm profile virtual-guest. Try these and see if the problem persists. Edit: Run zfs set xattr=sa storage. Here's why. You may have to wipe the volumes and start again (I'd recommend).
- mdadm + XFS(noatime,nodiratime,nobarrier,logbufs=8 0 0)
- ZFS + L2ARC
- ashift=12
- options zfs zfsarcmin=4294967296(4G)
- options zfs zfsarcmax=xxx (30% memory /etc/modprobe.d/zfs.conf)
- options zfs zfsprefetchdisable=1 (modprobe zfs zfsprefetchdisable=1)
- options zfs l2arcnoprefetch=0 * echo 1 >/sys/module/zfs/parameters/zfsprefetch_disable
- zfs set atime=off
- zfs set relatime=on
- zfs set compression=lz4
- zfs set primarycache=all
- zfs set secondarycache=all
- zfs set logbias=throughput (不使用ZIL)
- zfs set dedup=off
- zfs create -o casesensitivity=mixed
- crontab -e → 30 19 * * 5 zpool scrub <pool>
#!/bin/sh echo "options zfs zfs_prefetch_disable=1" > /etc/modprobe.d/zfs.conf echo "options zfs l2arc_noprefetch=0" >> /etc/modprobe.d/zfs.conf awk '/MemTotal/{printf "options zfs zfs_arc_min=%.f\n",$2*1024*1/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf awk '/MemTotal/{printf "options zfs zfs_arc_max=%.f\n",$2*1024*3/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf [ -z "$1" ] && echo "$0 poolname" && exit 0 zpool create $1 -f -o ashift=12 raidz \ -O atime=off \ -O relatime=on \ -O compression=lz4 \ -O primarycache=all \ -O secondarycache=all \ -O logbias=throughput \ -O dedup=off \ -O casesensitivity=mixed /dev/sd[bcde] zpool add $1 -f cache sda3
XFS
mkfs.xfs -L /ssd1 -l internal,lazy-count=1,size=128m -i attr=2 -d agcount=8 -i size=512 -f /dev/sda4 mount -t xfs -o rw,noexec,nodev,noatime,nodiratime,barrier=0,logbufs=8,logbsize=256k /dev/sda4 /storage [root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 31.1406 s, 673 MB/s real 0m31.143s user 0m0.010s sys 0m16.413s [root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches [root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 30.6331 s, 685 MB/s real 0m31.501s user 0m0.013s sys 0m16.881s
EXT4
[root@vcn40 ~]# cd /storage [root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 31.835 s, 659 MB/s real 0m31.837s user 0m0.010s sys 0m25.371s [root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches [root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 58.9783 s, 356 MB/s real 0m59.003s user 0m0.013s sys 0m27.882s
网络子系统加速
- SCTP vs TCP vs UDP的可行性
options bonding max_bonds=2 mode=4 miimon=100 downdelay=100 updelay=100 lacp_rate=1 use_carrier=1 xmit_hash_policy=layer3+4
* -net nic,model=virtio -net tap,vnet_hdr=on,vhost=on -device virtio-net-pci
* TSO,GSO = off , LRO = off
SPICE性能
- 图形显示: (mozjpeg > zstd > lz4 > quicklz)
- 视频显示: (h264 > xvid > mjpeg2000 > mjpeg) 参考链接
QXL显卡驱动编译
yum install -y python-setuptools libjpeg-devel cyrus-sasl-devel openssl-devel celt051-devel alsa-lib-devel glib2-devel libXrandr-devel libXinerama-devel xorg-x11-server-devel gcc gcc-c++ autoconf automake
- spice-protocol-0.12.8
- xf86-video-qxl-0.1.4
- spice-vdagent-0.16.0
- spice-0.12.5
制作客户端镜像
vi /usr/bin/startyc
#!/bin/sh cd /usr/local/bin/MiGateway/ ./ycc
vi /etc/rc.d/rc.local
/usr/bin/xinit /usr/bin/startyc
如果是使用systemd来管理启动进程 systemd
vi /etc/systemd/system/rc.local.service
[Unit] Description=/etc/rc.local Compatibility ConditionPathExists=/etc/rc.local [Service] Type=forking ExecStart=/etc/rc.local start TimeoutSec=0 StandardOutput=tty RemainAfterExit=yes SysVStartPriority=99 [Install] WantedBy=multi-user.target
systemd enable rc.local.service
mkusb.sh
#!/bin/sh [ -z $1 ] && echo "$0 /dev/sdX" && exit 0 DEV=$1 IMG="/root/ycos_client_xinit.tgz" FORMAT_FORCE="y" MOUNT_POINT="/mnt/" if [ $FORMAT_FORCE = "y" ];then fdisk $DEV<<EOF d 3 d 2 d n p 1 1 +4096M n p 2 +1024M n p 3 t 2 82 a 1 w EOF partx -a $DEV sleep 3 mkfs.ext4 -L /Amy ${DEV}1 mkswap -L /Swap ${DEV}2 fi mount -f ${DEV}1 $MOUNT_POINT tar zxvf $IMG -C $MOUNT_POINT mount ${DEV}1 $MOUNT_POINT tar zxvf $IMG -C $MOUNT_POINT grub-install --root-directory=$MOUNT_POINT --no-floppy --recheck $DEV umount $MOUNT_POINT
客户计算环境及使用习惯背景调研
| 类别 | 选项 | 勾选 |
|---|---|---|
| 安全 | 电力保障 | UPS |
| 交换机 | 端口bonding | 双千兆绑定 |
| 网络 | 网络速率 | 100M~1000M |
| 人员 | 常用数量/总数量 | |
| 使用类型 | 办公,娱乐,安全 |
Windows客户端OS
旧版本提升方案
- IDE → VirtIO
- 硬raid/软mdadm → ZFS RaidZ+SSD240/320G
模板同步方案
yum install -y inotify-tools rsync
/usr/local/bin/inotify_rsync.sh
#!/bin/sh SRC="/xx/" DST="/root/xx/" HOSTS="10.0.2.47 10.0.2.48" SSH_OPTS="-i/root/.ssh/id_rsa -p65422 -x -T -c arcfour -o Compression=no -oStrictHostKeyChecking=no" # Don't change below NUM=($HOSTS) NUM=${#NUM[*]} SPEED=$((100000/$NUM)) /usr/bin/inotifywait -mrq -e close_write,delete --format '%f' $SRC | while read files;do for ip in $HOSTS;do echo $files rsync -aP --bwlimit=$SPEED --delete -e "ssh $SSH_OPTS" $SRC root@$ip:$DST done done
nohup /usr/local/bin/inotify_rsync.sh 1>&2>/var/log/rsync.log &
守护进程
yum install supervisor
[program:vcnagent] command=/opt/vcn/vdi/vcnagent "-info" ;environment=PATH=/opt/bin:/opt/sbin:%(ENV_PATH)s priority=999 autostart=true autorestart=true startsecs=10 startretries=3 exitcodes=0,2 stopsignal=QUIT stopwaitsecs=10 user=root log_stdout=true log_stderr=true logfile=/var/log/vcnagent.log logfile_maxbytes=1MB logfile_backups=10
virtio-serial-bus unexpected port id
bcdedit -set loadoptions DISABLE_INTEGRITY_CHECKS bcdedit -set TESTSIGNING ON
