最佳架构设计

BIOS配置

开启Intel VT-x/VT-d/VMDq/SR-IOV
开启CR3(控制寄存器3)
开启超线程支持
关闭节电模式
独立主板固态硬盘启动、U盘启动

内核编译优化

There have been some pretty major IO performance boosts in the past few kernel releases, in some cases you might see 25% more random read or write IOPS if you are using an SSD.

Kernel 3.17 SCSI Multi-Queue - http://www.phoronix.com/scan.php?page=news_item&px=MTcyMjk

Kernel 3.19 multi-queue block layer (blk-mq) - http://www.phoronix.com/scan.php?page=news_item&px=MTg2Mjg

If you're running Ubuntu with a kernel that's older than 3.19, and you have an SSD, you are probably only getting 50% - 75% of it's potential speed. I'm not joking, the performance gains are huge between Kernel 3.13 and 3.19 even for Ubuntu. CentOS 7 / RHEL just got to Kernel 3.10 which sucks slightly less than CentOS 6 2.6.something, which is just depressing when it comes to SSD IO performance.

AIO,Bcache,xfs,ext4,EPT,VPID,THP

开发环境配置

安装必要软件包：Virtualhost & Developments

安装最新的spice-protocol,spice-server(强制卸载spice-server-0.12.4-11.el6.x86_64 –nodeps)

重新编译qemu最新稳定版本: 2.1.3(最稳定，够用，参考qboot编译参数，替换系统自带)

./configure --prefix=/opt --target-list=x86_64-softmmu --enable-linux-aio \ 
            --enable-numa --enable-spice --enable-kvm --enable-lzo --enable-snappy \
            --enable-libusb --enable-usb-redir --enable-libiscsi --enable-mc --enable-rdma \
            --disable-libnfs --disable-seccomp --disable-smartcard-nss --disable-fdt --disable-curl \
            --disable-curses --disable-sdl --disable-gtk  --disable-tpm --disable-vte --disable-xen  \
            --disable-cap-ng

软件包依赖

yum install -y numactl lzo snappy pixman celt051

内核参数优化

net.core.netdev_max_backlog = 262144
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_rmem = 8192 87380 6291456
net.ipv4.tcp_wmem = 8192 87380 6291456
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_app_win = 40
net.ipv4.tcp_early_retrans = 1

减少SWAP波动

vm.swappiness=0

EPT/VPID

grep -E "ept|vpid" /proc/cpuinfo
cat /sys/module/kvm_intel/parameters/ept
cat /sys/module/kvm_intel/parameters/vpid

modprobe kvm_intel ept=1,vpid=1

THP

grep Hugepagesize /proc/meminfo
mount -t hugetlbfs hugetlbfs /dev/hugepages
sysctl vm.nr_hugepages=1024

qemu-kvm –mem-path /dev/huagepages

NTP

服务器开启定时同步时间服务

虚拟机也可以设置定时时间同步任务

存储优化配置

Usually both options are recommended for:

io=native for block device based VMs.
io=threads for file-based VMs.

Important note from Red Hat: Direct Asynchronous IO (AIO) that is not issued on filesystem block boundaries, and falls into a hole in a sparse file on ext4 or xfs filesystems, may corrupt file data if multiple I/O operations modify the same filesystem block. Specifically, if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, guest filesystem corruption will occur if partitions are not aligned with the host filesystem block size. Generally, do not use aio=native option along with cache=none for QEMU. This issue can be avoided by using one of the following techniques:

Align AIOs on filesystem block boundaries, or do not write to sparse files using AIO on xfs or ext4 filesystems.
KVM: Use a non-sparse system image file or allocate the space by zeroing out the entire file.
KVM: Create the image using an ext3 host filesystem instead of ext4.
KVM: Invoke qemu-kvm with aio=threads (this is the default).
KVM: Align all partitions within the guest image to the host’s filesystem block boundary (default 4k).

* cache=writeback,aio=threads
* schelder = deadline/cfq
* acpi=off
* preallocation = metadata

CPU分配原则

尽量模拟多路多核多线程，让软件自动识别，maxcpus hotplug cpu有bug风险，暂时不建议使用

宿主机尽量保留1路1核给自有系统使用，使用isolcpus

-smp 4,sockets=2,cores=2,threads=1/2

SSD加速

推荐大容量的本地存储 ZFS RAIDZ/Z2/Z3,性能要求高的上ZFS RAID10
性能要求很高，预算充裕的情况下直接上硬件存储，如光纤存储柜
推荐硬件Raid 10
增加SSD 实施Bcache/EnhanceIO/ARC

ZFS pools + ssd L2ARC

文件系统

ZFS

ZFS + L2ARC and ZIL on ssd.

Performance was good. The L2ARC algoritihms are far more sophisticated than the moes for bcache or EnhanceIO, long term usage should benefit
A delight to work with, so flexible and clear. Great command line tool set, very easy to see what is going on. The ability to add multiple SSD's fo cache is amazing.
ZFS. Amazing powerful soft raid setup. Snapshots. Backups, Send/Recv, Compression, Deduplication.

dm-Cache:

Complicated to setup
Fiddly and error prone to manage
Needs custom init scripts
No auto ignoring of bulk reads/writes (disk cloning, backups)
I managed to destroy the underlying file system while attempting to flush and dismount the write cache.
Did give good results for reads/writes

bcache

User tools have to be compiled and installed
can't be used with existing file systems
No auto ignoring of bulk reads/writes (disk cloning, backups)
Needs custom init scripts
3.10 kernel version is hopelessly buggy. Trashed the file system
No tools for uninstalling. Required a hard result and then I had to use dd to overwrite the partition table to remove the cache store. I then blacklisted the module

EnhanceIO

Has to be compiled and installed
can be used with existing file systems
Can be created/edited/destroyed on the fly
No auto ignoring of bulk reads/writes (disk cloning, backups)
persistent between reboots (udev rules)
Good results on reads/writes
Unfortunately when I combined it with an external ext4 journal I got data corruption

ZFS (zfsonlinux.org)

Used the kernel module
has a repo, but required kernel headers, build-essentials and dkms.
Builtin support for journal and read caching using multiple SSD's
ZFS! with all the zfs goodies, raid, striping, snapshots, backups, pool managment.
Auto ignoring of bulk reads/writes (disk cloning, backups)
good tools for management and reporting disk/cache stats and errors
I restricted it to 1GB RAM on Hosts
Good results on reads/writes
No support for O_DIRECT so I had to disable glusters io cache, which is recommenced anyway for virtual stores.

ZFS was the clear winner - ease of management, reliability, flexibility. Its going to make expanding the stores so much easier in the future.

ZFS + Gluster + SSD's caches seems to be a winner for shared HA storage to me.

options zfs zfs_arc_max=40000000000
options zfs zfs_vdev_max_pending=24
Where zfs_arc_max is roughly 40% of your RAM in bytes (Edit: try zfs_arc_max=1200000000). The compiled-in default for zfs_vdev_max_pending is 8 or 10, depending on version. The value should be high (48) for SSD or low-latency drives. Maybe 12-24 for SAS. Otherwise, leave at default.
 
You'll want to also have some floor values in /etc/sysctl.conf
 
vm.swappiness = 10
vm.min_free_kbytes = 512000
Finally, with CentOS, you may want to install tuned and tuned-utils and set your profile to virtual-guest with tuned-adm profile virtual-guest.
 
Try these and see if the problem persists.
 
Edit:
 
Run zfs set xattr=sa storage. Here's why. You may have to wipe the volumes and start again (I'd recommend).

mdadm + XFS(noatime,nodiratime,nobarrier,logbufs=8 0 0)
ZFS + L2ARC
ashift=12
options zfs zfsarcmin=4294967296(4G)
options zfs zfsarcmax=xxx (30% memory /etc/modprobe.d/zfs.conf)
options zfs zfsprefetchdisable=1 (modprobe zfs zfsprefetchdisable=1)
options zfs l2arcnoprefetch=0 * echo 1 >/sys/module/zfs/parameters/zfsprefetch_disable
zfs set atime=off
zfs set relatime=on
zfs set compression=lz4
zfs set primarycache=all
zfs set secondarycache=all
zfs set logbias=throughput (不使用ZIL)
zfs set dedup=off
zfs create -o casesensitivity=mixed
crontab -e → 30 19 * * 5 zpool scrub <pool>

#!/bin/sh
echo "options zfs zfs_prefetch_disable=1" > /etc/modprobe.d/zfs.conf
echo "options zfs l2arc_noprefetch=0" >> /etc/modprobe.d/zfs.conf
awk '/MemTotal/{printf "options zfs zfs_arc_min=%.f\n",$2*1024*1/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf
awk '/MemTotal/{printf "options zfs zfs_arc_max=%.f\n",$2*1024*3/10}' /proc/meminfo >> /etc/modprobe.d/zfs.conf
 
[ -z "$1" ] && echo "$0 poolname" && exit 0
zpool create $1 -f -o ashift=12 raidz \
        -O atime=off \
        -O relatime=on \
        -O compression=lz4 \
        -O primarycache=all \
        -O secondarycache=all \
        -O logbias=throughput  \
        -O dedup=off \
        -O casesensitivity=mixed /dev/sd[bcde]
zpool add $1 -f cache sda3

数据压缩，去重，加密，COW,QoS,客户端缓存技术,并行NFS

XFS

mkfs.xfs -L /ssd1 -l internal,lazy-count=1,size=128m -i attr=2 -d agcount=8 -i size=512 -f /dev/sda4
mount -t xfs -o rw,noexec,nodev,noatime,nodiratime,barrier=0,logbufs=8,logbsize=256k /dev/sda4 /storage
 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 31.1406 s, 673 MB/s
 
real	0m31.143s
user	0m0.010s
sys	0m16.413s
[root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 30.6331 s, 685 MB/s
 
real	0m31.501s
user	0m0.013s
sys	0m16.881s

EXT4

[root@vcn40 ~]# cd /storage
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 31.835 s, 659 MB/s
 
real	0m31.837s
user	0m0.010s
sys	0m25.371s
[root@vcn40 storage]# echo 3 > /proc/sys/vm/drop_caches 
[root@vcn40 storage]# time dd if=/dev/zero of=2g bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 58.9783 s, 356 MB/s
 
real	0m59.003s
user	0m0.013s
sys	0m27.882s

网络子系统加速

网卡特性优化
网卡吞吐量测试
How to use get high performance with Virtio
KVM虚拟化网络优化技术总结
SCTP vs TCP vs UDP的可行性
KVM Bonding&Bridge+VLAN
10G 网络升级(服务器端 bonding)

       options bonding max_bonds=2 mode=4 
                miimon=100 downdelay=100 updelay=100 
                lacp_rate=1  use_carrier=1
                xmit_hash_policy=layer3+4

* -device virtio-net-pci,netdev=net0 -netdev tap,id=net0 →
* -net nic,model=virtio -net tap,vnet_hdr=on,vhost=on -device virtio-net-pci
* TSO,GSO = off , LRO = off

SPICE性能

图形显示: (mozjpeg > zstd > lz4 > quicklz)
视频显示: (h264 > xvid > mjpeg2000 > mjpeg) 参考链接

QXL显卡驱动编译

yum install -y python-setuptools libjpeg-devel cyrus-sasl-devel openssl-devel celt051-devel alsa-lib-devel glib2-devel libXrandr-devel libXinerama-devel xorg-x11-server-devel gcc gcc-c++ autoconf automake

spice-protocol-0.12.8
xf86-video-qxl-0.1.4
spice-vdagent-0.16.0
spice-0.12.5

制作客户端镜像

选用Fedora 21 32bit为基本系统，自带原生友好的Spice支持，稳定性好修改开机画面

vi /usr/bin/startyc

#!/bin/sh
cd /usr/local/bin/MiGateway/
./ycc

vi /etc/rc.d/rc.local

/usr/bin/xinit /usr/bin/startyc

如果是使用systemd来管理启动进程 systemd

vi /etc/systemd/system/rc.local.service

[Unit]
Description=/etc/rc.local Compatibility
ConditionPathExists=/etc/rc.local
 
[Service]
Type=forking
ExecStart=/etc/rc.local start
TimeoutSec=0
StandardOutput=tty
RemainAfterExit=yes
SysVStartPriority=99
 
[Install]
WantedBy=multi-user.target

systemd enable rc.local.service

mkusb.sh

#!/bin/sh
[ -z $1 ] && echo "$0 /dev/sdX" && exit 0
DEV=$1
IMG="/root/ycos_client_xinit.tgz"
 
FORMAT_FORCE="y"
MOUNT_POINT="/mnt/"
 
if [ $FORMAT_FORCE = "y" ];then
        fdisk $DEV<<EOF
d
3
d
2
d
n
p
1
1
+4096M
n
p
2
 
+1024M
n
p
3
 
 
t
2
82
a
1
 
w
EOF
partx -a $DEV
sleep 3
 
mkfs.ext4 -L /Amy ${DEV}1
mkswap -L /Swap ${DEV}2
fi
 
mount -f ${DEV}1 $MOUNT_POINT
tar zxvf $IMG -C $MOUNT_POINT
 
mount ${DEV}1 $MOUNT_POINT
tar zxvf $IMG -C $MOUNT_POINT
grub-install --root-directory=$MOUNT_POINT --no-floppy --recheck $DEV
umount $MOUNT_POINT

客户计算环境及使用习惯背景调研

类别	选项	勾选
安全	电力保障	UPS
交换机	端口bonding	双千兆绑定
网络	网络速率	100M～1000M
人员	常用数量/总数量
使用类型	办公，娱乐，安全

Windows客户端OS

Running Windows 7 guests on OpenStack Icehouse

旧版本提升方案

IDE → VirtIO
硬raid/软mdadm → ZFS RaidZ+SSD240/320G

模板同步方案

yum install -y inotify-tools rsync

/usr/local/bin/inotify_rsync.sh

#!/bin/sh
SRC="/xx/"
DST="/root/xx/"
HOSTS="10.0.2.47 10.0.2.48"
SSH_OPTS="-i/root/.ssh/id_rsa -p65422 -x -T -c arcfour -o Compression=no -oStrictHostKeyChecking=no"
 
# Don't change below
NUM=($HOSTS)
NUM=${#NUM[*]}
SPEED=$((100000/$NUM))
/usr/bin/inotifywait -mrq -e close_write,delete --format '%f' $SRC | while read files;do
        for ip in $HOSTS;do
        echo $files
        rsync -aP --bwlimit=$SPEED --delete -e "ssh $SSH_OPTS" $SRC root@$ip:$DST
        done
done

nohup /usr/local/bin/inotify_rsync.sh 1>&2>/var/log/rsync.log &

守护进程

yum install supervisor

[program:vcnagent]
command=/opt/vcn/vdi/vcnagent "-info"
;environment=PATH=/opt/bin:/opt/sbin:%(ENV_PATH)s
priority=999
autostart=true
autorestart=true
startsecs=10
startretries=3
exitcodes=0,2
stopsignal=QUIT
stopwaitsecs=10
user=root
log_stdout=true
log_stderr=true
logfile=/var/log/vcnagent.log
logfile_maxbytes=1MB
logfile_backups=10

virtio-serial-bus unexpected port id

bcdedit -set loadoptions DISABLE_INTEGRITY_CHECKS
bcdedit -set TESTSIGNING ON

瘦终端资料

J60是杰云科技自主研发的瘦客户机（Thin Client）

目录