4

vmware ESX and slow syscalls

view full story
linux-howto

http://www.linuxquestions.org – We have 2 physical machines in company. Both have the same HW configuration, running the same CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz One is a regular linux, and on second one we have ESX, version 4. In the ESX we have linux, which should be almost identical with the linux on first machine. The kernel version is: (a bit old for these days, but needed cause of old project) Linux x 2.4.21-53.ELhugemem #1 SMP Wed Nov 14 03:46:17 EST 2007 i686 i686 i386 GNU/Linux The problem is that virtualized linux is running slower. I have read, that the overhead should be ~8%, which is something I could live with. But the performance hit can be seen by naked eye. I made 2 test programs: First was just doing some extensive work in userspace (e.g. giant loop and counting numbers). Here the performance hit is around 8%-10%, which is fine. Second program is doing syscalls - "close(0);" in loop. And this is where things aren't pretty anymore: Linux running on real HW: Code: % time    seconds  usecs/call    calls    errors syscall ------ ----------- ----------- --------- --------- ----------------  99.65    0.963257          10    100002    99999 close   0.15    0.001403          33        43        41 open   0.14    0.001368          34        40        36 stat64   0.06    0.000566        566        1          execve   0.00    0.000027          5        5          old_mmap   0.00    0.000007          4        2          fstat64   0.00    0.000006          6        1          read   0.00    0.000006          6        1          munmap   0.00    0.000004          4        1          uname   0.00    0.000003          3        1          brk ------ ----------- ----------- --------- --------- ---------------- 100.00    0.966647                100097    100076 total   real    0m4.613s user    0m0.760s sys    0m3.730s Linux running on ESX: Code: Process 14702 detached % time    seconds  usecs/call    calls    errors syscall ------ ----------- ----------- --------- --------- ----------------  77.76  17.1206772        182    100002    99999 close   3.01    0.703602      703602        1          execve   2.99    0.700382      700382        1          set_thread_area   2.99    0.700337      700337        1          munmap   2.99    0.700328      700328        1          uname   2.99    0.700123      700123        1          read   2.99    0.700108      700108        1          brk   2.14    0.500571      100114        5          old_mmap   1.71    0.400229      200115        2          fstat64   0.43    0.100360      33453        3        1 open ------ ----------- ----------- --------- --------- ---------------- 100.00  23.412812                100018    100000 total   real    0m48.434s user    0m5.410s sys    0m40.610s The machine running on ESX spent 1200% more time doing the same thing. Any ideas why this is happening? It seems, that the context switch is very expensive for some reason. (HowTos)