Monday, October 29, 2012

Micro Benchmark MongoDB 2.2 performance on Solaris


I 100% agree on the statement by MongoDB:
“MongoDB does not publish any official benchmarks. We recommend running application performance tests on your application's work-load to find bottleneck and for performance tuning.”
However I don't have real world workload, so I just tried some micro benchmarks to observe the behaviors of MongoDB and OS. Although the result numbers mean nothing, but I would share some findings here.

1) Using JS Benchmark Harness
MongoDB provides JS Benchmarking Harness as a QA baseline perf measurement tool, not designed to be a "benchmark". This is a good start for having a first look at MongoDB performance. The harness is very easy to setup. However, there are a few things to be considerded.

The sample code on that web page is really really a micro benchmark. I tested it against MongoDB 2.2 for Solaris x64 and got suboptimal result comparing against Linux version. After analyzing the workload characteristics, it is more like a multi-threaded malloc and small TCP/IP packet ping-pong testing. 

By passing LD_PRELOAD_64=libmtmalloc.so to starting mongod, I got the performance on Solaris parallel to Linux. If the test client and sever are on separate systems, I may also need disable nagle algorithm: $ sudo ndd -set /dev/tcp tcp_naglim_def 1

The harness also has an interesting feature: RAND_INT [ min , max , multiplier ], it looks like we are able to only touch a fix fraction of data during the testing. Two things need be considered here:
  1. I looked at the current harness implementation, RAND_INT is translated to rand(), this is not really random for big (millions of records) data sets. The fix is using lrand48() instead.
  2. MongoDB uses mmap to cache data, like many other databases, it is still a page-level cache rather than row-level cache. So if your record size is small, RAND_INT [ 1, 10000000, 10 ] doesn't make you only touch 1/10 data, rather it makes you touch all the data.

2) Using YCSB.
YCSB is an extensive load testing tool. But its tests codes for mongodb is a little outdated. I need modify a little bit to add more writeConcern type.
YCSB's testing driver has some limitations:
  • You can set read/write proportion, but they are in same thread context, which means writes can block reads. So I prefer to put them in separate simultaneous jobs in testing.

  • The “recordcount” also implicitly set the max Id of data to be tested. when testing mongodb, the small number means only a few data files are mapped into the memory during transaction phase. So setting “recordcount”in transaction phase is not the right way to test against only small portion of the data.

3) Solaris related stuff.

The Solaris version of mongodb 2.2 has a large binary size compared to Linux, although it nearly does not affect the performance, but I don't like it. A quick check on its build info got “GCC 4.4 on snv_89 January 2008”, too old. This should be fixed by adding GCC option "-fno-function-sections" and "-fno-data-sections".

When starting mongod for Solaris, a warning message shows: “your operating system version does not support the method that MongoDB uses to detect impending page faults. This may result in slower performance for certain use cases”. After browsing the source codes, I found processinfo support is not there. So I added the Solaris support, currently the functions that count are ProcessInfo::blockInMemory() and ProcessInfo::blockCheckSupported().

The mongodb source code says “madvise not supported on solaris yet”, this is funny. Solaris certainly supports madvise. But madvise() is only useful when you understand your workloads. So I don't think this piece of code of calling madvise() is important.

ZFS and UFS.
==========
Since mongodb uses mmap(), it leaves a lot of things to the OS file system. UFS is a traditional file system, it uses traditional page cache (cachelist ) for caching file data. ZFS has quite a lot features beyond a file system, ZFS has its own ARC cache. The physical memory usage can be inspected using mdb ::memstat command:

# echo "::memstat"|mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     293720              1147    7%
ZFS File Data               85347               333    2%
Anon                       138902               542    3%
Exec and libs                1638                 6    0%
Page cache                  27118               105    1%
Free (cachelist)          3036514             11861   73%
Free (freelist)            576129              2250   14%

Total                     4159368             16247
Physical                  4159367             16247


In my test, ZFS is very good performance in data loading. However, because ZFS has its own cache, if the data is not mmaped, it will be searched firstly from cachelist then ZFS cache, if it's not there, then data is read from the disk into ARC cache, then data is mapped into mongod process address space as page cache. Using ZFS need more memory and when all data cannot fit in physical memory, there would be fights for memory between cachelist and ZFS cache. Tweaking ZFS parameters (manually set ARC cache size, adjust "primarycache" property, etc) did not help in my tests. For read intensive workload, using SSD as 2nd-level ARC cache will help. In addition, depending the workload and data characteristics, adjusting ZFS recordsize or disabling ZFS prefetching may worth a try.

An interesting madvise option for UFS is MADV_WILLNEED, when this option is set, the system will try to pull all data into the memory (quick warm) while during this period the mongod could not response to clients. So if your whole dataset can fit into the physical memory and you can stand the short period of unresponsive to outside during startup, you can consider using it because it warms fast and get peak performance quickly.

Wednesday, August 22, 2012

Build CouchDB 1.2 on Solaris 11 (SPARC)

I want to play a little bit with Document-oriented DB. The first try was MongoDB, however, building MongoDB on SPARC is a horrible experience: MongoDB is written for x86/x64; in addition, it uses scons as its build tool and scons doesn't work with IPS-based Solaris Studio 12.3 (I need hack scons to make it work), in fact in MongoDB's source code, it never consider other compilers but VC and GCC. Therefore, I decided to try CouchDB. Allthough CouchDB has a lot of dependencies, but they are friendly for porting.

1)  ICU (International Components for Unicode)
This is easy on Solaris 11:
# pkg install developer/icu (this will also cause library/icu to be installed)
Note: the ICU library 4.6 is built with Sun CC, so the C++ code that will link to it also should be built with Sun CC.


2) Build Erlang
Solaris 11 11/11 IPS repository provides Erlang 5.6.5, this is too old. CouchDB 1.1.x or later requires Erlang >= 5.7.3. Although Erlang website says they do daily build for Solaris Sparc (may be because Ericsson was Sun's shop), but why don't they publish the binary package for Solaris? Fortunately, the build procedure is not that difficult:

- Grab the source code from Erlang website. I chose 14B04 which version is 5.8.5.

- ./configure --prefix=/opt/local  (I use default gcc-45 from Solaris 11 IPS repository, since it is said erlang uses some GCC features (label vars?).
Ignore configure warnings unless configure fails.

- fix erts/emulator/drivers/common/inet_drv.c:
ifreq.ifr_hwaddr.sa_data would cause compiling error because ifreq does not have that member. Since Solaris 11 has defined both  SIOCGIFHWADDR and SIOCGENADDR, we only need SIOCGENADDR.
//#ifdef SIOCGIFHWADDR
//          if (ioctl(desc->s, SIOCGIFHWADDR, (char *)&ifreq) < 0)
//              break;
//          buf_check(sptr, s_end, 1+2+IFHWADDRLEN);
//          *sptr++ = INET_IFOPT_HWADDR;
//          put_int16(IFHWADDRLEN, sptr); sptr += 2;
//          /* raw memcpy (fix include autoconf later) */
//          sys_memcpy(sptr, (char*)(&ifreq.ifr_hwaddr.sa_data), IFHWADDRLEN);
//          sptr += IFHWADDRLEN;
//#elif defined(SIOCGENADDR)
#ifdef SIOCGENADDR


- After gmake install, prepend /opt/local/bin to PATH.


3) Build SpiderMonkey 1.8.5
- Download source code from here.
- It's interesting that Mozilla's project need autoconf-2.13, so I had to build this old version.
Get the source from GNU webiste, then:
./configure --prefix=/usr/local --program-suffix=-2.13

- Run autoconf in SpiderMonkey's source tree: to generate configure script.
 cd  js-1.8.5/js/src
 /usr/local/bin/autoconf-2.13


- This time I used Sun CC from Solaris Studio 12.3 which is freely available.
   CC=cc CXX=CC ./configure

- gmake; sudo gmake install  (in /usr/local)


4) Build CouchDB 1.2
Before proceeding, I need to fix something:

 - apache-couchdb-1.2.0/src/couchdb/priv/Makefile.in:
    replace "-Wall -Werror" with "-v"  because I use Sun CC.

 - cd /usr/local/lib; ln -s libmozjs185.so.1.0 libmozjs185-1.0.so
(because configure.ac always choose libmozjs185-1.0 instead of libmozjs185 due to existence of  libmozjs185-1.0.a)

- apache-couchdb-1.2.0/src/snappy/google-snappy/snappy-stubs-internal.h:
solaris does not have byteswap.h, I replaced it with byteorder.h and defined a few macros.
...
#else
//#include
#include
#define bswap_16(x) BSWAP_16(x)
#define bswap_32(x) BSWAP_32(x)
#define bswap_64(x) BSWAP_64(x)
....


- configure
 ./configure CC=cc CXX=CC LDFLAGS="-R /usr/local/lib" --with-erlang=/opt/local/lib/erlang/usr/include --with-js-lib=/usr/local/lib/ --with-js-include=/usr/local/include/js/ --prefix=/opt/couchdb1.2

- gmake

- Run test:
  $ export PATH=/usr/perl5/5.12/bin:$PATH
  $ gmake check   
Wait for a while, all tests should be successful.

- sudo gmake install

-  create a script couchdb.sh in /opt/couchdb1.2/bin (since I don't want the couchdb output to be at arbitrary place).
#!/usr/bin/bash
BIN_DIR=$(dirname $0)
cd $BIN_DIR
./couchdb -o ../var/log/couchdb.stdout -e ../var/log/couchdb.stderr ${1+"$@"}

- modify /opt/couchdb1.2/etc/couchdb/default.ini, change bind_address to let couchdb accessible from anywhere
  bind_address  = 0.0.0.0

- start couchdb: /opt/couchdb1.2/bin/couchdb.sh -b

Now, enjoy CouchDB!

UPDATE (Sept 2012):

 - I also tested Erlang 15B01 and latest 15B02, erts/emulator/drivers/common/inet_drv.c has already been fixed. But for building couchdb, we need to add something to /opt/local/lib/erlang/usr/include/erl_driver.h:
 #if defined(__sun)
#include <unistd.h>    /* for ssize_t */
#endif

- Running couchdb testsuite in browser (Erlang 14B04 and 15B01) caused erlang process crash. I posted the core dump analysis and temporary workaround in erlang mailing list. Using 15B02 doesn't have this problem, but still should be careful when the Erlang application on Solaris reloads crypto.so, see discussions here. One solution might be adding "-z nodelete" LDFLAGS in lib/crypto/c_src/sparc-sun-solaris2.11/Makefile.

Wednesday, June 27, 2012

root my phone

Compared to PC users, the smart phone users have far less freedom. I bought an android 4.0.3 phone recently. My phone was pre-installed many apps that I don't like and could not delete, in addition I could not install apps from Goolge Play because this phone is sold in China market. To solve these problems, I have to get root permission on the phone.

After a weekend study, I realized that a major way is to flash the phone using some 3rd party ROMs. I don't like this way . With further study, I searched out  a Linux security bug by chance. This bug also impacted Android. A hacker has exploited it for android. This is really good news to me. However, it's not easy to figure out the offsets for my phone: I know nothing about ARM assembly; run-as is statically linked, stripped and symbols are obfuscated, making it difficult to understand the binary; I  installed binutils-arm-linux-gnueabi on my linux desktop but arm-linux-gnueabi-objdump does not give me useful info.

In the end, I found 2 resources helped me a lot:
- android run-as source code
- IDA Disassembler 6.2 demo for Linux (I should thank this great tool)
These two resources helped me understand the assembly codes of run-as. In addition, analyzing run-as binary of Transformer Prime 4.0.3 helped me how to find the offsets for my phone because the Transformer Prime's offsets are already known.

The remaining is simple:
- Download and install android sdk
- run "android" from cmd line, add platform-tools for using adb.
- setup udev (for linux), follow this guide. After making changes,  /etc/init.d/udev restart
Enable debugging mode and disabling fastboot on the phone. Connect to my phone using "adb shell" from Linux:
- push "mempodroid" to /data/local/tmp on the phone.
- run the magic "mempodroid"using the offsets that I figured out, become root!
- remount /system of the phone in read-write mode:
  mount -o remount,rw /system
- deleted un-wanted apps in /system/app, /system/delapp, be careful when being root! (before doing this, I had removed as many apps as possible from app manager of the phone). For safety, I backed up them to sdcard using "cat" in adb shell.
- To make Google Play work, I copied the following apks to /system/app:
  GoogleLoginService.apk
  GoogleServicesFramework.apk
  OneTimeInitializer.apk
  Vending.apk
  These apks could be accquired from cyanogenmod website.
- Type "reboot" from adb shell.

It's much better now. I prefer "temporary root", it's safer.

Thursday, December 1, 2011

Optimize fonts and memory on Solaris 11 desktop

Solaris 11 was released on Nov. 9th, 2011, this is the result of 7 years of huge engineering efforts since Solaris 10. Although Solaris is mainly designed for servers nowadays, Solaris is also good for desktop usage: ZFS, timeslider, DTrace, etc. Solaris 11 comes with GNOME 2.30 which is a very stable version (I'm disappointed with GNOME 3 shell or Ubuntu Unity) and Nimbus theme (created by Sun) which is my favorite theme.
I made some changes to the default settings of Solaris 11 desktop to suit my needs. 

Fonts
The most important part for the user experience on the desktop might be fonts. Different people have difference preferences on fonts and the same font has different effects on various font-size and screen resolutions, therefore I need take some time to test different fonts and find the appropriate ones for my desktop. In the end, I chose the following fonts for my desktop screen (20.1", 1600x1200 LCD): "Nimbus Sans" for English font,  "WenQuanYi Micro Hei" for Chinese font. Below are the steps for setup:
Change  Desktop Appearance
Right click the desktop -> Desktop Appearance... -> Fonts, set the Application, Document, Desktop, Window title font to "Nimbus Sans", size "10" and set the Fixed width font to "Liberation Mono".  (Note: do not use Chinese fonts in Desktop Appearance, because some Chinese fonts at some sizes make "i" looks "l")
Install Chinese fonts.
Solaris 11's default Chinese font is "AR PL ShanHeiSun Uni" (文鼎PL細上海宋Uni),  which is a free, long history, high-quality font on Unix/Linux. But I want to use a more modern font.  Solaris 11 software repository include "WenQuanYi Zen Hei" font ( To view which fonts are available, open Package Manager -> System -> Fonts) , but I personally feel it does not fit my screen.  "WenQuanYi Micro Hei" is a better option.To install it:
  1. Download from here 
  2. Unpack the downloaded file
  3. mkdir -p ~/.fonts
  4. Copy the "wqy-microhei.ttc" file to ~/.fonts
  5. (optional) Run "fc-cache" in terminal.
  6. To verify: run "fc-list | grep WenQuanYi", you should see that.
Change the system fonts configuration
The "Sans-serif", "Serif" and "Monospace" fonts are mapped to the default system fonts in /etc/fonts, to customize it, I did:
  1. mkdir -p ~/.fonts.conf.d/
  2. cp /etc/fonts/conf.d/60-latin.conf ~/.fonts.conf.d/
  3. cp /etc/fonts/conf.d/65-nonlatin.conf ~/.fonts.conf.d/
  4. edit ~/.fonts.conf.d/60-latin.conf
    move Nimbus fonts to the front in each section so that "serif" maps to "Nimbus Roman No9 L", "sans-serif" maps to "Nimbus Sans L", and "monospace" maps to "Liberation Mono".
  5. edit ~/.fonts.conf.d/65-nonlatin.conf
    add "WenQuanYi Micro Hei" before "WenQuanYi Zen Hei" for both "serif" and "sans-serif",  and add "WenQuanYi Micro Hei Mono" before "AR PL ShanHeiSun Uni" for "monospace".
  6. To verify: run "fc-match sans", "fc-match monospace" and "fc-match serif" in terminal.

Memory
My x86 workstation has very limited memory, so I need to make changes to limit the unnecessary memory use. Here are options that you can do to save the memory:
  • Desktop Appearance -> Visual Effects, choose "None"
  • System -> Preferences -> File Management, disable "Preview"
  • System -> Preferences -> File Indexing, disable "indexing"
  • System -> Preferences -> Startup Applications, disable anything you don't want
  • Only keep the necessary Applets on your task bar
  • Check your and root cron jobs (sudo crontab -l), disable unnecessary jobs with "crontab -e"
  • Check the SMF services using "svcs" command, disable unnecessary services with "svcadm disable" command



Wednesday, November 16, 2011

Cloud Foundry

A little background
Cloud Foundry project is an open-source platform-as-a-service (PaaS) project initiated by VMware. CloudFoundry.com is a VMware hosted, managed and supported PaaS environment. Micro Cloud Foundry is a complete version of Cloud Foundry that runs in a virtual machine on a developer’s Mac or PC.

Try Cloud Foundry
There are 2 ways: install it from github or use Micro Cloud Foundry virtual machine image provided by cloudfoudry.com. The second method is relatively simple and requires an CloudFoundry.com account. Because I already have an account, I used the second method.
The testing was not very smooth by following Micro Cloud Foundry Quick Start guide. The major problem was the network issue: the VM image uses the bridged network, however, in my env, it bound to the wrong network physical adapter (you should be careful about this especially when you have multiple network adapters or you have Virtualbox installed). The cloud foundry setup didn't pointed out this problem. This caused me to use vcap.me as the domain.
To fix above issue, I need use the virtual network editor to change the binding from automatic to a dedicated network adapter. The VMware player doesn't install the virtual network editor by default, I had to follow this guide to manually extract the installation file (VMware-player-xxxx.exe /e folder-name) , find the network.cab and extract the vmnetcfg.exe and copy it to VMWare Player folder. (C:\Program Files (x86)\VMware\VMware Player).
Another issue is VCAP ROUTER: 404: after setting up the micro cloud, vmc command failed with below message: Error (JSON 404): VCAP ROUTER: 404 - DESTINATION NOT FOUND I tried reconfiguring the domain in micro cloud setup menu, the problem was gone.

Thoughts
As for cloud computing, provisioning VM instances or application instances is relatively easy, the resource management is a challenge.