Wednesday, January 30, 2013

Build Mesos on Solaris

Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. The purpose to build it is that I want to have experiments on Spark , a popular framework for cluster computing (e.g. big data analysis).
Mesos integrated a lots of third party software,  the build process on Solaris is not very smooth. At first, I tried Solaris Studio (SunCC), but had some troubles. To save time, I decided to use gcc.
The build environment is: Solaris 11 11/11 sparc;  gcc is 4.5.2 from Solaris 11 IPS repository; mesos 0.9.0.

Preparation Steps:

- need to build automake-1.11.6 (automake-1.11.2 from Solaris 11 is not enough for mesos 0.9.0)
CC=cc CXX=CC ./configure --program-suffix=-1.11
gmake;gmake install  (in /usr/local/bin)

- modify 
in "solaris" section:
    LIBS="$LIBS -lsocket -lnsl -lproject -lproc -lresolv -lsendfile -lxnet"

then in "JAVA_LDFLAGS" section, add something:
    elif test "$OS_NAME" = "solaris"; then
      for arch in sparc; do
        if test -e "$dir"; then
          # Note that these are libtool specific flags.
          JAVA_LDFLAGS="-L$dir -R$dir -ljvm"

execute autoconf.

run configure:
 ./configure CC=gcc CXX=g++ CFLAGS="-m32 -pthreads" CXXFLAGS="-m32 -pthreads" JAVA_HOME=/usr/java --prefix=/opt/mesos

Porting issues and solutions:

1) process.cpp in libprocess
compile process.cpp and future.hpp failed because syntax error in assembly.
using CC -S, then cc -c process.s, we found it use "pause" instruction.
Solution: use smt_pause()

2) process.cpp in libprocess
ssize_t length = sendfile(s, fd, offset, size);
ssize_t length = sendfile(s, fd, &offset, size);

3)pid.cpp in libprocess
for gethostbyname2_r, it is not availabe on Solaris, has to modify the codes to use gethostname_r

4)port_posix.h, atomic_pointer.h
for macros and memory barriers.

5)getpwuid_r in zookeeper.c
getpwuid_r(uid, &pw, buf, sizeof(buf), &pwp))
smiliar as gethostbyname2

6)recordio.c  in zookeeper
redefined htonll:
recordio.h:int64_t htonll(int64_t v);
On solaris, the prototype is: uint64_t htonll(uint64_t hostlonglong);
linux doesn't have htonll
Solution: comment out htonll

7)mt_adapter.c in zookeeper
atomic ops: fetch_and_add

8)cli.c in zookeeper
ctime_r(&tctime, tctimes)
Solaris requires: char *ctime_r(const time_t *clock, char *buf, int buflen);

./common/utils.hpp:359:17: error: ‘NAME_MAX’ was not declared
Solution: define it #define NAME_MAX 255

Solaris project implementation is not completed, so got compiling error.
Workaround: skip it by modifying macro definition

11) protobuf is compiled in 64-bit by default.
Solution: reconfigure in thirdparty/protobuf according to config.log and add -m64 in CFLAGS and CXXFLAGS: (it's better to add -m32 in top level of mesos configure).

12) /usr/lib/python2.6/pycc complained:
cc: No valid input files specified, no output generated
because the src file is c++: native/proxy_executor.cpp
Workaround (by looking at /usr/lib/python2.6/pycc, pycc and pyCC is same):
in ~/Downloads/mesos-0.9.0/src/python
$ PYCC_CC=g++ PYCC_CXX=g++ LDFLAGS="-lnsl -lresolv -lsendfile -lsocket" python build
(It seems on Solaris 11.1 there's no such issue).

13) gmake test
need -lxnet

14) got runtime error:
       if (errno != EINPROGRESS) {...
   in fact, errno is 0 here.
Reason: not with "-pthreads" in build libprocess.
Solution: also add  "-pthreads" in so that it take effects for each thread having a private copy of errno.

Other notes:

- Building mesos 0.9.0 fails with gcc 4.7 on Linux. Most current linux versions come with this gcc version.
- I also built mesos 0.9.0 on Solaris 11.1 x64, the code modifications are less because the platform is x64.
- The modified files are now put into github.