Thursday, March 3, 2011

crazy CSS, thanks DTrace

I hate many enterprise applications because of very unpleasant user experiences. Here's a recent example.
When I opened an application in firefox browser, my computer nearly hanged. Using basic Solaris commands prstat, mpstat, etc, I found displaying this page caused huge number of (40K) context switches per second.
At first, I thought it might be because of some ugly javascripts. So I first checked the source of this web page, well it contains several big java script files (sum up over 1MB), then I disabled the javascript in browser, however, the problem remained. I also tried offline browsing, it didn't solve the problem, so this is not a networking issue.

What's wrong?

~# truss -c -p 1224  (#1244 is firefox process ID)
^C
syscall               seconds   calls  errors
read                     .016    3097     777
write                    .022    2126       1
open                     .000       2
close                    .000       4
fstat                    .000       2
access                   .000       1       1
ioctl                    .005    1145
fdsync                   .003       1
fcntl                    .000       6
lwp_park                 .236   28575     330
lwp_unpark               .177   28340
sigaction                .000      52
...

It looks like lwp_park/lwp_unpark are evils. Use a Dtrace script to find who caused this.

~# cat lwp_park.d
#!/usr/sbin/dtrace -qs
syscall::lwp_park:entry
/pid==$target/
{
 @[ustack()]=count();
}
END
{
trunc(@, 10);
}

./lwp_park.d -p 1244   

Then switch to the problemed web page in firefox, go back to that terminal running DTrace, type Ctrl-C.
The DTrace output the user stacks for causing both lwp_park() and lwp_unpark() which is interesting. After demangling C++ symbols, I got:

              libc.so.1`__lwp_unpark+0x19
              libnspr4.so`PR_Unlock+0x102
              libxul.so`unsigned TimerThread::RemoveTimer(nsTimerImpl*)+0x7e
              libxul.so`unsigned nsTimerImpl::Cancel()+0x32
              libxul.so`unsigned imgContainer::FrameUpdated(unsigned,nsIntRect&)+0xbd
              libxul.so`void row_callback(png_struct_def*,unsigned char*,unsigned long,int)+0x187
              libxul.so`MOZ_PNG_proc_IDAT_data+0x190
              libxul.so`MOZ_PNG_process_data+0x4ce
              libxul.so`unsigned ReadDataOut(nsIInputStream*,void*,const char*,unsigned,unsigned,unsigned*)+0x8c
              libxul.so`unsigned nsStringInputStream::ReadSegments(unsigned(*)(nsIInputStream*,void*,const char*,unsigned,unsigned,unsigned*),void*,unsigned,unsigned*)+0x47
              libxul.so`unsigned nsPNGDecoder::WriteFrom(nsIInputStream*,unsigned,unsigned*)+0x3a
              libxul.so`unsigned imgContainer::ReloadImages()+0x339
              libxul.so`unsigned imgContainer::GetCurrentFrameRect(nsIntRect&)+0x1a1
              libxul.so`unsigned imgRequest::NotifyProxyListener(imgRequestProxy*)+0x10f
              libxul.so`unsigned imgRequestProxy::Clone(imgIDecoderObserver*,imgIRequest**)+0xe7
              libxul.so`unsigned nsImageLoader::Load(imgIRequest*)+0x52

libxul.so`already_AddRefednsImageLoader::Create(nsIFrame*,imgIRequest*,int,nsImageLoader*)+0x57
              libxul.so`void nsPresContext::SetupBackgroundImageLoaders(nsIFrame*,const nsStyleBackground*)+0x98
              libxul.so`void nsCSSRendering::PaintBackgroundWithSC(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,const nsStyleBackground&,const nsStyleBorder&,unsigned,nsRect*)+0x93f
              libxul.so`void nsCSSRendering::PaintBackground(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,unsigned,nsRect*)+0xea
            24658

              libc.so.1`__lwp_park+0x19
              libc.so.1`cond_wait_queue+0x60
              libc.so.1`cond_wait_common+0x1eb
              libc.so.1`__cond_timedwait+0x66
              libc.so.1`cond_timedwait+0x27
              libc.so.1`pthread_cond_timedwait+0x24
              libnspr4.so`PR_WaitCondVar+0x24a
              libxul.so`unsigned TimerThread::Run()+0x11c
              libxul.so`unsigned nsThread::ProcessNextEvent(int,int*)+0x121
              libxul.so`int NS_ProcessNextEvent_P(nsIThread*,int)+0x2d
              libxul.so`void nsThread::ThreadFunc(void*)+0xa7
              libnspr4.so`_pt_root+0xe7
              libc.so.1`_thrp_setup+0x9b
              libc.so.1`_lwp_start
            26335 
The stack of lwp_unpark() is often more interesting, because lwp_park() is usually caused by waiting for a condition variable which does not directly tell you what the application is waiting for. While if there are many lwp_park(), then there are many lwp_unpark() accordingly, so I looked at lwp_unpark() stack. I'm not a browser expert, from the function name, the firefox seemed to do rendering background based on CSS.  Then I checked the CSS file which size is 150KB, Damn, there're 571 occurences like below:

{background: url(../xxx/yyy/zzz.png)} 

Then I wanted to check how many times  nsCSSRendering::PaintBackground() was called by using another DTrace script.

 From:
libxul.so`void nsCSSRendering::PaintBackground(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,unsigned,nsRect*)+0xea  

#!/usr/sbin/dtrace -s
pid$target:libxul:__1cOnsCSSRenderingPPaintBackground6FpnNnsPresContext_rnTnsIRenderingContext_pnInsIFrame_rknGnsRect_9AIp7_v_:ea
{
 @ =count();
 Ran this DTrace script, and switched to the problemed web page in firefox. this time DTrace output:
~# ./pid.d -p 1244
dtrace: script './pid.d' matched 1 probe
^C
              422
It indicated rendering this web page caused the firefox painted background 422 times (via X window), crazy!!! On the other hand, it looks like firefox's libxul.so should be improved when it renders these images. It caused about 25,000 context switches. Is it because that rendering the images is in asynchronous way and the rendered data are in local memory or on fast network so that context switches occurred so fast?

Workaround: In firefox settings, Preference -> Contents -> Load images automatically -> Exceptions -> add this website to block list.


Tuesday, October 26, 2010

JiuZhai_HuangLong

JiuZhai and HuangLong are my wife's dream vacation places for years. This October we finally embarked on the journey. The cost of 3-days trip was very high (even more expensive than some overseas travel), but the view is very beutiful and unique. More photos taken by me can be found here, here.


Thursday, September 2, 2010

Sun set in China

Sun, my favorite brand.
Sun, my favorite logo.
Sun, the best engineering culture I have met with.
Sun, the most important career experience for me.
Sun, set in China on August 31st, 2010.

Sunday, July 18, 2010

MeeGo!

I don't want to use my laptop provided by the company any more: 1) it's 4 years old. 2) overheating problem. I decided to buy a device with my own money. I don't want to invest much because any computer becomes out of dated very soon. The requirement are: light, fast to start, energy efficient. iPad is not my choice because I need do some coding occasionally.

Finally I chose a netbook ViewSonic VNB 102p, it is based on Intel N450 Atom platform, the price is reasonable (less than RMB2000). I evaluated several OSes and installed MeeGo and Ubuntu Remix 10.04, these are designed for netbooks. Amazingly, all devices work out of box on these 2 OSes.

When I started to use this new netbook, I found I spent more time on MeeGo than Ubuntu. It might be because everything is fine on Ubuntu but I need some time to make some applications work on Meego. And Meego looks more simple, faster to start. I installed and customized thunderbird (for mail), openoffice 3.2.1 and skype (for voice chat), compiled Shrew (for VPN) and Shotwell 0.6.1 (for managing photos and online albums). Now I have used it for one month and this netbook can meet my demands for daily use.

Thursday, June 17, 2010

Why flying east takes shorter time

I'm not a frequent traveler. Some years ago I found flying from west of U.S to east U.S. took 1 hour less time than return trip, I didn't find answer at that time. Recently I decided to investigate this by the help of google search.

The answer is simple: WIND . But wait, why do most people in the world find flying east takes shorter time, does the wind over Earth surface always flies east? Then I studied some flights schedule around the equator such as Singapore to Tokyo and some flights in South America, there's no significant difference in flying time between goes and come-backs. After I read some articles, below factors should be considered:

- Wind and Earth Rotation or Prevailing Winds .  You know the directions of the winds in different areas of the world.

- The locations of most cities with flights. Open the world map, you can find the much more cities are located between lattitude 30 and 60, in this area, the direction of wind is from the west to the east.

- You should consider long distance flights (>3 hours), otherwise the airplane may not spend most of time at cruise speed, and many other factors (such as reserving time for air traffic control) can affect the flying time.

- Coriolis effect. Yes, it affects the direction of the wind and it also affects the trails of airplanes, i.e. when the airplane flies from low lattitude to high lattitude, the plane will be deflected to the east. However, Coriolis itself nearly does not affect the differences of flying time between goes and come-backs. For example, if you fly northwest from Singapore to North Eurpoe, the real flying distance may be longer than you imagine from the map, the return trip is also longer.

Am I right? I'm not 100% sure. I'm just trying to explain the reason :)