I hate many enterprise applications because of very unpleasant user experiences. Here's a recent example.
When I opened an application in firefox browser, my computer nearly hanged. Using basic Solaris commands prstat, mpstat, etc, I found displaying this page caused huge number of (40K) context switches per second.
At first, I thought it might be because of some ugly javascripts. So I first checked the source of this web page, well it contains several big java script files (sum up over 1MB), then I disabled the javascript in browser, however, the problem remained. I also tried offline browsing, it didn't solve the problem, so this is not a networking issue.
What's wrong?
It looks like lwp_park/lwp_unpark are evils. Use a Dtrace script to find who caused this.
Then switch to the problemed web page in firefox, go back to that terminal running DTrace, type Ctrl-C.
The DTrace output the user stacks for causing both lwp_park() and lwp_unpark() which is interesting. After demangling C++ symbols, I got:
Then I wanted to check how many times nsCSSRendering::PaintBackground() was called by using another DTrace script.
From:
Workaround: In firefox settings, Preference -> Contents -> Load images automatically -> Exceptions -> add this website to block list.
When I opened an application in firefox browser, my computer nearly hanged. Using basic Solaris commands prstat, mpstat, etc, I found displaying this page caused huge number of (40K) context switches per second.
At first, I thought it might be because of some ugly javascripts. So I first checked the source of this web page, well it contains several big java script files (sum up over 1MB), then I disabled the javascript in browser, however, the problem remained. I also tried offline browsing, it didn't solve the problem, so this is not a networking issue.
What's wrong?
~# truss -c -p 1224 (#1244 is firefox process ID)...
^C
syscall seconds calls errors
read .016 3097 777
write .022 2126 1
open .000 2
close .000 4
fstat .000 2
access .000 1 1
ioctl .005 1145
fdsync .003 1
fcntl .000 6
lwp_park .236 28575 330
lwp_unpark .177 28340
sigaction .000 52
It looks like lwp_park/lwp_unpark are evils. Use a Dtrace script to find who caused this.
~# cat lwp_park.d
#!/usr/sbin/dtrace -qs
syscall::lwp_park:entry
/pid==$target/
{
@[ustack()]=count();
}
END
{
trunc(@, 10);
}
./lwp_park.d -p 1244
The DTrace output the user stacks for causing both lwp_park() and lwp_unpark() which is interesting. After demangling C++ symbols, I got:
The stack of lwp_unpark() is often more interesting, because lwp_park() is usually caused by waiting for a condition variable which does not directly tell you what the application is waiting for. While if there are many lwp_park(), then there are many lwp_unpark() accordingly, so I looked at lwp_unpark() stack. I'm not a browser expert, from the function name, the firefox seemed to do rendering background based on CSS. Then I checked the CSS file which size is 150KB, Damn, there're 571 occurences like below:
libc.so.1`__lwp_unpark+0x19libnspr4.so`PR_Unlock+0x102libxul.so`unsigned TimerThread::RemoveTimer(nsTimerImpl*)+0x7elibxul.so`unsigned nsTimerImpl::Cancel()+0x32libxul.so`unsigned imgContainer::FrameUpdated(unsigned,nsIntRect&)+0xbdlibxul.so`void row_callback(png_struct_def*,unsigned char*,unsigned long,int)+0x187libxul.so`MOZ_PNG_proc_IDAT_data+0x190libxul.so`MOZ_PNG_process_data+0x4celibxul.so`unsigned ReadDataOut(nsIInputStream*,void*,const char*,unsigned,unsigned,unsigned*)+0x8clibxul.so`unsigned nsStringInputStream::ReadSegments(unsigned(*)(nsIInputStream*,void*,const char*,unsigned,unsigned,unsigned*),void*,unsigned,unsigned*)+0x47libxul.so`unsigned nsPNGDecoder::WriteFrom(nsIInputStream*,unsigned,unsigned*)+0x3alibxul.so`unsigned imgContainer::ReloadImages()+0x339libxul.so`unsigned imgContainer::GetCurrentFrameRect(nsIntRect&)+0x1a1libxul.so`unsigned imgRequest::NotifyProxyListener(imgRequestProxy*)+0x10flibxul.so`unsigned imgRequestProxy::Clone(imgIDecoderObserver*,imgIRequest**)+0xe7libxul.so`unsigned nsImageLoader::Load(imgIRequest*)+0x52
libxul.so`already_AddRefednsImageLoader::Create(nsIFrame*,imgIRequest*,int,nsImageLoader*)+0x57 libxul.so`void nsPresContext::SetupBackgroundImageLoaders(nsIFrame*,const nsStyleBackground*)+0x98libxul.so`void nsCSSRendering::PaintBackgroundWithSC(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,const nsStyleBackground&,const nsStyleBorder&,unsigned,nsRect*)+0x93flibxul.so`void nsCSSRendering::PaintBackground(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,unsigned,nsRect*)+0xea24658
libc.so.1`__lwp_park+0x19libc.so.1`cond_wait_queue+0x60libc.so.1`cond_wait_common+0x1eblibc.so.1`__cond_timedwait+0x66libc.so.1`cond_timedwait+0x27libc.so.1`pthread_cond_timedwait+0x24libnspr4.so`PR_WaitCondVar+0x24alibxul.so`unsigned TimerThread::Run()+0x11clibxul.so`unsigned nsThread::ProcessNextEvent(int,int*)+0x121libxul.so`int NS_ProcessNextEvent_P(nsIThread*,int)+0x2dlibxul.so`void nsThread::ThreadFunc(void*)+0xa7libnspr4.so`_pt_root+0xe7libc.so.1`_thrp_setup+0x9blibc.so.1`_lwp_start26335
{background: url(../xxx/yyy/zzz.png)}
Then I wanted to check how many times nsCSSRendering::PaintBackground() was called by using another DTrace script.
From:
libxul.so`void nsCSSRendering::PaintBackground(nsPresContext*,nsIRenderingContext&,nsIFrame*,const nsRect&,const nsRect&,unsigned,nsRect*)+0xea
It indicated rendering this web page caused the firefox painted background 422 times (via X window), crazy!!! On the other hand, it looks like firefox's libxul.so should be improved when it renders these images. It caused about 25,000 context switches. Is it because that rendering the images is in asynchronous way and the rendered data are in local memory or on fast network so that context switches occurred so fast?#!/usr/sbin/dtrace -s
pid$target:libxul:__1cOnsCSSRenderingPPaintBackground6FpnNnsPresContext_rnTnsIRenderingContext_pnInsIFrame_rknGnsRect_9AIp7_v_:ea
{
@ =count();
}Ran this DTrace script, and switched to the problemed web page in firefox. this time DTrace output:~# ./pid.d -p 1244
dtrace: script './pid.d' matched 1 probe
^C
422
Workaround: In firefox settings, Preference -> Contents -> Load images automatically -> Exceptions -> add this website to block list.
No comments:
Post a Comment