Menu

#13 CPU-bound

open
nobody
None
5
2007-11-03
2007-11-03
No

gl-117 is much more CPU-bound than most other games. This is a problem on my Athlon64 3200+ (Newcastle core, overclocked from 2.2 to ~2.4GHz, but with single-channel DDR333) w/ NVidia 7600GT. At max quality setting (5/150), I get at best 35 fps at 1680x1050. Depending on terrain complexity, sometimes down to 20 fps at worst. This is after compiling with -O3 -ffast-math -mfpmath=sse. I'm running i386 Debian.
Backing of the quality to 4 raises the frame rates to ~40. By comparison, my Core 2 Duo E6600 (2.4GHz, 4MB, dual channel DDR2-800) with g965 graphics hardware gets similar or worse frame rates at quality 4/150 at 1680x1050. I run 64bit Ubuntu on it. In most games, I get way better fps at higher quality settings on my slower CPU with the NVidia 7600GT. A 7600GT has maybe 10 times the fill rates of a g965, and probably shader power, too. I was thinking before I started writing this that gl-117 was actually faster on the core 2 with g965 graphics, but I guess it isn't at equal resolution and quality. Maybe last time I checked it was before I'd upgraded my nvidia drivers and overclocked my Athlon64. And with the Debian i386 binary on the Athlon64, vs. my -march=k8 -O3 -ffast-math -mfpmath=sse
binary from the debian sources. Just recompiling actually made a difference of several fps.

The fact remains that gl-117 seems a lot more CPU-bound than most games. My Athlon64 can run Nexuiz, ut2004, TA Spring, etc. etc. at high frame rates, and they're more graphically complex than gl-117. BTW, I really enjoy gl-117.

I used oprofile to find the hot spots.

opreport:
~2.4GHz Athlon64, 512kB L2. NVidia drivers.
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1100000
Counted L2_CACHE_MISS events (L2 Cache Misses) with a unit mask of 0x07 (multiple flags) count 500000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L2_CACHE_MISS:...|
samples| %| samples| %| samples| %|
------------------------------------------------------
3184691 61.4894 1123 57.2084 657 24.8206 gl-117
1077926 20.8124 592 30.1579 1529 57.7635 zero
557517 10.7644 113 5.7565 96 3.6267 libGLcore.so.100.14.19
193498 3.7360 57 2.9037 113 4.2690 no-vmlinux
59218 1.1434 41 2.0886 204 7.7068 libc-2.6.1.so

from opannotate --assembly src/gl-117
GLLandscape::drawTexturedQuad(int, int) total: 1003180 31.5001 287 25.5565 317 48.249

(32% of total CPU cycles on the machine, 26% of L1 Data cache misses, and 48% of L2 cache misses happened in this function.)

and using 25% of total CPU is:

08087950 <_ZN11GLLandscape4drawEii>: /* GLLandscape::draw(int, int) total: 779997 24.4921 73 6.5004 89 13.5464 */

Is that what openGL display lists are good for? So gl-117 wouldn't have to run through the geometry of the terrain itself every frame? I don't know openGL programming...

Another one is
08076160 <_ZN9Landscape7isWaterEi>: /* Landscape::isWater(int) total: 64735 2.0327 1 0.0890 2 0.3044 */

This should be in the .h, so it can be inlined. Pushing the frame pointer in this function uses .7% of the whole machine's CPU time...

Discussion


Log in to post a comment.

MongoDB Logo MongoDB