Optimizing the profiler itself so it runs 4x faster

Up to Share Your Story

Optimizing the profiler itself so it runs 4x faster

Posted by Richard Flamsholt at May 28. 2008

Here's the story of how we at EQATEC used the profiler to optimize ... itself!

We tried out the v1.1 profiler on a quite big app, NASA's 8MB large "World Wind", simply to test it. The profiler happily instrumented all 36,157 methods and the app ran fine afterwards, only slightly slower. Great.

However, the instrumentation itself took almost an entire minute. Maybe we could make the profiler go a little bit faster? We had worked hard to make sure the profiler's runtime module was lightning fast, but not spend much time optimizing the actual profiling step. It was coded sensible of course, but we had never - you know - profiled it.

Okay, since the profiler itself is a .NET app we simply fed it to itself, generating a profiled version of the profiler, used that to profile the World Wind app, and looked at the report.

And boy, were we in for a surprise!

It turned out that some small, final step took 75% of the entire time. What?! Here's the code:


        foreach (string name in m_MethodHandleMap)
        {
          ins = bw.AppendAfter(ins, bw.Create(OpCodes.Ldstr, name));
          ins = bw.AppendAfter(ins, bw.Create(OpCodes.Call, registerMethod));
        }


The reason? The AppendAfter-method apparently used linear search to find the instruction to append after, and doing this 2 x 36,157 times sure takes a long time. So forget about all the time spent doing fancy analysis and injections of extra MSIL-code for each and every method - no, it was this final step of adding two simple instructions per method that accounted for 75% of the entire time.

We quickly changed two other code-lines and the loop could then be coded like this instead:


        foreach (string name in m_MethodHandleMap)
        {
          bw.Append(bw.Create(OpCodes.Ldstr, name));
          bw.Append(bw.Create(OpCodes.Call, registerMethod));
        }


With this change profiling went from 61,7 sec to 14,0 sec. That's four times faster! Not the 20-30% that I'd maybe hoped for - no, a freakin' four times faster! And all from changing just a few lines of code. And all within 15 minutes since we got the idea to profile the profiler. Wow.

We laughed quite a bit at this, and it just goes to show that performance bottlenecks typically are not at all where you expect them to be. So in the next release you all get a faster profiler, simply because we took our own medicine.


Re: Optimizing the profiler itself so it runs 4x faster

Posted by Richard Flamsholt at June 12. 2008
I'll follow up with a similar story for the viewer.

To test the profiler I profiled Paint.NET (*) and got a 900K report. It took a tremendously long time to load - about 20 seconds. So just before releasing version 1.2 I profiled the viewer itself and found the hotspot immediately.

It wasn't loading or parsing the 900K xml-file. It wasn't creating all the call-graph info objects, etc etc. No, it was simply populating the upper listbox, even though it used both suspend/resume and begin/end update.

This is the old code:

      m_listSummary.SuspendLayout();
      m_listSummary.BeginUpdate();
      m_listSummary.Items.Clear();

      foreach (MethodInfo mi in m_Model.Methods)
      {
        if (mi.Calls == 0)
          continue;
        ListViewItem item = CreateMethodInfoItem(mi);
        m_listSummary.Items.Add(item);
      }

      m_listSummary.Sort();
      m_listSummary.EndUpdate();
      m_listSummary.ResumeLayout();


All time was spent inside this method (ie not in CreateMethodInfoItem etc). I changed this to add the items in one fell swoop, using AddRange:

      // Add items using AddRange instead of adding individually using Add;
      // the profiler showed this was 10x faster. And yes, it's that much
      // faster regardless of the BeginUpdate/EndUpdate. Amazing.
      IList<ListViewItem> items = new List<ListViewItem>();
      foreach (MethodInfo mi in m_Model.Methods)
      {
        if (mi.Calls == 0)
          continue;
        ListViewItem item = CreateMethodInfoItem(mi);
        items.Add(item);
      }
      ListViewItem[] itemsArray = new ListViewItem[items.Count];
      items.CopyTo(itemsArray, 0);
      m_listSummary.BeginUpdate();
      m_listSummary.Items.Clear();
      m_listSummary.Items.AddRange(itemsArray);
      m_listSummary.Sort();
      m_listSummary.EndUpdate();

The result? The report loaded now in not 20 seconds, but just 2 seconds: 10 times faster.

Amazing. And once again, a big performance gain in an unexpected place. I though newing objects and formatting strings for the listviewitem was the bottleneck, but no.


(*) Regarding profiling Paint.NET: if you want to do that, then notice that Paint.NET installs its assemblies into the native image cache. Since profiling doesn't change the assembly signature (name, version) Windows will load the original DLLs from that cache instead, no matter how hard you try to double-click on the profiled app to run it. You'll have to uninstall (ngen uninstall) the files or delete them manually using file explorer to stop it from using the original files and run the profiled versions instead. That was sort of an unexpected problem.
Powered by Ploneboard
http://www.eqatec.com / Forum / EQATEC Profiler / Share Your Story / Optimizing the profiler itself so it runs 4x faster