Improve linker_verbose_tracking.py

linker_verbose_tracking.py is a useful script for understanding why particular symbols and object files get pulled in to Chrome binaries on Windows. Previously this script did its tracking purely by object file name, which breaks down for things like mime_util.obj because there are *four* object files with that name. This change updates the script so that it tracks the chain of dependencies using fully-specified object file names - basically foo.lib(bar.obj) instead of just bar.obj. This avoids the dead-ends and misleading information that occasionally happened with the previous version. In order to make this usable (specifying a fully-qualified object file name to search for is unwieldy) the search string is now understood to be any sub-string of the object file name. The set of matches is printed so that the search can easily be refined. It works much better now. This was made necessary by the investigation of the linked bug. R=stanisc@chromium.org BUG=717103 Change-Id: I25682e6d5b534c69d3743456918c60d42f746e10 Reviewed-on: https://chromium-review.googlesource.com/565221 Commit-Queue: Bruce Dawson <brucedawson@chromium.org> Reviewed-by: Stanislav Chiknavaryan <stanisc@chromium.org> Cr-Commit-Position: refs/heads/master@{#485425}

Improve linker_verbose_tracking.py
linker_verbose_tracking.py is a useful script for understanding why particular symbols and object files get pulled in to Chrome binaries on Windows. Previously this script did its tracking purely by object file name, which breaks down for things like mime_util.obj because there are *four* object files with that name. This change updates the script so that it tracks the chain of dependencies using fully-specified object file names - basically foo.lib(bar.obj) instead of just bar.obj. This avoids the dead-ends and misleading information that occasionally happened with the previous version. In order to make this usable (specifying a fully-qualified object file name to search for is unwieldy) the search string is now understood to be any sub-string of the object file name. The set of matches is printed so that the search can easily be refined. It works much better now. This was made necessary by the investigation of the linked bug. R=stanisc@chromium.org BUG=717103 Change-Id: I25682e6d5b534c69d3743456918c60d42f746e10 Reviewed-on: https://chromium-review.googlesource.com/565221 Commit-Queue: Bruce Dawson <brucedawson@chromium.org> Reviewed-by: Stanislav Chiknavaryan <stanisc@chromium.org> Cr-Commit-Position: refs/heads/master@{#485425}
8dd579aa · Bruce Dawson · Commit Bot · edf897f5 · 8dd579aa
Commit 8dd579aa authored Jul 10, 2017 by Bruce Dawson Committed by Commit Bot Jul 10, 2017
Hide whitespace changes
Inline Side-by-side

Showing with 87 additions and 46 deletions

tools/win/linker_verbose_tracking.py tools/win/linker_verbose_tracking.py +87 -46

No files found.
--- a/tools/win/linker_verbose_tracking.py
+++ b/tools/win/linker_verbose_tracking.py
@@ -26,28 +26,44 @@ In these cases it can be necessary to see what symbol is causing one object file
 to reference another. Removing or moving the problematic symbol can fix the
 problem. See crrev.com/2559063002 for an example of such a change.

+In some cases a target needs to be a source_set in component builds (so that all
+of its functions will be exported) but should be a static_library in
+non-component builds. The BUILD.gn pattern for that is:
+
+  if (is_component_build) {
+    link_target_type = "source_set"
+  } else {
+    link_target_type = "static_library"
+  }
+  target(link_target_type, "filters") {
+
 One complication is that there are sometimes multiple source files with the
-same name, such as crc.c, which can make analysis more difficult or
-ambiguous. If this becomes a blocking issue they it may be necessary to
-temporarily rename the source file.
+same name, such as mime_util.cc, all creating mime_util.obj. The script takes
+whatever search criteria you pass and looks for all .obj files that were loaded
+that contain that sub-string. It will print the search list that it will use
+before reporting on why all of these .obj files were loaded. For instance, the
+initial output if mime_util.obj is specified will be something like this:

-Object file name matching is case sensitive.
+>python linker_verbose_tracking.py verbose.txt mime_util.obj
+  Searching for [u'net.lib(mime_util.obj)', u'base.lib(mime_util.obj)']

-Typical output when run on chrome.dll verbose link output is:
+If you want to restrict the search to just one of these .obj files then you can
+give a fully specified name, like this:

->python tools\win\linker_verbose_tracking.py chrome_verbose_02.txt flac_crc
-Database loaded - 11277 xrefs found
-flac_crc.obj pulled in for symbol "_FLAC__crc8" by
-        stream_decoder.obj
-        bitwriter.obj
+>python linker_verbose_tracking.py verbose.txt base.lib(mime_util.obj)
+
+Object file name matching is case sensitive.

-stream_decoder.obj pulled in for symbol "_FLAC__stream_decoder_new" by
-        stream_encoder.obj
-bitwriter.obj pulled in for symbol "_FLAC__bitwriter_new" by
-        stream_encoder.obj
+Typical output when run on chrome_watcher.dll verbose link output is:

-stream_encoder.obj pulled in for symbol "_FLAC__stream_encoder_new" by
-        Command-line obj file: audio_encoder.obj
+>python tools\win\linker_verbose_tracking.py verbose08.txt drop_data
+Database loaded - 3844 xrefs found
+Searching for common_sources.lib(drop_data.obj)
+common_sources.lib(drop_data.obj).obj pulled in for symbol Metadata::Metadata...
+        common.lib(content_message_generator.obj)
+
+common.lib(content_message_generator.obj).obj pulled in for symbol ...
+        Command-line obj file: url_loader.mojom.obj
 """

 import io
@@ -58,16 +74,26 @@ import sys
 def ParseVerbose(input_file):
  # This matches line like this:
  #   Referenced in skia.lib(SkError.obj)
-  # with the groups()[0] referring to the object file name without the file
-  # extension.
-  obj_match = re.compile('.*\((.*)\.obj\)')
-  # Prefix used for symbols that are referenced:
+  #   Referenced in cloud_print_helpers.obj
+  #   Loaded libvpx.lib(vp9_encodemb.obj)
+  # groups()[0] will be 'Referenced in ' or 'Loaded ' and groups()[1] will be
+  # the fully qualified object-file name (including the .lib name if present).
+  obj_match = re.compile('.*(Referenced in |Loaded )(.*)')
+
+  # Prefix used for symbols that are found and therefore loaded:
  found_prefix = '      Found'

+  # This dictionary is indexed by (fully specified) object file names and the
+  # payload is the list of object file names that caused the object file that
+  # is the key name to be pulled in.
  cross_refs = {}
+  # This dictionary has the same index as cross_refs but its payload is the
+  # simple that caused the object file to be pulled in.
  cross_refed_symbols = {}

+  # None or a list of .obj files that referenced a symbol.
  references = None
+
  # When you redirect the linker output to a file from a command prompt the
  # result will be a utf-8 (or ASCII?) output file. However if you do the same
  # thing from PowerShell you get a utf-16 file. So, we need to handle both
@@ -81,6 +107,8 @@ def ParseVerbose(input_file):
  with io.open(input_file, encoding=file_encoding) as file_handle:
    for line in file_handle:
      if line.startswith(found_prefix):
+        # Create a list to hold all of the references to this symbol which
+        # caused the linker to load it.
        references = []
        # Grab the symbol name
        symbol = line[len(found_prefix):].strip()
@@ -88,30 +116,32 @@ def ParseVerbose(input_file):
          # Strip off leading and trailing quotes if present.
          symbol = symbol[1:-1]
        continue
+      # If we are looking for references to a symbol...
      if type(references) == type([]):
        sub_line = line.strip()
        match = obj_match.match(sub_line)
-        # See if the line is part of the list of places where this symbol was
-        # referenced
-        if sub_line.count('Referenced ') > 0:
-          if match:
-            # This indicates a match that is xxx.lib(yyy.obj), so a referencing
-            # .obj file that was itself inside of a library. We discard the
-            # library name.
-            reference = match.groups()[0]
+        if match:
+          match_type, obj_name = match.groups()
+          # See if the line is part of the list of places where this symbol was
+          # referenced:
+          if match_type == 'Referenced in ':
+            if '.lib' in obj_name:
+              # This indicates a match that is xxx.lib(yyy.obj), so a
+              # referencing .obj file that was itself inside of a library.
+              reference = obj_name
+            else:
+              # This indicates a match that is just a pure .obj file name
+              # I think this means that the .obj file was specified on the
+              # linker command line.
+              reference = ('Command-line obj file: ' +
+                           obj_name)
+            references.append(reference)
          else:
-            # This indicates a match that is just a pure .obj file name
-            # I think this means that the .obj file was specified on the linker
-            # command line.
-            reference = ('Command-line obj file: ' +
-                         sub_line[len('Referenced in '): -len('.obj')])
-          references.append(reference)
-        elif sub_line.count('Loaded ') > 0:
-          if match:
-            loaded = match.groups()[0]
-            cross_refs[loaded] = references
-            cross_refed_symbols[loaded] = symbol
-          references = None
+            assert(match_type == 'Loaded ')
+            if '.lib' in obj_name and '.obj' in obj_name:
+              cross_refs[obj_name] = references
+              cross_refed_symbols[obj_name] = symbol
+            references = None
      if line.startswith('Finished pass 1'):
        # Stop now because the remaining 90% of the verbose output is
        # not of interest. Could probably use /VERBOSE:REF to trim out
@@ -121,15 +151,26 @@ def ParseVerbose(input_file):


 def TrackObj(cross_refs, cross_refed_symbols, obj_name):
-  if obj_name.lower().endswith('.obj'):
-    obj_name = obj_name[:-len('.obj')]
-
  # Keep track of which references we've already followed.
  tracked = {}

  # Initial set of object files that we are tracking.
-  targets = [obj_name]
+  targets = []
+  for key in cross_refs.keys():
+    # Look for any object files that were pulled in that contain the name
+    # passed on the command line.
+    if obj_name in key:
+      targets.append(key)
+  if len(targets) == 0:
+    targets.append(obj_name)
+  # Print what we are searching for.
+  if len(targets) == 1:
+    print 'Searching for %s' % targets[0]
+  else:
+    print 'Searching for %s' % targets
  printed = False
+  # Follow the chain of references up to an arbitrary maximum level, which has
+  # so far never been approached.
  for i in range(100):
    new_targets = {}
    for target in targets:
@@ -140,14 +181,14 @@ def TrackObj(cross_refs, cross_refed_symbols, obj_name):
          printed = True
          print '%s.obj pulled in for symbol "%s" by' % (target, symbol)
          for ref in cross_refs[target]:
-            print '\t%s.obj' % ref
+            print '\t%s' % ref
            new_targets[ref] = True
    if len(new_targets) == 0:
      break
    print
    targets = new_targets.keys()
  if not printed:
-    print ('No references to %s.obj found. Directly specified in sources or a '
+    print ('No references to %s found. Directly specified in sources or a '
          'source_set?' % obj_name)