• Ian Prest's avatar
    PDF a11y: Don't break paragraphs on font changes · f39b7b73
    Ian Prest authored
    The current heuristics always creates a new paragraph whenever the
    font-size changes, which is undesirable.
    
    Simply removing the size check causes many of the existing tests to fail
    because the size-change is often the only reason that we have multiple
    paragraphs at all.  The problem is our PDF file has too few lines on
    each page to compute reasonable line & paragraph size thresholds.
    
    So this change required changing the heuristics.  The new heuristic is
    as follows:
    
    1. We keep track of the top & bottom of the current line, as weighted
    averages of the (recent) text boxes on the line.
    2. When we encounter a new text box, if it significantly overlaps the
    top-to-bottom range, it's considered part of the same line.
    3. If we are starting a new line, we also check the paragraph threshold
    to see if we should also start a new paragraph.  If the paragraph
    threshold couldn't be computed (because there weren't enough lines on
    the page), we compare against the line size.
    
    We also introduce the `PDFExtensionAccessibilityTextExtractionTest`
    test suite.  These tests are like the tree-dump tests, but they dump
    raw text content, split into lines and paragraphs.  (Compared to
    tree-dump tests, this approach allows us to test the kNextOnLine and
    kPreviousOnLine attributes are correct.)
    
    Bug: 985604
    Change-Id: Idfce6edfef42580e7fac4d8a7753c82495c15bd1
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1743032
    Commit-Queue: Ian Prest <iapres@microsoft.com>
    Reviewed-by: default avatarLei Zhang <thestig@chromium.org>
    Reviewed-by: default avatarKevin Babbitt <kbabbitt@microsoft.com>
    Cr-Commit-Position: refs/heads/master@{#690153}
    f39b7b73
directional-text-runs-expected-uia-win.txt 228 Bytes