Do not consolidate data in BMPImageDecoder

In BMPImageDecoder, never call SharedBuffer::data(), which potentially copies all the data. Instead, do the following: - Read directly from the SharedBuffer when it is safe to do so. e.g. when only one byte is needed. We already know there is enough data, and getSomeData will return at least a byte if there is enough data. - Use a FastSharedBufferReader when we may be on a segment boundary. As a result, we make fewer copies when decoding BMPs. We never call SharedBuffer::data(), which copies up to the entire encoded data received so far. Occasionally we will make small copies when a two or four byte number is on a segment boundary. Add a setter to FastSharedBufferReader to update its SharedBuffer, so the BMPImageReader can have one as a member, which it will reuse across calls. Remove BIG_ENDIAN, and eliminate another memcpy that was there to support it. Using my modified version of ImageDecodeBench (see crrev.com/1145373003), I see a slight speedup as compared to the old version which relied on SharedBuffer::data(). (My data can be found here: [1].) This matches my expectations, since the new version is copying less data. Using a version of ImageDecodeBench with a premerged buffer (see crrev.com/1318713005), I still see a speed improvement, although it is very small. (See sheet 2 of the above spreadsheet). [1] https://docs.google.com/a/google.com/spreadsheets/d/1GN2FDytKvkdZzV-WF4IEQaYQF65wa4bRxG67bNgpjAE/edit?usp=sharing BUG=467772 Review URL: https://codereview.chromium.org/1259083003 git-svn-id: svn://svn.chromium.org/blink/trunk@201811 bbb929c8-8fbe-4397-9dbb-9b2b20218538

Do not consolidate data in BMPImageDecoder
In BMPImageDecoder, never call SharedBuffer::data(), which potentially copies all the data. Instead, do the following: - Read directly from the SharedBuffer when it is safe to do so. e.g. when only one byte is needed. We already know there is enough data, and getSomeData will return at least a byte if there is enough data. - Use a FastSharedBufferReader when we may be on a segment boundary. As a result, we make fewer copies when decoding BMPs. We never call SharedBuffer::data(), which copies up to the entire encoded data received so far. Occasionally we will make small copies when a two or four byte number is on a segment boundary. Add a setter to FastSharedBufferReader to update its SharedBuffer, so the BMPImageReader can have one as a member, which it will reuse across calls. Remove BIG_ENDIAN, and eliminate another memcpy that was there to support it. Using my modified version of ImageDecodeBench (see crrev.com/1145373003), I see a slight speedup as compared to the old version which relied on SharedBuffer::data(). (My data can be found here: [1].) This matches my expectations, since the new version is copying less data. Using a version of ImageDecodeBench with a premerged buffer (see crrev.com/1318713005), I still see a speed improvement, although it is very small. (See sheet 2 of the above spreadsheet). [1] https://docs.google.com/a/google.com/spreadsheets/d/1GN2FDytKvkdZzV-WF4IEQaYQF65wa4bRxG67bNgpjAE/edit?usp=sharing BUG=467772 Review URL: https://codereview.chromium.org/1259083003 git-svn-id: svn://svn.chromium.org/blink/trunk@201811 bbb929c8-8fbe-4397-9dbb-9b2b20218538
abc75960 · scroggo@chromium.org · 89b071bd · abc75960 · abc75960 · abc75960
Commit abc75960 authored Sep 04, 2015 by scroggo@chromium.org
7 changed files
--- a/third_party/WebKit/Source/platform/image-decoders/FastSharedBufferReader.cpp
+++ b/third_party/WebKit/Source/platform/image-decoders/FastSharedBufferReader.cpp
@@ -41,6 +41,16 @@ FastSharedBufferReader::FastSharedBufferReader(PassRefPtr<SharedBuffer> data)
 {
 }

+void FastSharedBufferReader::setData(PassRefPtr<SharedBuffer> data)
+{
+    if (data == m_data)
+        return;
+    m_data = data;
+    m_segment = 0;
+    m_segmentLength = 0;
+    m_dataPosition = 0;
+}
+
 const char* FastSharedBufferReader::getConsecutiveData(size_t dataPosition, size_t length, char* buffer)
 {
    RELEASE_ASSERT(dataPosition + length <= m_data->size());

--- a/third_party/WebKit/Source/platform/image-decoders/FastSharedBufferReader.h
+++ b/third_party/WebKit/Source/platform/image-decoders/FastSharedBufferReader.h
@@ -46,6 +46,8 @@ class PLATFORM_EXPORT FastSharedBufferReader {
 public:
    FastSharedBufferReader(PassRefPtr<SharedBuffer> data);

+    void setData(PassRefPtr<SharedBuffer>);
+
    // Returns a consecutive buffer that carries the data starting
    // at |dataPosition| with |length| bytes.
    // This method returns a pointer to a memory segment stored in

--- a/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageDecoder.cpp
+++ b/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageDecoder.cpp
@@ -31,6 +31,7 @@
 #include "config.h"
 #include "platform/image-decoders/bmp/BMPImageDecoder.h"

+#include "platform/image-decoders/FastSharedBufferReader.h"
 #include "wtf/PassOwnPtr.h"

 namespace blink {
@@ -96,8 +97,12 @@ bool BMPImageDecoder::processFileHeader(size_t& imgDataOffset)
    ASSERT(!m_decodedOffset);
    if (m_data->size() < sizeOfFileHeader)
        return false;
-    const uint16_t fileType = (m_data->data()[0] << 8) | static_cast<uint8_t>(m_data->data()[1]);
-    imgDataOffset = readUint32(10);
+
+    char buffer[sizeOfFileHeader];
+    FastSharedBufferReader fastReader(m_data);
+    const char* fileHeader = fastReader.getConsecutiveData(0, sizeOfFileHeader, buffer);
+    const uint16_t fileType = (fileHeader[0] << 8) | static_cast<uint8_t>(fileHeader[1]);
+    imgDataOffset = BMPImageReader::readUint32(&fileHeader[10]);
    m_decodedOffset = sizeOfFileHeader;

    // See if this is a bitmap filetype we understand.

--- a/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageDecoder.h
+++ b/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageDecoder.h
@@ -54,11 +54,6 @@ private:
    void decodeSize() override { decode(true); }
    void decode(size_t) override { decode(false); }

-    inline uint32_t readUint32(int offset) const
-    {
-        return BMPImageReader::readUint32(m_data.get(), m_decodedOffset + offset);
-    }
-
    // Decodes the image.  If |onlySize| is true, stops decoding after
    // calculating the image size. If decoding fails but there is no more
    // data coming, sets the "decode failure" flag.

--- a/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageReader.cpp
+++ b/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageReader.cpp
@@ -69,6 +69,7 @@ namespace blink {
 BMPImageReader::BMPImageReader(ImageDecoder* parent, size_t decodedAndHeaderOffset, size_t imgDataOffset, bool isInICO)
    : m_parent(parent)
    , m_buffer(0)
+    , m_fastReader(nullptr)
    , m_decodedOffset(decodedAndHeaderOffset)
    , m_headerOffset(decodedAndHeaderOffset)
    , m_imgDataOffset(imgDataOffset)
@@ -166,6 +167,7 @@ bool BMPImageReader::readInfoHeaderSize()
    ASSERT(m_decodedOffset == m_headerOffset);
    if ((m_decodedOffset > m_data->size()) || ((m_data->size() - m_decodedOffset) < 4))
        return false;
+
    m_infoHeader.biSize = readUint32(0);
    // Don't increment m_decodedOffset here, it just makes the code in
    // processInfoHeader() more confusing.
@@ -541,13 +543,14 @@ bool BMPImageReader::processColorTable()
    if ((m_decodedOffset > m_data->size()) || ((m_data->size() - m_decodedOffset) < tableSizeInBytes))
        return false;
    m_colorTable.resize(m_infoHeader.biClrUsed);
+
+    // On non-OS/2 1.x, an extra padding byte is present, which we need to skip.
+    const size_t bytesPerColor = m_isOS21x ? 3 : 4;
    for (size_t i = 0; i < m_infoHeader.biClrUsed; ++i) {
-        m_colorTable[i].rgbBlue = m_data->data()[m_decodedOffset++];
-        m_colorTable[i].rgbGreen = m_data->data()[m_decodedOffset++];
-        m_colorTable[i].rgbRed = m_data->data()[m_decodedOffset++];
-        // Skip padding byte (not present on OS/2 1.x).
-        if (!m_isOS21x)
-            ++m_decodedOffset;
+        m_colorTable[i].rgbBlue = readUint8(0);
+        m_colorTable[i].rgbGreen = readUint8(1);
+        m_colorTable[i].rgbRed = readUint8(2);
+        m_decodedOffset += bytesPerColor;
    }

    // We've now decoded all the non-image data we care about.  Skip anything
@@ -594,8 +597,8 @@ BMPImageReader::ProcessingResult BMPImageReader::processRLEData()

        // For every entry except EOF, we'd better not have reached the end of
        // the image.
-        const uint8_t count = m_data->data()[m_decodedOffset];
-        const uint8_t code = m_data->data()[m_decodedOffset + 1];
+        const uint8_t count = readUint8(0);
+        const uint8_t code = readUint8(1);
        if ((count || (code != 1)) && pastEndOfImage(0))
            return Failure;

@@ -629,8 +632,8 @@ BMPImageReader::ProcessingResult BMPImageReader::processRLEData()

                // Fail if this takes us past the end of the desired row or
                // past the end of the image.
-                const uint8_t dx = m_data->data()[m_decodedOffset + 2];
-                const uint8_t dy = m_data->data()[m_decodedOffset + 3];
+                const uint8_t dx = readUint8(2);
+                const uint8_t dy = readUint8(3);
                if (dx || dy)
                    m_buffer->setHasAlpha(true);
                if (((m_coord.x() + dx) > m_parent->size().width()) || pastEndOfImage(dy))
@@ -670,7 +673,7 @@ BMPImageReader::ProcessingResult BMPImageReader::processRLEData()
                    return InsufficientData;

                // One BGR triple that we copy |count| times.
-                fillRGBA(endX, m_data->data()[m_decodedOffset + 3], m_data->data()[m_decodedOffset + 2], code, 0xff);
+                fillRGBA(endX, readUint8(3), readUint8(2), code, 0xff);
                m_decodedOffset += 4;
            } else {
                // RLE8 has one color index that gets repeated; RLE4 has two
@@ -733,7 +736,7 @@ BMPImageReader::ProcessingResult BMPImageReader::processNonRLEData(bool inRLE, i
            // the most significant bits in the byte).
            const uint8_t mask = (1 << m_infoHeader.biBitCount) - 1;
            for (size_t byte = 0; byte < unpaddedNumBytes; ++byte) {
-                uint8_t pixelData = m_data->data()[m_decodedOffset + byte];
+                uint8_t pixelData = readUint8(byte);
                for (size_t pixel = 0; (pixel < pixelsPerByte) && (m_coord.x() < endX); ++pixel) {
                    const size_t colorIndex = (pixelData >> (8 - m_infoHeader.biBitCount)) & mask;
                    if (m_decodingAndMask) {

--- a/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageReader.h
+++ b/third_party/WebKit/Source/platform/image-decoders/bmp/BMPImageReader.h
@@ -31,9 +31,10 @@
 #ifndef BMPImageReader_h
 #define BMPImageReader_h

-#include <stdint.h>
+#include "platform/image-decoders/FastSharedBufferReader.h"
 #include "platform/image-decoders/ImageDecoder.h"
 #include "wtf/CPU.h"
+#include <stdint.h>

 namespace blink {

@@ -42,26 +43,16 @@ namespace blink {
 class PLATFORM_EXPORT BMPImageReader {
    WTF_MAKE_FAST_ALLOCATED(BMPImageReader);
 public:
-    // Read a value from |data[offset]|, converting from little to native
-    // endianness.
-    static inline uint16_t readUint16(SharedBuffer* data, int offset)
+    // Read a value from |buffer|, converting to an int assuming little
+    // endianness
+    static inline uint16_t readUint16(const char* buffer)
    {
-        uint16_t result;
-        memcpy(&result, &data->data()[offset], 2);
-    #if CPU(BIG_ENDIAN)
-        result = ((result & 0xff) << 8) | ((result & 0xff00) >> 8);
-    #endif
-        return result;
+        return *reinterpret_cast<const uint16_t*>(buffer);
    }

-    static inline uint32_t readUint32(SharedBuffer* data, int offset)
+    static inline uint32_t readUint32(const char* buffer)
    {
-        uint32_t result;
-        memcpy(&result, &data->data()[offset], 4);
-    #if CPU(BIG_ENDIAN)
-        result = ((result & 0xff) << 24) | ((result & 0xff00) << 8) | ((result & 0xff0000) >> 8) | ((result & 0xff000000) >> 24);
-    #endif
-        return result;
+        return *reinterpret_cast<const uint32_t*>(buffer);
    }

    // |parent| is the decoder that owns us.
@@ -71,7 +62,11 @@ public:
    BMPImageReader(ImageDecoder* parent, size_t decodedAndHeaderOffset, size_t imgDataOffset, bool isInICO);

    void setBuffer(ImageFrame* buffer) { m_buffer = buffer; }
-    void setData(SharedBuffer* data) { m_data = data; }
+    void setData(SharedBuffer* data)
+    {
+        m_data = data;
+        m_fastReader.setData(data);
+    }

    // Does the actual decoding.  If |onlySize| is true, decoding only
    // progresses as far as necessary to get the image size.  Returns
@@ -124,14 +119,23 @@ private:
        uint8_t rgbRed;
    };

-    inline uint16_t readUint16(int offset) const
+    inline uint8_t readUint8(size_t offset)
+    {
+        return m_fastReader.getOneByte(m_decodedOffset + offset);
+    }
+
+    inline uint16_t readUint16(int offset)
    {
-        return readUint16(m_data.get(), m_decodedOffset + offset);
+        char buffer[2];
+        const char* data = m_fastReader.getConsecutiveData(m_decodedOffset + offset, 2, buffer);
+        return readUint16(data);
    }

-    inline uint32_t readUint32(int offset) const
+    inline uint32_t readUint32(int offset)
    {
-        return readUint32(m_data.get(), m_decodedOffset + offset);
+        char buffer[4];
+        const char* data = m_fastReader.getConsecutiveData(m_decodedOffset + offset, 4, buffer);
+        return readUint32(data);
    }

    // Determines the size of the BMP info header.  Returns true if the size
@@ -196,27 +200,26 @@ private:
    // row.
    // NOTE: Only as many bytes of the return value as are needed to hold
    // the pixel data will actually be set.
-    inline uint32_t readCurrentPixel(int bytesPerPixel) const
+    inline uint32_t readCurrentPixel(int bytesPerPixel)
    {
+        // We need at most 4 bytes, starting at m_decodedOffset + offset.
+        char buffer[4];
        const int offset = m_coord.x() * bytesPerPixel;
+        const char* encodedPixel = m_fastReader.getConsecutiveData(m_decodedOffset + offset, bytesPerPixel, buffer);
        switch (bytesPerPixel) {
        case 2:
-            return readUint16(offset);
+            return readUint16(encodedPixel);

        case 3: {
            // It doesn't matter that we never set the most significant byte
-            // of the return value here in little-endian mode, the caller
-            // won't read it.
+            // of the return value, the caller won't read it.
            uint32_t pixel;
-            memcpy(&pixel, &m_data->data()[m_decodedOffset + offset], 3);
-    #if CPU(BIG_ENDIAN)
-            pixel = ((pixel & 0xff00) << 8) | ((pixel & 0xff0000) >> 8) | ((pixel & 0xff000000) >> 24);
-    #endif
+            memcpy(&pixel, encodedPixel, 3);
            return pixel;
        }

        case 4:
-            return readUint32(offset);
+            return readUint32(encodedPixel);

        default:
            ASSERT_NOT_REACHED();
@@ -283,6 +286,7 @@ private:

    // The file to decode.
    RefPtr<SharedBuffer> m_data;
+    FastSharedBufferReader m_fastReader;

    // An index into |m_data| representing how much we've already decoded.
    size_t m_decodedOffset;

--- a/third_party/WebKit/Source/platform/image-decoders/ico/ICOImageDecoder.h
+++ b/third_party/WebKit/Source/platform/image-decoders/ico/ICOImageDecoder.h
@@ -85,12 +85,14 @@ private:

    inline uint16_t readUint16(int offset) const
    {
-        return BMPImageReader::readUint16(m_data.get(), m_decodedOffset + offset);
+        // TODO (scroggo): This consolidates the data, meaning unnecessary copies.
+        return BMPImageReader::readUint16(&m_data->data()[m_decodedOffset + offset]);
    }

    inline uint32_t readUint32(int offset) const
    {
-        return BMPImageReader::readUint32(m_data.get(), m_decodedOffset + offset);
+        // TODO (scroggo): This consolidates the data, meaning unnecessary copies.
+        return BMPImageReader::readUint32(&m_data->data()[m_decodedOffset + offset]);
    }

    // If the desired PNGImageDecoder exists, gives it the appropriate data.