OSDN Git Service

DO NOT MERGE: BlastBufferQueue: Fake release if not received by complete
authorRobert Carr <racarr@google.com>
Sat, 1 Jan 2022 00:59:34 +0000 (16:59 -0800)
committerRobert Carr <racarr@google.com>
Tue, 4 Jan 2022 21:12:02 +0000 (13:12 -0800)
Perfetto telemetry shows a cluster where the transactionComplete
callback is arriving but the release buffer callback is not arriving.
This leads to blocking and ANRs. We try to work around this by
generating a fake release buffer callback for previous buffers when we
receive a TransactionCompleteCallback. How can we know this is safe? The
criteria to safely release buffers are as follows:
Condition 1. SurfaceFlinger does not plan to use the buffer
Condition 2. Have the acquire fence (or a later signalling fence)if
dropped, have the HWC release fence if presented.
Now lets break down cases where the transaction complete callback
arrives before the release buffer callback. All release buffer callbacks
fall in to two categories:
Category 1. In band with the complete callback. In which case the client
library in SurfaceComposerClient always sends release callbacks
first.
Category 2. Out of band, e.g. from setBuffer or merge when dropping buffers

In category 2 there are two scenarios:
Scenario 1: Release callback was never going to arrive (bug)
Scenario 2. Release callback descheduled, e.g.
a. Transaction is filled with buffer and held in VRI/WM/SysUI
b. A second transaction with buffer is merged in, the release
buffer callback is emitted but not scheduled (async binder)
c. The merged transaction is applied
d. SurfaceFlinger processes a frame and emits the transaction
complete callback
e. The transaction complete callback is scheduled before the
release callback is ever scheduled (since the 1 way binders are
from different processes).
In both scenarios, we can satisfy Conditions 1 and 2, because the HWC
present fence is a later signalling fence than the acquire fence which
the out of band callback will pass. While we may block extra on this
fence. We will be safe by condition 2. Receiving the transaction
complete callback should indicate condition 1 is satisfied.

In category 1 there should only be a single scenario, the release buffer
callback for frame N-1 is emitted immediately before the transaction
callback for frame N, and so if it hasn't come at that time (and isn't
out of band/scheduled e.g. category 2) then it will never come. We can
further break this down in to two scenarios:
1. stat.previousReleaseFence is set correctly. This is the
expected case. In this case, this is the same fence we would
receive from the release buffer callback, and so we can satisfy
condition 1 and 2 and safely release.
2. stat.previousReleaseFence is not set correctly. There is no
known reason for this to happen, but there is also no known
reason why we would receive a complete callback without the
corresponding release, and so we have to explore the option as
a potential risk/behavior change from the change.
Hypothetically in category 1 case 2 we could experience some tearing.
In category 1 we are currently experiencing ANR, and so the tradeoff
is a risk of tearing vs a certainty of ANR. To tear the following would
have to happen:
1. Enter the case where we previously ANRed
2. Enter the subcase where the release fence is not set
correctly
3. Be scheduled very fast on the client side, in practicality
HWC present has already returned and the fence should be firing
very soon
4. The previous frame not be GL comp, in which case we were
already done with the buffer and don't need the fence anyway.
Conditions 2 and 3 should already make the occurence of tearing much
lower than the occurence of ANRs. Furthermore 100% of observed ANRs
were during GL comp (rotation animation) and so the telemetry indicates
the risk of introducing tearing is very low.

To review, we have shown that we can break down the side effects from the change in
to two buckets (category 1, and category 2). In category 2, the possible negative side
effect is to use too late of a fence. However this is still safe, and should
be exceedingly rare. In category 1 we have shown that there are two scenarios,
and in one of these scenarios we can experience tearing. We have furthermore argued
that the risk of tearing should be exceedingly low even in this scenario,
while the risk of ANR in this scenario was nearly 100%.

This issue is not appearing in git_master branches and so the DO_NOT_MERGE tag.

Bug: 197269223
Bug: 212846697
Test: Existing tests pass
Change-Id: I757ed132ac076aa811df816e04a68f57b38e47e7

libs/gui/BLASTBufferQueue.cpp
libs/gui/SurfaceComposerClient.cpp
libs/gui/include/gui/BLASTBufferQueue.h
libs/gui/include/gui/SurfaceComposerClient.h
libs/gui/include/gui/test/CallbackUtils.h

index 76636c2..2104c77 100644 (file)
@@ -350,6 +350,16 @@ void BLASTBufferQueue::transactionCallback(nsecs_t /*latchTime*/, const sp<Fence
                     transactionCompleteCallback = std::move(mTransactionCompleteCallback);
                     mTransactionCompleteFrameNumber = 0;
                 }
+                std::vector<ReleaseCallbackId> staleReleases;
+                for (const auto& [key, value]: mSubmitted) {
+                    if (currFrameNumber > key.framenumber) {
+                        staleReleases.push_back(key);
+                    }
+                }
+                for (const auto& staleRelease : staleReleases) {
+                    releaseBufferCallbackLocked(staleRelease, stat.previousReleaseFence ? stat.previousReleaseFence : Fence::NO_FENCE,
+                                                stat.transformHint, stat.currentMaxAcquiredBufferCount);
+                }
             } else {
                 BQA_LOGE("Failed to find matching SurfaceControl in transactionCallback");
             }
@@ -358,6 +368,7 @@ void BLASTBufferQueue::transactionCallback(nsecs_t /*latchTime*/, const sp<Fence
                      "empty.");
         }
 
+
         decStrong((void*)transactionCallbackThunk);
     }
 
@@ -399,8 +410,14 @@ void BLASTBufferQueue::flushShadowQueue() {
 void BLASTBufferQueue::releaseBufferCallback(const ReleaseCallbackId& id,
                                              const sp<Fence>& releaseFence, uint32_t transformHint,
                                              uint32_t currentMaxAcquiredBufferCount) {
-    ATRACE_CALL();
     std::unique_lock _lock{mMutex};
+    releaseBufferCallbackLocked(id, releaseFence, transformHint, currentMaxAcquiredBufferCount);
+}
+
+void BLASTBufferQueue::releaseBufferCallbackLocked(const ReleaseCallbackId& id,
+                                                   const sp<Fence>& releaseFence, uint32_t transformHint,
+                                                   uint32_t currentMaxAcquiredBufferCount) {
+    ATRACE_CALL();
     BQA_LOGV("releaseBufferCallback %s", id.to_string().c_str());
 
     if (mSurfaceControl != nullptr) {
@@ -421,7 +438,10 @@ void BLASTBufferQueue::releaseBufferCallback(const ReleaseCallbackId& id,
 
     const auto numPendingBuffersToHold =
             isEGL ? std::max(0u, mMaxAcquiredBuffers - currentMaxAcquiredBufferCount) : 0;
-    mPendingRelease.emplace_back(ReleasedBuffer{id, releaseFence});
+    auto rb = ReleasedBuffer{id, releaseFence};
+    if (std::find(mPendingRelease.begin(), mPendingRelease.end(), rb) == mPendingRelease.end()) {
+        mPendingRelease.emplace_back(rb);
+    }
 
     // Release all buffers that are beyond the ones that we need to hold
     while (mPendingRelease.size() > numPendingBuffersToHold) {
index 922bceb..725ea65 100644 (file)
@@ -296,7 +296,7 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
                                       transactionStats.latchTime, surfaceStats.acquireTime,
                                       transactionStats.presentFence,
                                       surfaceStats.previousReleaseFence, surfaceStats.transformHint,
-                                      surfaceStats.eventStats);
+                                      surfaceStats.eventStats, surfaceStats.currentMaxAcquiredBufferCount);
             }
 
             callbackFunction(transactionStats.latchTime, transactionStats.presentFence,
@@ -321,7 +321,7 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
                                       transactionStats.latchTime, surfaceStats.acquireTime,
                                       transactionStats.presentFence,
                                       surfaceStats.previousReleaseFence, surfaceStats.transformHint,
-                                      surfaceStats.eventStats);
+                                      surfaceStats.eventStats, surfaceStats.currentMaxAcquiredBufferCount);
                 if (callbacksMap[callbackId].surfaceControls[surfaceStats.surfaceControl]) {
                     callbacksMap[callbackId]
                             .surfaceControls[surfaceStats.surfaceControl]
index c1d3891..24d978a 100644 (file)
@@ -95,6 +95,8 @@ public:
             const std::vector<SurfaceControlStats>& stats);
     void releaseBufferCallback(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
                                uint32_t transformHint, uint32_t currentMaxAcquiredBufferCount);
+    void releaseBufferCallbackLocked(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
+                                     uint32_t transformHint, uint32_t currentMaxAcquiredBufferCount);
     void setNextTransaction(SurfaceComposerClient::Transaction *t);
     void mergeWithNextTransaction(SurfaceComposerClient::Transaction* t, uint64_t frameNumber);
     void setTransactionCompleteCallback(uint64_t frameNumber,
@@ -164,6 +166,12 @@ private:
     struct ReleasedBuffer {
         ReleaseCallbackId callbackId;
         sp<Fence> releaseFence;
+        bool operator==(const ReleasedBuffer& rhs) const {
+            // Only compare Id so if we somehow got two callbacks
+            // with different fences we don't decrement mNumAcquired
+            // too far.
+            return rhs.callbackId == callbackId;
+        }
     };
     std::deque<ReleasedBuffer> mPendingRelease GUARDED_BY(mMutex);
 
index 9af9e64..0d1d1a3 100644 (file)
@@ -57,14 +57,15 @@ class Region;
 struct SurfaceControlStats {
     SurfaceControlStats(const sp<SurfaceControl>& sc, nsecs_t latchTime, nsecs_t acquireTime,
                         const sp<Fence>& presentFence, const sp<Fence>& prevReleaseFence,
-                        uint32_t hint, FrameEventHistoryStats eventStats)
+                        uint32_t hint, FrameEventHistoryStats eventStats, uint32_t currentMaxAcquiredBufferCount)
           : surfaceControl(sc),
             latchTime(latchTime),
             acquireTime(acquireTime),
             presentFence(presentFence),
             previousReleaseFence(prevReleaseFence),
             transformHint(hint),
-            frameEventStats(eventStats) {}
+            frameEventStats(eventStats),
+            currentMaxAcquiredBufferCount(currentMaxAcquiredBufferCount) {}
 
     sp<SurfaceControl> surfaceControl;
     nsecs_t latchTime = -1;
@@ -73,6 +74,7 @@ struct SurfaceControlStats {
     sp<Fence> previousReleaseFence;
     uint32_t transformHint = 0;
     FrameEventHistoryStats frameEventStats;
+    uint32_t currentMaxAcquiredBufferCount = 0;
 };
 
 using TransactionCompletedCallbackTakesContext =
index 6403208..d2e0426 100644 (file)
@@ -135,7 +135,7 @@ private:
         void verifySurfaceControlStats(const SurfaceControlStats& surfaceControlStats,
                                        nsecs_t latchTime) const {
             const auto& [surfaceControl, latch, acquireTime, presentFence, previousReleaseFence,
-                         transformHint, frameEvents] = surfaceControlStats;
+                         transformHint, frameEvents, ignore] = surfaceControlStats;
 
             ASSERT_EQ(acquireTime > 0, mBufferResult == ExpectedResult::Buffer::ACQUIRED)
                     << "bad acquire time";