From 1c6190de5d4f475cc6f9f4de29c05e5e8ffc35fa Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 6 Dec 2017 20:27:33 +0000 Subject: [PATCH] [MachineCombiner] Add up latencies of all instructions in new pattern. Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319951 91177308-0d34-0410-b5e6-96231b3b80d8 --- lib/CodeGen/MachineCombiner.cpp | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/lib/CodeGen/MachineCombiner.cpp b/lib/CodeGen/MachineCombiner.cpp index f61db309ed7..26bee98c9aa 100644 --- a/lib/CodeGen/MachineCombiner.cpp +++ b/lib/CodeGen/MachineCombiner.cpp @@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen( // of the original code sequence. This may allow the transform to proceed // even if the instruction depths (data dependency cycles) become worse. - unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace); - unsigned RootLatency = 0; + // Account for the latency of the inserted and deleted instructions by + // adding up their latencies. This assumes that the inserted and deleted + // instructions are dependent instruction chains, which might not hold + // in all cases. + unsigned NewRootLatency = 0; + for (unsigned i = 0; i < InsInstrs.size() - 1; i++) + NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]); + NewRootLatency += getLatency(Root, NewRoot, BlockTrace); + unsigned RootLatency = 0; for (auto I : DelInstrs) RootLatency += TSchedModel.computeInstrLatency(I); -- 2.11.0