Fix out-of-scope CLBufferMemoryRegion's buffer still in queue issue

When a CLBufferMemoryRegion is freed, it also frees its cl::Buffer
object. At this point we need to flush the queue to ensure all prior
commands that may use this buffer are completed before the buffer's
deallocation.

Previously a CommandQueue object is owned as a member inside
CLBufferMemoryRegion. Whenever CLBufferMemoryRegion is freed it causes
the queue to be released, which implicitly flushes the queue.

Now we need to explicitly flush the queue, without the excessive
releasing of the queue

Resolves COMPMID-6492

Signed-off-by: SiCong Li <sicong.li@arm.com>
Change-Id: I799507bcff8526d1381cde53d7c6298684c6d3ee
Reviewed-on: https://review.mlplatform.org/c/ml/ComputeLibrary/+/10126
Benchmark: Arm Jenkins <bsgcomp@arm.com>
Tested-by: Arm Jenkins <bsgcomp@arm.com>
Reviewed-by: Gian Marco Iodice <gianmarco.iodice@arm.com>
Comments-Addressed: Arm Jenkins <bsgcomp@arm.com>
diff --git a/arm_compute/runtime/CL/CLMemoryRegion.h b/arm_compute/runtime/CL/CLMemoryRegion.h
index 690a924..66a30fa 100644
--- a/arm_compute/runtime/CL/CLMemoryRegion.h
+++ b/arm_compute/runtime/CL/CLMemoryRegion.h
@@ -105,6 +105,7 @@
      * @param[in] buffer Buffer to be used as a memory region
      */
     CLBufferMemoryRegion(const cl::Buffer &buffer);
+    virtual ~CLBufferMemoryRegion() override;
 
     // Inherited methods overridden :
     void *ptr() final;
diff --git a/src/runtime/CL/CLMemoryRegion.cpp b/src/runtime/CL/CLMemoryRegion.cpp
index 380e406..00f91a0 100644
--- a/src/runtime/CL/CLMemoryRegion.cpp
+++ b/src/runtime/CL/CLMemoryRegion.cpp
@@ -72,6 +72,14 @@
     _mem = buffer;
 }
 
+CLBufferMemoryRegion::~CLBufferMemoryRegion()
+{
+    // Flush the command queue to ensure all commands that may use this memory buffer are scheduled to be finished before
+    // this buffer is freed
+    // Do not call finish as it is a blocking call which affects the performance
+    CLScheduler::get().queue().flush();
+}
+
 void *CLBufferMemoryRegion::ptr()
 {
     return nullptr;
@@ -110,6 +118,9 @@
     {
         try
         {
+            // Can only use the blocking finish instead of the non-blocking flush here, because clSVMFree requires all
+            // commands that may use the svm pointer to finish beforehand
+            // https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clSVMFree.html
             clFinish(CLScheduler::get().queue().get());
             _mem = cl::Buffer();
             clSVMFree(_ctx.get(), _ptr);