I am working on a project to accelerate tensor operations on embedded FPGA with approximate computing techniques. As always, the entire work is available as an open-source project to facilitate research in this field.
For this project, TensorFlow Lite Micro is deployed on Zybo Z7 (Zynq-7020). On the programmable logic, it is implemented a hardware tensor processor to delegate Conv2D and DepwiseConv2D operations. This design accelerates computation, reduces hardware resources and energy consumption.
Here you find a research poster on this project: Accelerating Tiny ML on embedded FPGA - Google Drive
This approach utilizes custom floating-point and logarithmic number representation on the filter and bias tensors.
Hence, a central point that can be optimized in this work is the accuracy with quantized aware training methods using TensorFlow. (Any help?)
If you are interested in contributing to this project in any sense please let me know :D. The progress will be published in design conferences and journals.