This research covers few topics: multi-bit flip-flop (MBFF), data-driven clock-gating (DDCG), latch gating and robust clock-tree design. In the area of MBFF the group is cooperating with Mellanox corporate, where the theory and methods developed at BIU are examined on Mellanox’s designs, to assess the potential of power savings by combining MBFFs with DDCG.
For robust clock distribution, a method of mixing various types of clock drivers in a tree has been shown to considerably reducing power supply noise without degrading the clock skew. Methods that take advantage of the shields involved in the clock distribution for reducing its power consumption and availing more degrees of freedom for clock tuning are being studied too.
This research takes few directions. In the first we examine the tradeoff between allowing the computation results to be erroneous on the account of reducing the energy consumed by the computation. We showed that for less than 1% of error probability it is possible to reduce the computation power of an entire processor by nearly 40%. Furthermore, with the aid of a simple control it is possible to upfront predict when an error should occur, and then allotting more clock cycles to complete proper computation.Adding few cycles with low probability negligibly degrades the clocks-per-instruction (CPI). The second direction examines how to break multipliers into pipeline stages, so the measure of throughput and energy product is minimized.Few proposals are presently being studied. Another direction revisits the particular operation of matrix multiplication and how it relates to the HW implementation. New summations techniques of the terms involved in multiplication are explored in a hope to considerable accelerate matrix HW multiplication. This research is conducted jointly with CEVA corporate.
This research aims at improving the utilization of the silicon area allocated for the cache memories. Assuming that the cache area is given, we would like to answer two questions. The first considers whether it is possible to on-line reconfigure the cache, e.g. block-size or associativity, and optimize those according to the time-varying workload behavior. This research has two aspects: the physical structure and layout that should allow such configuration, and how to monitor and deduce from the workload what is the optimal block-size and the optimal associativity. The second question we address is how to optimally divide the memory hierarchy levels, assuming a given amount of memory resources. To this end we are developing a performance model that captures the main parameters affecting performance such as miss rates, miss penalties, levels of caches and area resources.