Can You Do Deep Learning, Graph Search, And Conditional Optimization With Explosive Speed And Low Power Consumption? Quantitative Capabilities Of Brain Computers

Survey 08/07/2022

3 key points
✔️ First large-scale quantitative evaluation in the field of neuromorphic computation using Intel Loihi, a dedicated computation system for Spiking Neural Network (SNN), which is closer to biological neural systems than ANN
✔️ Found to be very compatible with networks with brain-like features such as recurrent structure, temporal information, stochasticity, and sparsity
✔️ Demonstrated that solving graph search, conditional optimization, sparse modeling, etc. with SNN has "orders of magnitude" of benefit in both computation time and energy

Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook
written by Mike Davies, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, Sumedh R. Risbud
(Submitted on May 2021)
Comments: Published in: Proceedings of the IEEE ( Volume: 109, Issue: 5 )

The images used in this article are from the paper, the introductory slides, or created based on them.

author's preface

Since the paper presented in this article is a large paper in the field of brain-based computation and neuromorphic computing, which will be unfamiliar to the majority of readers, we will list some of the results obtained that are worth mentioning before going into the text.

I would like you to read the following bullet points, remembering at least one important indicator, Energy Delay Product (EDP), which is the product of the time it takes to get a calculation result (latency) and the energy consumed in the calculation.

Compared to running the equivalent algorithm on a conventional computer (CPU, GPU, etc.), the results on "Intel Loihi" are

In Sequential MNIST inference with the LSTM equivalent algorithm, EDP37x improvement!
In the LASSO regression, the best case improves latency by 5 orders of magnitude and energy consumption by 6 orders of magnitude, i.e., improves EDP by 11 orders of magnitude (!?!?).
In graph traversal, the EDP is Improved by 3 to 7 orders of magnitude!
In SLAM, almost at the same speed100x improvement in power efficiency!
In the constraint satisfaction problem, at least EDP in the range of variables 4-4003 orders of magnitude improvement!

While the technology is different from mainstream ANN, isn't it obvious that it is worth knowing to look at these metrics?

first of all

Deep artificial neural networks are based on the information processing principles of the brain and have brought breakthroughs in machine learning across many problem domains.

Neuromorphic computing takes this a step further by creating computers specialized and adapted to computational models directly inspired by the form and function of the biological nervous system to perform intelligent information processing in low power and real-time. Properties of the nervous system include integration of memory and computation, low-precision and probabilistic computation, a huge number of inputs and outputs to neurons, asynchronous operation, and distributed communication using time-domain information of binary signals called spikes, and continuous learning. These features make the model of computer architecture dedicated to the Spiking Neural Network (SNN) a direct challenge to the von Neumann model used in almost all computers today.

Until now, few results have been published that can demonstrate quantitative computational value compared to modern CPUs and GPUs, but the situation has changed with the introduction of Loihi, a processor for neuromorphic research developed by Intel.

In this paper, we extensively review the results of novel approaches (e.g., graph search, stochastic conditional optimization, and sparse modeling) that directly exploit features of SNNs in addition to deep learning on Loihi. While the benefits, if any, in feed-forward networks were marginal, brain-like networks with features such as recurrent structure, use of accurate temporal information, stochasticity, and sparsity can be computed with orders of magnitude lower latency and energy than conventional state-of-the-art approaches It was found that the

What is SNN?

This paper requires knowledge of SNN models as a prerequisite, but since this knowledge is not yet common, the author gives a brief introduction here. Although there are many variations of SNN models, ranging from complex neuroscience-oriented models to hardware implementation-friendly models, the author will focus on the Leaky Integrate-and-Fire (LIF) Model, a simple neuron model that is commonly used in the field of neuromorphic computation. First, the most primary correspondence between SNNs and ANNs is shown in the following table (in the feed-forward case).

	SNN	ANN
Models of neurons	Spiking neurons (with internal states) ex. LIF	Activation function (no internal state) ex. ReLU
i-o-communication	0 or 1 (spike) + timing or frequency information on the time axis	High precision information, no time axis

The biggest feature of SNN is that it has a dimension in the time direction and the output of neurons is limited to 0 or 1 (spikes), but the statistical model that appears when SNN is run for a long time can also be seen as a general artificial neural network (ANN). Next, we represent an example of the equation of the discrete-time LIF model.

where $l$: layer number, $i$: neuron number in layer, $t$: time step.

The underlined part 1. represents the input. In the case of this equation, the input is from the all-coupled layer, which is the same matrix-vector product as in the all-coupled layer of ANN, but $O_j^{l-1,t}$ is a vector of only 0s and 1s output from the LIF neurons in the previous layer, and in most cases, one element is sparse, so multiplication is unnecessary and the calculation of the 0-element Therefore, multiplication is unnecessary and the calculation of the 0-element part can be omitted to speed up the process. It is possible to replace not only the all-joining layer but also the convolutional layer and the output of a more biological synaptic model. The sparsity of this computation influences Loihi's architecture.

The underlined part 2. represents the decay (Leak) of the membrane potential $u$ (internal state inherited from the previous time step). The membrane potential of the previous time step is multiplied by the decay coefficient $\beta [0:1]$.

The underlined 3. is the reset term. If there is a spike output in the previous time step (see below), the membrane potential is set to zero.

The underlined part 4. is the output decision of the spike. When the membrane potential $u$ exceeds the threshold $v_{th}$, the neuron outputs 1 (spike) (called "fire"). Otherwise, it outputs 0.

This is the basic behavior of the LIF model. In recent SNN research, $\beta$ and $v_{th}$ have become trainable parameters, and the reset method has changed from the old "hard-reset" method, in which the membrane potential is reset to zero without question, to "soft-reset", in which $v_{th}$ is subtracted from the membrane potential, but the general The operation is the same.

Next, we show how the coding method (Spike Coding) of how a sequence of spikes extending in the time direction represents information. Two representative categories are

Rate Coding ... The higher the frequency of the spike, the higher the value.
Temporal Coding ... The value (e.g., early is greater than late) in terms of spike firing time.

Convert digital data (e.g. pixels) and spike sequences using a certain algorithm.

Loihi System and Software

Loihi's Tip.

Loihi consists of asynchronous circuits without a clock, and 131072 discrete-time LIF neurons (i.e., circuits specialized for computing LIF models), divided into 128 cores. Each core has 128kB of memory to store synaptic weights and 20kB of memory to store neuron connections. Spikes are exchanged as 32-bit address information, that is, information about which synapse neuron they are sent to. Also, the microcode can choose from several plasticity rules to change the synaptic weights (i.e., it can learn in multiple ways). It also has an embedded x86 processor, which is used to convert between normal data and spikes.

The counterpoint to regular von Neumann processors (CPUs and GPUs) and machine learning accelerators (e.g., Google TPUs) in Loihi's architecture is that it is optimized for sparse, non-batch (batch size1) computation: memory accesses on Loihi's cores are always done close to the core and are also fragmented and discrete. (Author's note: GPUs and machine learning accelerators speed up processing by arranging data neatly and making memory accesses regular or vectorized.)

Loihi System

To connect Loihi to a conventional computer, it is necessary to bridge the asynchronous communication protocol in the neuromorphic chip with a standard synchronous protocol such as a host CPU. Since it was a demanding task for a small team to build this interface into the Loihi chip, this bridging was done in an FPGA (a chip that can be programmed with bit-by-bit circuitry).

Loihi Software

Since Loihi is a heterogeneous system spanning the neuromorphic core, embedded x86 processor, FPGA I/O interface, and host CPU, it requires its software framework, and a software tool called NxSDK has been developed to A software tool called NxSDK has been developed to provide an API, compiler, and debugging tools for programming Loihi. It also allows runtime monitoring and integration with third-party frameworks, such as Tensorflow and PyTorch.

Training method and quantitative evaluation of deep SNN

Deep learning is a natural starting point for SNN research. Given the great success of the approach of applying error backpropagation methods to differentiable ANNs, it is not unreasonable to assume that the same approach can be successful for SNNs. However, SNN chips such as Loihi are not designed to accelerate the computation of standard deep learning models such as MobileNet or ResNet; ANNs achieve their speedup by regularly performing sum-of-products operations on highly vectorized data, while SNNs are designed to be spatially and Because Loihi has the overhead to support sparse computation, each computation consumes more power than the sum-of-products operations of the machine learning accelerator for ANNs. Approximating the sum-of-products operation with spikes increases the number of computations, which is extra slow and consumes more power. In other words, it is not a good idea to imitate (or ignore) ANN when searching for an effective SNN model.

There are two major categories of deep learning-inspired deep SNN training methods. One is the online approach, where the model is trained using synaptic plasticity on neuromorphic hardware, and the other is the offline approach, where the trained model is created on CPU or GPU and deployed on neuromorphic hardware. The offline approach consists of a transformation approach that transforms the learned ANN into a nearly equivalent SNN and a direct learning approach that performs error backpropagation. The error backpropagation method requires some ingenuity to be applied.

深層SNNのトレーニング手法

Quantitative Comparison of Loihi and Conventional Computers by Transformation Approach

Details of the conversion approach

ANN to SNN conversion can be done by mapping weights from a trained ANN to an SNN with the same structure, and almost lossless conversion has been achieved in traditional image recognition tasks such as CIFAR and ImageNet. (Author's note: Most of them are based on the method of dividing all weights by the output value of the largest activation function in the layer. Also, the activation function is limited to ReLU.) In ANN to SNN conversion, it is common to express the activation of successive values of ANN as the frequency of spikes in SNN: ANN processes a single, static input vector, such as an image frame, in a series of dense operations, while SNN processes it in a series of sparse operations in multiple time steps or iterations. This temporal "unfolding" of computations can be a useful feature of SNNs. This kind of temporal "expansion" computation can be a useful feature of SNNs since it allows the trade-off between classification accuracy and inference latency to be dynamically adjusted. However, when using Rate Coding, it takes twice as many time steps to improve the accuracy of a signal by 1 bit, which requires exponentially more encoding time. frameworks that can be used.

ANN on Conventional Computers Before Conversion VS. SNN on Loihi after conversion

The dots on the figure show how many times the time (vertical axis)/energy consumption (horizontal axis) of running ANN on a conventional computer is compared to that of running SNN on Loihi. The dotted line is the break-even line, which indicates whether the Energy Delay Product (EDP) is small or not. (Author's note: No one wants a slow computer no matter how small its energy consumption is, so it cannot be said to have an advantage unless it beats a conventional computer in EDP, which can take both latency and energy into account.) The red markers are the comparison in the transformation approach. Tasks include keyword spotting, CIFAR image recognition with MobileNet, embedding generation for similarity search, and segmentation of ISBI cells with Modified U-Net. For almost all of these tasks, the energy efficiency is significantly improved (up to 100 times). As far as latency is concerned, we find that it is comparable to the reference hardware for small workloads but significantly slower than the reference hardware for large DNN workloads.

There are two main reasons why latency is getting worse when running on Loihi due to larger workloads.

As the number of layers increases, the time step required to achieve maximum accuracy increases
Large networks need to be distributed across multiple chips and inter-chip communication is congested

In many cases, the tasks that were effective on Loihi had a batch size of 1, indicating that it is suited for real-time tasks that respond to new data with low latency. However, low latency does not necessarily mean high throughput. Vectorized and pipelined architectures can achieve high throughput by processing many samples at once even if the latency is long (even if the response time is long, once the results start to appear, a large number of results are produced every cycle). The arrows in the figure show the performance improvement on CPU and GPU due to batching.

Quantitative comparison of Loihi and conventional computers using a direct learning approach.

Details of the conversion approach

The direct learning approach directly optimizes the parameters of the SNN using error backpropagation, where the SNN is formulated as an equivalent ANN with binary inputs, the nonlinearity of the ANN as a discontinuous spike generation function, and the temporal dynamics under the threshold of the membrane potential as a self-recurrent connection, respectively error backpropagation. Since the direct learning approach is guided by the emergence of spike timing coding (Temporal coding), which optimizes both latency and energy efficiency, the use of inputs whose information is encoded in the relative timing between input spikes, such as data generated by event-based sensors This is of particular interest. Even for more general tasks, Temporal coding can propagate information more efficiently than Rate coding, resulting in smaller spike counts, latency, and energy consumption.

It is very effective for small networks, but learning large networks becomes difficult. Since learning becomes a time credit allocation problem, we treat the SNN as an RNN and apply Backpropagation Through Time (BPTT): since a one-time step of the SNN corresponds to one execution of the RNN, the training time and memory footprint increase significantly compared to training a feed-forward ANN of the same size. the training time and memory footprint increase significantly compared to training a feed-forward ANN of the same size. In addition, the non-differentiability of the threshold function is avoided by approximating it with a proxy gradient, but the error accumulates as the network size increases.

ANN on Conventional Computers VS. direct learning SNN on Loihi

Loihi evaluates SNN workloads trained with three different error backpropagation methods, SLAYER, STDB, and BPTT. (Figure again)

The dynamic vision sensor (DVS, author's note: also called an event camera) learned in SLAYER. A system in which each pixel detects changes in luminance independently. Sony has also released one in the same category). The results show a 50x improvement in EDP over the execution on TrueNorth, a neuromorphic chip announced by IBM in 2014, when performing a gesture classification task on a The spike region facilitated multimodal processing, which included tactile character recognition and gesture classification combining surface EMG and visual data; STDB was used for robot navigation tasks, where Loihi was as fast as an edge GPU, but with lower power consumption, and BPTT was used to train a recurrent long-short memory SNN (LSNN), which has the same performance as a conventional LSTM. The LSNN was first applied to Sequential MNIST, and the EDP was improved by 6×104x/37x on Loihi by comparing LSTM on GPU with batch size 1/larger batch size and LSNN on Loihi, respectively. A large collection of LSNNs with interconnected networks trained with BPTT solved the relational inference problem from the bAbI question-and-answer dataset. This consumed 2320 Loihi cores and was the largest deep network that showed an advantage over conventional architectures.

On workloads of these various sizes, direct learning approaches consistently outperform conventional computation by orders of magnitude.

Quantitative Comparison of Loihi and Conventional Computers Using an Online Approach

So far we have seen offline approaches where learning is done on CPUs and GPUs, but online learning from streaming data is also desirable. However, error backpropagation, especially BPTT, is expensive to implement in terms of time, computation, and memory. To exploit error backpropagation for neuromorphic hardware, neuromorphic implementation-oriented simplifications of algorithms have been proposed. Several of them are under development for Loihi. As a first step, a "delta rule" for single-layer online learning has been demonstrated on Loihi. This further includes the proxy online error learning (SOEL) and the prescriptive error sensitivity (PES) method, which can learn new gestures online by learning the final layer of the DVS gesture recognition network with SOEL. EDP, the performance was about 100 times higher than that of the CPU.

LASSO can be solved by an attractor network with SNN

Unlike standard artificial neurons, spiking neurons have temporal behavior. This makes SNNs high-dimensional and highly nonlinear dynamical systems. The computations performed by the brain are the result of collective interactions between neurons and are characterized as emergent phenomena, like eddies in a stream. This is fundamentally different from the precise, comprehensive, sequentially formulated mode of operation of ANNs. Through feedback, adaptation, and interaction with the environment, brain neurons evolve as a group to behave in some desired way despite the uncertainty and nondeterminism of individual behaviors. In other words, in the case of SNNs, the computational scope of the study is much broader than that of ANNs and is grounded in collective dynamics (author's note: a branch of complex systems science ).

Attractor dynamics is the simplest form of Collective dynamics and yields useful and nontrivial computations. One important strategy for developing attractor-based SNN algorithms is to prove that the network satisfies a convergence guarantee to a particular well-defined equilibrium state, known as the Lyapunov condition. The equilibrium state of the network may be characterized in a mathematically closed way, as in the Locally Competitive Algorithm (LCA) described next, but even when this is not the case, as in the Dynamic Neural Fields (DNFs) described further on, the It may be possible to design behaviors that make the network intuitive to understand.

Locally Competitive Algorithm for LASSO

Here is an example of the simplest attractor network. A Hopfield network (author's note: can be used for associative memory) is a symmetric weight matrix with many neurons with all-to-all coupling. The dynamics of such a network satisfy the Lyapunov condition and converges to a fixed value corresponding to the minimum of the energy function (author's note: for example, in associative memory, it converges to the learned pattern closest to the input).

LCA is also one of the simplest networks to perform useful and nontrivial computations: in LCA, input signals are injected into neurons with inhibitory reciprocal recurrent connections. The balance between feed-forward input and recurrent inhibition induces competition within the network, and over time the system converges to the active set of features that best explain the input. If the parameters of the network are set according to LCA, the equilibrium state of the network will correspond exactly to the solution of the LASSO regression problem, which is widely used in statistics as a technique to reduce overfitting and identify sparse feature sets (sparse modeling).

Loihi's software framework NxSDK makes efficient use of on-chip memory by leveraging weight sharing and provides a compiler for convolutional LCA networks that supports networks consisting of millions of feature neurons. In (Figure), we compare the execution of LCA on Loihi and FISTA (a typical conventional algorithm) on the CPU. Both sparse code the input image by solving the same LASSO problem and show that the quality of the solution as measured by the LASSO objective function is the same: the objective value of Loihi LCA typically saturates at about 1% of the optimal value and a convergence threshold is set for the evaluation. This is an approximate solution but is sufficient for many applications.

The above figure shows that for the largest problem size of about $10^5$ unknowns (Region III), which is the largest LCA problem size that can be run on a single-chip Loihi, we see an advantage of up to 5 orders of magnitude in latency and up to 6 orders of magnitude in power consumption. In Region IV, the size approaches that typically seen in real-world applications in multi-chip configurations. Scaling is worse in this region due to inter-chip communication congestion, but it remains superior to the trend for CPUs in Regions I-III.

LCA is one of the best examples of how a fine-grained parallel algorithm with sparse activation achieves order-of-magnitude gains by exploiting the matching properties of neuromorphic architectures. Highly vectorized datapaths cannot take advantage of the properties of this algorithm in conventional architectures due to the large penalty imposed on bit-level data-dependent branching.

Dynamic Neural Fields for Object Tracking

DNFs provide modular algorithms for implementing states, relations, behaviors, and memories using attractor networks. For example, a particular network has all-to-all connectivity such that the most strongly stimulated state is persistently activated and suppresses the expression of other less stimulated states, as in a winner-take-all network. This is a similar structure to working memory. Neuroscience research has shown that DNFs model many cognitive processes that require working memory.

DNFs have been used as a programming framework for autonomous systems and cognitive robots, but their high computational cost has hindered their application to useful real-world tasks. The reasons for the cost are high recurrent connectivity and sparse activity, similar to LCA.

In Loihi, a two-layer 2D DNF network is implemented to track moving objects seen by a DVS event camera, which can reliably track objects with an accuracy of 3.5 pixels when processing 240x180 inputs on a 64x64 neural grid in real time. This could be used for visual odometry, SLAM pre-processing, focusing attention on objects in complex scenes, or visually tracking navigation targets.

Time-domain computation of SNN solves nearest neighbor and graph search

The results of the direct learning approach to deep learning suggest that optimizing the SNN for Spatio-temporal spike patterns improves efficiency and speed, whereas a gradient-based approach similar to ANN cannot, and a huge space that exploits spike timing relationships accurately, especially dynamic states, delayed coupling, plasticity and probability, which requires a broader view to exploring the space.

Recently, many handcrafted SNN algorithms have been proposed to solve well-defined computational problems using spike-based temporal information processing. These include computational primitives such as sorting, maximum, minimum, and median operations, various graph algorithms, NP-complete/hard problems (constraint satisfaction, Boolean satisfiability, dynamic programming, quadratic unconstrained binomial optimization), and a new Turing-complete framework. When implemented on neuromorphic hardware, these are expected to improve both speed and efficiency by exploiting fine-grained parallelism and event-based computation.

However, due to the immaturity of the hardware, evaluations of these proposed algorithms on real machines were few and rudimentary demonstrations with no reported latency or energy measurements; with the advent of Loihi, evaluations of these spike-based algorithms are now sufficiently large and With the advent of Loihi, evaluations of these spike-based algorithms can now be performed with sufficient scale and performance. Rigorous characterization and case study comparisons with traditional solutions are confirming the promise of delivering order-of-magnitude advantages so far.

recent search

An implementation in SNN time-domain computation of the approximate nearest neighbor search problem was prototyped as an efficient and scalable application running on "Pohoiki Springs", a system with 768 Loihi chips. In this implementation, the search query pattern is directly encoded in the relative time of a single spike wavefront distributed over all Loihi chips. This implementation can quickly identify the closest matches by computing the cosine similarity of the query to all data points distributed across the core of the system.

For regularized data points and query vectors, the cosine similarity corresponds to an inner product. This inner product operation can be performed by integral firing neurons as a transitive operation of multiple query spike arrivals. Each data point is mapped to the input weights of a single neuron, and one 8-bit synaptic weight is assigned to each data point dimension. When a particular broadcast input query is given as parallel spikes with different timing, neurons corresponding to sufficiently close data points generate output matching spikes corresponding to the inner product of the input query and the data points. The earlier output spikes represent stronger matches. Thus, the subsequent sorting task is simplified to just observing the spike generation order: if only k nearest-neighbor matches are required, the network may be stopped immediately after observing k output spikes.

Highly vectorized architectures such as GPUs and machine learning accelerators can efficiently compute the inner product of batched vectors using a rich set of sum-of-products operators, but they are extremely inefficient when performing the top-k sorting operations sequentially. On the other hand, the implementation in SNN only waits for the arrival of the earliest spike and does not consume any incremental energy or time, so the top-k sorting operation is performed almost free of charge. Furthermore, unlike conventional approximate nearest neighbor implementations, we can add data points with a complexity of $O(1)$ simply by physically adding neurons.

In Loihi's implementation, dimensionality reduction by Principal Component Analysis and Independent Component Analysis (PCA/ICA) is used to deal with arbitrary input data types while keeping the data dimensionality constant. PCA/ICA is also used to simultaneously project the query into a sparse representation suitable for efficiently encoding spikes. Since both the routing and synaptic memory resources of the query scale linearly with increasing dimensionality, a trade-off must be made between the accuracy of computing the inner product by Spatio-temporal spiking and the number of data points that can be stored in the system.

The implementation of Loihi k-NN was evaluated against other state-of-the-art approximate nearest neighbor algorithms on many standard datasets of 960 dimensions and 1M data points each. The evaluation metrics are latency, throughput, power, build time and insertion time of new data points. While we found algorithms that outperformed Loihi k-NN on individual metrics, Loihi k-NN recorded high performance on all metrics, performing 685 times better on EDP when compared to an equivalent brute-force inner product implementation on a CPU.

graph search

Time-domain spike computation can also be applied to pathfinding. It is inspired by the spiking wavefronts observed in the human hippocampus during pathfinding. Although other algorithms using wavefronts have been proposed, such as the classical Dijkstra method, the formulation in SNN promises superior performance due to its parallelism, time-domain computation, sparse spike activity, and local synaptic plasticity. Loihi has a simplified version of the Ponulak and A simplified version of the Hopfield algorithm is implemented in Loihi, in which synaptic plasticity is binarized. Instead, small (6-bit) positive edge weights are enhanced with synaptic delays.

Graph exploration on Loihi requires first partitioning the graph and mapping it to the physical cores of a multi-chip Loihi system. This compilation process by the host CPU takes several hours for a graph of one million nodes and is suitable for the iterative exploration of a single static graph. The source node is selected by the host CPU by sending spikes of corresponding neurons. Next, the search is initiated by stimulating the destination node (neuron) to fire. During spike propagation, each time a wavefront spike first reaches an intermediate neuron, the weights of the spiked connections are set to zero, leaving the connections facing in the opposite direction. When the search is complete, the host CPU reads the state of the network and discovers the shortest path by following the paths with non-zero weights.

Theoretical analysis of the search phase suggests that for large graphs, the asymptotic search time scale is $O(\sqrt{E})$. where $E$ is the number of edges in the graph. Even the optimized state-of-the-art Dijkstra method scales only linearly at best concerning $E$.

To ascertain the actual performance of a graph traversal implementation on Loihi, 1651 traversals between randomly selected nodes in a Watts-Strogatz small-world graph were evaluated. The number of nodes in the graph ranged from 100-1,000,000 and the number of edges per node ranged from 10-290. Such graphs were chosen because they are abundant in the real world as social, electrical, semantic, and logistical networks, are easy to synthesize, and stress the communication and sorting capabilities of both neuromorphic and traditional implementations. wavefront search times on Loihi were evaluated using Dijkstra methods with bounded integer edge cost-optimized Dial's algorithm and compared the search time with the CPU implementation of the algorithm. The search time results are shown below as a function of the total number of edges $E$.

The CPU implementation showed a nearly linear dependence on $E$ as theoretically expected, while the implementation on Loihi showed sub-linear characteristics as theoretically expected on small graphs, but near linear dependence on large graphs. This may be due to the inter-chip communication congestion dominating the search time. However, for all but the smallest graphs, Loihi was more than 100 times faster than the CPU.

stochastic conditional optimization

Spike time-domain computation can also be used to solve the NP-complete class of constraint satisfaction problems (CSPs), which involve finding admissible values for a set $X$ of variables satisfying a constraint $C$. The reason that CSPs are combinatorial NP-complete is that as the number of variables increases, the composition of possible solutions exponentially This is because the number of possible solution configurations explodes exponentially as the number of variables increases.

State-of-the-art algorithms for CSP are either systematic methods or stochastic Greedy methods. In the case of systematic strategies for complete solutions, they have exponential complexity in the worst case. In contrast, stochastic search strategies are not guaranteed to find a solution but have good scalability. The SNN implementation uses this cost function strategy and also utilizes a stochastic SNN governed by an energy function to solve the CSP.

$E = S^{T}(t) \cdot W \cdot S(t) = \sum_i \left( S_i \cdot \sum_j W_{ij} \cdot S_j \right) \dots (1)$

In (1), $S$ is the spike vector at an instant and $W$ is the synaptic weight matrix. The SNN is set up so that $W$ encodes $C$ and different values of the CSP variable $X$ are represented by a one-hot coded winner-take-all network. The fine timing dynamics of the stochastic SNN facilitate escape from the local minima and can find the global minima more effectively than the Boltzmann machine, even though they sample from the same distribution.

For such an SNN CSP solver to be practical, the SNN must not only visit the energy minimum state, but also be able to detect that it has been visited; the solver implemented in Loihi can compute the cost function in a distributed, event-based manner within the network, and only communicate with the host CPU when a solution below a set threshold is found. The solver implemented in Loihi can compute the cost function in a distributed event-based manner in the network and communicate with the host CPU only when a solution below a set threshold is found.

The performance of the solver was demonstrated and evaluated using the NP-complete Latin square problem. Figure (a) above illustrates the principle of operation of the solver: as a result of the stochastic dynamics of the winner-take-all network, some neurons fire earlier than others and suppress other conflicting neurons whose activity is inconsistent with their state. This process iteratively prunes the search space so that only the neurons that are not in conflict with each other, which ultimately correspond to the solution to the problem, remain active.

As a conventional solver for comparison, Coin or Branch and Cut (CBC) provided by the COIN-OR project were used. It uses an incomplete energy minimizer similar to the SNN solver and has the best performance among open source CSP solvers. Solving times and energy consumption were compared.

As shown in Figures (b)-(d), the Loihi solver is significantly faster and more energy efficient than the CPU reference, improving the EDP by at least three orders of magnitude over a wide range of problem sizes from 4 to 400 variables.

Although the Loihi solver, like other heuristic solvers, cannot guarantee the existence of a solution or always find all solutions, it found the optimal solution for the largest CSP implemented in a neuromorphic system to date, without exhausting the resources of the Loihi chip. Furthermore, the ability to find the solution step by step makes it particularly attractive for time-constrained applications. Relaxing the evaluation threshold of the cost function provides a trade-off mechanism between delay and accuracy.

These results show how the temporal dynamics of stochastic spikes can extend the computational space supported by SNNs, yielding surprisingly fast and efficient results. The field is still in its infancy, and a deeper understanding of the role of noise is being considered for further performance gains and application to other challenging computational problems.

Application

Beyond algorithm benchmarking, many promising applications are demonstrated at Loihi.

event-based sensing

Event-based sensing is rapidly developing as a sister technology to neuromorphic computing. In the case of vision sensors, changes in luminance are detected pixel by pixel, and asynchronous events are generated when the magnitude of the change exceeds a threshold value. Event-based sensing has great characteristics such as self-adaptability, low power consumption, low latency, and high dynamic range, but it is completely different from traditional frame-based computer vision and requires new processing algorithms and architectures to be applied to real-world applications. Architectures like Loihi can maintain their excellent properties by processing event-based spike data. Numeric recognition, visual and haptic fusion, visual and electromyographic information fusion, and online learning of gestures have been demonstrated. Future challenges are expected to be power cost and time resolution degradation due to increased bandwidth between the sensor and processor as the sensor resolution increases, and solutions are being investigated.

Odor Recognition and Learning

Neuromorphic technology has proven to be a good match for solving the unique technical challenges associated with odor sensing. Current odor sensors are unreliable and require frequent recalibration. The high level of noise inherent in this modality and the problem of occlusion, where elements hide from each other, make it difficult to process in edge computing. In addition, the diversity and natural variability of the world's smells require online learning and fine tuning in the field.

The neuroscience of biological smell perception has become relatively mature and could be useful for algorithm discovery. Indeed, recent neuroscience modeling has been further abstracted to a level where it can be implemented in Loihi. The algorithm was based on the unique features of SNNs that are not present in ANNs, as described in the previous chapters and two previous chapters. Given a single training sample of 10 chemicals, the algorithm can successfully classify test samples extracted from the same dataset, as well as noise-added samples. This is more than 40% better than previous algorithms involving seven layers of autoencoders. Furthermore, when training additional odors, the SNN algorithm did not degrade the classification accuracy of the previously trained odor classes, while catastrophic forgetting occurred for the autoencoder.

(Author's note: This means that learning and resistance to catastrophic forgetting were observed in a very small sample size. This is a property that is difficult to achieve with conventional ANNs.)

Closed-loop control for robotics

Closed-loop control is another exciting area for neuromorphic computing. Event-driven processing meshes with the temporal properties and low latency requirements of closed-loop control. Several approaches to motion control have been demonstrated on Loihi. The faster the processing, the faster the convergence to the target value.

PID control is implemented in SNN and the integral (I) term can be adapted online to reduce state-dependent perturbations. 6-DOF robot arm control is performed by Loihi, resulting in 4.6 and 43.2 times improvement in power efficiency and latency over CPU and GPU implementations, respectively. almost unchanged, and 42% slower on the GPU. Other implementations include drone control and insectoid robot control.

Simultaneous self-location estimation and mapping (SLAM)

SLAM is an important task in robotics, requiring

Fusion of sensor information to maintain the absolute position of agents (e.g. robots), path integration, and state estimation from onboard sensor information
Creating maps that store the location of objects of interest in the environment

Since there is an error in state estimation, detecting and reducing the error in map creation is an important problem. This problem is formulated as an optimization problem and becomes prohibitively computationally expensive in large-scale environments.

In one of the SNN implementations, DNFs are used for path integration and spike-based recursive Bayesian estimation is used to estimate the direction of the robot head; the implementation on Loihi achieves similar accuracy while consuming 100 times less power than the CPU.

the others

Implementations included an associative memory for learning new patterns, a radio frequency waveform adaptation algorithm for noisy environments, and an event-driven random walk solver for the thermal diffusion equation. Including those with no published results, numerous speech recognition applications have been investigated, including anomaly detection for security and fault monitoring, particle collider trajectory classification, brain-machine interfaces using myoelectric information and direct neural probing, and low power keyword spotting.

Future Outlook

deep network

From the results of the transform approach, ANNs transformed into deep SNNs can gain energy efficiency advantages, but suffer from long latency, especially for large problems spanning multiple chips. The drastic increase in the number of spikes makes the rate coding transformed SNNs unattractive for sparsity-oriented neuromorphic architectures, so we need to take other than transformed approaches. The direct learning approach shows significant latency improvement but is not suitable for training large networks, and the hybrid approach, which involves a transformation phase from ANN to SNN and then re-training in SNN, is expected to solve this problem. The hybrid approach is expected to solve this problem. It has been shown that the hybrid approach can reduce the inference time by a factor of 10 compared to the transform approach. Temporal coding using logarithmic time scales, which is highly compatible with transformation, is also being considered. Network compression is another important direction, which can save valuable energy, time, and memory resources for architectures with highly integrated memory and computation. This includes the opportunity to exploit the sparsity introduced by dimensional compression techniques and pruning in offline approaches. Approaches such as deep rewiring, which constantly recycles less important synapses to where they are most needed for the task while keeping sparsity stable, are well suited for memory-constrained online learning approaches.

online learning

Online single-layer error back propagation approximation algorithms such as SOEL and PES provide valuable examples that work within Loihi's constraints. In the future, more generalized algorithms could approximate BPTT without the causal inversion requirement of propagating information backward in time and successfully scale up online learning. However, these online approaches face hardware efficiency and convergence challenges due to their tendency to process training data in samples rather than in batches and to make network weight updates nonsparse. The need for a large number of samples is also a challenge for those error back propagation approximation algorithms.

The realization of supervised real-time online learning is a key challenge in the field of artificial intelligence in general and is still far from being realized in deep learning. Concerning the natural world, but also from the example of Loihi, it is expected that the focus will be more on modular and shallow learning algorithms of networks that form associations between different neuronal populations and distributed semantic representations.

Sensor Integration

Simply increasing the communication bandwidth between the sensor and the computing element to accommodate the higher resolution of the sensor leads to higher power consumption and worse temporal resolution. Neuromorphic chips are fundamentally incompatible with the legacy of traditional computing, which transmits high-density data synchronously. Therefore, the sensors themselves need to be reconfigured to an event-based paradigm. In addition, sufficient sparse coding must be performed instead of transmitting raw data from the sensor.

For example, LASSO has been considered unsuitable for sensor integration due to its heavy computation, but in the form of SNN LCA, it becomes almost costless (it shows 5 orders of magnitude improvement in maximum latency and 6 orders of magnitude improvement in energy consumption over conventional techniques). Other nonlinear transformations that mimic nature, such as those in the cochlea and retina, are expected to yield even greater gains.

In addition, three-dimensional vertical integration of sensors and neuromorphic chips could dramatically improve the power and latency of the interface between the two.

robotics

For decades, engineers and science fiction writers have foreseen robots that could navigate and interact in the real world, operating with autonomy and agility alongside humans. Despite significant developments in a variety of fields, such robots remain out of reach today. Intelligent control of such future robots will require a difficult integration between classical control theory, which relies on accurate models of the environment, and artificial intelligence, which bases such models of the environment on perception. Interacting with a dynamic and often unpredictable real-world environment remains difficult even for state-of-the-art robots, but humans do it effortlessly. Biological brains have evolved to solve just such tasks, and this is perhaps the most promising application of neuromorphic technology.

DNNs are the first choice for building robot vision systems because they can achieve the highest performance in computer vision. Even as it is now, latency and power consumption could be lowered by implementation on neuromorphic chips. However, new adaptive algorithms will be essential to meet the needs of real-world variability and unpredictability. Such algorithms will fit the characteristics of neuromorphic hardware and will eventually be realized by the co-development of algorithms and hardware.

Robust robotic systems also need to integrate multiple sensory modalities such as vision, hearing, and touch. Spikes can provide a unified language that can encode both temporal and spatial information that is important to the task across modalities. The dynamic nature of DNFs can also provide another unifying basic element that can create memory states in attractor networks that bridge different timescales of multiple senses. DNFs with recursive and feedback connections can provide top-down attentional control of sensory processing, thereby allowing us to focus computational resources on aspects most relevant to the task and ignore noise and occlusion.

SNN DNFs for three different senses have been successfully integrated into Loihi to control the interaction of a humanoid robot with its environment. While this result is far from a brain realization, it shows that relatively complex applications can already be built by composing heterogeneous modules using a common toolbox of spikes and attractor networks.

Planning, Optimization, Reasoning

Planning and reasoning (in this section Reasoning, reasoning with evidence) are arguably the most sophisticated and elusive capabilities of natural intelligent systems. Recent developments in deep learning have yielded significant gains in sub-symbolic inference and learning, but have not yielded the same gains for higher-order symbolic and analogical reasoning tasks. Vector Symbolic Architectures (VSA, author's note: also referred to as Hyperdimensional Computing) are rich knowledge representations in high-dimensional spaces that can be used to plan actions and optimize for high-level objectives. VSAs provide a pathway to fast, efficient, and scalable next-generation AI capabilities by interfacing deep networks with the optimization and search algorithms described in this survey. It can.

As a first step towards a scalable VSA framework, a spike-based topological associative memory network, TRAM, was implemented in Loihi. The efficiency and scalability of VSA are limited by associative memory with all-to-all coupling. In contrast, TPAM improves scalability by introducing sparsity in both network connectivity and activation.

programming model

With the promising results and the possibilities opened up by Loihi, one of the most pressing problems facing the neuromorphic field is that programming models and frameworks are fragmented and non-configurable mishmash. The published SNN development frameworks can be divided into three categories. The first is supervised learning frameworks for deep learning, the second is SNN simulators that provide low-level programming APIs for conventional architectures, and the third is low-level frameworks for neuromorphic hardware. While the increasing level of exploration in this area is encouraging, there is still no framework that provides a unified programming abstraction that covers the wide variety of algorithms being studied in this area. Such a framework will be needed to further promote the integration of heterogeneous modules.

economic viability

The tight integration of memory and computation in neuromorphic architectures is both a boon and a bane. The economics of the current semiconductor industry is highly optimized for the von Neumann architecture. The cost per bit of (regular non-embedded) DRAM is on the order of 100x cheaper than the cost per bit of the densest memory in the logic process. For workloads that require large amounts of memory, traditional architectures achieve optimal cost by partitioning the physical implementation between cheap memory and expensive logic. Neuromorphic architectures must choose between these processes depending on the integration of memory and computation, but since Loihi is more than half logic circuits, the expensive logic process is the only choice.

This inevitably places neuromorphic technology in a high-end niche for large workloads. To expand this niche, we first need to add value to small-scale problems. This means that the first commercial applications to emerge will be edge and sensor nodes and robotic systems.

In the long term, manufacturing innovations will be needed to reduce the cost of neuromorphic architectures. This could be brought about by new memories that are densely and cheaply integrated, such as a crossbar of resistive/magnetic/phase-change memory elements. However, truly analog memories (author's note: memories that can remember a very large number of values by today's memory standards. It is expected to have a significant increase in storage density per element and massively parallel operations in the analog domain in which the memory elements operate). ), creating the unique problem of needing to maintain its density and cost advantages without pushing many of the surrounding architectural elements into the analog domain.

Without breakthroughs in low-energy, high-speed, CMOS-integrated memory density, even if architectural, algorithmic, and software challenges are solved, the cost will limit widespread adoption in mainstream devices outside of small implementation targets.

summary

Many of the latency and energy consumption benchmark results discussed in this survey are plotted in the figure. When viewed in this unified manner, a clear trend emerges.

It is incompatible with rate-coded feedforward networks, and Loihi's performance is significantly degraded in large networks.
When spike timing is computationally significant, networks using error backpropagation or networks assembled from algorithms can provide significant gains.
The best performing workloads on Loihi are all highly recurrent networks

Other evaluation axes, such as throughput during batch processing or highest achievable accuracy, are less favored by Loihi. However, it is clear that for a considerable variety of workloads it can offer the advantage of being measured in terms of coefficients or digits rather than percentages.

Although it was a lengthy article, I think you could feel the great potential of neuromorphic computing. When it comes to computers based on new principles, only quantum computing has attracted attention, but such low-power, low-latency computers should be indispensable for running the next generation of AI in real life. In addition, the development of this kind of computing technology is expected to increase our knowledge of the intelligence, emotions, and consciousness of the human brain, which is an exciting prospect for the future.