Wiki
We have compiled hundreds of related entries to help you understand "artificial intelligence"
Mel-frequency cepstrum is a widely used technique in the field of sound processing, especially in speech recognition and speaker identification.
Dijkstra's algorithm is a classic algorithm for finding the shortest path from a single source in a graph.
WISE technology aims to combat hallucination phenomena in large language models and improve the model's knowledge memory editing capabilities.
DuoAttention optimizes memory and computing resources by applying a full KV cache for retrieval headers and a lightweight, fixed-length KV cache for streaming headers.
Instead of pursuing a one-to-one correspondence with real objects, digital cousins focus on similar geometric and semantic qualities, thereby generating practical training data at a lower cost.
DAPE stands for Data-Adaptive Positional Encoding, a new positional encoding method proposed by Zheng Chuanyang and others from the Chinese University of Hong Kong. The research team also includes researchers from the National University of Singapore, Noah Lab, the University of Hong Kong, and Hong Kong Baptist University. […]
SparseLLM is a new global pruning framework proposed by researchers from Emory University and Argonne National Laboratory in 2024. The related paper is “SparseLLM: Towards Global Pruning of Pre-trai […]
Diff Transformer calculates two independent softmax attention maps and then takes the difference to get the final attention score. This method can effectively eliminate attention noise and prompt the model to pay more attention to the most relevant parts of the input.
UNA stands for Unified Alignment Framework, a new alignment framework proposed by a research team from Salesforce and Xiamen University. The related paper is “UNA: Unifying Alignments of […]
Swarm is an experimental multi-agent framework developed by OpenAI in 2024 that aims to simplify the construction, orchestration, and deployment of multi-agent systems. Swarm focuses on making agent collaboration and execution lightweight, highly controllable, and easy to test. The core of Swarm […]
Michelangelo is a method proposed by DeepMind researchers in 2024 to evaluate the reasoning ability of large language models in long text contexts. It uses a framework called Latent Structure Queries (LSQ) […]
The Halting Problem is an important problem in the theory of computability in logic and mathematics, proposed by British mathematician Alan Turing in 1936. The relevant paper is Turing’s famous paper “On Computable Numbers”.
When the model starts generating data during training that is far from the true data distribution, the performance of the model will drop drastically, eventually rendering the model output meaningless.
The Hopfield network is a recurrent neural network that is mainly used for problems such as associative memory and pattern recognition.
Reward error reduction refers to the problem in reinforcement learning (RL) caused by the reward function not fully matching the agent’s true goal.
Sequential recommendation system is an important type of recommendation system, whose main task is to predict the user's next behavior based on the user's historical behavior sequence.
R-MFDN enhances the model’s sensitivity to forged content through cross-modal contrastive learning loss function and identity-driven contrastive learning loss function.
The Karel puzzle is a set of problems that involve controlling a robot's actions in a simulated environment through instructions.
Fully Forward Mode (FFM) is a method for training optical neural networks. It was proposed by the research team of Academician Dai Qionghai and Professor Fang Lu of Tsinghua University in 2024. The relevant paper is “Fully forward mode training […]
The Busy Beavers game is a theoretical computer science problem proposed in 1962 by mathematician Tibor Radó.
The working principle of RNN is to store the information of previous time steps through the state of the hidden layer, so that the output of the network depends on the current input and the previous state.
ResNet effectively solves the gradient vanishing and gradient exploding problems that occur as the network depth increases by adding residual connections in the network.
Adam is an algorithm for first-order gradient optimization, which is particularly suitable for handling optimization problems with large-scale data and parameters.
The core technology of the GPT model is the Transformer architecture, which effectively captures contextual information through the self-attention mechanism.
Mel-frequency cepstrum is a widely used technique in the field of sound processing, especially in speech recognition and speaker identification.
Dijkstra's algorithm is a classic algorithm for finding the shortest path from a single source in a graph.
WISE technology aims to combat hallucination phenomena in large language models and improve the model's knowledge memory editing capabilities.
DuoAttention optimizes memory and computing resources by applying a full KV cache for retrieval headers and a lightweight, fixed-length KV cache for streaming headers.
Instead of pursuing a one-to-one correspondence with real objects, digital cousins focus on similar geometric and semantic qualities, thereby generating practical training data at a lower cost.
DAPE stands for Data-Adaptive Positional Encoding, a new positional encoding method proposed by Zheng Chuanyang and others from the Chinese University of Hong Kong. The research team also includes researchers from the National University of Singapore, Noah Lab, the University of Hong Kong, and Hong Kong Baptist University. […]
SparseLLM is a new global pruning framework proposed by researchers from Emory University and Argonne National Laboratory in 2024. The related paper is “SparseLLM: Towards Global Pruning of Pre-trai […]
Diff Transformer calculates two independent softmax attention maps and then takes the difference to get the final attention score. This method can effectively eliminate attention noise and prompt the model to pay more attention to the most relevant parts of the input.
UNA stands for Unified Alignment Framework, a new alignment framework proposed by a research team from Salesforce and Xiamen University. The related paper is “UNA: Unifying Alignments of […]
Swarm is an experimental multi-agent framework developed by OpenAI in 2024 that aims to simplify the construction, orchestration, and deployment of multi-agent systems. Swarm focuses on making agent collaboration and execution lightweight, highly controllable, and easy to test. The core of Swarm […]
Michelangelo is a method proposed by DeepMind researchers in 2024 to evaluate the reasoning ability of large language models in long text contexts. It uses a framework called Latent Structure Queries (LSQ) […]
The Halting Problem is an important problem in the theory of computability in logic and mathematics, proposed by British mathematician Alan Turing in 1936. The relevant paper is Turing’s famous paper “On Computable Numbers”.
When the model starts generating data during training that is far from the true data distribution, the performance of the model will drop drastically, eventually rendering the model output meaningless.
The Hopfield network is a recurrent neural network that is mainly used for problems such as associative memory and pattern recognition.
Reward error reduction refers to the problem in reinforcement learning (RL) caused by the reward function not fully matching the agent’s true goal.
Sequential recommendation system is an important type of recommendation system, whose main task is to predict the user's next behavior based on the user's historical behavior sequence.
R-MFDN enhances the model’s sensitivity to forged content through cross-modal contrastive learning loss function and identity-driven contrastive learning loss function.
The Karel puzzle is a set of problems that involve controlling a robot's actions in a simulated environment through instructions.
Fully Forward Mode (FFM) is a method for training optical neural networks. It was proposed by the research team of Academician Dai Qionghai and Professor Fang Lu of Tsinghua University in 2024. The relevant paper is “Fully forward mode training […]
The Busy Beavers game is a theoretical computer science problem proposed in 1962 by mathematician Tibor Radó.
The working principle of RNN is to store the information of previous time steps through the state of the hidden layer, so that the output of the network depends on the current input and the previous state.
ResNet effectively solves the gradient vanishing and gradient exploding problems that occur as the network depth increases by adding residual connections in the network.
Adam is an algorithm for first-order gradient optimization, which is particularly suitable for handling optimization problems with large-scale data and parameters.
The core technology of the GPT model is the Transformer architecture, which effectively captures contextual information through the self-attention mechanism.