Model Compression Articles | AI-SCHOLAR.TECH | AI-SCHOLAR | AI: (Artificial Intelligence) Articles and technical information media

Combining Speed And Accuracy: Quantization-aware LLM Pre-training "QAP

02/08/2025

Innovations In Outlier-Safe Pre-Training For Large Language Models To Prevent Outliers And Protect Quantization Accuracy

Innovations In Outlier-Safe Pre-Training For Large Language Models To Prevent Outliers And Protect Q ...

26/07/2025

GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation

01/07/2025

Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency

23/06/2025

Hymba, A New Architecture That Pushes The Limits Of Small LLMs

23/06/2025

Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation

22/06/2025

Cross-Layer Attention Significantly Reduces Transformer Memory

10/12/2024 Transformer

SA-FedLoRA] Communication Cost Reduction Methods For Federated Learning

06/12/2024 Medical

The Compressed Sensing Revolution: Automatic Validation Algorithms Prove Accuracy Of Neural Networks

24/10/2024 Neural Network

[BitNet B1.58] Achieved Accuracy Better Than Llama By Expressing Model Parameters In Three Values!

27/08/2024 Large Language Models

Apple's Efficient Inference Of Large Language Models On Devices With Limited Memory Capacity

29/01/2024 Large Language Models

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, Are Also Available!

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, A ...

16/11/2023 Transformer

Move The GAN With Your Phone! Combination Of Compression Techniques To Reduce Weight, 'GAN Slimming'

18/09/2020 GAN (Hostile Generation Network)

BERT For The Poor: A Technique To Reduce The Weight Of Complex Models Using Simple Techniques To Maximize Performance With Limited ...

BERT For The Poor: A Technique To Reduce The Weight Of Complex Models Using Simple Techniques To Max ...

23/05/2020 Pruning

Model Compression

Combining Speed And Accuracy: Quantization-aware LLM Pre-training "QAP

Combining Speed And Accuracy: Quantization-aware LLM Pre-training "QAP

Innovations In Outlier-Safe Pre-Training For Large Language Models To Prevent Outliers And Protect Quantization Accuracy

Innovations In Outlier-Safe Pre-Training For Large Language Models To Prevent Outliers And Protect Q ...

GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation

GenRecal, A General-purpose Distillation Framework For Lightweight, High-performance Distillation

Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency

Ultra-Sparse Memory Network: A New Method To Change Transformer Memory Efficiency

Hymba, A New Architecture That Pushes The Limits Of Small LLMs

Hymba, A New Architecture That Pushes The Limits Of Small LLMs

Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation

Stable Flow: Visualization Of The "really Important Layers" Behind Image Generation

Cross-Layer Attention Significantly Reduces Transformer Memory

Cross-Layer Attention Significantly Reduces Transformer Memory

SA-FedLoRA] Communication Cost Reduction Methods For Federated Learning

SA-FedLoRA] Communication Cost Reduction Methods For Federated Learning

The Compressed Sensing Revolution: Automatic Validation Algorithms Prove Accuracy Of Neural Networks

The Compressed Sensing Revolution: Automatic Validation Algorithms Prove Accuracy Of Neural Networks

[BitNet B1.58] Achieved Accuracy Better Than Llama By Expressing Model Parameters In Three Values!

[BitNet B1.58] Achieved Accuracy Better Than Llama By Expressing Model Parameters In Three Values!

Apple's Efficient Inference Of Large Language Models On Devices With Limited Memory Capacity

Apple's Efficient Inference Of Large Language Models On Devices With Limited Memory Capacity

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, Are Also Available!

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, A ...

How Does Pruning Of The ImageNet Pre-training Model Work In Downstream Tasks?

How Does Pruning Of The ImageNet Pre-training Model Work In Downstream Tasks?

Architectural Exploration Method For Neural Nets Running On IoT Devices

Architectural Exploration Method For Neural Nets Running On IoT Devices

Model Compression For Unconditional-GAN

Model Compression For Unconditional-GAN

Dropout Layers, Not Weights Or Nodes! "LayerDrop" Proposal

Dropout Layers, Not Weights Or Nodes! "LayerDrop" Proposal

Move The GAN With Your Phone! Combination Of Compression Techniques To Reduce Weight, 'GAN Slimming'

Move The GAN With Your Phone! Combination Of Compression Techniques To Reduce Weight, 'GAN Slimming'

BERT For The Poor: A Technique To Reduce The Weight Of Complex Models Using Simple Techniques To Maximize Performance With Limited ...

BERT For The Poor: A Technique To Reduce The Weight Of Complex Models Using Simple Techniques To Max ...