Pruning Large Models

Recent findings reveal that many layers in large language models exhibit redundancy, suggesting that some contribute minimally to functionality. Researchers introduced a metric called block influence to assess layer significance, leading to a proposed method for model pruning through layer removal. This approach could make inference cheaper while enhancing efficiency, complementing existing methods like quantization.