Andrew Fairless, Ph.D.
/About/Bio
/Projects
/Reposts
/Tags
/Categories
Entries tagged :: transformer
.
2025-08-14
What I Read: Scale
2025-07-23
What I Read: Generative, latent
2025-07-22
What I Read: BM25F
2025-06-26
What I Read: Recommendation, LLMs
2025-06-18
What I Read: Domain specific architectures
2025-05-19
What I Read: chatbot limitations
2025-05-08
What I Read: short case, Nvidia
2025-04-30
What I Read: adaptive LLM
2025-04-29
What I Read: tensor dimensions, transformers
2025-04-17
What I Read: model merging
2025-03-06
What I Read: multimodal LLMs
2025-03-05
What I Read: LLMs, school math
2025-02-11
What I Read: Mamba, State Space Models
2025-01-27
What I Read: Transformers Inference Optimization
2025-01-21
What I Read: GenAI, Classify Text
2025-01-06
What I Read: embedding models
2024-12-19
What I Read: Toy Models, Superposition?
2024-12-16
What I Watch: How LLMs store facts
2024-10-28
What I Read: History, Transformer
2024-10-15
What I Read: Improving Language Models, Practical Size
2024-10-09
What I Read: Illustrated AlphaFold
2024-09-30
What I Read: Sliding Window Attention
2024-08-14
What I Read: Transformers by Hand
2024-07-08
What I Read: Ring Attention
2024-06-18
What I Read: Attention, transformers
2024-06-10
What I Read: Mamba Explained
2024-06-03
What I Read: Chain-of-Thought Reasoning
2024-05-14
What I Read: 1-bit LLMs, 1.58 Bits
2024-05-09
What I Read: Mamba, Easy Way
2024-05-01
What I Read: Mamba
2024-04-30
What I Read: Structured State Space Sequence Models
2024-04-03
What I Read: LLM Evaluation Metrics
2024-03-04
What I Read: Self-Attention in GPT
2024-02-21
What I Read: Research Directions
2024-02-13
What I Read: Limits of Transformers on Compositionality
2023-12-20
What I Read: Distributed Training, Finetuning
2023-11-30
What I Read: Visualizing Matrix Multiplication
2023-11-06
What I Read: Optimizing LLM in production
2023-10-30
What I Read: LLM Training, RLHF
2023-10-16
What I Read: To Understand Transformers, Focus on Attention
2023-10-10
What I Read: LLM research
2023-10-05
What I Read: Multimodal, Embeddings
2023-09-13
What I Read: Attention Off By One
2023-09-07
What I Read: LLMs
2023-09-06
What I Read: Accelerating PyTorch
2023-07-12
What I Read: What, Why ChatGPT
2023-07-11
What I Read: In-Context Learning
2023-06-29
What I Read: Against LLM
2023-05-30
What I Read: Few Shot, Recommenders, LLMs
2023-05-08
What I Read: Abilities Emerging From AI
2023-04-10
What I Read: Geometric Deep Learning
2023-03-16
What I Read: Transformer Inference Optimization
2023-02-27
What I Read: Realtime User Actions in Recommendation
2023-01-19
What I Read: Transformers Training
2022-11-29
What I Read: Illustrated Stable Diffusion
2022-11-16
What I Read: Productizing Large Language Models
2022-11-09
What I Read: Transformers, Brain
2022-10-06
What I Read: Emergent Features
2022-09-12
What I Read: BLOOM Training
2022-09-06
What I Read: Transformers in computer vision
2022-06-27
What I Read: Applying BERT to Speech
2022-05-24
What I Read: Understanding, Simple AI
2022-04-26
What I Read: Will Transformers Take Over Artificial Intelligence?
2022-04-13
What I Read: Textless NLP
2022-02-16
What I Read: To Understand Language is to Understand Generalization
2022-02-14
What I Read: Interpretable Time Series
2022-01-18
What I Learn: Meta-Learning, Keyphrase Extraction
2021-12-14
What I Read: Dense Vectors
2021-11-08
What I Read: How Train Large Deep Learning Models
2021-10-26
What I Read: How to Train Really Large Models
2021-10-05
What I Read: Permutation-Invariant Neural Networks for Reinforcement Learning
2021-09-14
What I Read: Systems for Machine Learning
2021-08-31
What I Read: Advances in TF-Ranking
2021-08-16
What I Read: Geometric, Deep Learning
2021-08-09
What I Read: Prompting, Language Models, NLP
2021-07-27
What I Read: Better computer vision models, Transformers, CNNs
2021-07-20
What I Read: Semantic Search
2021-07-01
What I Read: Contrastive Representation Learning
2021-03-29
What I Read: Reducing High Cost of Training NLP Models
2021-03-26
What I Read: Language Model Fine-tuning
2021-03-13
What I Read: Transformer Networks to Answer Questions About Images
2021-03-11
What I Read: Neural Text Generation
2021-03-10
What I Read: Why I’m lukewarm on graph neural networks
2021-03-09
What I Read: How Transformers work
2021-03-05
What I Read: Data-efficient image Transformers
2021-02-23
What I Read: Introduction to Graph Neural Networks
2021-02-18
What I Read: HuggingFace Transformers
2021-02-15
What I Read: Revisiting Sutton’s Bitter Lesson for AI
2021-02-07
What I Read: Attention with Performers
2021-02-06
What I Read: Deep Double Descent: Where Bigger Models and More Data Hurt
2021-01-31
What I Read: Transformers for Image Recognition
2021-01-21
What I Read: Transformer Architecture
2021-01-05
What I Read: Progress of Natural Language Processing
2020-12-28
What I Read: GPT-3, The New Mighty Language Model
2020-12-24
What I Read: Neural Networks to Find Answers in Tables
2020-12-23
What I Read: Common Sense Computers
2020-12-17
What I Read: Transformers Graph Neural Networks
2020-12-10
What I Read: Reformer efficient Transformer