Mac mini (M4), GMKtec EVO-X2 & Tiiny AI Pocket Lab: local LLMs (30B / 70B / 120B)

ANIMApril 21, 20269 min read

This article compares three separate products people discuss for local LLMs:

  1. Apple Mac mini (2024)M4 or M4 Pro; desktop Mac, macOS, unified memory.
  2. GMKtec EVO-X2 AI Mini PCAMD Ryzen AI Max+ 395 (Strix Halo class), factory Windows 11, up to 128 GB soldered RAM.
  3. Tiiny AI Pocket Lab — US startup Tiiny AI (spelled Tiiny, double i); very compact hardware, CES 2026, Kickstarter (estimated August 2026 delivery). It is not Tiny Corp (tinybox / tinygrad) and not a Mac mini substitute.

Alongside hardware and community-reported speeds, we checked whether exo can realistically pool capacity today — based on what upstream actually documents.

Three different trade-offs

Mac mini is sold as a small desktop, but for AI it is really the macOS + unified memory story — convenient for Ollama or MLX, usually quieter than many PCs, with a hard 64 GB ceiling on M4 Pro in this generation.

EVO-X2 is the opposite bet: maximum soldered RAM in a mini PC plus a strong iGPU, but you live in the Windows / Linux / driver matrix; many benchmarks you read are Ubuntu + Ollama, not the same experience “out of the box” on Windows.

Tiiny is the third wave — portability plus “big models in a small shell.” That can be compelling for travel or demos, but crowdfunding plus aggressive marketing ranges mean: exciting for early adopters, riskier as your only production machine.

Model size is not the same as “GB on the spec sheet”

Headline parameter counts (30B, 70B…) depend on architecture (dense vs. MoE), quantization, context (KV cache), and software. The tables below mix vendor specs with numbers other people published — always validate against your model and prompts.

Official or publicly stated hardware

ItemMac mini (M4)Mac mini (M4 Pro)GMKtec EVO-X2 (Ryzen AI Max+ 395)Tiiny AI Pocket Lab
Form factorDesktop miniDesktop miniMini PC (GMKtec: ~193 × 185.8 × 77 mm)Pocket / ultra-compact (press often cites ~14.2 × 8 × 2.5 cm, ~300 g — e.g. Geeky Gadgets)
CPU10-core (4P + 6E)12-core (8P + 4E), up to 14C/20GPU16 cores / 32 threads, up to 5.1 GHz (GMKtec)12-core ARMv9.2 (campaign / spec summaries)
GPU / acceleration10-core GPU16–20-core GPURadeon 8060S, 40 RDNA 3.5 CUs (GMKtec)Emphasis on NPU / integrated AI; materials cite on the order of ~190 TOPS (verify INT8 vs. FP definitions)
Neural / NPU16-core Neural Engine16-core Neural EngineNPU up to 50 TOPS (XDNA 2), up to 126 TOPS SoC claim (GMKtec)Similar ~190 TOPS order of magnitude in vendor materials
Memory (max)16 → 32 GB unified24 → 64 GB unified64 / 96 / 128 GB LPDDR5X-8000, soldered80 GB LPDDR5X + 1 TB NVMe (Kickstarter, Geeky Gadgets)
Memory bandwidth (Apple)120 GB/sup to 273 GB/sLPDDR5X 8000 MHz (GMKtec)Detailed theoretical bandwidth — confirm in official docs when available
OSmacOSmacOSWindows 11 Pro (GMKtec)Often macOS / Windows as host alongside the device — confirm workflow before buying

Sources: Apple Support — Mac mini (2024), GMKtec — EVO-X2, Micro Center — EVO X2, Kickstarter — Tiiny, tiiny.ai.

Pricing we can cite

DeviceSource
Mac miniApple’s Oct 29, 2024 press release: from $599 (M4, 16 GB) and from $1,399 (M4 Pro). Local EUR — your country’s Apple Store.
GMKtec EVO-X2GMKtec’s site has shown ~$1,999.99 for 64 GB + 1 TB; verify live 96/128 GB SKUs. TechPowerUp (April 2026) discusses pricing shifts for top configs.
TiinyKickstarter: $1,999 MSRP, tiers such as $1,399 / $1,599 / $1,799; estimated Aug 2026 delivery. tiiny.ai FAQ — deposits and refund terms.

A Kickstarter reward is not the same as retail shelf stock; schedules and details can move.

Tokens per second (tok/s) — what the community actually measures

The same machine can report different tok/s if you only change Ollama version, MLX vs Metal path, quant, or context length. The matrix below compiles other people’s measurements, not an ANIM lab run — use it as directional signal, not a guarantee.

Workload (example)Mac mini M4 (16–24 GB in sources)Mac mini M4 Pro (64 GB)EVO-X2 (64–128 GB; often Ubuntu + Ollama)Tiiny (pre-release / demo)
~7–8B Q4~18–30 tok/s (vminstall, CraftRigs; YouTube Tech-Practice ~20 tok/s for Qwen2.5-Coder 7B)Higher bandwidth than base M4 → usually faster on the same model~28–45 tok/s (CraftRigs, 128 GB review)YouTube ~~26.8 tok/s in one CLI run; wider ~~18–40 band in press (AOL)
~14B Q4~18–22 tok/s (CraftRigs)More unified RAM headroom~18–22 tok/s (CraftRigs)
~30–32B Q416 GB often too tight; 32 GB marginal~10–16 tok/s (vminstall, Like2Byte)~7–12 tok/s (CraftRigs)YouTube ~19.6 tok/s on one “Qwen” bench row (transcript says “330B” — likely a verbal slip)
~70B Q4_K_MNot practical @32 GB max~3–5 tok/s (r/LocalLLaMA, M4 Pro 64 GB)~4–8 tok/s (CraftRigs 128 GB Ubuntu)
gpt-oss ~20B (MoE)Rarely quoted in the same sources~33–65 tok/s depending on run (Nish Tahir)YouTube ~22.4 tok/s
GPT-OSS ~120B (MoE)Not the natural Mac mini tierNish Tahir — wide spread by tool and contextYouTube ~12.4 tok/s average in bench UI; long context in the table drags speed

Go deeper: Tech-Practice — M4 + Ollama, M1 / M3 Pro / M4, Ollamometer + Strix Halo, ETA Prime — EVO-X2, Tiiny hands-on; forums: r/LocalLLaMA, Hacker News — Ollama vs LM Studio; blogs: vminstall, Like2Byte, CraftRigs, Tom’s Hardware. For Tiiny architecture questions: remio.ai.

Editorial take: vendor claims vs. what you feel

Apple does not publish official LLM tok/s for Mac mini — which is honest in its own way: fewer fake-precision catalog numbers, more dependence on RAM tier and backend choice. GMKtec’s product page mixes TOPS, LM Studio comparisons, and per-SKU model lists — useful as what the company wants you to read, but purchase decisions should still lean on third-party tests and your own prompts.

GMKtec’s “LLM support” table pairs e.g. 32B with 64 GB, GPT-OSS 120B with 96 GB, 70B with 128 GB (plus other names as on their page). Those are marketing pairings, not guarantees of speed or answer quality.

Tiiny’s 120B messaging almost always implies MoE / a specialized stack (TurboSparse, PowerInfer in their materials). The question is not whether the startup is “right” — it is that “120B on the box” ≠ the same engineering problem as a dense 120B in FP16.

Which machine for 30B, 70B, and 120B?

~30B: M4 Pro 64 GB is a reasonable Apple pick; M4 32 GB is marginal. EVO-X2’s 64 GB SKU is formally aligned with 32B-class claims; 96/128 GB adds KV headroom. Tiiny (80 GB) makes sense if pocket + experimentation matters more than predictable desktop ergonomics.

~70B: M4 (32 GB max) is not comfortable for Q4 70B. M4 Pro (64 GB) enters “maybe, model-dependent, patience required.” EVO-X2 128 GB has the most soldered RAM in this trio; GMKtec ties 70B to that SKU. Tiiny — still too little stable public evidence to rank beside the other two for production 70B.

~120B: Mac mini is not the natural home for dense 120B. EVO-X2 96 GB+ lists GPT-OSS 120B — a specific model family, not every “120B” card on the internet. Tiiny: same story — read exact model + quant, not the slogan.

Exo clustering

exo joins multiple machines for inference (MLX on Apple Silicon, peer discovery; the README also mentions RDMA over Thunderbolt 5 between Mac nodes). It is not a drop-in replacement for single-machine Ollama if the model already fits — Exo shines when you shard what would not fit in one box.

DeviceExo today (per README)
Mac mini, macOSYes — primary documented path with MLX / Metal. For RDMA clusters, the README expects nodes to be fully connected — plan Thunderbolt / wired networking.
EVO-X2, WindowsDo not assume official Exo support — Windows is not listed as a first-class platform in the README.
EVO-X2, LinuxLimited: README states Exo on Linux is CPU-only today, with GPU support under development — Strix Halo iGPU does not accelerate Exo on Linux until that lands.
TiinyUnknown — not in Exo docs; without vendor/upstream confirmation, do not plan production Exo nodes on it.

Heterogeneous Mac + PC: backends must align; this DEV article illustrates how painful MLX CUDA ring setups can get. If Exo is not a fit, llama.cpp RPC is a different architecture — not the same thing.

Common follow-ups about Exo

Windows and Exo? The README documents macOS and Linux; for Windows you typically look at another stack or dual-boot.

Is Exo always faster than one Mac mini? No — network latency and orchestration cost you. If the model fits locally, one machine is often nicer for chat.

Do two Mac minis “replace” one 128 GB unified-memory Mac? Exo shards layers across nodes; it does not create one continuous unified RAM pool like a single M4 Max / Studio-class machine.

Networking? README + practice: wired (Gigabit / 10GbE) or Thunderbolt; Wi‑Fi is a poor default for decode latency.

Bottom line

  • Want quiet macOS and a predictable stackMac mini; for serious LLM work inside Apple’s world, bias toward M4 Pro with as much unified memory as you can afford.
  • Want maximum soldered RAM in a Windows mini PCEVO-X2, with eyes open that drivers and OS are part of the entry price.
  • Want portability and early hardware accessTiiny, with Kickstarter risk and the need to re-verify claims once units reach buyers.

Methodology: ANIM did not independently lab-bench these systems. This article combines Apple and GMKtec documentation, Kickstarter / tiiny.ai materials, public benchmarks (YouTube, Reddit, blogs), the exo README, and the analyses linked above. Always verify live prices, tax/duty, and warranty with the seller.

Tags:local LLMMac miniGMKtecTiiny AIExoAI clustertokens per secondbenchmarkApple SiliconStrix HaloKickstarterhardware

Need help with this topic?

ANIM offers free assessments for small and medium businesses. Get in touch and let's discuss your needs.

Free assessment