<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        <title><![CDATA[Alex Dremov]]></title>
        <description><![CDATA[I write about AI research, code, and algorithms]]></description>
        <link>https://alexdremov.me</link>
        <image>
            <url>https://alexdremov.me/favicon.png</url>
            <title>Alex Dremov</title>
            <link>https://alexdremov.me</link>
        </image>
        <lastBuildDate>Thu, 21 May 2026 12:52:52 +0200</lastBuildDate>
        <atom:link href="https://alexdremov.me" rel="self" type="application/rss+xml"/>
        <ttl>60</ttl>

                <item turbo="true">
                    <title><![CDATA[ Rethinking Quantization-Aware Training: Why Your QAT Length is Probably Wrong ]]></title>
                    <description><![CDATA[ Training quantized neural networks involves a fundamental trade-off: how should you divide your compute budget between full-precision pretraining and quantization-aware training? ]]></description>
                    <link>https://alexdremov.me/rethinking-quantization-aware-training-why-your-qat-length-is-probably-wrong/</link>
                    <guid isPermaLink="false">690348816c1b7b00215d6dd0</guid>
                    <category><![CDATA[ Machine Learning ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 30 Oct 2025 21:37:42 +0100</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2025/10/qat-optimality.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Training quantized neural networks typically involves two phases: full-precision (FP) pretraining followed by quantization-aware training (QAT). The conventional approach allocates about 10% of the training budget to QAT. But recent research at Apple shows this ratio is far from optimal, especially at scale.</p><p>In extreme cases, using the wrong QAT fraction can waste up to 50% of your compute budget. Moreover, what QAT bit-width should you pick given a fixed memory budget? Here's what we found after running ~800 experiments across different model sizes and training lengths.</p><h2 id="the-resource-allocation-problem">The Resource Allocation Problem</h2><p>When training QAT models, you face a fundamental trade-off: given a fixed compute budget, how should you divide training time between full-precision pre-training and quantization-aware training?</p><p>More FP training gives you a better starting checkpoint. More QAT training gives the model more time to adapt to quantization. Previous work <a href="https://arxiv.org/abs/2502.02631?ref=alexdremov.me" rel="noreferrer">(Liu et al., 2025)</a> suggested 10% QAT was optimal but didn't explore how this changes with scale.</p><h2 id="key-observations">Key Observations</h2><p>We trained models from 86M to 2.2B parameters across token counts ranging from billions to trillions, testing 1-bit through 6-bit QAT to see how performance changes.</p><h3 id="the-optimal-qat-fraction-increases-with-scale">The Optimal QAT Fraction Increases With Scale</h3><p>What we discover is that the optimal QAT fraction isn't fixed at 10%. It grows with your total compute budget, ranging from 10-15% for small-scale training to 55% or even more for large-scale training.</p><p><strong>The intuition.</strong> Longer full-precision training packs more and more information in high-precision bits, making subsequent quantization harder. Therefore, the model needs more QAT steps to adapt to the precision loss. In fact, not just proportionally more steps but the portion itself starts to grow.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Another intuition idea is from the <b><strong style="white-space: pre-wrap;">optimization perspective</strong></b>: QAT training uses gradient approximations, which negatively impact convergence. Therefore, we want to have as few QAT steps as possible to not waste compute on a sub-optimal optimization process.</div></div><h2 id="predicting-optimal-fractions-from-tokens-per-parameter-byte">Predicting Optimal Fractions From Tokens-Per-Parameter-Byte</h2><p>The optimal QAT fraction can be predicted using the tokens-per-parameter-byte statistic.</p><p>$$S_{\text{total}} = \frac{D_{\text{total}}}{N \cdot \frac{B}{8}},$$</p><p>where \(D_{\text{total}}\) is the total number of tokens, \(N\) is the parameter count, and \(B\) is the QAT bit-width. This metric captures several key insights:</p><ul><li>Larger models are easier to quantize (higher \(N\) → lower \(S_{\text{total}}\)</li><li>Models trained longer are harder to quantize (higher \(D_{\text{total}}\) → higher \(S_{\text{total}}\))</li><li>Lower bit-widths are harder to quantize (lower \(B\) → higher \(S_{\text{total}}\))</li></ul><p>We achieve a low mean absolute error in predicting optimal QAT fractions across all experiments by using such a simple predictor:</p>
<!--kg-card-begin: html-->
$$\widehat{f}(D_\text{total}, N, B) = \frac{\exp\left(\log{S_\text{total}} - \frac{6.7297}{\log{S_\text{total}}}\right)}{S_\text{total}}.$$
<!--kg-card-end: html-->
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-17.49.53.png" class="kg-image" alt="" loading="lazy" width="1708" height="1176" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-17.49.53.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-17.49.53.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/10/Screenshot-2025-10-30-at-17.49.53.png 1600w, https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-17.49.53.png 1708w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">QAT optima for 396M model plotted in tokens-per-parameter-byte coordinates for different bit-widths</span></figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">While this formula performs well, it is fitted only on <b><strong style="white-space: pre-wrap;">optimal QAT data points.</strong></b> This ignores many non-optimal data points, which also contain useful information about loss behavior.<br><br>To capture full information, we can try predicting loss directly.</div></div><h2 id="loss-scaling-law">Loss Scaling Law</h2><p>As noted, we moved to deriving a comprehensive loss scaling law that models final loss as a function of parameter count (\(N\)), full-precision tokens (\(D_{\text{fp}}\)), QAT tokens (\(D_{\text{qat}}\)), and bit-width (\(B\)). It not only predicts the final model's performance but also captures the observed phenomena of optimal QAT fraction:</p>
<!--kg-card-begin: html-->
$$L(N, D_\text{qat}, D_\text{fp}, B) = \underbrace{
  \alpha + \frac{\beta}{D_{\text{total}}^{\gamma}} + \frac{\zeta}{N^{\eta}}
}_{
  \text{Chinchilla-like loss}
}
+
\underbrace{
    \delta(N, D_\text{qat}, D_\text{fp}, B)
}_{
  \text{QAT fraction-aware penalty}
},$$
<!--kg-card-end: html-->

<!--kg-card-begin: html-->
$$\delta(N, D_\text{qat}, D_\text{fp}, B) =
\underbrace{
  \theta \cdot 2^{- \kappa \cdot B}}_{
    \text{Irreducible QAT error}
} +
\underbrace{
  \frac{\phi \cdot 2^{- \chi \cdot B}}{N^{\psi} \cdot S_{\text{qat}}^{\omega}}}_{
    \text{Pure QAT penalty}
}
+ \underbrace{
  \frac{\lambda \cdot 2^{- \mu \cdot B}}{N^{\nu} \cdot S_{\text{fp}}^{\xi} \cdot S_{\text{qat}}^{\rho}}
}_{
  \text{FP / QAT interaction}
}.$$
<!--kg-card-end: html-->
<p>The QAT penalty term includes:</p><ul><li><strong>Irreducible QAT error</strong>: Baseline penalty dependent on bit-width</li><li><strong>Pure QAT penalty</strong>: Loss that decreases with more QAT training</li><li><strong>FP/QAT interaction</strong>: Captures how FP training length affects QAT difficulty</li></ul><p>The scaling law achieves $R^2 = 0.982-0.991$ across different bit-widths. Moreover, we can infer the optimal QAT fraction for a given compute by finding a minimum point with $D_\text{qat} + D_\text{fp} = const$. That's how the loss plot looks:</p><p></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-18.49.34.png" class="kg-image" alt="" loading="lazy" width="1418" height="1210" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-18.49.34.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-18.49.34.png 1000w, https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-18.49.34.png 1418w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Visualization of fitted loss scaling law for 759M model, 1-bit QAT, and different \(D_\text{qat}\), \(D_\text{fp}\). Orange lines represent constant \(D_\text{total} = D_\text{qat} + D_\text{fp}\) levels, and stars represent loss minima for each such level. It is clearly seen that the loss structure yields an optimal QAT fraction for a specific \(D_\text{total}\).</span></figcaption></figure><p>You can try exploring the scaling law through the following interactive plot:</p>
<!--kg-card-begin: html-->
<section class="loss-plot toc-ignore">
<div class="container">
  <div id="canvas-container"></div>
  
  <div class="controls">
        
      <div class="control-group">
          <h3 style="margin-bottom: 15px; color: var(--color-accent); font-size: 16px;">Dfp Range</h3>
          <div class="control-label">
              <span class="control-name">Min - Max</span>
              <span class="control-value" id="dfp-range-value">100B - 10T</span>
          </div>
          <div style="margin-bottom: 8px;">
              <label style="font-size: 12px; color: var(--color-text-secondary);">Min:</label>
              <input type="range" id="dfp-min-slider" min="0" max="100" value="20" step="1">
          </div>
          <div>
              <label style="font-size: 12px; color: var(--color-text-secondary);">Max:</label>
              <input type="range" id="dfp-max-slider" min="0" max="100" value="90" step="1">
          </div>
      </div>
      
      <div class="control-group">
          <h3 style="margin-bottom: 15px; color: var(--color-accent); font-size: 16px;">Dqat Range</h3>
          <div class="control-label">
              <span class="control-name">Min - Max</span>
              <span class="control-value" id="dqat-range-value">100B - 10T</span>
          </div>
          <div style="margin-bottom: 8px;">
              <label style="font-size: 12px; color: var(--color-text-secondary);">Min:</label>
              <input type="range" id="dqat-min-slider" min="0" max="100" value="20" step="1">
          </div>
          <div>
              <label style="font-size: 12px; color: var(--color-text-secondary);">Max:</label>
              <input type="range" id="dqat-max-slider" min="0" max="100" value="90" step="1">
          </div>
      </div>
      <div class="control-group" style="grid-column: 1 / -1;">
          <h3 style="margin-bottom: 15px; color: var(--color-accent); font-size: 16px;">Model Parameters</h3>
          <div class="control-label">
              <span class="control-name">N (Parameters)</span>
              <span class="control-value" id="n-value">1.00B</span>
          </div>
          <input type="range" id="n-slider" min="0" max="100" value="25" step="1">
          
          <div class="control-label" style="margin-top: 15px;">
              <span class="control-name">B (Bit-width)</span>
              <span class="control-value" id="b-value">4</span>
          </div>
          <input type="range" id="b-slider" min="1" max="8" value="2" step="1">
      </div>
  </div>
  
  <p class="info">Drag to rotate • Scroll to zoom • Adjust sliders to explore the scaling law</p>
</div>
</section>

<script src="https://cdn.jsdelivr.net/npm/three@0.128.0/build/three.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/three@0.128.0/examples/js/controls/OrbitControls.js"></script>
<script>
  // Scaling law formula coefficients
  const CONSTANTS = {
      constant: 1.598,
      dtotal_coeff: 2477.0,
      dtotal_power: 0.4089,
      n_coeff: 57.64,
      n_power: 0.2148,
      b_const_coeff: 0.4297,
      b_const_power: -1.41,
      b_sqat_coeff: 1091.0,
      b_sqat_power: -1.212,
      n_sqat_power: 0.4004,
      sqat_power1: 0.076,
      b_final_coeff: 138.8,
      b_final_power: -0.0833,
      n_final_power: 0.2135,
      sfp_power: 0.4819,
      sqat_power2: 0.1903
  };

  // Parameter ranges
  const N_MIN = 100e6;
  const N_MAX = 1e12;
  const B_MIN = 1;
  const B_MAX = 8;
  
  // Compute token ranges
  const COMPUTE_MIN = 1e9;  // 1N
  const COMPUTE_MAX = 20e12;  // 20T
  const DFP_DEFAULT_MIN = 100e9;  // 100B
  const DFP_DEFAULT_MAX = 10e12;  // 10T
  const DQAT_DEFAULT_MIN = 100e9;  // 100B
  const DQAT_DEFAULT_MAX = 10e12;  // 10T
  
  // Surface resolution
  const GRID_POINTS = 50;

  // Three.js setup
  let scene, camera, renderer, controls;
  let surfaceMesh, gridHelper;
  let minLoss = Infinity;
  let maxLoss = -Infinity;
  let axisTicksGroup = new THREE.Group();
  let isoTokenLinesGroup = new THREE.Group();
  let isoTokenOptimalPointsGroup = new THREE.Group();
  let optimalPointLabelsGroup = new THREE.Group();

  function init() {
      const container = document.getElementById('canvas-container');
      
      // Scene
      scene = new THREE.Scene();
      // scene.background = new THREE.Color(0x0a0e14);
      
      // Camera
      camera = new THREE.PerspectiveCamera(
          45,
          container.clientWidth / container.clientHeight,
          0.1,
          1000
      );
      camera.position.set(40, 35, 40);
      
      // Renderer
      renderer = new THREE.WebGLRenderer(
        { antialias: true, alpha: true}
      );
      renderer.setSize(container.clientWidth, container.clientHeight);
      renderer.setPixelRatio(window.devicePixelRatio);
      renderer.setClearColor(0x000000, 0);
      container.appendChild(renderer.domElement);
      
      // Controls
      controls = new THREE.OrbitControls(camera, renderer.domElement);
      controls.enableDamping = true;
      controls.dampingFactor = 0.05;
      
      // Lights
      const ambientLight = new THREE.AmbientLight(0xffffff, 0.6);
      scene.add(ambientLight);
      
      const directionalLight = new THREE.DirectionalLight(0xffffff, 0.8);
      directionalLight.position.set(10, 10, 10);
      scene.add(directionalLight);
      
      // Grid helper
      gridHelper = new THREE.GridHelper(40, 20, 0x32b8c6, 0x1a1a1a);
      gridHelper.position.y = 0;
      scene.add(gridHelper);
      
      // Add axes
      addAxes();
      
      // Add ticks group to scene
      scene.add(axisTicksGroup);
      
      // Add iso-token groups to scene
      scene.add(isoTokenLinesGroup);
      scene.add(isoTokenOptimalPointsGroup);
      scene.add(optimalPointLabelsGroup);
      
      // Create initial surface
      updateSurface();
      
      // Handle window resize
      window.addEventListener('resize', onWindowResize);
      
      // Animation loop
      animate();
  }

  function addAxes() {
      const axisLength = 25;
      const origin = new THREE.Vector3(-20, 0, -20);
      
      // X axis (Dfp) - RED
      const xAxis = new THREE.ArrowHelper(
          new THREE.Vector3(1, 0, 0),
          origin,
          axisLength,
          0xffffff,
          2,
          1
      );
      scene.add(xAxis);
      
      // Y axis (Loss) - BLUE
      const yAxis = new THREE.ArrowHelper(
          new THREE.Vector3(0, 1, 0),
          origin,
          axisLength,
          0xffffff,
          2,
          1
      );
      scene.add(yAxis);
      
      // Z axis (Dqat) - GREEN
      const zAxis = new THREE.ArrowHelper(
          new THREE.Vector3(0, 0, 1),
          origin,
          axisLength,
          0xffffff,
          2,
          1
      );
      scene.add(zAxis);
      
      // Add axis labels
      addAxisLabel('Dfp', new THREE.Vector3(-20 + axisLength + 3, 0, -20), 0xffffff);
      addAxisLabel('Perplexity', new THREE.Vector3(-20, axisLength + 3, -20), 0xffffff);
      addAxisLabel('Dqat', new THREE.Vector3(-20, 0, -20 + axisLength + 3), 0xffffff);
  }
  
  function addAxisLabel(text, position, color) {
      // Create canvas for text
      const canvas = document.createElement('canvas');
      const context = canvas.getContext('2d');
      canvas.width = 256;
      canvas.height = 128;
      
      // Draw text
      context.fillStyle = '#' + color.toString(16).padStart(6, '0');
      context.font = 'Bold 48px Arial';
      context.textAlign = 'center';
      context.textBaseline = 'middle';
      context.fillText(text, 128, 64);
      
      // Create texture
      const texture = new THREE.CanvasTexture(canvas);
      
      // Create sprite material
      const material = new THREE.SpriteMaterial({ 
          map: texture,
          transparent: true
      });
      
      // Create sprite
      const sprite = new THREE.Sprite(material);
      sprite.position.copy(position);
      sprite.scale.set(6, 3, 1);
      
      scene.add(sprite);
  }

  function removeAxisTicks() {
      // Remove all existing ticks and labels
      while(axisTicksGroup.children.length > 0) {
          const child = axisTicksGroup.children[0];
          axisTicksGroup.remove(child);
          if (child.geometry) child.geometry.dispose();
          if (child.material) {
              if (child.material.map) child.material.map.dispose();
              child.material.dispose();
          }
      }
  }

  function createAxisTicks(axisDirection, values, labels, axisColor, origin, perpDir1, perpDir2) {
      const tickLength = 0.8;
      
      for (let i = 0; i < values.length; i++) {
          const t = values[i];
          const label = labels[i];
          
          // Calculate position along axis
          let tickPos = origin.clone();
          if (axisDirection === 'x') {
              tickPos.x += t;
          } else if (axisDirection === 'y') {
              tickPos.y += t;
          } else if (axisDirection === 'z') {
              tickPos.z += t;
          }
          
          // Create tick mark (small line perpendicular to axis)
          const tickGeometry = new THREE.BufferGeometry();
          const tickStart = tickPos.clone().add(perpDir1.clone().multiplyScalar(-tickLength/2));
          const tickEnd = tickPos.clone().add(perpDir1.clone().multiplyScalar(tickLength/2));
          tickGeometry.setAttribute('position', new THREE.Float32BufferAttribute([
              tickStart.x, tickStart.y, tickStart.z,
              tickEnd.x, tickEnd.y, tickEnd.z
          ], 3));
          
          const tickMaterial = new THREE.LineBasicMaterial({ color: axisColor });
          const tickLine = new THREE.Line(tickGeometry, tickMaterial);
          axisTicksGroup.add(tickLine);
          
          // Create label
          const canvas = document.createElement('canvas');
          const context = canvas.getContext('2d');
          canvas.width = 256;
          canvas.height = 128;
          
          context.fillStyle = '#' + axisColor.toString(16).padStart(6, '0');
          context.font = 'Bold 40px Arial';
          context.textAlign = 'center';
          context.textBaseline = 'middle';
          context.fillText(label, 128, 64);
          
          const texture = new THREE.CanvasTexture(canvas);
          const spriteMaterial = new THREE.SpriteMaterial({ 
              map: texture,
              transparent: true
          });
          
          const sprite = new THREE.Sprite(spriteMaterial);
          const labelOffset = perpDir1.clone().multiplyScalar(2.5);
          sprite.position.copy(tickPos).add(labelOffset);
          sprite.scale.set(4, 2, 1);
          
          axisTicksGroup.add(sprite);
      }
  }

  function addAxisTicksForCurrentSurface(N, minPerplexity, maxPerplexity, dfpMin, dfpMax, dqatMin, dqatMax) {
      // Remove old ticks
      removeAxisTicks();
      
      const origin = new THREE.Vector3(-20, 0, -20);
      const numTicks = 5;
      
      // X-axis (Dfp) ticks - log spaced from N*5 to N*50
      const dfpValues = logSpace(dfpMin, dfpMax, numTicks);
      const dfpPositions = [];
      const dfpLabels = [];
      
      for (let i = 0; i < numTicks; i++) {
          // Map from data space to visual space (0 to 40)
          const logPos = (Math.log10(dfpValues[i]) - Math.log10(dfpMin)) / 
                         (Math.log10(dfpMax) - Math.log10(dfpMin));
          dfpPositions.push(logPos * 40);
          dfpLabels.push(formatNumber(dfpValues[i]));
      }
      
      createAxisTicks('x', dfpPositions, dfpLabels, 0xff0000, origin, 
                     new THREE.Vector3(0, -1, 0), new THREE.Vector3(0, 0, 1));
      
      // Z-axis (Dqat) ticks - log spaced from N*5 to N*50
      const dqatValues = logSpace(dqatMin, dqatMax, numTicks);
      const dqatPositions = [];
      const dqatLabels = [];
      
      for (let i = 0; i < numTicks; i++) {
          const logPos = (Math.log10(dqatValues[i]) - Math.log10(dqatMin)) / 
                         (Math.log10(dqatMax) - Math.log10(dqatMin));
          dqatPositions.push(logPos * 40);
          dqatLabels.push(formatNumber(dqatValues[i]));
      }
      
      createAxisTicks('z', dqatPositions, dqatLabels, 0x00ff00, origin,
                     new THREE.Vector3(-1, 0, 0), new THREE.Vector3(0, 1, 0));
      
      // Y-axis (Perplexity) ticks - linear spaced from minPerplexity to maxPerplexity
      const perplexityRange = maxPerplexity - minPerplexity;
      const perplexityPositions = [];
      const perplexityLabels = [];
      
      if (perplexityRange > 0) {
          const scaleY = 20 / perplexityRange;
          for (let i = 0; i < numTicks; i++) {
              const t = i / (numTicks - 1);
              const perplexityValue = minPerplexity + t * perplexityRange;
              perplexityPositions.push(t * 20);
              perplexityLabels.push(perplexityValue.toFixed(2));
          }
      } else {
          // Edge case: all perplexities are the same
          perplexityPositions.push(0);
          perplexityLabels.push(minPerplexity.toFixed(2));
      }
      
      createAxisTicks('y', perplexityPositions, perplexityLabels, 0x0000ff, origin,
                     new THREE.Vector3(-1, 0, 0), new THREE.Vector3(0, 0, 1));
  }

  function computeLoss(Dqat, Dfp, N, B) {
      // Avoid division by zero
      if (N <= 0 || B <= 0 || Dqat <= 0 || Dfp <= 0) {
          return Infinity;
      }
      
      const Dtotal = Dfp + Dqat;
      const Sqat = Dqat / (N * B / 8);
      const Sfp = Dfp / (N * B / 8);
      
      // Handle edge cases
      if (Sqat <= 0 || Sfp <= 0) {
          return Infinity;
      }
      
      // L(Dqat, Dfp, N, B) formula
      const term1 = CONSTANTS.constant;
      const term2 = CONSTANTS.dtotal_coeff / Math.pow(Dtotal, CONSTANTS.dtotal_power);
      const term3 = CONSTANTS.n_coeff / Math.pow(N, CONSTANTS.n_power);
      const term4 = CONSTANTS.b_const_coeff * Math.pow(2, CONSTANTS.b_const_power * B);
      const term5 = CONSTANTS.b_sqat_coeff * Math.pow(2, CONSTANTS.b_sqat_power * B) / 
                   (Math.pow(N, CONSTANTS.n_sqat_power) * Math.pow(Sqat, CONSTANTS.sqat_power1));
      const term6 = CONSTANTS.b_final_coeff * Math.pow(2, CONSTANTS.b_final_power * B) / 
                   (Math.pow(N, CONSTANTS.n_final_power) * Math.pow(Sfp, CONSTANTS.sfp_power) * 
                    Math.pow(Sqat, CONSTANTS.sqat_power2));
      
      return term1 + term2 + term3 + term4 + term5 + term6;
  }

  function logSpace(min, max, count) {
      const logMin = Math.log10(min);
      const logMax = Math.log10(max);
      const step = (logMax - logMin) / (count - 1);
      const result = [];
      
      for (let i = 0; i < count; i++) {
          result.push(Math.pow(10, logMin + step * i));
      }
      
      return result;
  }

  function linSpace(min, max, count) {
      const step = (max - min) / (count - 1);
      const result = [];
      
      for (let i = 0; i < count; i++) {
          result.push(min + step * i);
      }
      
      return result;
  }

  function getColorForLoss(loss, minLoss, maxLoss) {
      const normalized = (loss - minLoss) / (maxLoss - minLoss);
      const color = new THREE.Color();
      
      // Blue -> Cyan -> Green -> Yellow -> Red
      if (normalized < 0.25) {
          const t = normalized / 0.25;
          color.setRGB(0, t, 1);
      } else if (normalized < 0.5) {
          const t = (normalized - 0.25) / 0.25;
          color.setRGB(0, 1, 1 - t);
      } else if (normalized < 0.75) {
          const t = (normalized - 0.5) / 0.25;
          color.setRGB(t, 1, 0);
      } else {
          const t = (normalized - 0.75) / 0.25;
          color.setRGB(1, 1 - t, 0);
      }
      
      return color;
  }

  function updateSurface() {
      // Get current N and B values
      const nSlider = document.getElementById('n-slider');
      const bSlider = document.getElementById('b-slider');
      
      const N = logValue(parseFloat(nSlider.value) / 100, N_MIN, N_MAX);
      const B = parseInt(bSlider.value);
      
      // Get compute ranges from sliders (logarithmic scale)
      const dfpMinSliderVal = parseFloat(document.getElementById('dfp-min-slider').value);
      const dfpMaxSliderVal = parseFloat(document.getElementById('dfp-max-slider').value);
      const dqatMinSliderVal = parseFloat(document.getElementById('dqat-min-slider').value);
      const dqatMaxSliderVal = parseFloat(document.getElementById('dqat-max-slider').value);
      
      // Convert from slider position (0-100) to actual compute values (logarithmic)
      const dfpMin = logValue(dfpMinSliderVal / 100, COMPUTE_MIN, COMPUTE_MAX);
      const dfpMax = logValue(dfpMaxSliderVal / 100, COMPUTE_MIN, COMPUTE_MAX);
      const dqatMin = logValue(dqatMinSliderVal / 100, COMPUTE_MIN, COMPUTE_MAX);
      const dqatMax = logValue(dqatMaxSliderVal / 100, COMPUTE_MIN, COMPUTE_MAX);
      
      // Generate grid (25x25 points)
      const dfpValues = logSpace(dfpMin, dfpMax, GRID_POINTS);
      const dqatValues = logSpace(dqatMin, dqatMax, GRID_POINTS);
      
      // Compute losses and find min/max for color mapping
      const losses = [];
      minLoss = Infinity;
      maxLoss = -Infinity;
      let minPerplexity = Infinity;
      let maxPerplexity = -Infinity;
      
      for (let i = 0; i < GRID_POINTS; i++) {
          losses[i] = [];
          for (let j = 0; j < GRID_POINTS; j++) {
              const loss = computeLoss(dqatValues[j], dfpValues[i], N, B);
              if (isFinite(loss)) {
                  losses[i][j] = loss;
                  minLoss = Math.min(minLoss, loss);
                  maxLoss = Math.max(maxLoss, loss);
                  const perplexity = Math.exp(loss);
                  minPerplexity = Math.min(minPerplexity, perplexity);
                  maxPerplexity = Math.max(maxPerplexity, perplexity);
              } else {
                  losses[i][j] = maxLoss;
              }
          }
      }
      
      // Create geometry
      const geometry = new THREE.BufferGeometry();
      const vertices = [];
      const colors = [];
      const indices = [];
      
      const scaleX = 40 / (GRID_POINTS - 1);
      const scaleZ = 40 / (GRID_POINTS - 1);
      const scaleY = maxPerplexity > minPerplexity ? 20 / (maxPerplexity - minPerplexity) : 1;
      
      // Create vertices with colors
      for (let i = 0; i < GRID_POINTS; i++) {
          for (let j = 0; j < GRID_POINTS; j++) {
              const x = i * scaleX - 20;
              const z = j * scaleZ - 20;
              const loss = losses[i][j];
              const perplexity = Math.exp(loss);
              const y = (perplexity - minPerplexity) * scaleY;
              
              vertices.push(x, y, z);
              
              const color = getColorForLoss(loss, minLoss, maxLoss);
              colors.push(color.r, color.g, color.b);
          }
      }
      
      // Create faces
      for (let i = 0; i < GRID_POINTS - 1; i++) {
          for (let j = 0; j < GRID_POINTS - 1; j++) {
              const a = i * GRID_POINTS + j;
              const b = i * GRID_POINTS + (j + 1);
              const c = (i + 1) * GRID_POINTS + (j + 1);
              const d = (i + 1) * GRID_POINTS + j;
              
              indices.push(a, b, d);
              indices.push(b, c, d);
          }
      }
      
      geometry.setAttribute('position', new THREE.Float32BufferAttribute(vertices, 3));
      geometry.setAttribute('color', new THREE.Float32BufferAttribute(colors, 3));
      geometry.setIndex(indices);
      geometry.computeVertexNormals();
      
      const material = new THREE.MeshPhongMaterial({
          vertexColors: true,
          side: THREE.DoubleSide,
          shininess: 30,
          flatShading: false
      });
      
      // Remove old mesh
      if (surfaceMesh) {
          scene.remove(surfaceMesh);
          surfaceMesh.geometry.dispose();
          surfaceMesh.material.dispose();
      }
      
      // Add new mesh
      surfaceMesh = new THREE.Mesh(geometry, material);
      scene.add(surfaceMesh);
      
      // Add wireframe
      const wireframe = new THREE.WireframeGeometry(geometry);
      const line = new THREE.LineSegments(wireframe);
      line.material.color.setHex(0x222222);
      line.material.opacity = 0.3;
      line.material.transparent = true;
      surfaceMesh.add(line);
      
      // Add axis ticks with labels
      addAxisTicksForCurrentSurface(N, minPerplexity, maxPerplexity, dfpMin, dfpMax, dqatMin, dqatMax);
      
      // Add iso-token lines and optimal points
      addIsoTokenLines(N, B, dfpMin, dfpMax, dqatMin, dqatMax, minLoss, maxLoss, minPerplexity, maxPerplexity, scaleX, scaleZ, scaleY);
  }

  function addIsoTokenLines(N, B, dfpMin, dfpMax, dqatMin, dqatMax, minLoss, maxLoss, minPerplexity, maxPerplexity, scaleX, scaleZ, scaleY) {
      // Clear previous iso-token lines and points
      while(isoTokenLinesGroup.children.length > 0) {
          const child = isoTokenLinesGroup.children[0];
          isoTokenLinesGroup.remove(child);
          if (child.geometry) child.geometry.dispose();
          if (child.material) child.material.dispose();
      }
      
      while(isoTokenOptimalPointsGroup.children.length > 0) {
          const child = isoTokenOptimalPointsGroup.children[0];
          isoTokenOptimalPointsGroup.remove(child);
          if (child.geometry) child.geometry.dispose();
          if (child.material) child.material.dispose();
      }
      
      // Clear previous optimal point labels
      while(optimalPointLabelsGroup.children.length > 0) {
          const child = optimalPointLabelsGroup.children[0];
          optimalPointLabelsGroup.remove(child);
          if (child.material && child.material.map) child.material.map.dispose();
          if (child.material) child.material.dispose();
      }
      
      // Define iso-total-token levels (10 levels)
      const numLevels = 10;
      const minTotalTokens = dfpMin + dqatMin;
      const maxTotalTokens = dfpMax + dqatMax;
      
      // Logarithmic spacing for iso-token levels
      const logMinTotal = Math.log(minTotalTokens);
      const logMaxTotal = Math.log(maxTotalTokens);
      const logStep = (logMaxTotal - logMinTotal) / (numLevels - 1);
      
      const isoTokenLevels = [];
      for (let i = 0; i < numLevels; i++) {
          isoTokenLevels.push(Math.exp(logMinTotal + i * logStep));
      }
      
      // For each iso-token level, find optimal point and draw line
      for (let levelIdx = 0; levelIdx < isoTokenLevels.length; levelIdx++) {
          const totalTokens = isoTokenLevels[levelIdx];
          
          // Sample along the line Dfp + Dqat = totalTokens
          const numSamples = 1000;
          const dfpSamples = [];
          const dqatSamples = [];
          const lossSamples = [];
          
          // Sample Dfp from dfpMin to totalTokens, compute Dqat = totalTokens - Dfp
          for (let i = 0; i < numSamples; i++) {
              const t = i / (numSamples - 1);
              const dfp = dfpMin + t * (totalTokens - dfpMin);
              const dqat = totalTokens - dfp;
              
              // Check if point is valid (within bounds)
              if (dqat >= dqatMin && dqat <= dqatMax && dfp >= dfpMin && dfp <= dfpMax) {
                  const loss = computeLoss(dqat, dfp, N, B);
                  
                  if (isFinite(loss)) {
                      dfpSamples.push(dfp);
                      dqatSamples.push(dqat);
                      lossSamples.push(loss);
                  }
              }
          }
          
          if (dfpSamples.length === 0) continue;
          
          // Find optimal point (minimum loss)
          let minLossIdx = 0;
          let minLossValue = lossSamples[0];
          for (let i = 1; i < lossSamples.length; i++) {
              if (lossSamples[i] < minLossValue) {
                  minLossValue = lossSamples[i];
                  minLossIdx = i;
              }
          }
          
          const optimalDfp = dfpSamples[minLossIdx];
          const optimalDqat = dqatSamples[minLossIdx];
          const optimalLoss = lossSamples[minLossIdx];
          
          // Convert to 3D coordinates for visualization
          const lineVertices = [];
          for (let i = 0; i < dfpSamples.length; i++) {
              // Map from data space to visual space
              const logDfp = Math.log10(dfpSamples[i]);
              const logDqat = Math.log10(dqatSamples[i]);
              const logDfpMin = Math.log10(dfpMin);
              const logDfpMax = Math.log10(dfpMax);
              const logDqatMin = Math.log10(dqatMin);
              const logDqatMax = Math.log10(dqatMax);
              
              const xNorm = (logDfp - logDfpMin) / (logDfpMax - logDfpMin);
              const zNorm = (logDqat - logDqatMin) / (logDqatMax - logDqatMin);
              
              const x = xNorm * 40 - 20;
              const z = zNorm * 40 - 20;
              const perplexity = Math.exp(lossSamples[i]);
              const y = (perplexity - minPerplexity) * scaleY;
              
              lineVertices.push(x, y, z);
          }
          
          // Create line geometry
          const lineGeometry = new THREE.BufferGeometry();
          lineGeometry.setAttribute('position', new THREE.Float32BufferAttribute(lineVertices, 3));
          
          const lineMaterial = new THREE.LineBasicMaterial({ 
              color: 0xffa500, // Orange
              linewidth: 5,
              transparent: true,
              opacity: 0.9,
              depthTest: true
          });
          
          const line = new THREE.Line(lineGeometry, lineMaterial);
          isoTokenLinesGroup.add(line);
          
          // Add optimal point as a star (sphere for now)
          const logOptDfp = Math.log10(optimalDfp);
          const logOptDqat = Math.log10(optimalDqat);
          const logDfpMin = Math.log10(dfpMin);
          const logDfpMax = Math.log10(dfpMax);
          const logDqatMin = Math.log10(dqatMin);
          const logDqatMax = Math.log10(dqatMax);
          
          const xNorm = (logOptDfp - logDfpMin) / (logDfpMax - logDfpMin);
          const zNorm = (logOptDqat - logDqatMin) / (logDqatMax - logDqatMin);
          
          const optX = xNorm * 40 - 20;
          const optZ = zNorm * 40 - 20;
          const optimalPerplexity = Math.exp(optimalLoss);
          const optY = (optimalPerplexity - minPerplexity) * scaleY;
          
          // Create star shape using icosahedron
          const starGeometry = new THREE.IcosahedronGeometry(0.5, 0);
          const starMaterial = new THREE.MeshPhongMaterial({ 
              color: 0x800080, // Purple
              emissive: 0x400040,
              shininess: 100,
              transparent: true,
              opacity: 0.95
          });
          
          const starMesh = new THREE.Mesh(starGeometry, starMaterial);
          starMesh.position.set(optX, optY, optZ);
          
          // Add a small glow around the star
          const glowGeometry = new THREE.IcosahedronGeometry(0.7, 0);
          const glowMaterial = new THREE.MeshBasicMaterial({
              color: 0xff00ff,
              transparent: true,
              opacity: 0.3
          });
          const glowMesh = new THREE.Mesh(glowGeometry, glowMaterial);
          glowMesh.position.set(optX, optY, optZ);
          
          isoTokenOptimalPointsGroup.add(glowMesh);
          isoTokenOptimalPointsGroup.add(starMesh);
          
          // Add percentage label for this optimal point
          const qatPercentage = (optimalDqat / (optimalDqat + optimalDfp)) * 100;
          addOptimalPointLabel(qatPercentage.toFixed(1) + '%', optX, optY + 1.5, optZ);
      }
  }
  
  function addOptimalPointLabel(text, x, y, z) {
      // Create canvas for text
      const canvas = document.createElement('canvas');
      const context = canvas.getContext('2d');
      canvas.width = 256;
      canvas.height = 128;
      
      // Draw semi-transparent background
      context.fillStyle = 'rgba(128, 0, 128, 0.0)';
      context.roundRect = function(x, y, w, h, r) {
          if (w < 2 * r) r = w / 2;
          if (h < 2 * r) r = h / 2;
          this.beginPath();
          this.moveTo(x+r, y);
          this.arcTo(x+w, y, x+w, y+h, r);
          this.arcTo(x+w, y+h, x, y+h, r);
          this.arcTo(x, y+h, x, y, r);
          this.arcTo(x, y, x+w, y, r);
          this.closePath();
          return this;
      };
      context.roundRect(40, 30, 176, 68, 10).fill();
      
      // Draw text
      context.fillStyle = '#ffffff';
      context.font = 'Bold 80px Arial';
      context.textAlign = 'center';
      context.textBaseline = 'middle';
      context.fillText(text, 128, 64);
      
      // Create texture
      const texture = new THREE.CanvasTexture(canvas);
      
      // Create sprite material
      const material = new THREE.SpriteMaterial({ 
          map: texture,
          transparent: true,
          depthTest: false
      });
      
      // Create sprite
      const sprite = new THREE.Sprite(material);
      sprite.position.set(x, y, z);
      sprite.scale.set(3, 1.5, 1);
      
      optimalPointLabelsGroup.add(sprite);
  }

  function animate() {
      requestAnimationFrame(animate);
      controls.update();
      renderer.render(scene, camera);
  }

  function onWindowResize() {
      const container = document.getElementById('canvas-container');
      camera.aspect = container.clientWidth / container.clientHeight;
      camera.updateProjectionMatrix();
      renderer.setSize(container.clientWidth, container.clientHeight);
  }

  // Slider utilities
  function logValue(normalized, min, max) {
      const logMin = Math.log10(min);
      const logMax = Math.log10(max);
      return Math.pow(10, logMin + normalized * (logMax - logMin));
  }

  function formatNumber(value) {
      if (value >= 1e12) {
          return (value / 1e12).toFixed(2) + 'T';
      } else if (value >= 1e9) {
          return (value / 1e9).toFixed(2) + 'B';
      } else if (value >= 1e6) {
          return (value / 1e6).toFixed(2) + 'M';
      } else {
          return value.toFixed(2);
      }
  }

  function formatScientific(value) {
      return value.toExponential(2);
  }

  // Slider event listeners
  const nSlider = document.getElementById('n-slider');
  const bSlider = document.getElementById('b-slider');
  const dfpMinSlider = document.getElementById('dfp-min-slider');
  const dfpMaxSlider = document.getElementById('dfp-max-slider');
  const dqatMinSlider = document.getElementById('dqat-min-slider');
  const dqatMaxSlider = document.getElementById('dqat-max-slider');
  
  const nValue = document.getElementById('n-value');
  const bValue = document.getElementById('b-value');
  const dfpRangeValue = document.getElementById('dfp-range-value');
  const dqatRangeValue = document.getElementById('dqat-range-value');

  function updateNValue() {
      const n = logValue(parseFloat(nSlider.value) / 100, N_MIN, N_MAX);
      nValue.textContent = formatNumber(n);
      updateSurface();
  }

  function updateBValue() {
      bValue.textContent = bSlider.value;
      updateSurface();
  }
  
  function updateDfpRange() {
      const minSliderVal = parseFloat(dfpMinSlider.value);
      const maxSliderVal = parseFloat(dfpMaxSlider.value);
      
      // Ensure min < max
      if (minSliderVal >= maxSliderVal) {
          dfpMaxSlider.value = minSliderVal + 1;
      }
      
      // Convert to actual values
      const minVal = logValue(parseFloat(dfpMinSlider.value) / 100, COMPUTE_MIN, COMPUTE_MAX);
      const maxVal = logValue(parseFloat(dfpMaxSlider.value) / 100, COMPUTE_MIN, COMPUTE_MAX);
      
      dfpRangeValue.textContent = formatNumber(minVal) + ' - ' + formatNumber(maxVal);
      updateSurface();
  }
  
  function updateDqatRange() {
      const minSliderVal = parseFloat(dqatMinSlider.value);
      const maxSliderVal = parseFloat(dqatMaxSlider.value);
      
      // Ensure min < max
      if (minSliderVal >= maxSliderVal) {
          dqatMaxSlider.value = minSliderVal + 1;
      }
      
      // Convert to actual values
      const minVal = logValue(parseFloat(dqatMinSlider.value) / 100, COMPUTE_MIN, COMPUTE_MAX);
      const maxVal = logValue(parseFloat(dqatMaxSlider.value) / 100, COMPUTE_MIN, COMPUTE_MAX);
      
      dqatRangeValue.textContent = formatNumber(minVal) + ' - ' + formatNumber(maxVal);
      updateSurface();
  }

  nSlider.addEventListener('input', updateNValue);
  bSlider.addEventListener('input', updateBValue);
  dfpMinSlider.addEventListener('input', updateDfpRange);
  dfpMaxSlider.addEventListener('input', updateDfpRange);
  dqatMinSlider.addEventListener('input', updateDqatRange);
  dqatMaxSlider.addEventListener('input', updateDqatRange);

  // Initialize after window loads to ensure Three.js is ready
  window.addEventListener('load', function() {
      // Double check that THREE is defined
      if (typeof THREE !== 'undefined') {
          init();
          updateNValue();
          updateBValue();
          updateDfpRange();
          updateDqatRange();
      } else {
          console.error('THREE.js failed to load');
          document.getElementById('canvas-container').innerHTML = '<p style="color: red; padding: 20px;">Error: Three.js library failed to load. Please refresh the page.</p>';
      }
  });
</script>
<!--kg-card-end: html-->
<h2 id="practical-predictions">Practical Predictions</h2><p>Ok, we know that there's an optimal QAT fraction, but how bad is a sub-optimal fraction? We can compare optimal and sub-optimal setups from the perspective of "wasted tokens" — how many more tokens you need to spend with a sub-optimal setup to match an optimal one.</p><h3 id="quantifying-wasted-compute">Quantifying wasted compute</h3><p>Using the fitted scaling law, we can quantify how bad a sub-optimal setup is. Comparing 10% QAT to optimal fractions reveals significant inefficiencies:</p><ul><li><strong>1-bit QAT</strong>: Up to 50% wasted tokens</li><li><strong>2-4-bit QAT</strong>: 5-30% wasted tokens</li><li><strong>6-bit QAT</strong>: 5-10% wasted tokens</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.01.25.png" class="kg-image" alt="" loading="lazy" width="1398" height="976" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-19.01.25.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-19.01.25.png 1000w, https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.01.25.png 1398w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Comparison of sub-optimal QAT setup with fixed 10% QAT fraction and optimal QAT setup for 1B parameter model. Wasted token count is the number of tokens effectively wasted by not utilizing an optimal QAT fraction setup. That is, if the wasted token count is n%, then the same loss can be achieved with (100− n)% tokens and optimal QAT fraction. While results vary for different bit widths, the general relationship is similar, revealing high potential savings.</span></figcaption></figure><h3 id="optimal-bit-width-under-memory-constraints">Optimal bit-width under memory constraints</h3><p>Another useful use-case is inferring optimal QAT bit-width. Given a fixed memory budget, the scaling law determines whether you should use a larger model with lower bit-width or a smaller model with higher precision. The "fixed memory budget" is practically important as LLMs decoding is commonly bottlenecked by memory transfers. We found that optimal bit-width decreases as training compute increases.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.31.32.png" class="kg-image" alt="" loading="lazy" width="2000" height="1175" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-19.31.32.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-19.31.32.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/10/Screenshot-2025-10-30-at-19.31.32.png 1600w, https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.31.32.png 2166w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Optimal QAT bit width for different memory budgets and total training budgets. We use the loss corresponding to the optimal QAT fraction. For training FLOPs, we use the estimation \(C \sim 6ND\). The white area corresponds to \(D &lt; N\), which is not practically important</span></figcaption></figure><h3 id="qat-accuracy-vs-full-precision"><strong>QAT accuracy vs full-precision</strong></h3><p>One perspective to plan QAT from is from the idea "when can we match full-precision performance?" The loss scaling law can help with that! We can compare each specific QAT bit-width for different token counts to full-precision performance. As expected, larger models tolerate lower bit-widths better, which has implications for choosing which bit-width to train.</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.49.21.png" class="kg-image" alt="" loading="lazy" width="2000" height="707" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-19.49.21.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-19.49.21.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/10/Screenshot-2025-10-30-at-19.49.21.png 1600w, https://alexdremov.me/content/images/size/w2400/2025/10/Screenshot-2025-10-30-at-19.49.21.png 2400w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">Difference in perplexity between FP loss scaling law and QAT loss scaling law for two model sizes. For QAT, the loss corresponding to the optimal QAT fraction is used. Values below 0 correspond to QAT performing better than the FP model. It is clearly observed that the ability of QAT to match FP loss is greatly influenced by model size and token count. In particular, larger models are able to tolerate lower QAT precision for higher total token count budgets.</span></figcaption></figure><h2 id="cooldown-qat-fusion">Cooldown &amp; QAT Fusion</h2><p>Standard training performs learning rate cooldown on the full-precision model, then re-warms the learning rate for QAT. We speculate that those carefully adjusted weights during FP cooldown are almost discarded when quantization is initialized.</p><p>We propose <strong>cooldown &amp; QAT fusion</strong>: skip the FP cooldown phase and perform learning rate decay jointly with QAT instead.</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.37.40.png" class="kg-image" alt="" loading="lazy" width="2000" height="643" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-19.37.40.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-19.37.40.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/10/Screenshot-2025-10-30-at-19.37.40.png 1600w, https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.37.40.png 2134w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">Comparison between two different QAT schemes. In both setups, the QAT fraction is 40%. Red-shaded areas indicate zones with lowered learning rate, which we expect to correspond to minor weight updates that get effectively ignored by QAT initialization. </span><b><strong style="white-space: pre-wrap;">On the left,</strong></b><span style="white-space: pre-wrap;"> classic QAT scheme visualization: QAT follows fully completed FP training that ends with 20% (of FP training length) learning rate decay. For QAT, the learning rate follows a cosine shape with 5% re-warmup phase. </span><b><strong style="white-space: pre-wrap;">On the right, </strong></b><span style="white-space: pre-wrap;">the cooldown &amp; QAT fusion scheme is displayed. QAT starts directly from the constant learning rate stage with small re-warmup, effectively resuming the FP learning rate scheduler as if QAT was not present at all. QAT ends with 20% cooldown (of total training length). As QAT follows the classic FP learning rate recipe with usual cooldown, we call this approach cooldown &amp; QAT fusion</span></figcaption></figure><h3 id="results">Results</h3><p>QAT fusion shows good results on 4-bit and 6-bit QAT across different model sizes. We also experimented with lower bits, but gains there were not as evident. We believe this is because for lower bits, the optimal QAT fraction is quite high, which makes the effect from QAT fusion less noticeable.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/10/Screenshot-2025-10-30-at-19.15.47.png" class="kg-image" alt="" loading="lazy" width="2000" height="602" srcset="https://alexdremov.me/content/images/size/w600/2025/10/Screenshot-2025-10-30-at-19.15.47.png 600w, https://alexdremov.me/content/images/size/w1000/2025/10/Screenshot-2025-10-30-at-19.15.47.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/10/Screenshot-2025-10-30-at-19.15.47.png 1600w, https://alexdremov.me/content/images/size/w2400/2025/10/Screenshot-2025-10-30-at-19.15.47.png 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Accuracy comparison between the classic QAT scheme and the cooldown &amp; QAT fusion training scheme. The loss difference is reported in “wasted tokens”—the difference in total token count between optimal QAT fraction loss points in the loss scaling law. Substantial improvements are noticeable across different model sizes and token counts.</span></figcaption></figure><p>The perplexity improvements translate to billions of tokens' worth of compute saved.</p><h2 id="implementation-guidelines">Implementation Guidelines</h2><p>If you're planning QAT, consider the following steps:</p><ul><li><strong>Calculate tokens-per-parameter-byte</strong> and use it to predict optimal QAT fraction instead of assuming 10%.</li><li><strong>Budget compute appropriately</strong> — optimal fractions can exceed 50% for large-scale training.</li><li><strong>Implement cooldown &amp; QAT fusion</strong> — it's a simple scheduler change with noticeable compute savings.</li><li><strong>Choose bit-width based on constraints</strong> — use the scaling law to optimize for your memory and compute budget.</li><li><strong>Pay extra attention to low-bit QAT</strong> — suboptimal fractions are much more costly for 1-2 bit quantization than 6-bit.</li></ul><h2 id="conclusions">Conclusions</h2><p>Efficient quantized model training requires careful compute allocation between full-precision and quantization-aware phases. The optimal QAT fraction isn't fixed—it increases with scale, from 10% to 50% or higher depending on tokens per parameter byte.</p><p>The loss scaling law enables us to:</p><ul><li>Predict optimal QAT fractions in advance</li><li>Avoid significant compute waste (up to 50% for extreme cases)</li><li>Select optimal bit-widths under memory constraints</li><li>Achieve higher-quality quantized models for the same cost</li></ul><p>Combined with cooldown &amp; QAT fusion, these techniques provide substantial efficiency gains for training quantized models at scale. Full details and additional experiments are available in the original paper:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://arxiv.org/abs/2509.22935v1?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Compute-Optimal Quantization-Aware Training</div><div class="kg-bookmark-description">Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior accuracy compared to QAT alone. However, the optimal allocation of compute between the FP and QAT phases remains unclear. We conduct extensive experiments with various compute budgets, QAT bit widths, and model sizes from 86.0M to 2.2B to investigate how different QAT durations impact final performance. We demonstrate that, contrary to previous findings, the loss-optimal ratio of QAT to FP training increases with the total amount of compute. Moreover, the optimal fraction can be accurately predicted for a wide range of model sizes and quantization widths using the tokens-per-parameter-byte statistic. From experimental data, we derive a loss scaling law that predicts both optimal QAT ratios and final model performance across different QAT/FP compute allocation strategies and QAT bit widths. We use the scaling law to make further predictions, which we verify experimentally, including which QAT bit width is optimal under a given memory constraint and how QAT accuracy with different bit widths compares to full-precision model accuracy. Additionally, we propose a novel cooldown and QAT fusion approach that performs learning rate decay jointly with quantization-aware training, eliminating redundant full-precision model updates and achieving significant compute savings. These findings provide practical insights into efficient QAT planning and enable the training of higher-quality quantized models with the same compute budget.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/apple-touch-icon-5.png" alt=""><span class="kg-bookmark-author">arXiv.org</span><span class="kg-bookmark-publisher">Aleksandr Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/arxiv-logo-fb-1.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><blockquote><em>Work conducted at Apple with David Grangier, Angelos Katharopoulos, and Awni Hannun. All information is from the public paper preprint.</em><br><br>Apple and the Apple logo are trademarks of Apple Inc., registered in the U.S. and other countries and regions.</blockquote> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Understanding Flash Attention: Writing the Algorithm from Scratch in Triton ]]></title>
                    <description><![CDATA[ Why is Flash Attention so fast? Find out how Flash Attention works. Afterward, we&#39;ll polish our understanding by writing a GPU kernel of the algorithm in Triton. ]]></description>
                    <link>https://alexdremov.me/understanding-flash-attention-writing-the-algorithm-from-scratch-in-triton/</link>
                    <guid isPermaLink="false">678302f9eb4e160022e3b911</guid>
                    <category><![CDATA[ Machine Learning ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sun, 12 Jan 2025 19:36:55 +0100</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-18.35.58-1.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Flash Attention is a revolutionary technique that dramatically accelerates the attention mechanism in transformer-based models, delivering processing speeds many times faster than naive methods. By cleverly tiling data and minimizing memory transfers, it tackles the notorious GPU memory bottleneck that large language models often struggle with.</p><p>In this post, we’ll dive into how Flash Attention leverages efficient <em>I/O-awareness</em> to reduce overhead, then take it a step further by crafting a <strong>block-sparse attention kernel</strong> in Triton.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">I will provide a simple explanation of how Flash Attention works. We will then implement the explained algorithm in Triton!</div></div><h2 id="what-is-attention">What is Attention?</h2><p>The attention mechanism (or scaled dot-product attention) is a core element of transformer models, which is a leading architecture for solving the problem of language modeling. All popular models, like GPT, LLaMA, and BERT, rely on attention.</p><p>The formula is pretty simple:</p><p>$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V,\\Q, K, V\;—\; \text{query, key, value tensors}$$</p><p>The rest is history.</p><p>Even though the formula looks simple, its computation involves multiplications of large tensors and a lot of data movement. Considering that this is a core part of the transformer architecture, optimizing the algorithm greatly improves the performance of the model in general.</p><p>In the naive implementation, attention requires \(O(n^2)\) additional memory and \(O(n^2)\) compute time complexity, where \(n\) is the sequence length. <strong>That's a lot!</strong></p><h2 id="flash-attention"><strong>Flash Attention</strong></h2><h3 id="core-idea"><strong>Core Idea</strong></h3><p>The main idea of Flash attention can be summarized in a simple quote from <a href="https://arxiv.org/pdf/2205.14135?ref=alexdremov.me">the original paper</a>:</p><blockquote>We argue that a missing principle is making attention algorithms IO-aware — accounting for reads and writes between levels of GPU memory.</blockquote><p>That is, modern GPUs have several types of memory:</p><ul><li><strong>SRAM</strong> — fast, on-chip, small</li><li><strong>HBM — </strong>slower than SRAM, large size. That's what we usually address as GPU memory.</li></ul><p>Check out the memory hierarchy in the image below to see the differences in bandwidth and sizes of different memory types.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-16.15.50.png" class="kg-image" alt="" loading="lazy" width="2000" height="783" srcset="https://alexdremov.me/content/images/size/w600/2025/01/Screenshot-2025-01-11-at-16.15.50.png 600w, https://alexdremov.me/content/images/size/w1000/2025/01/Screenshot-2025-01-11-at-16.15.50.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/01/Screenshot-2025-01-11-at-16.15.50.png 1600w, https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-16.15.50.png 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Image from FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness by Tri Dao et al.</span></figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">To conduct computation, data must be transferred from HBM to SRAM, and this transfer is not overhead-free!</div></div><p>The Flash Attention algorithm proposes a method of <strong>computing attention in tiles</strong>, without explicitly materializing the attention scores tensor:</p><p>$$\text{AttentionScores}(Q, K) = \text{Softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)$$</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Not materializing a matrix</strong></b> means that at any given time, the matrix does not exist in its full shape in memory.</div></div><p>It's easy to see that this matrix requires \(O(n^2)\) of memory to store. For large sequence lengths, <strong>that's a lot of data!</strong> So, if we manage to avoid explicitly materializing this matrix, we can save lots of memory.</p><p>However, this matrix is necessary for transformer training as it is a part of backpropagation and gradient calculation. The authors propose that it's better to recalculate this matrix during the backward pass (again without explicit materialization). Not only does this saves lots of memory, but it also provides huge speedups as we don't need to transfer this enormous matrix between different GPU memory types.</p><p>Overall, such an approach did not only speed up calculations by taking GPU I/O specifics into account, but also allowed processing huge sequence lengths as memory complexity drops to \(O(n)\).</p><h3 id="tiled-attention-calculation">Tiled Attention Calculation</h3><p>The last thing to understand is how to compute attention <strong>in tiles</strong>. Basically, this means that we will calculate attention over the full sequence by processing incoming tokens in small portions.</p><p>Well, it's easy to calculate \(QK^T\) in tiles. Considering that attention dimension is not high, we can load full matrix rows and columns and conduct multiplication in tiles.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Yes, if we want to have an enormous attention dimension, Flash Attention will not work without algorithm modifications. <br><br>As dimensions are usually quite small even for enormous models, this limitation is fair.</div></div><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-17.18.40.png" class="kg-image" alt="Tiled QK^T | Image by the author" loading="lazy" width="2000" height="982" srcset="https://alexdremov.me/content/images/size/w600/2025/01/Screenshot-2025-01-11-at-17.18.40.png 600w, https://alexdremov.me/content/images/size/w1000/2025/01/Screenshot-2025-01-11-at-17.18.40.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/01/Screenshot-2025-01-11-at-17.18.40.png 1600w, https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-17.18.40.png 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tiled QK^T | Image by the author</span></figcaption></figure><p>So, we have \(QK^T\) calculated in SRAM. All that's left is to apply softmax, multiply by \(V\), and that's it!</p><p>$$\text{Softmax}(z_i) = \frac{e^{z_{i}}}{\sum_{j=1}^T e^{z_{j}}} \; \; \text{for}\; i = 1, 2,\ldots, T$$</p><p>That's where the trick is.</p><p>The problem is that the softmax denominator requires aggregation over the sequence length to normalize scores, and we do not have access to the whole length as we load data in tiles.</p><p>To address it, we can implement a concatenated softmax algorithm. Using it, we can calculate softmax "in batch" mode: by adjusting computed values with the new incoming data.</p><p>Taking the algorithm from the original article, we can define rules to compute the softmax over data concatenation. Having two vectors \(x^{(1)}\) and \(x^{(2)}\), we need to calculate the softmax denominator \(l(x)\) over those vectors' concatenation: \(x = \left[x^{(1)}, x^{(2)}\right]\). If the vector's maximum is \(m(x)\), we can easily derive the softmax denominator of the concatenation:</p><p>$$m(x) = m\left(\left[x^{(1)}, x^{(2)}\right]\right) = m(m(x^{(1)}), m(x^{(2)})),$$</p><p>$$l(x) = l\left(\left[x^{(1)}, x^{(2)}\right]\right) = e^{m(x^{(1)}) - m(x)}l(x^{(1)}) + e^{m(x^{(2)}) - m(x)}l(x^{(2)}).$$</p><p>The last equivalence can be easily verified as \(l(x)=\sum_{j=1}^{T} e^{x_{j}}.\)</p><p>So, now we have what we want — we can calculate softmax per-tile and then, by doing re-normalization from the formula above, compute the global softmax. The last thing to do is to incorporate the tile of the \(V\) tensor and keep doing the same re-normalization (as matrix multiplication is a linear operation).</p><p>And all of this without loading the full sequence into memory or materializing \(QK^T\)!</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Notice that we calculate \(\text{Softmax}\left(QK^T\right)\) in tiles only, without needing to have the whole matrix at any moment.</div></div><p>Also, in the actual algorithm for numerical stability, we will compute not \(\text{Softmax}(x)\) but \(\text{Softmax}(x - \max(x))\). We can do that as softmax is invariant to constant shifts.</p><h2 id="triton-implementation">Triton Implementation</h2><p>Now, we can easily implement the outlined algorithm in Triton, which is a tool that allows us to write efficient GPU kernels with the ease of Python.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">To learn more about Triton, check out their official guides.</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://triton-lang.org/main/getting-started/tutorials/index.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Tutorials — Triton documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://static.ghost.org/v5.0.0/images/link-icon.svg" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/triton-logo.png" alt="" onerror="this.style.display = 'none'"></div></a></figure>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h3 id="outlining-the-algorithm">Outlining the Algorithm</h3><p>The first step is to decide how we will assign jobs and what data each job will load. By the algorithm of tiled softmax, each job must have access to \(K, V\) over the whole sequence length. So, each job will iterate over \(K, V\) in tiles. We don't have any algorithmic restriction on the number of \(Q\) tiles processed. Therefore, each job will load just one \(Q\) tile and work with it only — this way we will maximize job parallelism.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-18.35.58.png" class="kg-image" alt="Jobs data management | Image by the author" loading="lazy" width="2000" height="1034" srcset="https://alexdremov.me/content/images/size/w600/2025/01/Screenshot-2025-01-11-at-18.35.58.png 600w, https://alexdremov.me/content/images/size/w1000/2025/01/Screenshot-2025-01-11-at-18.35.58.png 1000w, https://alexdremov.me/content/images/size/w1600/2025/01/Screenshot-2025-01-11-at-18.35.58.png 1600w, https://alexdremov.me/content/images/2025/01/Screenshot-2025-01-11-at-18.35.58.png 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Kernel jobs data management | Image by the author</span></figcaption></figure><p>In summary, each job will load a single \(Q\) tile, iterate over all tiles in \(K\) and \(V\), and store one tile of result corresponding to the \(Q\) tile.</p><h3 id="the-kernel">The Kernel</h3><p>What's left is to write the actual code. Let's focus on the core part first, and only then we'll add Triton-specific boilerplates.</p><p>Below is a Triton pseudocode with every line explained.</p><pre><code class="language-python">def self_attn_fwd(...):
    # loading sample len
    seq_len = ...

    # running qk^T max (initialized by -inf)
    m_i = tl.zeros([TILE_Q_SIZE], dtype=tl.float32) - float("inf")

    # current softmax denominator
    l_i = tl.zeros([TILE_Q_SIZE], dtype=tl.float32)

    # result tile 
    # we will accumulate here (softmax numerator) @ V
    # then, we will divide it by softmax denominator in the very end
    acc = tl.zeros([TILE_Q_SIZE, HEAD_DIM], dtype=tl.float32)

    # notice: we accumulate all values above
    # in fp32 for higher precision

    # account for variable length of samples in batch
    q_tile_indices = q_token_idx + tl.arange(0, TILE_Q_SIZE)
    q_lens_mask = (
        q_tile_indices[:, None] &lt; seq_len
    )

    # loading q tile into SRAM, shape (TILE_Q_SIZE, HEAD_DIM)
    q_tile = ... 

    # softmax scale, multiplying by log_2(e) 
    # to use faster exp2(...) instead of exp(...)
    softmax_scale: tl.constexpr = tl.cast(SM_SCALE * log_2(e), q_tile.dtype)

    # indices of tokens inside kv tile 
    tile_k_arange = tl.arange(0, TILE_K_SIZE)

    # iterate over all tiles in k, v
    for kv_tile_idx in tl.range(
        0, tl.cdiv(seq_len, TILE_K_SIZE), num_stages=PIPELINING
    ):
        # index of the first token in the kv tile
        kv_token_idx = kv_tile_idx * TILE_K_SIZE

        kt_tile = ... # load into SRAM K^T tile no. kv_tile_idx
        v_tile = ... # load into SRAM V tile no. kv_tile_idx

        # compute tile of QK^T
        qk = tl.dot(
            q_tile * softmax_scale,
            kt_tile,
            input_precision=INPUT_PRECISION,
            out_dtype=tl.float32
        )

        # masking out kv tokens after the sequence length
        kv_indices = kv_token_idx + tile_k_arange
        mask = q_lens_mask &amp; (
            kv_indices[None, :] &lt; seq_len
        )

        # set masked out values to -inf
        # for softmax to ignore them
        qk = tl.where(mask, qk, tl.cast(-float("inf"), qk.dtype))

        # calculating new maximum over seq len
        # m(x) = m(m(x1), m(x2))
        m_ij = tl.maximum(m_i, tl.max(qk, 1))

        # e^(x2 - m(x))
        p = tl.math.exp2(qk - m_ij[:, None])

        # current tile softmax denominator
        l_ij = tl.sum(p, 1)

        # from softmax formula: e^(m(x1) - m(x))
        alpha = tl.math.exp2(m_i - m_ij)

        # updating denominator using the formula
        # l(x) = e^(m(x1) - m(x)) * l(x1) + e^(0)l(x2)
        # notice: e^(0) as we subtract m(x) from x2 above
        l_i = l_i * alpha + l_ij
        
        # update previous acc to address maximum change
        # as e^(xi - m(x1)) * alpha = e^(xi - m(x))
        acc = acc * alpha[:, None]

        # multiply p by v and adding to acc
        acc += tl.dot(
            p.to(v_tile.dtype),
            v_tile,
            input_precision=INPUT_PRECISION,
            out_dtype=tl.float32,
        )

        # storing new maximum
        m_i = m_ij

    # finally incorporate softmax denominator
    acc = acc / l_i[:, None]

    # set fully masked token values to 0 to avoid garbage values
    # in the result
    acc = tl.where(q_lens_mask, acc, 0.0)

    # save the result
    tl.save(acc, ...) </code></pre><p>See? Easy!</p><p>What's important is that you can see how simple it is to write such a thing as soon as we understand the idea of tiled softmax. Apart from that, there's nothing complicated from the algorithm perspective.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">This kernel can be made even faster by implementing triton optimizations. However, this is out of the scope of this article.</div></div><p>This pseudocode is pretty close to the actual code. You may find it in my GitHub by following the link. All that I added is just data management and PyTorch wrappers.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/alexdremov/kernels/blob/main/src/self_attention/kernel.py?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">kernels/src/self_attention/kernel.py at main · alexdremov/kernels</div><div class="kg-bookmark-description">Collection of useful kernels. Contribute to alexdremov/kernels development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/pinned-octocat-093da3e6fa40-2.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">alexdremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/fbb2c3e5de3a0a6dbb858f209284255e632e255c911b7421f730fc1a653a3b9d/alexdremov/kernels" alt="" onerror="this.style.display = 'none'"></div></a></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">❗</div><div class="kg-callout-text">Don't hesitate to ask if something isn't clear. I'm here in the comments 😁.</div></div><p>The code above <a href="https://github.com/alexdremov/kernels/blob/main/tests/test_self_attention.py?ref=alexdremov.me">was extensively tested</a> to match PyTorch's <code>scaled_dot_product_attention</code>. You can also check out the tests to see how to use the written kernel.</p><h3 id="benchmarking">Benchmarking</h3><p>While we wrote the kernel in Triton to improve the algorithm understanding, it's interesting to compare the performance with a naive implementation and PyTorch's <code>scaled_dot_product_attention</code>.</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2025/01/plot.png" class="kg-image" alt="" loading="lazy" width="1100" height="400" srcset="https://alexdremov.me/content/images/size/w600/2025/01/plot.png 600w, https://alexdremov.me/content/images/size/w1000/2025/01/plot.png 1000w, https://alexdremov.me/content/images/2025/01/plot.png 1100w"><figcaption><span style="white-space: pre-wrap;">Benchmarking implementations for different sequence lengths | Image by the author</span></figcaption></figure><p>As expected, the Flash Attention algorithm completely outperforms the naive implementation performance-wise. Also, I've marked with a dashed line the range of lengths for which the naive implementation causes a CUDA out-of-memory error.</p><p>We see that our Triton implementation is slightly worse than PyTorch SDPA. But the difference is not too large Considering the fact that PyTorch SDPA is a well-optimized CUDA kernel, that's a nice result.</p><p>Benchmarking code is also available in the repository.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/alexdremov/kernels/blob/main/benchmark/benchmark_self_attention.py?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">kernels/benchmark/benchmark_self_attention.py at main · alexdremov/kernels</div><div class="kg-bookmark-description">Collection of useful kernels. Contribute to alexdremov/kernels development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/pinned-octocat-093da3e6fa40-3.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">alexdremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/kernels" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="conclusions">Conclusions</h2><p>In the post, I covered the motivation of the Flash Attention algorithm as well as its algorithm details. Finally, we were able to implement it from scratch in Triton, reproducing the speedups from the paper.</p><p>I hope this post improved your understanding of Flash Attention. Feel free to leave a comment below if you have any questions.</p><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://arxiv.org/abs/2205.14135?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness</div><div class="kg-bookmark-description">Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">arXiv.org</span><span class="kg-bookmark-publisher">Tri Dao</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/arxiv-logo-fb.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://triton-lang.org/main/getting-started/tutorials/index.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Tutorials — Triton documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://static.ghost.org/v5.0.0/images/link-icon.svg" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/triton-logo-1.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/alexdremov/kernels/tree/main?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - alexdremov/kernels: Collection of useful kernels</div><div class="kg-bookmark-description">Collection of useful kernels. Contribute to alexdremov/kernels development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/pinned-octocat-093da3e6fa40-4.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">alexdremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/kernels-1" alt="" onerror="this.style.display = 'none'"></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker ]]></title>
                    <description><![CDATA[ It&#39;s all about making your models run faster, from flicking a magic “compile” switch to writing your own custom GPU code. In each step, we’ll implement an innocent softmax function, but things are about to get dark by the end. ]]></description>
                    <link>https://alexdremov.me/speed-up-pytorch-with-custom-kernels-but-it-gets-progressively-darker/</link>
                    <guid isPermaLink="false">67784a35d6f0130001527c43</guid>
                    <category><![CDATA[ Machine Learning ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sat, 04 Jan 2025 01:34:24 +0100</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1532882871449-7fbb1ec36d48?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;M3wxMTc3M3wwfDF8c2VhcmNofDZ8fHNpbGljb258ZW58MHx8fHwxNzM1OTQ1NDY1fDA&amp;ixlib&#x3D;rb-4.0.3&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>PyTorch offers remarkable flexibility, allowing you to code complex GPU-accelerated operations in a matter of seconds. However, this convenience comes at a cost. PyTorch executes your code sequentially, resulting in suboptimal performance. This translates into slower model training, which impacts the iteration cycle of your experiments, the robustness of your team, the financial implications, and so on.</p><p>In this post, I’ll explore three strategies for accelerating your PyTorch operations. Each method uses <strong><code>softmax</code></strong> as our “Hello World” demonstration, but you can swap it with any function you like, and the discussed methods would still apply.</p><p>We’ll begin with <strong><code>torch.compile</code></strong>, move on to writing a custom Triton kernel, and finally dive into designing a CUDA kernel.</p><p>So, this post may get complicated, but bear with me.</p><h2 id="torchcompile-%E2%80%94-a-quick-way-to-boost-performance"><code>torch.compile</code>&nbsp;— A Quick Way to Boost Performance</h2><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2025/01/Phase_1-2.jpeg" class="kg-image" alt="" loading="lazy" width="254" height="254"></figure><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">“Wait, you just turn on a single function call and it speeds up your code? That’s it? Sounds too good to be true.”</em></i><br><br>— Yes.</div></div><p>The <code>torch.compile</code>&nbsp;is a relatively new API in PyTorch that uses runtime graph capture and kernel fusion under the hood . With one decorator, you can often see speed improvements without significant changes to your code.</p><p>Speaking simply, for example, we can speed up calculations by merging operations into one GPU function, which removes overheads of separate GPU calls. Or even better, optimize a chain of operations by replacing them with one equivalent! </p><p>Such optimizations are not possible in the regular PyTorch execution mode (eager) as it is eager and executes operations just as they are called in the code.</p><h3 id="softmax-implementation-with-torchcompile">Softmax Implementation with&nbsp;<code>torch.compile</code></h3><p>Below is a&nbsp;simple example showing how to implement and compile a softmax function using&nbsp;<code>torch.compile</code>. Replace it in your model’s forward pass, and your code (hopefully) runs faster.</p><pre><code class="language-python">import torch

# Our softmax function in PyTorch land
def softmax_pytorch(x):
    # Avoid numerical instability by subtracting max
    x_max = torch.max(x, dim=-1, keepdim=True).values
    x_exp = torch.exp(x - x_max)
    return x_exp / torch.sum(x_exp, dim=-1, keepdim=True)

# Let's compile it with torch.compile
@torch.compile
def compiled_softmax(x):
    return softmax_pytorch(x)

if __name__ == "__main__":
    # Example usage:
    input_tensor = torch.randn((2, 4), device="cuda")
    output = compiled_softmax(input_tensor)
    print("Input:", input_tensor)
    print("Compiled Softmax Output:", output)
</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">❗</div><div class="kg-callout-text">Note that you'll have bigger speedups if you compile the whole model pass and not just one operation</div></div><p><strong>Pros</strong>:</p><ul><li>One line to enable the compiler.</li><li>No black magic rituals needed (except for the dynamic shapes maybe).</li></ul><p><strong>Cons</strong>:</p><ul><li>The first pass can be slower while it compiles; afterwards, it picks up speed.</li><li>Doesn’t always produce dramatic speed-ups for&nbsp;<em>all</em>&nbsp;models and can occasionally break if your code is too creative.</li><li>Still has problems with handling dynamic shapes.</li></ul><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Dynamic shapes compilation mode is needed when input shapes change and we don't want to recompile the code for each specific size.<br><br>The ways to debug this is a whole new article.</div></div><h2 id="triton-code-%E2%80%94-write-gpu-kernels-with-python-breeze">Triton Code — Write GPU Kernels With Python Breeze</h2><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2025/01/images.jpeg" class="kg-image" alt="" loading="lazy" width="225" height="224"></figure><h3 id="why-use-triton">Why Use Triton?</h3><p><strong>Triton</strong>&nbsp;is a language that compiles to efficient GPU kernels while letting you write Pythonic code. It’s used under the hood of PyTorch’s dynamo/inductor stack, but you can also write your own custom ops! For many matrix/tensor operations — like softmax — you can get huge speed-ups. Because&nbsp;<strong>why</strong>&nbsp;wait for official PyTorch kernels when you can write your own?</p><h3 id="softmax-in-triton">Softmax in Triton</h3><p>Here’s a minimal snippet that shows how we might do a naive softmax forward in Triton. I'll keep it short and sweet for demonstration. In a real project, you’d likely do more advanced tiling and block management.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">This may look complicated, but you just need to get familiar with Triton, and it will start making sense.<br><br>Check out <a href="https://triton-lang.org/main/index.html?ref=alexdremov.me">their guides</a>!</div></div><pre><code class="language-python">import torch
import triton
import triton.language as tl


@triton.autotune(
    configs=[
        triton.Config(
            kwargs=dict(
                BLOCK_SIZE_ROWS=BLOCK_SIZE_ROWS,
                num_stages=num_stages,
            ),
            num_warps=num_warps,
            num_stages=num_stages,
        )
        for BLOCK_SIZE_ROWS in (16, 32, 64, 128)
        for num_stages in (2, 3, 4)
        for num_warps in (2, 4, 8)
    ],
    key=['N_COLS'],
)
@triton.heuristics(
    values=dict(
        BLOCK_SIZE_COLS=lambda args: triton.next_power_of_2(args['N_COLS'])
    )
)
@triton.jit
def softmax_kernel(
    input_ptr: tl.tensor,
    output_ptr: tl.tensor,
    input_row_stride: int,
    output_row_stride: int,
    n_rows: int,
    N_COLS: tl.constexpr,
    BLOCK_SIZE_ROWS: tl.constexpr,
    BLOCK_SIZE_COLS: tl.constexpr,
    num_stages: tl.constexpr
):
    input_ptr = tl.make_block_ptr(
        base=input_ptr,
        shape=(n_rows, N_COLS),
        strides=(input_row_stride, 1),
        offsets=(0, 0),
        block_shape=(BLOCK_SIZE_ROWS, BLOCK_SIZE_COLS),
        order=(1, 0),
    )

    output_ptr = tl.make_block_ptr(
        base=output_ptr,
        shape=(n_rows, N_COLS),
        strides=(output_row_stride, 1),
        offsets=(0, 0),
        block_shape=(BLOCK_SIZE_ROWS, BLOCK_SIZE_COLS),
        order=(1, 0),
    )

    cols_mask = tl.arange(0, BLOCK_SIZE_COLS) &lt; N_COLS

    row_idx = tl.program_id(0) * BLOCK_SIZE_ROWS
    in_tile_ptr = tl.advance(input_ptr, (row_idx, 0))
    row = tl.load(pointer=in_tile_ptr, boundary_check=(0, 1))

    # Subtract maximum for numerical stability
    row_minus_max = row - tl.max(row, axis=1, keep_dims=True)
    row_minus_max = tl.where(cols_mask, row_minus_max, -float('inf'))

    numerator = tl.exp(row_minus_max)
    denominator = tl.sum(numerator, axis=1, keep_dims=True)
    softmax_output = numerator / denominator

    out_tile_ptr = tl.advance(output_ptr, (row_idx, 0))
    tl.store(out_tile_ptr, softmax_output, boundary_check=(0, 1))


def softmax(x: torch.Tensor):
    x_orig_shape = x.shape
    x = x.view(-1, x_orig_shape[-1])
    n_rows, n_cols = x.shape

    y = torch.empty_like(x, memory_format=torch.contiguous_format)

    grid = lambda args: (
        triton.cdiv(n_rows, args['BLOCK_SIZE_ROWS']),
        1,
        1
    )

    softmax_kernel[grid](
        input_ptr=x,
        output_ptr=y,
        input_row_stride=x.stride(0),
        output_row_stride=y.stride(0),
        n_rows=n_rows,
        N_COLS=n_cols,
    )
    return y.view(*x_orig_shape)</code></pre><p>Indeed, it looks complicated. But the core of the algorithm is summarized in a few lines.</p><pre><code class="language-python">    row_minus_max = row - tl.max(row, axis=1, keep_dims=True)
    row_minus_max = tl.where(cols_mask, row_minus_max, -float('inf'))

    numerator = tl.exp(row_minus_max)
    denominator = tl.sum(numerator, axis=1, keep_dims=True)
    
    softmax_output = numerator / denominator</code></pre><p>Everything else is just data management and side-hustle.</p><p>If we'll conduct benchmarking for different data length, we'll see that we match <code>torch.nn.functional.softmax</code> performance <strong>(which is highly optimized kernel!)</strong> and dramatically outperform naive torch implementation. </p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2025/01/softmax-performance.png" class="kg-image" alt="" loading="lazy" width="640" height="480" srcset="https://alexdremov.me/content/images/size/w600/2025/01/softmax-performance.png 600w, https://alexdremov.me/content/images/2025/01/softmax-performance.png 640w"></figure><p>You may find the full code for the kernel and benchmark in the following github file.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/alexdremov/kernels/blob/main/src/softmax/kernel.py?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">kernels/src/softmax/kernel.py at main · alexdremov/kernels</div><div class="kg-bookmark-description">Collection of useful kernels. Contribute to alexdremov/kernels development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/pinned-octocat-093da3e6fa40.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">alexdremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/c1d5fd6dabdbfb9b86b5e7053edb027477fb87351bd31f21a099a829b761e09e/alexdremov/kernels" alt="" onerror="this.style.display = 'none'"></div></a></figure><p><strong>Pros</strong>:</p><ul><li>Potentially huge speed-ups by fusing ops and optimizing memory access patterns.</li><li>More control than&nbsp;<code>torch.compile</code>.</li><li>Easy to write efficient code (we matched torch implementation!)</li><li>Easy to write inefficient code (if you don't know what you're doing).</li></ul><p><strong>Cons</strong>:</p><ul><li>You’re now the&nbsp;<em>kernel developer</em>, which means debugging if something goes sideways. Which is tough. Really.</li><li>If you go further with custom backward passes, you might need a second coffee… or more. That's because torch cannot use autograd for triton. So you will need to define backward yourself.</li><li>Subscribe so you don't miss a post about usage of triton kernels + autograd + torch.compile tandem.</li></ul>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="pure-cuda-aka-going-hardcore">Pure CUDA (a.k.a. Going Hardcore)</h2><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2025/01/Uncanny_Phase_3-2.jpeg" class="kg-image" alt="" loading="lazy" width="254" height="259"></figure><p>Sometimes even Triton won’t cut it, or you just enjoy living on the edge. In that case, you can write a custom CUDA kernel in C++, compile it, and tie it into PyTorch via a custom extension. Projects like <a href="https://github.com/fattorib/CudaSoftmax?ref=alexdremov.me">[this fused CUDA softmax reference]</a>&nbsp;show how people build specialized kernels for maximum speed.</p><h3 id="softmax-in-custom-cuda">Softmax in Custom CUDA</h3><p>You’ll typically have a&nbsp;<code>setup.py</code>&nbsp;that compiles a&nbsp;<code>.cu</code>&nbsp;or&nbsp;<code>.cpp</code>&nbsp;file and exposes a Python function as an extension. </p><p>Checkout <a href="https://github.com/fattorib/CudaSoftmax?ref=alexdremov.me">CudaSoftmax</a> for self-explanatory example.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/fattorib/CudaSoftmax?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - fattorib/CudaSoftmax: Softmax CUDA kernel :)</div><div class="kg-bookmark-description">Softmax CUDA kernel :). Contribute to fattorib/CudaSoftmax development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/content/images/icon/pinned-octocat-093da3e6fa40-1.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">fattorib</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/thumbnail/CudaSoftmax" alt="" onerror="this.style.display = 'none'"></div></a></figure><p>I will not provide the code for this method in this post, so this fact speaks for itself. This approach is quite complicated, requires good justification, and usually the last thing you should try doing.</p><p>It's very easy to write inefficient, buggy, unsafe code.</p><p><strong>Pros</strong>:</p><ul><li>Maximum control. “If you want something done right, do it yourself.”</li><li>Potential for the fastest possible kernel if well-optimized.</li></ul><p><strong>Cons</strong>:</p><ul><li>Requires deep CUDA understanding.</li><li>Memory management, block sizes, shared memory—those are hard!</li><li>Maintenance overhead can be <strong>extremely</strong> high.</li></ul><h2 id="conclusion">Conclusion</h2><p>When it comes to speeding up PyTorch operations, you can choose from progressively more intricate methods:</p><ol><li><strong><code>torch.compile</code></strong>: Minimal code changes needed.</li><li><strong>Triton Kernel</strong>: More control over kernel behaviour, still quite easy coding.</li><li><strong>Pure CUDA</strong>: Maximum optimisation potential, but <strong>a lot higher</strong> complexity.</li></ol><p>If you’re looking for the simplest improvement, start with&nbsp;<code>torch.compile</code>. If that’s insufficient, explore Triton. For advanced users, writing a custom CUDA kernel can yield further gains, though it demands deep GPU programming skills.</p><h2 id="references">References</h2><ol><li><a href="https://pytorch.org/tutorials/recipes/compiling_optimizer.html?ref=alexdremov.me" rel="nofollow noopener">Compiling the optimizer with torch.compile (PyTorch Docs)</a></li><li><a href="https://discuss.pytorch.org/t/how-should-i-use-torch-compile-properly/144598?ref=alexdremov.me" rel="nofollow noopener">How should I use torch.compile properly? (PyTorch discussion)</a></li><li><a href="https://pytorch.org/tutorials/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?ref=alexdremov.me" rel="nofollow noopener">Using User-Defined Triton Kernels with torch.compile (PyTorch Docs)</a></li><li><a href="https://discuss.pytorch.org/t/torch-compile-with-custom-triton-kernel/192876?ref=alexdremov.me" rel="nofollow noopener">Torch.compile with custom Triton kernel (PyTorch discussion)</a></li><li><a href="https://github.com/fattorib/CudaSoftmax?ref=alexdremov.me" rel="nofollow noopener">GitHub: fattorib/CudaSoftmax</a></li></ol><p>Choose the path that fits your project’s needs and your comfort level. Good luck optimizing!</p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Simple Ways to Speed Up Your PyTorch Model Training ]]></title>
                    <description><![CDATA[ If all machine learning engineers want one thing, it&#39;s faster model training — maybe after good test metrics. ]]></description>
                    <link>https://alexdremov.me/simple-ways-to-speedup-your-pytorch-model-training/</link>
                    <guid isPermaLink="false">66532c659f146317c6d64f96</guid>
                    <category><![CDATA[ Machine Learning ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Tue, 28 May 2024 22:16:11 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1578991132108-16c5296b63dc?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;M3wxMTc3M3wwfDF8c2VhcmNofDR8fHNwZWVkfGVufDB8fHx8MTcxNjcyNjc5M3ww&amp;ixlib&#x3D;rb-4.0.3&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>Does this topic even need an introduction?</p><p>Speeding&nbsp;up machine learning model training&nbsp;is one thing that all machine learning engineers want.&nbsp;Faster training equals faster experiments equals faster iterations for your product. Also, it means that one model training will require fewer resources. So, straight to the point</p><h2 id="containerization">Containerization</h2><p>Yes, this will not speed up your training on its own. But this targets another&nbsp;important&nbsp;aspect — reproducibility. Sometimes virtualenv with fixed library versions is enough, but I encourage you to take one step further and build an all-in-one docker container for your model training.&nbsp;</p><p>This&nbsp;ensures&nbsp;that the&nbsp;environment is&nbsp;fully&nbsp;consistent during debugging, profiling, and final training. The last thing you want is to optimize a part of code that is no longer a bottleneck due to python12 speed up, for example. Or even a bug that is not reproducible on different CUDA versions.</p><p>As a starting point, you can use pre-built images from NVIDIA. They already have CUDA, PyTorch, and other popular libs installed:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">PyTorch | NVIDIA NGC</div><div class="kg-bookmark-description">PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://catalog.ngc.nvidia.com/favicon.ico" alt=""><span class="kg-bookmark-author">NVIDIA NGC Catalog</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://assets.nvidiagrid.net/ngc/logos/OSS-Nvidia-Partnership-Pytorch.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">A Docker container is the ultimate solution for problems like<br>"Hey, it works on my machine. I have no idea why it doesn't on yours."</div></div><h2 id="get-comfortable-with-pytorch-profiler">Get comfortable with PyTorch profiler</h2><p>Before optimizing anything, you have to understand how long some parts of your code run. Pytorch profiler is <em>almost</em> an all-in-one tool for profiling training. It's able to record:</p><ul><li>CPU operations timings</li><li>CUDA kernels timings</li><li>Memory consumption history</li></ul><p>That's all you need. And it's easy to enable!</p><p>To record events, all you need is to embed training into a profiler context like this:</p><pre><code class="language-python">import torch.autograd.profiler as profiler

with profiler.profile(
  activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
  on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs'),
) as prof:
  train(args)</code></pre><p>After that, you can launch the tensorboard and view profiling traces. Do not forget to install <a href="https://pypi.org/project/torch-tb-profiler/?ref=alexdremov.me">torch-tb-profiler</a>.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">PyTorch Profiler With TensorBoard — PyTorch Tutorials 2.3.0+cu121 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/tutorials/_static/img/profiler_overview1.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><p>Profiler has a lot of different options, but the most important are <code>activities</code> and <code>profile_memory</code>. You can experiment with other options, but keep in mind a simple rule: <strong>the fewer options you've enabled, the less overhead you have</strong>.</p><p>So, if you want to profile CUDA kernel execution timings, it is a good idea to turn off CPU profiling and all other features. In this mode, profiling will be as close to the real execution as possible.</p><p>To make traces easier to understand, consider adding profiling contexts that describe core parts of your code. If profiling is not enabled, those are no-op.</p><pre><code class="language-python">with profiler.record_function("forward_pass"):
  result = model(**batch)

with profiler.record_function("train_step"):
  step(**result)
</code></pre><p>This way, the labels that you use will be visible in traces. So, it will be easier to identify code blocks. Or even more granular inside mode's forward:</p><pre><code class="language-python">with profiler.record_function("transformer_layer:self_attention"):
  data = self.self_attention(**data)

...

with profiler.record_function("transformer_layer:encoder_attention"):
  data = self.encoder_attention(**data, **encoder_data)</code></pre><h2 id="understanding-pytorch-traces">Understanding PyTorch traces</h2><p>After you gather traces, open them in the tensorboard. That's what the CPU + CUDA profile looks like:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/profiler_trace_view1.png" class="kg-image" alt="" loading="lazy" width="2000" height="575" srcset="https://alexdremov.me/content/images/size/w600/2024/05/profiler_trace_view1.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/profiler_trace_view1.png 1000w, https://alexdremov.me/content/images/size/w1600/2024/05/profiler_trace_view1.png 1600w, https://alexdremov.me/content/images/2024/05/profiler_trace_view1.png 2086w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">source: </span><a href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html?ref=alexdremov.me"><span style="white-space: pre-wrap;">https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html</span></a></figcaption></figure><p>Straight away, find the core parts of any training:</p><ul><li>data loading</li><li>forward pass</li><li>backward pass</li></ul><p>Backward pass is handled by PyTorch in a separate thread (thread 16893 on the image above), so it is easy to identify.</p><h2 id="data-loading">Data loading</h2><p>For data loading, we want near-zero timings.</p><p>No compromises.</p><p>That's because during data loading GPU does nothing, which under-utilizes available resources. However, data processing can be overlapped with GPU computing as those are independent parts.</p><p>You can easily identify areas where GPU is idle&nbsp;— just look at <em>GPU Est. SM Efficiency</em> and <em>GPU Utilization</em> figures in the profiler's trace. Areas with zero activity are our patients. That's where GPU does nothing.</p><p>A simple solution for that is:</p><ul><li>process data in the background process (no GIL)</li><li>process data augmentations and transforms in parallel processes</li></ul><p>If you use PyTorch DataLoader, then it can be easily achieved by specifying <code>num_workers</code>. It's more complicated if you use <code>IterableDataset</code>, as then data will be duplicated. However, this issue still can be solved by using <a href="https://pytorch.org/docs/stable/data.html?ref=alexdremov.me#torch.utils.data.IterableDataset">get_worker_info()</a> — you need to adjust iteration in a way so that each worker receives different, non-intersecting rows.</p><p>For more configurable processing, you may consider implementing multi-process transforms yourself with <code>multiprocessing</code></p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">If you never checked your code's data processing speed, then this slight modification can yield <b><strong style="white-space: pre-wrap;">dramatic speedups</strong></b></div></div>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="making-friends-with-memory-allocator">Making friends with memory allocator</h2><p>You want to be friends with PyTorch's CUDA caching allocator.</p><p>When you allocate tensors with PyTorch on a CUDA device, PyTorch will use a caching allocator. That's because <code>cudaMalloc</code>/<code>cudaFree</code> are expensive operations that we want to avoid, so PyTorch has its allocator that will try to reuse previously allocated through <code>cudaMalloc</code> blocks. That is, if PyTorch's allocator has an appropriate block available, it will give it straight away without calling <code>cudaMalloc</code>. That way, <code>cudaMalloc</code> is called only at the beginning.</p><p>However, if you're dealing with data of variable length, different forward passes will require intermediate tensors of different sizes. So, PyTorch's allocator may not have an appropriate block of data available. In this case, the allocator panics and releases allocated previously bocks by calling <code>cudaFree</code> to free up space for new allocations.</p><p>After that, the allocator starts building its cache again, doing tons of <code>cudaMalloc</code>, which is an expensive operation. You can spot this problem by looking at the memory profiler section of the tensorboard profiler viewer.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">You also can spot this problem in the traces. It will be visible as calls to <code spellcheck="false" style="white-space: pre-wrap;">cudaMalloc</code> and <code spellcheck="false" style="white-space: pre-wrap;">cudaFree</code></div></div><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/Screenshot-2024-05-26-at-18.17.44.png" class="kg-image" alt="" loading="lazy" width="2000" height="460" srcset="https://alexdremov.me/content/images/size/w600/2024/05/Screenshot-2024-05-26-at-18.17.44.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/Screenshot-2024-05-26-at-18.17.44.png 1000w, https://alexdremov.me/content/images/size/w1600/2024/05/Screenshot-2024-05-26-at-18.17.44.png 1600w, https://alexdremov.me/content/images/2024/05/Screenshot-2024-05-26-at-18.17.44.png 2000w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">PyTorch allocator freaks out </span></figcaption></figure><p>As you see, a red line that corresponds to the allocator's reserved memory constantly changes. That means that PyTorch allocator is not able to efficiently handle allocation requests.</p><p>When allocations are handled without the allocator panicking, the red line is completely straight</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/Screenshot-2024-05-26-at-18.36.36.png" class="kg-image" alt="" loading="lazy" width="2000" height="714" srcset="https://alexdremov.me/content/images/size/w600/2024/05/Screenshot-2024-05-26-at-18.36.36.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/Screenshot-2024-05-26-at-18.36.36.png 1000w, https://alexdremov.me/content/images/size/w1600/2024/05/Screenshot-2024-05-26-at-18.36.36.png 1600w, https://alexdremov.me/content/images/2024/05/Screenshot-2024-05-26-at-18.36.36.png 2000w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">PyTorch allocator works as expected</span></figcaption></figure><p>As I said, that is usually due to variable shapes of tensors. How to fix that?</p><h3 id="1-expandable-segments"><strong>1.&nbsp;Expandable Segments</strong></h3><p>The first thing that is worth trying is to set PyTorch's relatively new allocator mode:</p><pre><code class="language-bash">PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"</code></pre><blockquote>If set to&nbsp;<code>True</code>, this setting instructs the allocator to create CUDA allocations that can later be expanded to better handle cases where a job changes allocation sizes frequently, such as having a changing batch size.</blockquote><p>So, this tells PyTorch allocator to allocate blocks that could be expanded in the future, which is exactly our case. Though, if size variations are too big, it still may fail to solve the issue. In this case, move to the next option.</p><h3 id="2-make-allocations-variate-less"><strong>2.&nbsp;Make allocations variate less</strong></h3><p>Another possible solution is to make data shapes consistent. That way it will be easier for the allocator to find an appropriate data block to reuse.</p><p>To accomplish that, you may pad data to the same sizes. Or you can preheat the allocator by running a model with maximum input sizes.</p><p>You can learn more about PyTorch allocator modification in the following article</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/docs/stable/notes/cuda.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">CUDA semantics — PyTorch 2.3 documentation</div><div class="kg-bookmark-description">A guide to torch.cuda, a PyTorch module to run CUDA operations</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/docs/stable/_static/images/view-page-source-icon.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="tidy-up-allocations-history">Tidy up allocations history</h2><p>We want to use all available GPU memory — that allows us to run big batches and process data faster. However, at some point, you will encounter a <em>CUDA out-of-memory</em> error when increasing batch size. What causes this error?</p><p>To debug this, we can view the allocator's memory history. It can be recorded through PyTorch and then visualized at <a href="https://pytorch.org/memory_viz?ref=alexdremov.me">https://pytorch.org/memory_viz</a></p><ul><li><strong>Start:</strong>&nbsp;<code>torch.cuda.memory._record_memory_history(max_entries=100000)</code></li><li><strong>Save:</strong>&nbsp;<code>torch.cuda.memory._dump_snapshot(file_name)</code></li><li><strong>Stop:</strong>&nbsp;<code>torch.cuda.memory._record_memory_history(enabled=None)</code></li></ul><p>Visualization will draw something like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/fig1.png" class="kg-image" alt="" loading="lazy" width="1185" height="656" srcset="https://alexdremov.me/content/images/size/w600/2024/05/fig1.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/fig1.png 1000w, https://alexdremov.me/content/images/2024/05/fig1.png 1185w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">source: </span><a href="https://pytorch.org/blog/understanding-gpu-memory-1/?ref=alexdremov.me"><span style="white-space: pre-wrap;">https://pytorch.org/blog/understanding-gpu-memory-1/</span></a></figcaption></figure><p>The x-axis represents time, the y-axis represents total used memory, and colourful blocks represent tensors. So, it shows when tensors were allocated and when it was released.</p><p>You may notice narrow spikes — those are short-lasting tensors that take up a lot of space. By clicking on a tensor, you can get information on where this tensor was allocated. We want to minimize those spikes as they limit efficient memory usage. Check out what caused this spike and consider other ways of computing what you intended.</p><p>Apart from spikes, it's easy to detect memory leaks:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/fig3.png" class="kg-image" alt="" loading="lazy" width="1185" height="729" srcset="https://alexdremov.me/content/images/size/w600/2024/05/fig3.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/fig3.png 1000w, https://alexdremov.me/content/images/2024/05/fig3.png 1185w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">source: </span><a href="https://pytorch.org/blog/understanding-gpu-memory-1/?ref=alexdremov.me"><span style="white-space: pre-wrap;">https://pytorch.org/blog/understanding-gpu-memory-1/</span></a></figcaption></figure><p>As you see, some data after the first forward is not cleared. By clicking on blocks you can get the idea where these tensors come from. In the image is the case when gradients are not cleared after the training step, so they lay dead during the forward pass, limiting the ability to increase the batch size to fit more data.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/blog/understanding-gpu-memory-1/?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Understanding GPU Memory 1: Visualizing All Allocations over Time</div><div class="kg-bookmark-description">During your time with PyTorch on GPUs, you may be familiar with this common error message:</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""><span class="kg-bookmark-author">PyTorch</span><span class="kg-bookmark-publisher">Aaron Shi, Zachary DeVito</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/assets/images/social-share.jpg" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="speed-up-the-model-and-use-less-memory">Speed up the model and use less memory</h2><p>What can be better than this? We can achieve so by using the <strong>FlashAttention</strong> kernel for calculating dot-product attention. </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/Dao-AILab/flash-attention?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention</div><div class="kg-bookmark-description">Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">Dao-AILab</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/e1e6d6e0ffe03e775ffc9262d8022c4f844acd1c07d84105567bdd4412666a79/Dao-AILab/flash-attention" alt="" onerror="this.style.display = 'none'"></div></a></figure><p>If you haven't heard about it, it is a way of calculating precise dot product attention without constructing the attention matrix explicitly. That optimizes GPU's io operations which improves speed and also <strong>dramatically</strong> minimizes memory consumption. There's simply no reason not to use it.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Unfortunately, there's one reason not to use it — hardware.<br><br>Flash attention only works with <code spellcheck="false" style="white-space: pre-wrap;">fp16</code> and <code spellcheck="false" style="white-space: pre-wrap;">bf16</code> precision on compatible hardware. That is NVIDIA Ampere, Hooper, etc</div></div><p>Other libraries use flash attention under the hood, so you may consider using other variants that better fit your codebase.</p><ol><li><strong>XFormers</strong></li></ol><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/facebookresearch/xformers?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.</div><div class="kg-bookmark-description">Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">facebookresearch</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://repository-images.githubusercontent.com/416849738/22b08af9-fe74-4946-acda-52e73c72d99e" alt="" onerror="this.style.display = 'none'"></div></a></figure><ol start="2"><li><strong>Transformer Engine</strong></li></ol><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/NVIDIA/TransformerEngine?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - NVIDIA/TransformerEngine: A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.</div><div class="kg-bookmark-description">A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">NVIDIA</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/89a18f7d463b8ff56291d77cbd2f22f9cffd623496d6a0f1fb32e60bd4549ea1/NVIDIA/TransformerEngine" alt="" onerror="this.style.display = 'none'"></div></a></figure><ol start="3"><li><strong>PyTorch itself!</strong></li></ol><p>That is true, new versions of PyTorch may use flash attention when applicable. To activate this mode, you need to execute attention blocks in the context manager that specify which attention strategy to use: </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html?ref=alexdremov.me#torch-nn-functional-scaled-dot-product-attention"><div class="kg-bookmark-content"><div class="kg-bookmark-title">torch.nn.functional.scaled_dot_product_attention — PyTorch 2.3 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/docs/stable/_static/images/view-page-source-icon.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="optimize-multi-gpu-data-redundancy-%E2%80%94-fsdp">Optimize multi-GPU data redundancy — FSDP</h2><p>If you use multiple GPUs to run your training, the basic solution is to use the <code>DistributedDataParallel</code> class. This way, several identical processes are spawned, and gradients are aggregated during the backward step.</p><p>However, that is sub-optimal!</p><p>The problem is as we spawned identical processes, then we have identical models and optimiser states on each GPU, which is redundant. The solution is to shard data across. We can do so using the Fully Sharded Data Parallel PyTorch wrapper. </p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2024/05/fsdp_workflow.png" class="kg-image" alt="" loading="lazy" width="2000" height="903" srcset="https://alexdremov.me/content/images/size/w600/2024/05/fsdp_workflow.png 600w, https://alexdremov.me/content/images/size/w1000/2024/05/fsdp_workflow.png 1000w, https://alexdremov.me/content/images/size/w1600/2024/05/fsdp_workflow.png 1600w, https://alexdremov.me/content/images/2024/05/fsdp_workflow.png 2000w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">source: </span><a href="https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?ref=alexdremov.me"><span style="white-space: pre-wrap;">https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html</span></a></figcaption></figure><p>How does it work?</p><p>As I said, when training on several GPUs, each process has exact copies of the same data when training with DDP. We can optimize it, by implementing several enhancements:</p><h3 id="shard-optimizer-state-zero-1"><strong>Shard optimizer state (ZeRO 1)</strong></h3><p>When training with DDP, each process holds a complete copy of the optimizer states. With ZeRO1, we shard these optimizer states across all ranks such that each rank holds only a portion of the optimizer states. During the backward pass, each rank only needs to gather the optimizer states relevant to its parameters to make an optimization step. This reduction in redundancy helps conserve memory.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">&nbsp;In case of the Adam, which holds parameters at roughly twice the model size, sharding the optimizer state among 8 ranks means each rank <b><strong style="white-space: pre-wrap;">stores only one quarter (2/8) of the total state size.</strong></b></div></div><h3 id="shard-gradients-zero-2"><strong>Shard gradients (ZeRO 2)</strong></h3><p>We shard optimizer states. Now, we will modify the optimizer step to shard gradients too. If one rank has optimizer states for a portion of  parameters, then we will:</p><ul><ul><li>aggregate all gradients relevant to the states the rank holds</li><li>calculate optimization step</li><li>send optimization step for a portion of parameters to all other ranks</li></ul></ul><p>As you noticed, now each rank does not need to hold a full replica of gradients. We can send gradients to a relevant rank as soon as they are available. So, we can reduce peak memory consumption even further.</p><h3 id="shard-model-parameters-zero-3"><strong>Shard model parameters (ZeRO 3)</strong></h3><p>This is about to be epic.</p><p>Why do we need to store a full copy of the model on each rank? Let's shard model parameters between all ranks. Then, we're going to fetch the required parameters just in time during forward and backward.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">In case of large models, these optimisations can drammaticaly decrease memory consumption</div></div><h2 id="how-to-use-fsdp">How to use FSDP?</h2><p>Quite simple actually. All we need is to wrap the model with FSDP:</p><pre><code class="language-python">import torch
import torch.nn as nn
import torch.optim as optim
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP


model = FSDP(model)

# it's critical to get parameters from the wrapped model
# as only a portion of them returned (sharded part)
optimizer = optim.Adam(model.parameters())

# consuct training as usual
train(model, optimizer)</code></pre><p>You can also specify the sharding strategy of FSDP. For example, we can select the <code>SHARD_GRAD_OP</code> strategy to achieve behaviour similar to that of ZeRO2. You can learn about other strategies here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/docs/stable/fsdp.html?ref=alexdremov.me#torch.distributed.fsdp.ShardingStrategy"><div class="kg-bookmark-content"><div class="kg-bookmark-title">FullyShardedDataParallel — PyTorch 2.3 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/docs/stable/_static/images/view-page-source-icon.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><p>Also, you can wrap with FSDP submodules. In the example above, only one FSDP module is used, which will reduce computation efficiency and memory efficiency. The way it works is that, suppose your model contains 100 Linear layers. If you do FSDP(model), there will only be one FSDP unit which wraps the entire model. In that case, the allgather would collect the full parameters for all 100 linear layers, and hence won’t save CUDA memory for parameter sharding.</p><p>You can wrap submodules explicitly or define an auto-wrap policy. To learn more about FSDP, read the PyTorch guide:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.3.0+cu121 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/tutorials/_images/pencil-16.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="magic-speedup-with-torchcompile">Magic speedup with <code>torch.compile</code> </h2><p>That is, torch compile can speed up your code by several percent by just enabling it.</p><p>Torch traces your execution graph and tries to compile it into an efficient format so that the model can be executed almost without Python invocation. </p><p>Basic usage is to wrap the model with compile:</p><pre><code class="language-python">import torch

model = torch.compile(model)</code></pre><p>This will execute almost instantly. The actual tracing will happen only during the first forward.</p><p>It also has a lot of options that are worth to try:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/docs/stable/generated/torch.compile.html?ref=alexdremov.me#torch.compile"><div class="kg-bookmark-content"><div class="kg-bookmark-title">torch.compile — PyTorch 2.3 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/docs/stable/_static/images/view-page-source-icon.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Torch compiler is a big feature that will be covered in the next posts! <br>Stay tuned</div></div><p>Learn more about torch compile here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Introduction to torch.compile — PyTorch Tutorials 2.3.0+cu121 documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://pytorch.org/favicon.ico" alt=""></div></div><div class="kg-bookmark-thumbnail"><img src="https://pytorch.org/tutorials/_static/images/view-page-source-icon.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="conclusion">Conclusion</h2><p>This post is in no way complete with explanations. Rather, that is a list of speed-ups that are worth trying straight away. Hope that it was helpful. Feel free to leave a comment!</p><p>Consider subscribing </p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Swift Actors — Common Problems and Tips ]]></title>
                    <description><![CDATA[ Swift actors are a powerful tool to address data races and make your code thread-safe. However, it is also quite a sophisticated concept that requires deep understanding to write efficient and bug-free code. ]]></description>
                    <link>https://alexdremov.me/swift-actors-common-problems-and-tips/</link>
                    <guid isPermaLink="false">646be95d93528957ee8526df</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Tue, 13 Jun 2023 14:32:57 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2023/06/photo-1686153490072-cc31c6bf3686-copy.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Swift actors are a powerful tool to address data races and make your code thread-safe. However, it is also quite a sophisticated concept that requires deep understanding.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Check out my introduction to Swift Actors or quick guide to Swift async/await</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/conquer-data-races-with-swift-actors/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Conquer Data Races with Swift Actors | Alex Dremov</div><div class="kg-bookmark-description">Unleash the power of Swift concurrency with Actors! Get all the information you need in this comprehensive article</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png?v=012b35a5f7" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1532800783378-1bed60adaf58?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGFjdG9yfGVufDB8fHx8MTY3NTUxNTM3OQ&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=2000" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/quick-guide-to-async-await-in-swift/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Quick Guide to Async Await in Swift | Alex Dremov</div><div class="kg-bookmark-description">Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png?v=012b35a5f7" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/2022/04/slide_17.jpg" alt=""></div></a></figure><h2 id="reentrancy-invalid-state-expectations">Reentrancy: Invalid State Expectations</h2><p>One of the core actor's features is reentrancy. By allowing calls to the actor's isolated methods while another method awaits for something, actors reduce the time your code spends on waiting for actor availability.</p><p>Though, it requires additional considerations about the actor's state. Classic example:</p><pre><code class="language-swift">actor Door {
    private var isOpen = false
    
    func open() async {
        isOpen = true
        
        await notifyDoorOpened() // Suspension point
        
        // Mistake! Door could have been closed
        // while notifyDoorOpened was executing
        print("Door is open: \(isOpen)")
    }
    
    func close() {
        isOpen = false
    }
    
    func notifyDoorOpened() async {
        try! await Task.sleep(for: .seconds(1))
    }
}

let door = Door()
Task {
    await door.open()
}
Task {
    await door.close()
}</code></pre><pre><code>Door is open: false</code></pre><p>So, the first tip is to drop any expectations about the actor's state after an asynchronous call inside it. Explicitly check for conditions you believe to be true.</p><h2 id="reentrancy-double-computations">Reentrancy: Double Computations</h2><p>An even more common case is when execution enters <strong>the same method</strong> with the same arguments several times.</p><p>For example, let's suppose that actor performs heavy data loading inside one of its methods. But we don't want heavy data to be loaded each call, so we implement simple caching:</p><pre><code class="language-swift">import Foundation

actor ActivitiesStorage {
    var cache = [UUID: Data?]()
    
    func retrieveHeavyData(for id: UUID) async -&gt; Data? {
        if let data = cache[id] {
            return data
        }
        
        // ...
        
        let data = await requestDataFromDatabase(for: id) // suspension
        cache[id] = data
        
        return data
    }
    
    private func requestDataFromDatabase(for id: UUID) async -&gt; Data? {
        print("Performing heavy data loading!")
        try! await Task.sleep(for: .seconds(1))
        // ...
        return nil
    }
    
}

let id = UUID()
let storage = ActivitiesStorage()

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

Task {
    let data = await storage.retrieveHeavyData(for: id)
}
</code></pre><p>But our caching is useless as data is loaded twice anyways. <strong>We deal with data race</strong>:</p><pre><code>Performing heavy data loading!
Performing heavy data loading!</code></pre><p>At this point, you already see that this is due to the actor's reentrancy. The cache is not set until data is loaded, allowing the following heavy loadings.</p><p>Let's use mutexes! (no, please don't)</p><p>To fix this problem we can explicitly "subscribe" to <strong>single</strong> heavy data loading and return it when it is available:</p><pre><code class="language-swift">import Foundation

actor ActivitiesStorage {
    var cache = [UUID: Task&lt;Data?, Never&gt;]()
    
    func retrieveHeavyData(for id: UUID) async -&gt; Data? {
        if let task = cache[id] {
            return await task.value
        }
        
        // ...
        
        let task = Task {
            await requestDataFromDatabase(for: id)
        }
        
        // Notice that it is set before `await`
        // So, the following calls will have this task available
        cache[id] = task
        return await task.value // suspension
    }
    
    private func requestDataFromDatabase(for id: UUID) async -&gt; Data? {
        print("Performing heavy data loading!")
        try! await Task.sleep(for: .seconds(1))
        // ...
        return nil
    }
    
}

let id = UUID()
let storage = ActivitiesStorage()

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

Task {
    let data = await storage.retrieveHeavyData(for: id)
}</code></pre><p>As you see, we use a task to delay await inside an actor, allowing us to set the cache before the suspension. Now, only one call to heavy data is performed.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Using tasks inside actors to delay await is a powerful feature!</div></div><h2 id="mainactor-overuse">@MainActor Overuse</h2><p>Marking your methods or classes with <code>@MainActor</code> results in the code inside them running on the main thread. It is useful for UI-related code as UI updates must happen on the main thread.</p><p>However, overusing <code>@MainActor</code> slows down your concurrent code a lot as it will be running only in one thread, freezing your UI frequently. </p><p>To not fall into this trap, do not use <code>@MainActor</code> for the whole class:</p><pre><code class="language-swift">@MainActor
class OnboardingViewModel: ViewModel {
	// ...
}</code></pre><p>Such use restricts all methods to the main thread, which may be overlooked when adding new methods or functionality. </p><p>Use it for specific methods only.</p><p>And decompose your methods so that <code>@MainActor</code> methods have as little code as possible, resulting in a low chance of main thread block.</p><pre><code class="language-swift">class OnboardingViewModel {
    func performLogIn() async {
        // loading, processing and stuff
        // can be executed on any thread
        
        await updateLogInInformation()
    }
    
    @MainActor func updateLogInInformation() {
        // fast ui updates only
    }
}
</code></pre>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="use-sendable-do-not-keep-this-information-in-mind">Use Sendable. Do Not Keep This Information In Mind</h2><p>The Sendable protocol is a feature added in Swift 5.5 that is used to mark code as safe to be passed across concurrency domains by copying. This means that it is safe to execute Sendable code concurrently.</p><p>Before that, <strong>you had to keep in mind which classes and closures are thread-safe and which are not</strong>. Now, you can explicitly state this by conforming to the Sendable protocol</p><pre><code class="language-swift">final class FoodData: Sendable {
    // ...
    
    func addFood(foodFactory: @Sendable () -&gt; Food) {
        // ...
    }
}</code></pre><p>In the code above, we say that <code>FoodData</code> methods are safe to be called without synchronization. Also, <code>foodFactory</code> closure is marked with <code>@Sendable</code> which also means that it can be safely called from different concurrent contexts.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Moreover, if you use <code spellcheck="false" style="white-space: pre-wrap;">Sendable</code>, Swift automatically checks that your code is actually thread-safe. That's cool as you cannot introduce unsafe code by accident as your code will not compile.</div></div><p>You can take one step further and set <code>SWIFT_STRICT_CONCURRENCY</code> build setting to <code>complete</code>. In this mode, the swift compiler will not tolerate any thread-unsafe code it detects.</p><h2 id="do-not-ignore-nonisolated-keyword">Do Not Ignore Nonisolated Keyword</h2><p>Nonisolated methods do not mutate or access the actor's isolated state, therefore they do not require the actor's isolated execution. Use them to decompose actors' isolated methods into smaller methods. Actors' code must be readable too</p><h2 id="continue-reading-about-swift-ios">Continue Reading About Swift &amp; iOS</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/tag/ios/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Alex Dremov | iOS</div><div class="kg-bookmark-description">One of my favourites. Here I write about Swift and iOS development. It is noticeable that I mainly focus on iOS development right now.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png?v=012b35a5f7" alt=""><span class="kg-bookmark-author">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1558126372-76b529458592?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDExfHxpb3N8ZW58MHx8fHwxNjQ5NTA0MTQ5&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt=""></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ I Contributed to PyTorch. Here&#x27;s What I Learned ]]></title>
                    <description><![CDATA[ When you see something that does not work in an omnipresent framework, you believe it can&#39;t be completely broken, right? ]]></description>
                    <link>https://alexdremov.me/i-contributed-to-pytorch-heres-what-i-learned/</link>
                    <guid isPermaLink="false">63f734ae0ad7a70f37f6e6f4</guid>
                    <category><![CDATA[ Code ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Mon, 20 Mar 2023 17:23:35 +0100</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2023/03/maxresdefault.jpg" medium="image"/>
                    <content:encoded><![CDATA[ <h2 id="the-issue-must-not-be-that-bad">The Issue Must Not Be That Bad</h2><p>That's what I thought when I encountered a PyTorch problem during one of my college assignments. Jupyter kernel was dying because of some bug in the LSTM implementation for MPS.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">MPS (Metal Performance Shaders) is an acceleration backend for MacOS that utilizes GPU for computations</div></div><p>After a quick investigation, I discovered that this happens because of the <code>batch_first</code> flag. MPS's backend did not work correctly with it and crushed the entire kernel.</p><blockquote>"Easy fix"<br>P.S. After that phrase, Alex spend the next two days fixing what looked like an "easy fix"</blockquote><p>PR was merged pretty quickly. Thanks, PyTorch team, for that! And the story could've ended here, but I discovered a funny detail in MPS tests.</p><pre><code class="language-python">@unittest.skipIf(True, "Backward of lstm returns wrong result")
def test_lstm_2(self, device="mps", dtype=torch.float32)</code></pre><p>And LSTM was really bad. It got a whole lot worse score than when trained on CUDA or CPU.</p><h2 id="it-was-bad-really-bad">It Was Bad. Really Bad</h2><p>It turned out that LSTM on MPS was <strong>completely</strong> broken. The forward pass had a bug with the <code>batch_first</code> flag and hidden cell initialization.</p><p>Backward pass used first layers weights for the last layers, mixing up all gradients. It did not calculate gradients for hidden states. And my favourite: the backward function returned initialized with garbage tensors, screwing up all subsequent training. It was a mess that I kept investigating for several days.</p><p>Eventually, I fixed LSTM and its tests in a massive PR, ensuring that it is consistent with the CPU.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2023/03/Screenshot-2023-03-20-at-18.59.25.png" class="kg-image" alt loading="lazy" width="1626" height="692" srcset="https://alexdremov.me/content/images/size/w600/2023/03/Screenshot-2023-03-20-at-18.59.25.png 600w, https://alexdremov.me/content/images/size/w1000/2023/03/Screenshot-2023-03-20-at-18.59.25.png 1000w, https://alexdremov.me/content/images/size/w1600/2023/03/Screenshot-2023-03-20-at-18.59.25.png 1600w, https://alexdremov.me/content/images/2023/03/Screenshot-2023-03-20-at-18.59.25.png 1626w" sizes="(min-width: 720px) 720px"></figure><h2 id="what-i-learned">What I Learned</h2><ul><li><em>Big projects also have garbage code.</em> Broken implementation lived in stable releases for <strong>almost a year, </strong>generating several related GitHub issues.</li><li><em>Contributing to a big project is fun and challenging.</em> And it eventually helps a lot of developers, which keeps me warm during cold winter nights. Specifically, contributing to PyTorch is extremely simple. Thanks, PyTorch team, for arranging that!</li><li><em>Deploying untested code that looks right is extremely dangerous.</em> I listed pretty severe mistakes that I found scrutinizing LSTM sources for several days. There's no way they could have been discovered without extensive testing. Even though the issues were severe, they were also subtle. The code looked right.</li></ul><h2 id="finally">Finally</h2><p>I was able to complete the college PyTorch assignment even though it required rewriting PyTorch's LSTM MPS implementation. Consider also solving open issues of your favourite framework or project. At the end of the day, it is a lot more fun than Leetcode problems.</p><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h2 id="see-my-work">See My Work</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/pytorch/pytorch/pull/95137?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">[MPS] Fix LSTM backward and forward pass by AlexRoar · Pull Request #95137 · pytorch/pytorch</div><div class="kg-bookmark-description">Fixes #91694Fixes #92615Several transpositions were missing for backward graph in case of batch_first&#x3D;True. The #91694 is not reproduced with batch_first&#x3D;False.After fixing transpose issue, I fi...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">pytorch</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/eb01c9acbbb7a453257ef8f56fef240c67c593b66d377dab4b5a74a57868f4c6/pytorch/pytorch/pull/95137" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/pytorch/pytorch/pull/95563?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">[MPS] Fix bidirectional LSTM &amp; small one-direction LSTM fix by AlexRoar · Pull Request #95563 · pytorch/pytorch</div><div class="kg-bookmark-description">Fixes #94754With this PR I hope to finish my breathtaking journey of fixing MPS LSTM.Here, I enable bidirectional on MPS. Also, I’ve noticed that cache key did not account for all parameters, so ...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">pytorch</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/36976c41bcb749892d6c73f144887a50c1fc93d5eace798a4202392f5862eeee/pytorch/pytorch/pull/95563" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/pytorch/pytorch/pull/96601?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">[MPS] LSTM grad_y missing fix by AlexRoar · Pull Request #96601 · pytorch/pytorch</div><div class="kg-bookmark-description">Fixes #96416Added tests that do not use LSTM output simalarly to the issueSeems like this fix once again introduces backward incompatibility.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">pytorch</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/fdca786439d2b99be79b4955fe26f15414c11266c4ba3a9f0f30d96dc7630dfa/pytorch/pytorch/pull/96601" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/pytorch/pytorch/pull/95091?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">[MPS] LogSoftmax numerical stability by AlexRoar · Pull Request #95091 · pytorch/pytorch</div><div class="kg-bookmark-description">Fixes #94043Calculations are now consistent with numericaly stable formula and CPU:$LogSoftmax(X, \dim) &#x3D; X - \max(X, \dim) - \log(sum(X - \max(X, \dim), \dim))$@malfet</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">pytorch</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/ef21590314dfd0a1ec52bfdf6f74cf293d08fcd4206bd2433d5faf9a4e42278a/pytorch/pytorch/pull/95091" alt=""></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Conquer Data Races with Swift Actors ]]></title>
                    <description><![CDATA[ Unleash the power of Swift concurrency with Actors! Get all the information you need in this comprehensive article ]]></description>
                    <link>https://alexdremov.me/conquer-data-races-with-swift-actors/</link>
                    <guid isPermaLink="false">6382224c6bdea0516ceea940</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Tue, 07 Feb 2023 20:08:18 +0100</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1532800783378-1bed60adaf58?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGFjdG9yfGVufDB8fHx8MTY3NTUxNTM3OQ&amp;ixlib&#x3D;rb-4.0.3&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>Mobile development is close to impossible without concurrent code. While executing tasks concurrently generally speeds up your app, it also introduces a lot of challenges to overcome. And one of them is a data race.</p><h2 id="data-races-and-when-they-happen">Data Races And When They Happen</h2><p>Try to find a problem in the code below</p><pre><code class="language-swift">import Foundation

var counter = 0
let queue = DispatchQueue.global()

for _ in 1...100500 {
    queue.async {
        counter += 1
    }
}

queue.sync(flags: .barrier) {
    // Synchronous barrier to wait untill all
    // async tasks are finished
    print("Final value: \(counter)")
}
</code></pre><p>This does not output <code>100500</code> as desired</p><p><code>Final value: 100490</code></p><p>Let me run the same code one more time.</p><p> <code>Final value: 100486</code></p><p>Voilà</p><p>As you see, the same code produces different results. In this case, we deal with a <strong>data race.</strong></p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Data races occur when multiple threads access a shared resource without protections, leading to undefined behaviour</div></div><p>In the code above, asynchronous tasks capture <code>counter</code> and modify it simultaneously. This leads to undefined behaviour.</p><div class="kg-card kg-toggle-card" data-kg-toggle-state="close"><div class="kg-toggle-heading"><h4 class="kg-toggle-heading-text">What's under the hood?</h4><button class="kg-toggle-card-icon"><svg id="Regular" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path class="cls-1" d="M23.25,7.311,12.53,18.03a.749.749,0,0,1-1.06,0L.75,7.311"/></svg></button></div><div class="kg-toggle-content"><p>The reasoning behind such behaviour is in assembly operations. Before incrementing the value, it is loaded from RAM into the processor's register. At the same time, other threads can increment the value and save it back to RAM. But the thread that saved value from memory to register will not know about it and will continue to work with the old value, eventually overwriting the updated value in RAM</p></div></div><h2 id="non-actor-solutions">Non-Actor Solutions</h2><p>Before the introduction of actors, several solutions to the problem were used.</p><h3 id="serial-queue">Serial Queue</h3><p>We can create a dedicated queue that will be used during all accesses to the counter. Internally, tasks execute serially, so no data races occur.</p><pre><code class="language-swift">import Foundation

var counter = 0
let queue = DispatchQueue.global()

// Serial queue
let counterAccessQueue = DispatchQueue(label: "CounterAccessQueue")

for _ in 1...100500 {
	queue.async {
		counterAccessQueue.sync { counter += 1 }
	}
}

queue.sync(flags: .barrier) {
	counterAccessQueue.sync { print("Final value: \(counter)") }
}</code></pre><h3 id="concurrent-queue-with-barrier">Concurrent Queue With Barrier</h3><p>It's possible to use sync with barrier parameter to modify value even in concurrent queue. Basically, the barrier waits until all previous tasks are completed, then it executes code synchronously, and after that queue continues to operate as usual.</p><p>In the current example, it basically transforms concurrent queue to serial, but still, it's a different approach.</p><pre><code class="language-swift">import Foundation

var counter = 0
let queue = DispatchQueue.global()

for _ in 1...100500 {
	queue.sync(flags: .barrier) {
		counter += 1
	}
}

queue.sync {
	print("Final value: \(counter)")
}</code></pre><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h2 id="actors-model">Actors Model</h2><p>The actor model is an architecturally different approach. Consider actors as classes with additional restrictions. Ideologically, code inside actors <strong>cannot be executed concurrently</strong>, therefore actors can safely modify their state.</p><blockquote>In the world of chaos (concurrent) consider actors as a safe space</blockquote><p>Also, other instances cannot modify the actor's state from the outside. Thus, ensuring the safety of accesses.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">All in all, actors let you safely share information between concurrent contexts</div></div><h2 id="using-actors-in-swift">Using Actors in Swift</h2><p>Luckily, we do not need to implement the actor model ourselves. Starting from <strong>Swift 5.7</strong>, actors are available as part of Swift concurrency.</p><p>Actors are defined with <code>actor</code> keyword.</p><pre><code class="language-swift">actor Counter {
	private(set) var counter = 0
    
	func increment() {
		counter += 1
	}
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Like classes, actors are<strong> reference types</strong></div></div><p>Generally, all access to actors may be suspended and require <code>await</code> keyword.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">If you're unfamiliar with Swift concurrency, check out my quick guide!</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/quick-guide-to-async-await-in-swift/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Quick Guide to Async Await in Swift | Alex Dremov</div><div class="kg-bookmark-description">Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png?v&#x3D;eef9b14b42" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/2022/04/slide_17.jpg" alt=""></div></a></figure><p>Now, according to the defined model, an actor represents an isolated state. Therefore, we cannot directly execute code inside the actor or change its state because some other task can already be changing the actor's state.</p><p>We want to mitigate data races!</p><pre><code class="language-swift">let counter = Counter()
let queue = DispatchQueue.global()

// Used only to wait for all tasks to complete
let group = DispatchGroup()

for _ in 1...100500 {
    group.enter()
    
    queue.async {
    	// async calls can be executed only in
        // appropriate concurrent environment, so
        // we spawn a new task
        Task.detached {
            await counter.increment()
            group.leave()
        }
    }
}

group.wait()
Task {
    print("Final value: \(await counter.counter)")
}
</code></pre><p>As you see, all calls to methods of <code>Counter</code> and even to its properties are asynchronous and marked with <code>await</code> keyword.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Notice that <code>await</code> is not needed inside the actor's method. That's because the actor's methods are already inside an isolated state</div></div><h3 id="nonisolated-members">Nonisolated Members</h3><p>All members of actors are by default isolated. Actors also can have non-isolated members. Access to them is the same as if actor was a regular class. Notice, though, that nonisolated methods cannot directly access isolated members.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Stored non-constant properties cannot be <code>nonisolated</code></div></div><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Constant properties ( <code>let</code> ) are <code>nonisolated</code> by default, as they cannot provoke a data race</div></div><pre><code class="language-swift">actor Counter {
    let id = UUID()
    private(set) var counter: Int = 0
    
    private nonisolated var description: String {
        "Counter"
    }
    
    func increment() {
        counter += 1
    }
    
    nonisolated func getDescription() -&gt; String {
        return description
    }
}

...

print(counter.getDescription()) // no await
print(counter.id) // no await</code></pre><h2 id="difference-to-locks">Difference to Locks</h2><p>One may ask</p><blockquote>How's it different from taking a lock before executing code inside an actor and releasing a lock on an exit?</blockquote><p>The difference is noticeable if actor itself runs asynchronous operations inside it. For example, if it messages another actor.</p><p>Take a look</p><pre><code class="language-swift">actor Ping {
    let pong = Pong()
    
    func run() async {
        print("ping!")
        await pong.run() // Suspension point
        
        // While pong.run() is waited, other tasks
        // can enter this actor
    }
}

actor Pong {
    func run() async {
        try! await Task.sleep(for: .seconds(1)) // sleeping a bit
        print("pong!")
    }
}

let ping = Ping()
Task {
    await ping.run()
}

Task{
    await ping.run()
}
</code></pre><p>This code outputs</p><pre><code>ping!
ping!
pong!
pong!</code></pre><p>Notice that another actor is also called using <code>await</code> keyword. I marked this place as a suspension point. The current task is suspended while waiting for an asynchronous task, <strong>so the actor is free for entrance again.</strong></p><p>That's the core difference to a simple mutex or lock, and it is called <strong>Actor Reentrancy</strong>. Some consider this a problem. However, it is an awesome optimization at expense of complicating code a bit.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Mind about actor reentrancy! It is incorrect to make assumptions about an actor's state after an <code>await</code> call inside an actor</div></div><pre><code class="language-swift">actor Door {
    private var _open = false
    
    func open() async {
        _open = true
        
        await someTask() // Suspension point
        
        // Mistake! Door could have been closed
        // while someTask was executing
        print("Door is open")
    }
    
    func close() {
        _open = false
    }
}</code></pre><p>Luckily, suspension points are all marked with <code>await</code> keyword, so it is easy to keep track of them</p><h2 id="final-notes">Final Notes</h2><p>Actors are a great solution to data races. They nicely integrate into Swift concurrency. Keep in mind, though, that actor reentrancy must be taken into account to avoid incorrect state assumptions.</p><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://developer.apple.com/documentation/swift/actor?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Apple Developer Documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://developer.apple.com/apple-logo.svg" alt=""></div></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://docs.swift.org/swift-book/LanguageGuide/Concurrency.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Concurrency — The Swift Programming Language (Swift 5.7)</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://docs.swift.org/apple-touch-icon-180x180.png" alt=""><span class="kg-bookmark-author">Swift.org</span><span class="kg-bookmark-publisher">Apple Inc.</span></div></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Dive into Swift&#x27;s Memory Management ]]></title>
                    <description><![CDATA[ Swift uses ARC to track and deallocate unused objects. Learn about the three types of reference counts and how ARC works — in this detailed post. ]]></description>
                    <link>https://alexdremov.me/dive-into-swifts-memory-management/</link>
                    <guid isPermaLink="false">63b99132582cff68d55eb9d7</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sun, 08 Jan 2023 20:33:03 +0100</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2023/01/imageunwstr.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>In this post, I'll explore how Swift's memory management works under the hood, and how the memory modifiers: <code>unowned</code> and <code>weak</code>, affect an object's lifetime. You'll get a deeper understanding of how Swift manages objects' lifetime internally.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Swift memory management is one of the basic interview questions. It was asked <b><strong style="white-space: pre-wrap;">in every</strong></b> iOS developer interview I've ever been to</div></div><h2 id="memory-management">Memory Management</h2><p>For example, in C, only the developer is in charge of deallocating unused objects. This can lead to memory leaks, double deallocations, or the use of invalid memory areas.</p><p>We don't want this.</p><p>Swift uses automatic reference counting (ARC) under the hood to deduce objects' lifetime and automatically deallocate unused objects. Swift has <strong>three different types of the reference count. </strong>They count how many other instances use an object. And when it is not needed, it is deallocated.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">This guide will progress from a general overview to the internals of ARC. Even if you're familiar with Swift's memory management, there's a high chance that you will learn something new</div></div><h2 id="strong-reference">Strong Reference</h2><p>The counter that is responsible for deallocation is a <strong>strong reference counter (RC). </strong>The strong RC counts strong references to the object. When the strong RC reaches zero the object is deinited.</p><p>A strong reference is just a regular object usage. Creating a variable, or a constant, or saving a reference to an object in another object's property — they all create a strong reference.</p><p>Why a developer should even care about reference counting? Seems like a low-level implementation detail that is not important. <strong>But actually, it's crucial.</strong></p><p>Take a look at this example</p><pre><code class="language-swift">class Person {
    let name: String
    init(name: String) { self.name = name }
    var apartment: Apartment?
    deinit { print("\(name) is being deinitialized") }
}

class Apartment {
    let unit: String
    init(unit: String) { self.unit = unit }
    var tenant: Person?
    deinit { print("Apartment \(unit) is being deinitialized") }
}

var john: Person? = Person(name: "John Appleseed")
var unit4A: Apartment? = Apartment(unit: "4A")

john!.apartment = unit4A // Person -&gt; Apartment: strong reference
unit4A!.tenant = john // Apartment -&gt; Person: strong reference

john = nil // Person is no longer needed
unit4A = nil // Apartment is no longer needed</code></pre><p>In the above example, the <code>Person</code> and <code>Apartment</code> objects have a strong reference to each other, creating a <strong>retain cycle</strong>. As a result, when you set both <code>john</code> and <code>unit4A</code> to <code>nil</code>, the <strong>deinitializers are not called and the objects are not deallocated.</strong></p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2023/01/strongRef.png" class="kg-image" alt="Retain cycle image" loading="lazy" width="1280" height="800" srcset="https://alexdremov.me/content/images/size/w600/2023/01/strongRef.png 600w, https://alexdremov.me/content/images/size/w1000/2023/01/strongRef.png 1000w, https://alexdremov.me/content/images/2023/01/strongRef.png 1280w" sizes="(min-width: 720px) 720px"></figure><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">This situation is called a <b><strong style="white-space: pre-wrap;">memory leak</strong></b>. In Swift, it occurs only in the case of a <b><strong style="white-space: pre-wrap;">retain cycle.</strong></b> Two objects depend on each other and they will never be deallocated.</div></div><p>That's where memory management modifiers come in handy.</p><h2 id="weak-reference">Weak Reference</h2><p>One of the solutions to the problem of a retain cycle is a <strong>weak reference.</strong> It is created using the <code>weak</code> modifier like that:</p><pre><code class="language-swift">let person = Person(name: "John Appleseed") // person is a strong reference
weak var weakPerson = person // weak reference to the same object</code></pre><p>Weak var <strong>always has an optional type</strong> and cannot be constant (<code>let</code>). That's because the object can be deallocated while it is still referenced by a weak variable. In this case, the variable is automatically set to <code>nil</code>.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Consider <code spellcheck="false" style="white-space: pre-wrap;">weak</code> reference like the one that needs an object but can go on correctly without it (using <code spellcheck="false" style="white-space: pre-wrap;">nil</code>), allowing it to deallocate when nobody else needs it</div></div><p>Let's take a look at the solution to the problem above using the <code>weak</code> modifier:</p><pre><code class="language-swift">class Person {
    let name: String
    init(name: String) { self.name = name }
    var apartment: Apartment?
    deinit { print("\(name) is being deinitialized") }
}

class Apartment {
    let unit: String
    init(unit: String) { self.unit = unit }
    
    weak var tenant: Person?
    
    deinit { print("Apartment \(unit) is being deinitialized") }
}

var john: Person? = Person(name: "John Appleseed")
var unit4A: Apartment? = Apartment(unit: "4A")

john!.apartment = unit4A // Person -&gt; Apartment: strong reference
unit4A!.tenant = john // Apartment -&gt; Person: weak reference

john = nil
unit4A = nil</code></pre><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2023/01/weakRef.png" class="kg-image" alt="Strong and weak reference image" loading="lazy" width="1280" height="800" srcset="https://alexdremov.me/content/images/size/w600/2023/01/weakRef.png 600w, https://alexdremov.me/content/images/size/w1000/2023/01/weakRef.png 1000w, https://alexdremov.me/content/images/2023/01/weakRef.png 1280w" sizes="(min-width: 720px) 720px"></figure><p>Now, retain cycle is no longer here. At first, the <code>Person</code> object is deallocated because it has no strong references to it. Then, the <code>Apartment</code> object is deallocated.</p><p>No memory leak!</p><p>That's it. That is how you break retention cycles in Swift. There is one more modifier that can help you with that.</p><h2 id="unowned-reference">Unowned Reference</h2><p>An <code>unowned</code> reference is very similar to a <code>weak</code> reference cause it also does not increase a strong reference count. The difference is that it's up to the developer to not use an invalid object.</p><p>Unowned variables <strong>can be constant or non-optional. </strong>When an object is deallocated, <strong>ARC does not set the unowned reference’s value to <code>nil</code></strong>. However, if you try to access a deallocated object, you will catch a runtime error.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Use an unowned reference only when you are sure that the reference always refers to an instance that has not been deallocated</div></div><p>Here's a similar example:</p><pre><code class="language-swift">class Customer {
    let name: String
    var card: CreditCard?
    init(name: String) {
        self.name = name
    }
    deinit { print("\(name) is being deinitialized") }
}

class CreditCard {
    let number: UInt64
    unowned let customer: Customer
    init(number: UInt64, customer: Customer) {
        self.number = number
        self.customer = customer
    }
    deinit { print("Card #\(number) is being deinitialized") }
}

var john: Customer? = Customer(name: "John Appleseed")
john!.card = CreditCard(number: 1234_5678_9012_3456, customer: john!)

john = nil // No retain cycle, both objects are deallocated</code></pre>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="three-reference-counters">Three Reference Counters</h2><p>So, how does all this magic works inside? <a href="https://github.com/apple/swift/blob/main/stdlib/public/SwiftShims/swift/shims/RefCount.h?ref=alexdremov.me">Swift sources</a> have an amazing detailed description of all processes under the hood. </p><p>The <strong>strong RC</strong> counts strong references to the object. When the strong RC reaches zero the object is deinited, unowned reference reads become errors, and weak reference reads become nil. The strong RC is stored as an extra count: when the physical field is 0 the logical value is 1.</p><p>The <strong>unowned RC</strong> counts unowned references to the object. The unowned RC also has an extra <code>+1</code> on behalf of the strong references; this <code>+1</code> is decremented after deinit completes. When the unowned RC reaches zero the object's allocation is freed.</p><p>The <strong>weak RC</strong> counts weak references to the object. The weak RC also has an extra <code>+1</code> on behalf of the unowned references; this <code>+1</code> is decremented after the object's allocation is freed. When the weak RC reaches zero the object's side table entry is freed.</p><p>But what is a side table and why is it needed?</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">What's side table is another popular interview question, usually more advanced</div></div><h2 id="side-table">Side Table</h2><p>An object conceptually has three refcounts. These refcounts are stored either "inline" or in a "side table entry" pointed to by the internal field. You cannot access these fields from Swift directly</p><pre><code class="language-swift">class User {
	var id: Int
    var name: String
    
    init(id: Int, name: String) {
    	self.id = id
        self.name = name
    }
}

let user = User(id: 0, name: "John")</code></pre><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-15.12.40.png" class="kg-image" alt="" loading="lazy" width="1746" height="914" srcset="https://alexdremov.me/content/images/size/w600/2023/01/Screenshot-2023-01-08-at-15.12.40.png 600w, https://alexdremov.me/content/images/size/w1000/2023/01/Screenshot-2023-01-08-at-15.12.40.png 1000w, https://alexdremov.me/content/images/size/w1600/2023/01/Screenshot-2023-01-08-at-15.12.40.png 1600w, https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-15.12.40.png 1746w" sizes="(min-width: 720px) 720px"></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Remember that unowned has <code spellcheck="false" style="white-space: pre-wrap;">+1</code> on behalf of strong reference and weak has <code spellcheck="false" style="white-space: pre-wrap;">+1</code> on behalf of unowned references</div></div><p>Objects initially start with no side table. They can gain a side table when a weak reference is formed.</p><p>Gaining a side table entry is a one-way operation; an object with a side table entry never loses it. This prevents some thread races.</p><pre><code class="language-swift">weak var weakUser = user // Side table implicitly created</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-15.20.34.png" class="kg-image" alt="A side table is created" loading="lazy" width="1248" height="1160" srcset="https://alexdremov.me/content/images/size/w600/2023/01/Screenshot-2023-01-08-at-15.20.34.png 600w, https://alexdremov.me/content/images/size/w1000/2023/01/Screenshot-2023-01-08-at-15.20.34.png 1000w, https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-15.20.34.png 1248w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A side table is created</span></figcaption></figure><p>Strong and unowned variables point at the object. Weak variables point at the object's side table.</p><p>This idea is fundamental to understanding how <code>weak</code> references work. By pointing not to the object but to the side table, the object itself can be deinitialized and fully deallocated.</p><h2 id="weak-and-unowned-deep-differences">Weak and Unowned. Deep Differences</h2><p>Now, by looking at the implementation we can notice important differences between <code>weak</code> and <code>unowned</code>. </p><h3 id="performance">Performance</h3><p>Using <code>unowned</code> introduces less overhead than using <code>weak</code>. That's because <code>weak</code> variables reference the object through a side table. This means that there's one more pointer hop to reach the object.</p><p>Unowned references point directly to the object, so they do not have such overhead.</p><h3 id="deallocation-vs-deinitialization">Deallocation vs deinitialization</h3><p>According to the sources, when the strong RC reaches zero the object is <strong>deinited. </strong>And when the unowned RC reaches zero the <strong>object's allocation is freed</strong>. </p><p>That means that object memory is not available for realocation until all unowned references disappear.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">If an object holds a large amount of memory, its memory will not be available until the last unowned reference disappear.If lack of memory is a problem, consider using <code spellcheck="false" style="white-space: pre-wrap;">weak</code> reference because it allows objects to be fully deallocated even when there are alive <code spellcheck="false" style="white-space: pre-wrap;">weak</code> references.</div></div><h2 id="common-problems">Common Problems</h2><p>The example of <code>Person</code> and <code>Apartment</code> retain cycle can be trivial. It's important to know about common cases when retain cycle appears.</p><h3 id="closures-strong-capture-and-self">Closures, strong capture, and self</h3><p>By default, a closure expression captures constants and variables from its surrounding scope with strong references to those values.</p><p>As we've already noted, uncontrollable strong references may create a retain cycle. An escaping closure that refers to <code>self</code> needs special consideration if <code>self</code> refers to an instance of a class. Capturing <code>self</code> in an escaping closure makes it easy to accidentally create a strong reference cycle.</p><p>For example:</p><pre><code class="language-swift">class Person {
  var name: String
  var voice: Voice? = nil

  init(name: String) {
    self.name = name
    self.voice = Voice {
      print("I'm \(self.name)")
    }
  }
  func say() { voice?.say() }
  deinit {
    print("Person deallocated")
  }
}

class Voice {
  var say: () -&gt; ()
  init(say: @escaping () -&gt; ()) { self.say = say }
  deinit {
    print("Voice deallocated")
  }
}

var person: Person? = Person(name: "Alex")
person!.say()

person = nil</code></pre><p>Which outputs only this line — without <code>deinit</code> prints</p><pre><code>My name is Alex</code></pre><p>What's going on here? Let's draw a strong references graph:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-16.22.35.png" class="kg-image" alt="Retain cycle with closure" loading="lazy" width="1116" height="744" srcset="https://alexdremov.me/content/images/size/w600/2023/01/Screenshot-2023-01-08-at-16.22.35.png 600w, https://alexdremov.me/content/images/size/w1000/2023/01/Screenshot-2023-01-08-at-16.22.35.png 1000w, https://alexdremov.me/content/images/2023/01/Screenshot-2023-01-08-at-16.22.35.png 1116w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Retain cycle with closure</span></figcaption></figure><p>And, as expected, there is a pretty notable strong reference cycle. The problem is in  the creation of the <code>Voice</code> instance:</p><pre><code class="language-swift">self.voice = Voice {
	print("My name is \(self.name)")
}</code></pre><p>Here, <code>self</code> is captured with a strong reference to the escaping closure. To solve that, we can capture <code>self</code> with the <code>weak</code> modifier:</p><pre><code class="language-swift">self.voice = Voice {[weak self] in
	guard let self = self else { return; }
	print("My name is \(self.name)")
}</code></pre><p>With such modification, we receive an expected output:</p><pre><code>My name is Alex
Person deallocated
Voice deallocated</code></pre><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Do not use <code spellcheck="false" style="white-space: pre-wrap;">weak self</code> when it is not needed. Remember that strong reference is required so that object is not deallocated before it is needed.</div></div><h2 id="final-notes">Final notes</h2><p>If you want to achieve an even deeper understanding of ARC internals, definitely check the ARC source code. You can start with this amazing description of an object's lifetime state machine.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/apple/swift/blob/3bac57d9ac20eb9a6e41fd3c32e8d6fb23e37a47/stdlib/public/SwiftShims/swift/shims/RefCount.h?ref=alexdremov.me#L112"><div class="kg-bookmark-content"><div class="kg-bookmark-title">swift/RefCount.h at 3bac57d9ac20eb9a6e41fd3c32e8d6fb23e37a47 · apple/swift</div><div class="kg-bookmark-description">The Swift Programming Language. Contribute to apple/swift development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">apple</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/1b258a6f968eb7c1af3235e6e4954c458d8edcd16fdb3a7e0e477002d51f4095/apple/swift" alt="" onerror="this.style.display = 'none'"></div></a></figure><p>Hope that this post was helpful to you. Feel free to leave a comment or to reach me through my social nets!</p><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/apple/swift/blob/main/stdlib/public/SwiftShims/swift/shims/RefCount.h?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">swift/RefCount.h at main · apple/swift</div><div class="kg-bookmark-description">The Swift Programming Language. Contribute to apple/swift development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">apple</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/1b258a6f968eb7c1af3235e6e4954c458d8edcd16fdb3a7e0e477002d51f4095/apple/swift" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://medium.com/appcoda-tutorials/memory-management-in-swift-understanding-strong-weak-and-unowned-references-b80a06c82460?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Memory Management in Swift: Understanding Strong, Weak and Unowned References</div><div class="kg-bookmark-description">Behind all the coding that we are doing, you probably have noticed some of your variables with the reference of strong, weak or unowned…</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://cdn-static-1.medium.com/_/fp/icons/Medium-Avatar-500x500.svg" alt=""><span class="kg-bookmark-author">AppCoda Tutorials</span><span class="kg-bookmark-publisher">AppCoda</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://miro.medium.com/max/1200/1*ky03wTVr4G93J_b1pi4VFQ.jpeg" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://docs.swift.org/swift-book/LanguageGuide/AutomaticReferenceCounting.html?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Automatic Reference Counting — The Swift Programming Language (Swift 5.7)</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://docs.swift.org/apple-touch-icon-180x180.png" alt=""><span class="kg-bookmark-author">Swift.org</span><span class="kg-bookmark-publisher">Apple Inc.</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://docs.swift.org/swift-book/_images/referenceCycle01_2x.png" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://docs.swift.org/swift-book/ReferenceManual/Expressions.html?ref=alexdremov.me#ID544"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Expressions — The Swift Programming Language (Swift 5.7)</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://docs.swift.org/apple-touch-icon-180x180.png" alt=""><span class="kg-bookmark-author">Swift.org</span><span class="kg-bookmark-publisher">Apple Inc.</span></div></div></a></figure><p></p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Data Binding in SwiftUI: Tips, Tricks, and Best Practices ]]></title>
                    <description><![CDATA[ Want to create dynamic and responsive user interfaces in SwiftUI? Data binding is the key! In this tutorial, I&#39;ll show you how to use @State, @ObservedObject, @EnvironmentObject, and @Binding to keep your user interface in sync with your data ]]></description>
                    <link>https://alexdremov.me/data-binding-in-swiftui-tips-tricks-and-best-practices/</link>
                    <guid isPermaLink="false">63aecdb56bdea0516ceea95e</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Fri, 30 Dec 2022 14:16:07 +0100</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1586953208448-b95a79798f07?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDR8fFVJfGVufDB8fHx8MTY3MjQwMDUwNg&amp;ixlib&#x3D;rb-4.0.3&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>Are you building an app with SwiftUI and wondering how to manage your app's state? Data binding is a powerful tool that can help you build dynamic and responsive interfaces.</p><p>In this tutorial, we'll explore how to use <code>@State</code>, <code>@ObservedObject</code>, and <code>@EnvironmentObject</code>.</p><h2 id="what-is-data-binding-in-swiftui">What is data binding in SwiftUI?</h2><p>Data binding connects UI element to a piece of data in your app. When the data changes, the UI element automatically updates to reflect the new value, and when the user interacts with the element, the data updates to reflect the new input.</p><p>SwiftUI provides several tools for data binding: <code>@State</code>, <code>@ObservedObject</code>, and <code>@EnvironmentObject</code>. These tools allow you to bind values, objects, and even global objects to your user interface.</p><h2 id="how-to-use-state-to-bind-a-simple-value-to-your-user-interface">How to use @State to bind a simple value to your user interface</h2><p><code>@State</code> is a property wrapper that allows you to bind a simple value, like a string or an integer, to your user interface. </p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Strictly, <code>@State</code> can be used to bind value-type objects only. So, any <code>struct</code> also can be binded using <code>@State</code>.</div></div><p>To use <code>@State</code>, you first define a property with the <code>@State</code> wrapper, and then use the property in your user interface as a usual. For example, here's how you might use <code>@State</code> to bind a string to a text field:</p><pre><code class="language-swift">struct ContentView: View {
    @State private var name: String = ""
    
    var body: some View {
        VStack {
            TextField("Enter your name", text: $name)
            Text("Hello, \(name)!")
        }
    }
}</code></pre><p>You may notice that <code>$name</code> is used. It allows to access <code>projectedValue</code> of the wrapper. In case of <code>@State</code> it is <code>Binding&lt;Type&gt;</code>. </p><p>Now, whenever name is changed, the UI updates automatically. And when the user modifies the text field, variable data gets updated too.</p><h2 id="using-binding">Using @Binding</h2><p><code>@Binding</code> is used when you want to bind a value or object <strong>that is owned by a different view</strong>.</p><p>To use <code>@Binding</code>, you first define a property with the <code>@Binding</code> wrapper, and then pass the binding to another view as an argument. The other view can then use the binding to read and write the data from the original view.</p><pre><code class="language-swift">struct CustomTextField: View {
    @Binding var text: String
    
    var body: some View {
        HStack {
            Image(systemName: "person.circle")
            TextField("Enter your name", text: $text)
        }
        .padding()
    }
}

struct ContentView: View {
    @State private var name: String = ""
    
    var body: some View {
        VStack {
            CustomTextField(text: $name)
            Text("Hello, \(name)!")
        }
    }
}
</code></pre><p>You also can pass binding in <code>init</code> using direct access to property wrapper through underscore.</p><pre><code class="language-swift">struct CustomTextField: View {
    @Binding var text: String
    
    init(text: Binding&lt;String&gt;) {
        self._text = text
    }
    
    var body: some View {
        HStack {
            Image(systemName: "person.circle")
            TextField("Enter your name", text: $text)
        }
        .padding()
    }
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">You can view <code>@Binding</code> as a channel that gets value from the source and sets value to the source. It does not own an object.</div></div><p>Therefore, <code>@Binding</code> is great for the view decomposition as it allows to inject dependencies to subviews.</p><p>Read more about modular app architecture with SwiftUI in my previous post:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-using-swiftui-in-modular-app/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Using SwiftUI in Modular App</div><div class="kg-bookmark-description">The modular architecture is excellent. But how to implement it effectively with SwiftUI? From its core, SwiftUI is state-driven, and it can be tricky to modularize an app and define exact responsibility borders.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png?v&#x3D;812a8f874f" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1581291518633-83b4ebd1d83e?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGludGVyZmFjZXxlbnwwfHx8fDE2NjYxMjA1NzM&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><h2 id="how-to-use-observedobject-to-bind-a-class-to-your-user-interface">How to use @ObservedObject to bind a class to your user interface</h2><p><code>@ObservedObject</code> allows you to bind a <strong>class</strong> to your user interface. The class must conform to the <code>ObservableObject</code> protocol and use the <code>@Published</code> property wrapper for any properties that you want to bind to your user interface. When the object's <code>@Published</code> properties change, the user interface updates.</p><p>Here's an example of how you might use <code>@ObservedObject</code> to bind a <code>User</code> object to a form:</p><pre><code class="language-swift">class User: ObservableObject {
    @Published var name: String = ""
    @Published var email: String = ""
    
    var someUntrackedValue = ""
}

struct ContentView: View {
    @ObservedObject private var user = User()
    
    var body: some View {
        VStack {
            TextField("Enter your name", text: $user.name)
            TextField("Enter your email", text: $user.email)
            Text("Hello, \(user.name)!")
        }
    }
}
</code></pre><p>In this example, the <code>user</code> property is bound to the text fields using the <code>$user.name</code> and <code>$user.email</code> syntax. When the user types in the text fields, the <code>name</code> and <code>email</code> properties of the <code>User</code> object update to reflect the new input, and the <code>Text</code> view updates to show the new value.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Mind that if you publish a reference type in <code>ObservableObject</code>, then changes inside it will not be propagated.</div></div><h2 id="how-to-use-environmentobject-to-bind-a-global-object-to-your-user-interface">How to use @EnvironmentObject to bind a global object to your user interface</h2><p>EnvironmentObject allows you to bind a global object. The object must conform to the <code>ObservableObject</code> protocol the same way as with <code>@ObservedObject</code>.</p>
<aside class="gh-post-upgrade-cta no-ads">
  <div class="gh-post-upgrade-cta-content" style="background-color: #73926C">
      <h2>This post is for free subscribers only</h2>
      <h4>Subscribe for free now and continue to read the post</h4>
      <a class="gh-btn" data-portal="signup" style="color:#73926C">Subscribe now</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>
  </div>
</aside>
 ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ iOS App As a Microservice. Using SwiftUI in Modular App ]]></title>
                    <description><![CDATA[ The modular architecture is excellent. But how to implement it effectively with SwiftUI? From its core, SwiftUI is state-driven, and it can be tricky to modularize an app and define exact responsibility borders. ]]></description>
                    <link>https://alexdremov.me/ios-app-as-a-microservice-using-swiftui-in-modular-app/</link>
                    <guid isPermaLink="false">634efadaba9260892e6dcce7</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Wed, 19 Oct 2022 15:00:23 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1581291518633-83b4ebd1d83e?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGludGVyZmFjZXxlbnwwfHx8fDE2NjYxMjA1NzM&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>In this post, I will describe features of SwiftUI that work well in modular design and those that are better to avoid.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">This is the third and the last post in the series on a modular architecture.Check out the previous issues to boost your understanding of critical concepts!</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-build-robust-app-architecture/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Build Robust App Architecture</div><div class="kg-bookmark-description">What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1532622785990-d2c36a76f5a6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDV8fHN0cnVjdHVyZXxlbnwwfHx8fDE2NjMyMzA3ODU&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-modularize-your-app-with-tuist/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Modularize Your App With Tuist</div><div class="kg-bookmark-description">This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1613645695025-20e3f38de4a6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fG1vZHVsYXJ8ZW58MHx8fHwxNjY0OTk5NDQ5&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt=""></div></a></figure><h2 id="whats-the-problem">What's The Problem</h2><p>Why SwiftUI use in modular design is different, and why do I need a whole new post for it? As I already mentioned, SwiftUI is state-driven and trying to avoid that leads to ineffective and messy solutions.</p><p>For example</p><p>Let's suggest that you have settings and homepage modules. Users can log out on the settings screen and your app needs to <em>handle</em> this case correctly. The first intent is to pass a closure to the settings module that will be called on the logout button press. Sounds reasonable, right?</p><p>Ok, but how does it connect with SwiftUI? Notice that <em>handling </em>action does not necessarily mean that there will be a change in state. There is a logical change, though. But how can SwiftUI know about that?</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">State-driven means that views are a function of the state. So, the only way to update the view is to change its state.</div></div><h2 id="data-flow">Data Flow</h2><p>Apple released a nice presentation on WWDC19 about the role of data in SwiftUI. The presentation covers cases where <code>@Binding</code>, <code>@EnvironmentObject</code>, etc. are the most applicable.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/10/Screenshot-2022-10-18-at-23.19.45.png" class="kg-image" alt="Apple WWDC19 — Swift Data Flow" loading="lazy" width="2000" height="1185" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Screenshot-2022-10-18-at-23.19.45.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/Screenshot-2022-10-18-at-23.19.45.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/10/Screenshot-2022-10-18-at-23.19.45.png 1600w, https://alexdremov.me/content/images/2022/10/Screenshot-2022-10-18-at-23.19.45.png 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Apple WWDC19 — Swift Data Flow</span></figcaption></figure><p>But also the crucial point is made — the view is not the result of a sequence of events, but rather a <strong>representation of data or state</strong>. It's also essential where this data comes from. There should be a single source of truth.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://developer.apple.com/videos/play/wwdc2019/226/?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Data Flow Through SwiftUI - WWDC19 - Videos - Apple Developer</div><div class="kg-bookmark-description">SwiftUI was built from the ground up to let you write beautiful and correct user interfaces free of inconsistencies. Learn how to connect...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://developer.apple.com/apple-logo.svg" alt=""><span class="kg-bookmark-author">Apple Developer</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://devimages-cdn.apple.com/wwdc-services/images/48/2828/2828_wide_250x141_2x.jpg" alt=""></div></a></figure><p>Keeping this in mind, let's move on to the first tip that will solve the issue proposed in <a href="#whats-the-problem">the "problem" section</a> of this article.</p><h2 id="use-data-flows-and-not-callbacks">Use Data Flows and Not Callbacks</h2><p>The problem with <em>handling </em>the<em> </em>logout<em> </em>action is in the word <em><code>handle</code></em> itself. There is no explicit change in state and it's unknown who's responsible for changing the state if it is even defined. </p><p>So, if SwiftUI is state-driven, let's define the source of truth for this state. It must be a variable that stores the current <code>logged-in</code> / <code>logged-out</code> state. Depending on the state's complexity, it can be a bool, enum, or struct. </p><p>Singleton or global state? No.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">As described in previous posts, <b><strong style="white-space: pre-wrap;">dependencies should be explicit</strong></b>.In this case, the logged-in / logged-out variable should be passed as a dependency to the settings module and to the homepage module.</div></div><p>But we need to listen for changes in this variable and update views respectively. Also, it's bad if every module can change this variable. There should be restrictions on which module can modify state and which can only read. </p><h3 id="swiftui-combine-its-a-match">SwiftUI + Combine. It's a Match</h3><p>You may already know that SwiftUI automatically listens for <code>ObservableObject</code> changes and updates views when something is changed. So, we can create such a class:</p><pre><code class="language-swift">class LogInState: ObservableObject {
    @Published var isLoggedIn: Bool
    
    init(isLoggedIn: Bool) {
        self.isLoggedIn = isLoggedIn
    }
    
    func loggedOut() {
        isLoggedIn = false
    }
    
    func loggedIn() {
        isLoggedIn = true
    }
}</code></pre><p>It later can be injected into a SwiftUI view as simple as that</p><pre><code class="language-swift">struct MyView: View {
	@ObservedObject var logInState: LogInState

	var body: some View {
    	Text(logInState.isLoggedIn ? "Yes" : "No")
    }
}

...
let logInState = LogInState(isLoggedIn: true)
HomePageModule(logInState: logInState)
...
SettingsModule(logInState: logInState)</code></pre><p>Don't you think that creating such a distinct class for every state is bad? It may be fine for complex data types, but definitely not for a single boolean value.</p><p>Also, notice that both <code>HomePageModule</code> and <code>SettingsModule</code> can change the state. What if you have many more modules that depend on <code>logInState</code>? They all could change it!</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">If every part of your app can hypothetically change the shared state, then if a bug arises, you start playing an amazing game"Who the hell changed this value?"</div></div>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="better-combine-use">Better Combine Use</h2><p>Ok, we've solved the problem with callbacks. Though we still have a problem with the boilerplate code needed to define a new <code>ObservableObject</code>, and a problem with state modification privileges.</p><p>We can solve those by creating a custom ObservableObject!</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">You also can use third-party reactive frameworks, but I will cover implementation using Combine as it seamlessly integrates with SwiftUI</div></div><p>To use SwiftUI's automatic listening to updates, we need to conform to <code>ObservableObject</code>. Here's a generic class to make any type observable. It also utilizes <code>@propertyWrapper</code> and <code>@dynamicMemberLookup</code> features.</p><pre><code class="language-swift">import Foundation
import Combine

@dynamicMemberLookup
@propertyWrapper
public class ObservableProperty&lt;Output&gt;: ObservableObject {
    @Published private var storedValue: Output
    
    public var wrappedValue: Output {
        get {
            storedValue
        }
        set {
            storedValue = newValue
        }
    }
    
    public init(wrappedValue initialValue: Output) {
        self.storedValue = initialValue
    }
    
    public subscript&lt;Result&gt;(dynamicMember keyPath: WritableKeyPath&lt;Output, Result&gt;) -&gt; Result {
        get {
            storedValue[keyPath: keyPath]
        }
        set {
            storedValue[keyPath: keyPath] = newValue
        }
    }
    
    public subscript&lt;Result&gt;(dynamicMember keyPath: KeyPath&lt;Output, Result&gt;) -&gt; Result {
        storedValue[keyPath: keyPath]
    }
}</code></pre><p>It can be used as simply as that</p><pre><code class="language-swift">struct MyView: View {
    @ObservedObject
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty&lt;Bool&gt;) {
        self._logInState = .init(initialValue: logInState)
    }
    
    var body: some View {
        VStack {
            Text(logInState ? "Yes" : "No")
            Button("toggle") {
                logInState = !logInState
            }
        }
    }
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">However, ObservableProperty works with value types only. Passing reference types will not trigger updates</div></div><h2 id="restrict-modules-to-read-only-variables">Restrict Modules To Read-Only Variables</h2><p>In the example above, <code>MyView</code> can modify the value. But how we can restrict it to read-only mode? We can create a similar class that will prohibit modification</p><pre><code class="language-swift">@dynamicMemberLookup
@propertyWrapper
public class ObservableValue&lt;Output&gt;: ObservableObject {
    @Published private var storedValue: Output
    public var wrappedValue: Output {
        storedValue
    }

    public var value: Output {
        storedValue
    }

    public init(wrappedValue initialValue: Output) {
        fatalError("ObservableValue cannot be initialized with value. Use constant()")
    }

    init&lt;Pub: Publisher&lt;Output, Never&gt;&gt;(initialValue: Output, publisher: Pub) {
        storedValue = initialValue
        publisher.assign(to: &amp;$storedValue)
    }

    public subscript&lt;Result&gt;(dynamicMember keyPath: WritableKeyPath&lt;Output, Result&gt;) -&gt; Result {
        get {
            storedValue[keyPath: keyPath]
        }
        set {
            storedValue[keyPath: keyPath] = newValue
        }
    }

    public subscript&lt;Result&gt;(dynamicMember keyPath: KeyPath&lt;Output, Result&gt;) -&gt; Result {
        storedValue[keyPath: keyPath]
    }

    public static func constant(initialValue: Output) -&gt; ObservableValue&lt;Output&gt; {
        .init(
            initialValue: initialValue,
            publisher: Empty()
        )
    }

    public var publisher: Published&lt;Output&gt;.Publisher {
        $storedValue
    }
}</code></pre><p>Then, we can add <code>projectedValue</code> to <code>ObservableProperty</code> to create <code>ObservableValue</code> from it.</p><pre><code class="language-swift">public class ObservableProperty&lt;Output&gt;: ObservableObject {
	...
    public var publisher: AnyPublisher&lt;Output, Never&gt; {
        $storedValue.eraseToAnyPublisher()
    }
    
    public var projectedValue: ObservableValue&lt;Output&gt; {
        ObservableValue&lt;Output&gt;(
            initialValue: storedValue,
            publisher: publisher
        )
    }
	...
}</code></pre><p>Great!</p><p>Now we can create an observable source of truth, and pass it to modules, restricting some of them to read-only mode. Check out the example:</p><pre><code class="language-swift">struct ReadOnlyModule: View {
    @ObservedObject
    @ObservableValue
    var logInState: Bool
    
    init(logInState: ObservableValue&lt;Bool&gt;) {
        self._logInState = .init(wrappedValue: logInState)
    }
    
    var body: some View {
        Text(logInState ? "Yes" : "No")
    }
}

struct ModifyModule: View {
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty&lt;Bool&gt;) {
        self._logInState = logInState
    }
    
    var body: some View {
        Button("toggle") {
            logInState = !logInState
        }
    }
}

struct MyView: View {
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty&lt;Bool&gt;) {
        self._logInState = logInState
    }
    
    var body: some View {
        VStack {
        	// projected read-only value (ObservableValue)
            ReadOnlyModule(logInState: $logInState)
            
            // ObservableProperty reference
            ModifyModule(logInState: _logInState)
        }
    }
}</code></pre><p>So, the callbacks problem is solved and we can move on to the next idea.</p><h2 id="do-not-use-environmentobjects">Do Not Use EnvironmentObjects</h2><p>Yes, I'm this definite about it. Environment objects in their core are global variables that create implicit dependencies. Also, they are easily overlooked and can produce unexpected crashes when not set.</p><p>Apart from that, you can't set two environment objects of the same type and it results in messy decisions and code modifications.</p><p>And the third reason is that they simply don't work with dependency inversion. You cannot hide the environment object behind the protocol as only ObservableObject can be passed as an environment object.</p><h2 id="go-for-programmatic-navigation">Go For Programmatic Navigation </h2><p>SwiftUI is trying to introduce ways for implementing programmatic navigation, but it is not ready yet. Though, it's essential for modular architecture because of loose coupling.</p><p>There are frameworks that can be used to achieve that. I have a post on this topic. Check it out!</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/swiftui-navigation-is-a-mess-heres-what-you-can-do/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">SwiftUI Navigation Is a Mess. Here’s What You Can Do</div><div class="kg-bookmark-description">Managing navigation in pure SwiftUI is hard and leads to messy solutions. In this post, I will show you how you can manage views effectively</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1597945161640-9366e6d4253b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDF8fE5hdmlnYXRpb258ZW58MHx8fHwxNjU5MjAzNjQy&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt=""></div></a></figure><p>Alternatively, you can use other open-source solutions. For example, I recently found a similar framework:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/johnpatrickmorgan/FlowStacks?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - johnpatrickmorgan/FlowStacks: FlowStacks allows you to hoist SwiftUI navigation and presentation state into a Coordinator</div><div class="kg-bookmark-description">FlowStacks allows you to hoist SwiftUI navigation and presentation state into a Coordinator - GitHub - johnpatrickmorgan/FlowStacks: FlowStacks allows you to hoist SwiftUI navigation and presentati...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">johnpatrickmorgan</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/c4f40a4363f317ae1c8fb69c0fd9a888dcfd7718b0be9c0afaa7f3ccfe2f669d/johnpatrickmorgan/FlowStacks" alt=""></div></a></figure><p>As always, let me know what you think in the comments!</p><h2 id="references">References </h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://developer.apple.com/videos/play/wwdc2019/226/?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Data Flow Through SwiftUI - WWDC19 - Videos - Apple Developer</div><div class="kg-bookmark-description">SwiftUI was built from the ground up to let you write beautiful and correct user interfaces free of inconsistencies. Learn how to connect...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://developer.apple.com/apple-logo.svg" alt=""><span class="kg-bookmark-author">Apple Developer</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://devimages-cdn.apple.com/wwdc-services/images/48/2828/2828_wide_250x141_2x.jpg" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://developer.apple.com/documentation/combine?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Apple Developer Documentation</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://developer.apple.com/apple-logo.svg" alt=""></div></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ iOS App As a Microservice. Modularize Your App With Tuist ]]></title>
                    <description><![CDATA[ This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist ]]></description>
                    <link>https://alexdremov.me/ios-app-as-a-microservice-modularize-your-app-with-tuist/</link>
                    <guid isPermaLink="false">633ddc39d9ed9a6f50bd0718</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Fri, 07 Oct 2022 12:14:04 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1613645695025-20e3f38de4a6?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDJ8fG1vZHVsYXJ8ZW58MHx8fHwxNjY0OTk5NDQ5&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p><strong>Tuist </strong>is an excellent command line tool that helps you generate, maintain and interact with Xcode projects.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">I covered the core ideas of modular architecture in the previous post. Check it out if you haven't yet!</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-build-robust-app-architecture/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Build Robust App Architecture</div><div class="kg-bookmark-description">What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1532622785990-d2c36a76f5a6?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDV8fHN0cnVjdHVyZXxlbnwwfHx8fDE2NjMyMzA3ODU&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><h2 id="what%E2%80%99s-next">What’s next?</h2><p>In the next and last post in this series, I will cover implementation tips with SwiftUI. Subscribe so you don’t miss it<br><strong>UPD: </strong>now available</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-using-swiftui-in-modular-app/#whats-the-problem"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Using SwiftUI in Modular App</div><div class="kg-bookmark-description">The modular architecture is excellent. But how to implement it effectively with SwiftUI? From its core, SwiftUI is state-driven, and it can be tricky to modularize an app and define exact responsibility borders.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1581291518633-83b4ebd1d83e?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGludGVyZmFjZXxlbnwwfHx8fDE2NjYxMjA1NzM&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><h2 id="why-tuist">Why Tuist?</h2><p>It encourages you to further code modularization as it provides an elegant way to create separate Xcode projects for different modules, making tight coupling or implicit dependencies less viable</p><p>Also, it's<strong> great for teamwork.</strong> Have you tried to commit an Xcode project to a VCS like GitHub?</p><p>It's a mess</p><p>Diff of the modified Xcode project is not human-readable. It's simply impossible to trace changes or review a PR. What if you could define the Xcode project in a simple config file? Tuist does that. Moreover, <strong>tuist</strong> <strong>config files are written in Swift</strong>.</p><h2 id="our-goal">Our goal</h2><p>We want to divide our project into separate Xcode projects according to the architecture I proposed in the previous article.</p><p>To reiterate, our app will consist of a combination of modules and for every module or feature, we will create a new Tuist project.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Remember that each feature should not depend on other features' implementation. Only interfaces should be public</div></div><p>So, for each feature, we will create several targets corresponding to the feature interface, implementation, and testing or mocking targets if required.</p><h2 id="defining-project">Defining project</h2><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Sources for this post are published on GitHub. So, before reading this article you can see how elegant describing a project could be when using Tuist</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AlexRoar/TuistExample?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - AlexRoar/TuistExample: Using Tuist for modular app architecture</div><div class="kg-bookmark-description">Using Tuist for modular app architecture. Contribute to AlexRoar/TuistExample development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">AlexRoar</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/504ecc1c580a5e07e18a7a1546831b4328d13bfcbbd90a7c6cc0b0d35be53e18/AlexRoar/TuistExample" alt=""></div></a></figure><h3 id="structure">Structure</h3><p>Tuist project is a simple folder with config files describing your workspace structure</p><pre><code>Your project root
├── Workspace.swift
├── Tuist
│   ├── Config.swift
│   ├── Dependencies.swift
│   └── ProjectDescriptionHelpers
│       └── &lt;tuist helpers&gt;
└── modules
    ├── Foo
    │   ├── Project.swift
    │   └── &lt;module code, folders&gt;
    ├── Biz
    │   ├── Project.swift
    │   └── &lt;module code, folders&gt;
    └── ...</code></pre><p>But as I said early, <em>each</em> module should have at least an implementation and interface target</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">There could be modules that contain common tools and that are not dependent on any other module. Then, it might have implementation only</div></div><p>So, let's modify the structure according to that</p><pre><code>Your project root
├── Workspace.swift
├── Tuist
│   ├── Config.swift
│   ├── Dependencies.swift
│   └── ProjectDescriptionHelpers
│       └── &lt;tuist helpers&gt;
└── modules
    ├── Foo
    │   └── Project.swift
    │       ├── interface
    │       │   └── &lt;interface files&gt;
    │       └── src
    │           └── &lt;implementation files&gt;
    ├── Biz
    │   └── Project.swift
    │       ├── interface
    │       │   └── &lt;interface files&gt;
    │       └── src
    │           └── &lt;implementation files&gt;
    └── ...</code></pre><p>Before defining modules, we need to define where Tuist should search for these modules. This can be done in <code>Workspace.swift</code> file </p><pre><code class="language-swift">import ProjectDescription

let workspace = Workspace(
    name: "ExampleWorkspace",
    projects: [
        "modules/*"
    ]
)
</code></pre><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h3 id="project-file">Project file</h3><p>Tuist defines the Xcode project with a simple Swift file.</p><pre><code class="language-swift">// Project.swift
import ProjectDescription
import ProjectDescriptionHelpers

let project = Project(
  name: "ProjectName",
  targets: [
  	...
  ]
)</code></pre><p>But this post is not just a review of Tuist</p><p>Let's define a project, knowing that we need to have an interface and implementation targets. Also, let's create an enum for feature names so that we don't have to use strings and remember all namings</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">As config is defined in Swift, you can use the power of suggestions and auto-completion in Xcode while defining your project structure. <br><br>For example, Xcode will suggest other modules' names when using enums</div></div><p>With several simple helpers, we could define project structure with Swift's beauty:</p><pre><code class="language-swift">import ProjectDescription
import ProjectDescriptionHelpers

let project = Project(
    name: Feature.Foo.rawValue,
    targets: [
        .feature(
            implementation: .Foo,
            dependencies: [
                .feature(interface: .Biz),
                .external(.AsyncAlgorithms)
            ]
        ),
        .feature(
            interface: .Foo,
            dependencies: [
                .feature(interface: .Biz)
            ]
        )
    ]
)
</code></pre><p>Features are going to be separate frameworks.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">All <code>swift</code> files that help to describe tuist configs should be placed in the <code>ProjectDescriptionHelpers</code> folder</div></div><pre><code class="language-swift">public extension Target {
    static func makeFramework(
        name: String,
        sources: ProjectDescription.SourceFilesList,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -&gt; Target {
        Target(
            name: name,
            platform: .iOS,
            product: defaultPackageType,
            bundleId: makeBundleID(with: name + ".framework"),
            sources: sources,
            resources: resources,
            dependencies: dependencies
        )
    }
}</code></pre><p>Then, we can define what feature is</p><pre><code class="language-swift">public extension Target {
    static func feature(
        interface featureName: Feature,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -&gt; Target {
        .makeFramework(
            name: featureName.rawValue + "Interface",
            sources: [ "interface/**" ],
            dependencies: dependencies,
            resources: resources
        )
    }
    
    static func feature(
        interface featureName: Feature,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -&gt; Target {
        .makeFramework(
            name: featureName.rawValue,
            sources: [ "src/**" ],
            dependencies: dependencies,
            resources: resources
        )
    }
}</code></pre><p>Finally, we combine modules in an app target. It's defined in the same way</p><pre><code class="language-swift">public extension Target {
    static func makeApp(
        name: String,
        sources: ProjectDescription.SourceFilesList,
        dependencies: [ProjectDescription.TargetDependency]
    ) -&gt; Target {
        Target(
            name: name,
            platform: .iOS,
            product: .app,
            bundleId: makeBundleID(with: "app"),
            deploymentTarget: .iOS(targetVersion: "16.0", devices: .iphone),
            sources: sources,
            dependencies: dependencies
        )
    }
}

let project = Project(
    name: "ExampleApp",
    targets: [
        .makeApp(
            name: "ExampleApp",
            sources: [
                "src/**"
            ],
            dependencies: [
                .common,
                .feature(implementation: .Foo),
                .feature(interface: .Foo),

                .feature(implementation: .Biz),
                .feature(interface: .Biz),

                .external(.FoggyColors)
            ]
        )
    ]
)</code></pre><p>That's it.</p><p>Now we can create different features and state dependencies between them. After that, we simply use <code>tuist generate</code> command and it generates Xcode workspace and Xcode projects for us. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/10/Screenshot-2022-10-07-at-00.43.44-min.png" class="kg-image" alt="Tuist-generated workspace" loading="lazy" width="2000" height="1339" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Screenshot-2022-10-07-at-00.43.44-min.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/Screenshot-2022-10-07-at-00.43.44-min.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/10/Screenshot-2022-10-07-at-00.43.44-min.png 1600w, https://alexdremov.me/content/images/size/w2400/2022/10/Screenshot-2022-10-07-at-00.43.44-min.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Tuist-generated workspace</figcaption></figure><p>Great!</p><p>Now we have our project bootstrapped, and it is fully defined in nice Swift files with a clean structure and explicit dependencies. You can add all <code>.xcodeproj</code> and <code>.xcworkspace</code> to gitignore and forget about a mess in GitHub repositories.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Some details are not covered for the brevity of this post. The full example is published on GitHub and do not hesitate to ask me about anything in the comments!</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AlexRoar/TuistExample?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - AlexRoar/TuistExample: Using Tuist for modular app architecture</div><div class="kg-bookmark-description">Using Tuist for modular app architecture. Contribute to AlexRoar/TuistExample development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">AlexRoar</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/504ecc1c580a5e07e18a7a1546831b4328d13bfcbbd90a7c6cc0b0d35be53e18/AlexRoar/TuistExample" alt=""></div></a></figure><h2 id="creating-an-app-with-tuist">Creating an app with Tuist</h2><p>I already showed how to define project structure in the examples above. Let's get even more specific and write a simple app that will show a random value in a range.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/10/graph.svg" class="kg-image" alt loading="lazy" width="560" height="346"><figcaption>App Architecture</figcaption></figure><p><strong>RandomProvider </strong>defines a protocol for generating a random number and several implementations for it</p><pre><code class="language-swift">// Interface
public protocol NumberProvider {
    var number: Int { get }
}

// Implementation
public struct NumberProviderZero: NumberProvider {
    public let number = 0
    
    public init() {
        
    }
}

public struct NumberProviderRandom: NumberProvider {
    private let range: ClosedRange&lt;Int&gt;
    
    public var number: Int {
        Int.random(in: range)
    }
    
    public init(range: ClosedRange&lt;Int&gt;) {
        self.range = range
    }
}</code></pre><p><strong>RandomScreen </strong>defines several UI screens to display random number and re-generate it. Notice that it depends only on <strong>RandomProviderInterface</strong> and not on <strong>RandomProvider </strong>which is the implementation</p><pre><code class="language-swift">public struct RandomScreenSimple: RandomScreen {
    let randomProvider: NumberProvider
    
    @State var number: Int = 0
    
    public init(randomProvider: NumberProvider) {
        self.randomProvider = randomProvider
    }
    
    public var body: some View {
        VStack {
            Text("\(number)")
            Button("generate") {
                number = randomProvider.number
            }
        }.onAppear {
            number = randomProvider.number
        }
        .animation(.default, value: number)
    }
}
</code></pre><p><strong>Common </strong>is a module that provides common tools. Actually, it is used only by the App module, but I wanted to show that many modules can depend on it</p><p><strong>ExampleApp </strong>is an app module that combines other modules and builds the final app</p><p>This is the only module that can depend on other modules' implementation. Moreover, it chooses which implementation to use depending on the scenario. In the example app, <code>NumberProvider</code> implementation is changed in runtime </p><figure class="kg-card kg-video-card kg-width-wide kg-card-hascaption"><div class="kg-video-container"><video src="https://alexdremov.me/content/media/2022/10/video.mp4" poster="https://img.spacergif.org/v1/2778x1284/0a/spacer.png" width="2778" height="1284" loop autoplay muted playsinline preload="metadata" style="background: transparent url('https://alexdremov.me/content/images/2022/10/media-thumbnail-ember911.jpg') 50% 50% / cover no-repeat;" /></video><div class="kg-video-overlay"><button class="kg-video-large-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button></div><div class="kg-video-player-container kg-video-hide"><div class="kg-video-player"><button class="kg-video-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button><button class="kg-video-pause-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/><rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/></svg></button><span class="kg-video-current-time">0:00</span><div class="kg-video-time">/<span class="kg-video-duration"></span></div><input type="range" class="kg-video-seek-slider" max="100" value="0"><button class="kg-video-playback-rate">1&#215;</button><button class="kg-video-unmute-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/></svg></button><button class="kg-video-mute-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/></svg></button><input type="range" class="kg-video-volume-slider" max="100" value="100"></div></div></div><figcaption>Example app</figcaption></figure><h2 id="final-notes">Final notes</h2><p>So, in this post, we constructed a modular app using Tuist. In the example project, I added useful tools like</p><ul><li>Additions to default Info.plist</li><li>Template for creating a new feature that can be invoked by<br><code>tuist scaffold framework --name ModuleName</code>. This will create a new module folder, Project.swift file</li><li>Building for release mode. You can invoke generation with an environment variable and this will make all modules static. Using static frameworks improves app speed and is good for production.<br><code>TUIST_BUILD_TYPE_RELEASE=TRUE tuist generate --no-cache</code></li></ul><p>Also, If you have not read my article on a general overview of modular architecture, check it out!</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-build-robust-app-architecture/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Build Robust App Architecture</div><div class="kg-bookmark-description">What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1532622785990-d2c36a76f5a6?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDV8fHN0cnVjdHVyZXxlbnwwfHx8fDE2NjMyMzA3ODU&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><p>Do not hesitate to ask anything in the comments</p><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://tuist.io/?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Xcode on steroids | Tuist</div><div class="kg-bookmark-description">Tuist is a tool that helps developers manage large Xcode projects by leveraging project generation. Moreover, it provides some tools to automate most common tasks, allowing developers to focus on building apps.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://tuist.io/icons/icon-512x512.png?v&#x3D;afd926b5da3ecabd886495871849f751" alt=""><span class="kg-bookmark-author">Tuist - Xcode on steroids</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://tuist.io/squared-logo.png" alt=""></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ iOS App As a Microservice. Build Robust App Architecture ]]></title>
                    <description><![CDATA[ What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized? ]]></description>
                    <link>https://alexdremov.me/ios-app-as-a-microservice-build-robust-app-architecture/</link>
                    <guid isPermaLink="false">63052db24cd6bfc84885935a</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Fri, 16 Sep 2022 09:43:25 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1532622785990-d2c36a76f5a6?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDV8fHN0cnVjdHVyZXxlbnwwfHx8fDE2NjMyMzA3ODU&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>In this post, I will discuss microfeature architecture that is, simply said, amazing when implemented correctly in an iOS app.</p><h2 id="next-episodes">Next Episodes</h2><ul><li>Ideas on implementation with <strong>SwiftUI</strong></li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-using-swiftui-in-modular-app/#whats-the-problem"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Using SwiftUI in Modular App</div><div class="kg-bookmark-description">The modular architecture is excellent. But how to implement it effectively with SwiftUI? From its core, SwiftUI is state-driven, and it can be tricky to modularize an app and define exact responsibility borders.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1581291518633-83b4ebd1d83e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGludGVyZmFjZXxlbnwwfHx8fDE2NjYxMjA1NzM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt="" onerror="this.style.display = 'none'"></div></a></figure><ul><li>Using <strong>tuist</strong> to structure microfeature application</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-modularize-your-app-with-tuist/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Modularize Your App With Tuist</div><div class="kg-bookmark-description">This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1613645695025-20e3f38de4a6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fG1vZHVsYXJ8ZW58MHx8fHwxNjY0OTk5NDQ5&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt="" onerror="this.style.display = 'none'"></div></a></figure><h2 id="core-idea">Core Idea</h2><p>The idea comes from microservice server-side application infrastructure. The whole app is divided into logical components corresponding to different functional areas of the application. </p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Considering how complex mobile apps can be, why not apply the same architecture to iOS apps?</div></div><p>Briefly, microfeature architecture implies splitting your app into different components that accept other components' interfaces or data as <strong>explicit dependencies</strong>.</p><p>Therefore, your app can be represented as a graph of modules that explicitly interact with each other.</p><h2 id="main-benefits">Main Benefits</h2><ul><li><strong>Improved maintainability</strong> — each component is small and so is easier to understand and change.</li><li><strong>Better testability</strong> — components explicitly define their public interface. So, they are easier to mock and test.</li><li><strong>Team organization </strong>— different teams can work on different components independently.</li><li><strong>Scalability, code reuse</strong> —when an app is a combination of modules, you can robustly change the app's behaviour by recombining modules. If you decide to create an app extension, watchOS app, or App Clip, just pick the required components and you're all set up.</li><li><strong>Explicit dependencies</strong> — implicit dependencies are one of the worst things that can happen to an app's architecture. This architecture requires defining explicit dependencies for each module.</li></ul><h2 id="details">Details</h2><p>So, how to structure an iOS app once you decided to use microfeature architecture? The core concept is separation. But you still can use one Xcode project for that and separate features purely by architecture.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">You can put each feature into <b><strong style="white-space: pre-wrap;">a separate Xcode project</strong></b>. This will push you to a strict separation of components.I will cover how to do this effectively with <b><strong style="white-space: pre-wrap;">tuist</strong></b> in the next episode!</div></div><p> Your codebase will be divided into several blocks:</p><h3 id="features">Features</h3><p>That's where elements of your app live. Later in this post, I will show by example what this part includes.</p><p>Components are logical blocks of your app. Each component explicitly defines an interface to interact with it.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Swift does not have namespaces, but you can use enums to hide internal module logic.</div></div><h3 id="apps">Apps</h3><p>You can have a WatchOS app, widgets, and the main iOS app. Each app depends on features and builds the final app using features, combining them like bricks.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/09/graphviz-10.svg" class="kg-image" alt="" loading="lazy" width="485" height="346"><figcaption><span style="white-space: pre-wrap;">General apps structure</span></figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://images.unsplash.com/photo-1591040092219-081fb773589c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fHB1enpsZXxlbnwwfHx8fDE2NjMyNjcwNTM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" class="kg-image" alt="" loading="lazy" width="5568" height="3712" srcset="https://images.unsplash.com/photo-1591040092219-081fb773589c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fHB1enpsZXxlbnwwfHx8fDE2NjMyNjcwNTM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=600 600w, https://images.unsplash.com/photo-1591040092219-081fb773589c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fHB1enpsZXxlbnwwfHx8fDE2NjMyNjcwNTM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=1000 1000w, https://images.unsplash.com/photo-1591040092219-081fb773589c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fHB1enpsZXxlbnwwfHx8fDE2NjMyNjcwNTM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=1600 1600w, https://images.unsplash.com/photo-1591040092219-081fb773589c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDR8fHB1enpsZXxlbnwwfHx8fDE2NjMyNjcwNTM&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2400 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Photo by </span><a href="https://unsplash.com/@ashkfor121?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit"><span style="white-space: pre-wrap;">Ashkan Forouzani</span></a><span style="white-space: pre-wrap;"> / </span><a href="https://unsplash.com/?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit"><span style="white-space: pre-wrap;">Unsplash</span></a></figcaption></figure><h3 id="tests-testing-data-and-mock">Tests + Testing Data And Mock</h3><p>This logic also lies apart from the feature's main parts. It's separate because:</p><ul><li>We don't want to use mock data accidentally in the app</li><li>We don't want to include irrelevant data in the final app binary</li></ul><h2 id="feature-design">Feature design</h2><p>The feature consists of four blocks. Tests and mocks may not be present, but the feature always has an interface and implementation.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/09/graphviz-6.svg" class="kg-image" alt="" loading="lazy" width="354" height="250"><figcaption><span style="white-space: pre-wrap;">One feature structure</span></figcaption></figure><h3 id="interface">Interface</h3><p>This part defines parts visible for other features. Public interfaces and models or entities of the feature stay here.</p><p>Interfaces define ways that are used to interact with the feature.</p><p>Models or entities are simple structures with almost no logic that simply define data used to communicate with the feature.</p><p>You can include other components in the interface but remember that <strong>interface must not expose implementation details</strong></p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">If the feature depends on another feature, then it depends on the other feature's interface.Features <b><strong style="white-space: pre-wrap;">must not</strong></b> depend on other feature's implementation</div></div><h3 id="implementation">Implementation</h3><p>Implementation depends on an interface and provides classes and structures conforming to defined protocols in the interface. Resources, images, and other implementation details also stay here.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Separation Interface/Implementation forces you to write code conforming to the letter <b><strong style="white-space: pre-wrap;">D </strong></b>from<b><strong style="white-space: pre-wrap;"> SOLID</strong></b>.Dependency inversion happens naturally when other modules know about interfaces and not about implementations.</div></div><p>Knowing this information, we can add details to our app's graph image:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/09/graphviz-11.svg" class="kg-image" alt="" loading="lazy" width="568" height="405"><figcaption><span style="white-space: pre-wrap;">Detailed apps structure</span></figcaption></figure><p>Notice that none of the features depends on the other feature's interface. Each feature interface strictly depends on the other feature's interface.</p><p>Now you see that apps take building blocks and combine them to make an app.</p>
<!--kg-card-begin: html-->
<section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section>
<!--kg-card-end: html-->
<h2 id="case-example">Case Example</h2><p>Let's architect a scheduling app. It will have:</p><ul><li>Schedule view</li><li>Add event/edit view</li><li>Schedule WatchOS View</li></ul><p>Pretty simple.</p><p>Let's split this app into several features:</p><ul><li><strong>UICommon</strong></li></ul><p>Contains common UI elements that can be used to create more complex views</p><ul><li><strong>Schedule</strong></li></ul><p>Contains main schedule views and logic associated with them. The interface defines ways to interact with views or present them.</p><ul><li><strong>WatchSchedule</strong></li></ul><p>Contains watch-specific schedule views and logic associated with them</p><ul><li><strong>EventModification</strong></li></ul><p>Contains event modification logic and views</p><ul><li><strong>ScheduleData</strong></li></ul><p>Data provider. Defines data structures and entities to obtain them.</p><p>The interface will contain simple data entities and model protocols defining ways of obtaining these entities.</p><p>Implementation defines models conforming to protocols defined in the interface. For example, you may want to define a local storage model or network model. It's up to the final app to decide which option to use.</p><h3 id="app-graph">App Graph</h3><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/09/graphviz-13.svg" class="kg-image" alt="" loading="lazy" width="613" height="346"><figcaption><span style="white-space: pre-wrap;">Case app graph</span></figcaption></figure><p>As you see, WatchOS and the main iOS app reuse common components. Also, Each app decides which implementation of modules' interfaces they pick. For example, the WatchOS app can choose different data sources in ScheduleData feature rather than the main iOS app.</p><p>In a monolithic app, you would probably need to write almost a second app and copy a lot of code</p><h2 id="next-episodes-1">Next Episodes</h2><p>In the next posts, I will share my ideas on using microfeature architecture with <strong>SwiftUI </strong>and<strong> tuist</strong> to structure code efficiently.</p><h2 id="faq">FAQ</h2><h3 id="when-should-i-create-a-new-feature-and-when-its-better-not-to">When should I create a new feature and when It's better not to?</h3><p>It purely depends on the case and on what you think the best option is. If you can come up with some use case when your feature will be reused in some other context, then it's a separate feature.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Do not overcomplicate things!Making a new feature for each class will do more harm than good.</div></div><p>If some block probably will not be reused, but you <strong>just feel</strong> that it's logically separate functionality, then also go with a new feature as it will help to keep your architecture clean.</p><h3 id="what-to-do-with-circular-references">What to do with circular references? </h3><p>Circular references can be a pain and they happen if two features depend on each other's interfaces. If such a situation happens, critically consider if your feature separation is correct. There are two possible options.</p><ul><li>Two features are actually one feature. Then, you can merge these two features and get rid of circular references.</li><li>Two features are actually three features. If features depend on each other, then there is some part that's needed by both features. What if this part is an independent feature? If this is the case, extract the third feature and fix dependencies.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/09/graphviz-12.svg" class="kg-image" alt="" loading="lazy" width="738" height="193"><figcaption><span style="white-space: pre-wrap;">Possible circular reference solution</span></figcaption></figure><h3 id="theres-a-lot-said-about-making-dependencies-explicit-whats-the-point">There's a lot said about making dependencies explicit. What's the point?</h3><p>It's nearly impossible to scale or modify big apps when components are implicitly dependent. Just imagine the mess that is going to happen if you modify some class that is a dependency of all other modules through a singleton.</p><p>Your app may start to have unexpected behaviour here and there and you can't even know how your modification will affect the whole app.</p><p>It's like sitting on a box of TNT.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://images.unsplash.com/photo-1613834927301-1c96a302e074?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGR5bmFtaXRlfGVufDB8fHx8MTY2MzI3MzE5MA&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" class="kg-image" alt="🃏" loading="lazy" width="6000" height="4000" srcset="https://images.unsplash.com/photo-1613834927301-1c96a302e074?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGR5bmFtaXRlfGVufDB8fHx8MTY2MzI3MzE5MA&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=600 600w, https://images.unsplash.com/photo-1613834927301-1c96a302e074?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGR5bmFtaXRlfGVufDB8fHx8MTY2MzI3MzE5MA&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=1000 1000w, https://images.unsplash.com/photo-1613834927301-1c96a302e074?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGR5bmFtaXRlfGVufDB8fHx8MTY2MzI3MzE5MA&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=1600 1600w, https://images.unsplash.com/photo-1613834927301-1c96a302e074?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fGR5bmFtaXRlfGVufDB8fHx8MTY2MzI3MzE5MA&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2400 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Photo by </span><a href="https://unsplash.com/@messrro?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit"><span style="white-space: pre-wrap;">Mehdi MeSSrro</span></a><span style="white-space: pre-wrap;"> / </span><a href="https://unsplash.com/?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit"><span style="white-space: pre-wrap;">Unsplash</span></a></figcaption></figure><p>I encourage you to avoid implicit dependencies whenever possible. Microfeatures architecture will help you with doing that.</p><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/ios-app-as-a-microservice-modularize-your-app-with-tuist/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">iOS App As a Microservice. Modularize Your App With Tuist</div><div class="kg-bookmark-description">This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1613645695025-20e3f38de4a6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=MnwxMTc3M3wwfDF8c2VhcmNofDJ8fG1vZHVsYXJ8ZW58MHx8fHwxNjY0OTk5NDQ5&amp;ixlib=rb-1.2.1&amp;q=80&amp;w=2000" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://docs.tuist.io/building-at-scale/microfeatures?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">µFeatures Architecture | Tuist Documentation</div><div class="kg-bookmark-description">This document describes an approach for architecting a modular Apple OS application to enable scalability, optimize build and test cycles, and ensure good practices.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://docs.tuist.io/img/favicon.ico" alt=""><span class="kg-bookmark-author">Tuist</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://docs.tuist.io/img/logo.svg" alt="" onerror="this.style.display = 'none'"></div></a></figure><figure class="kg-card kg-embed-card"><iframe id="talk_frame_430480" class="speakerdeck-iframe" src="//speakerdeck.com/player/f1759993c7d54294bbcfab419acae8f0" width="710" height="399" style="aspect-ratio:710/399; border:0; padding:0; margin:0; background:transparent;" frameborder="0" allowtransparency="true" allowfullscreen="allowfullscreen" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
</figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Exploring SwiftUI Layout Protocol | Creating Custom Layout ]]></title>
                    <description><![CDATA[ Apple introduces new SwiftUI&#39;s Layout protocol with the release of iOS 16. It is a powerful tool for constructing custom views with SwiftUI elegance. ]]></description>
                    <link>https://alexdremov.me/exploring-swiftui-layout-protocol-creating-custom-layout/</link>
                    <guid isPermaLink="false">62f54f3377cf58ecc52574eb</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 11 Aug 2022 23:00:18 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-21.57.08.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Apple introduces a new SwiftUI <code>Layout</code> protocol with the release of iOS 16. It is a powerful tool for constructing custom views with SwiftUI elegance. In this post, I will cover what <code>Layout</code> is and how it can be used.</p><p>In the end, we will construct a custom table view that auto-arranges its subviews. Complete code is provided!</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-12-at-00.21.52.png" class="kg-image" alt loading="lazy" width="2000" height="1030" srcset="https://alexdremov.me/content/images/size/w600/2022/08/Screenshot-2022-08-12-at-00.21.52.png 600w, https://alexdremov.me/content/images/size/w1000/2022/08/Screenshot-2022-08-12-at-00.21.52.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/08/Screenshot-2022-08-12-at-00.21.52.png 1600w, https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-12-at-00.21.52.png 2288w" sizes="(min-width: 720px) 720px"></figure><h2 id="conforming-to-layout">Conforming to Layout</h2><p>The discussed <code>Layout</code> is a new protocol that allows you to select a way of arranging your views.</p><p>Through it, you literally can say at what coordinates you want to place subviews. For example, now <code>HStack</code>, <code>VStack</code>, and <code>ZStack</code> can easily be implemented through it in iOS 16.</p><pre><code class="language-swift">protocol Layout : Animatable</code></pre><p>To conform to the protocol, you need to define two methods</p><pre><code class="language-swift">func sizeThatFits(
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
) -&gt; CGSize


func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
)</code></pre><p>You also can define <code>makeCache(subviews:)</code> if your layout has some calculations that do not depend on a proposal and depend only on subviews. Then, you can make your calculations in <code>makeCache(subviews:)</code> and then use these values.</p><h3 id="method-sizethatfits">Method <code>sizeThatFits</code></h3><pre><code class="language-swift">func sizeThatFits(
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
) -&gt; CGSize</code></pre><p>Returns a size that indicates how much space the container needs to arrange its subviews. SwiftUI can call this method several times, probing your view and finally deciding the best option</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Only finite sizes can be returned. Returning size with infinite coordinate <strong>results in a crash without a reasonable call stack</strong>, so keep attention to sizes that you return</div></div><p>To calculate it, you can use passed arguments:</p><h4 id="proposal">proposal</h4><p>Basically, it's SwiftUI's proposal for your view's size. I like to think about it as a negotiation.</p><blockquote>I can give you this much space. What's your size is going to be? Will you even fit?<br><br>— SwiftUI negotiator</blockquote><p><code>ProposedViewSize</code> is like a <code>CGSize</code> that also can have some specific values. </p><ul><li>The <code>zero</code> proposal; the view responds with its minimum size.</li><li>The <code>infinity</code> proposal; the view responds with its maximum size.</li><li>The <code>unspecified</code> proposal; the view responds with its ideal size.</li></ul><p>You can also access <code>width</code> and <code>height</code> of proposal if it is not of the above values. </p><p>The proposal can have one dimension fixed and the second one as <code>nil</code>. For example, an <code>HStack</code> might measure the flexibility of its subviews’ widths, while using a fixed value for the height.</p><h4 id="subviews">subviews</h4><p>It is just a container of subviews' proxies <code>LayoutSubview</code>. Through it, you can ask subviews about their size, and also give them your proposal</p><blockquote>Dear subview, I give you this much space. What's your size is going to be?<br><br>— Custom Layout negotiator</blockquote><p>You can ask for subview size through </p><p><code>func sizeThatFits(ProposedViewSize) -&gt; CGSize</code> </p><p>and</p><p> <code>func dimensions(in: ProposedViewSize) -&gt; ViewDimensions</code></p><h4 id="cache">cache</h4><p>It is a cache provided by your <code>makeCache(subviews:)</code> function. It also can be <code>Void</code> (no cache).</p><h3 id="method-placesubviews">Method <code>placeSubviews</code></h3><pre><code class="language-swift">func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
)</code></pre><p>It's where the magic happens. In this method (and only this) you are given bounds for your view and subviews for your disposal. </p><p>To place subviews, you need to call <code>place</code> method on <code>subviews</code> elements.</p><pre><code class="language-swift">func place(
    at position: CGPoint,
    anchor: UnitPoint = .topLeading,
    proposal: ProposedViewSize
)</code></pre><p>The definition is pretty self-explanatory. For every subview, you need to specify a point to place it, an anchor for this point, and <strong>your</strong> proposal for the selected subview.</p><h4 id="bounds">bounds</h4><p>It's bounds for your view to use. It is one of your <code>sizeThatFits</code> outputs. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">While it is named <code>bounds</code>, it is actually <code>frame</code>. So, the origin point is also specified <strong>and you need to arrange subviews with respect to that</strong></div></div><h4 id="proposal-1">proposal</h4><p>The size proposal from which the container generated the size that the parent used to create the <code>bounds</code> parameter.</p><h3 id="about-caching">About caching</h3><p>You may not use it, but usually, some subviews-concerned calculations can be cached which is a good practice and great for performance. </p><p>When subviews are changed, <code>func updateCache(inout Self.Cache, subviews: Self.Subviews)</code> is called. Its default implementation is just to call <code>makeCache(subviews:)</code>.</p><h2 id="creating-auto-filled-table">Creating auto-filled table</h2><p>SwiftUI has a <code>Grid</code> to construct table-like structures, but what if you have an unknown number of subviews? Then, you need to construct <code>GridRow</code> somehow correctly.</p><p>Let's better use the new <code>Layout</code> protocol feature!</p><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h3 id="calculating-sizes">Calculating sizes</h3><p>Deciding what size the result view will have is relatively simple. </p><pre><code class="language-swift">public func sizeThatFits(
        proposal: ProposedViewSize,
        subviews: Subviews,
        cache: inout ()
    ) -&gt; CGSize {
    
        let subviewProposal = getSubviewProposal(
            subviewsCount: subviews.count,
            from: proposal
        )
        
        let rowHeights = getRowHeights(
        	subviews: subviews,
            globalProposal: proposal
        )
        
        let resultWidth = proposal.width ?? 
        		((subviewProposal.width ?? 0) * CGFloat(columnsNumber))
        return CGSize(
            width: resultWidth,
            height: rowHeights.reduce(0, +)
        )
    }</code></pre><p>It uses several helper-functions</p><pre><code class="language-swift">/**
 Get array of heights for every row.
 Just get max height on every row
 */
private func getRowHeights(subviews: Subviews, subviewProposal: ProposedViewSize) -&gt; [CGFloat] {
    var subviewProposalNoHLimit = subviewProposal
    subviewProposalNoHLimit.height = .infinity
    
    var rowHeights = [CGFloat]()
    var index = 0
    while index &lt; subviews.count {
        var rowMax: CGFloat = 0
        for _ in 0..&lt;columnsNumber where index &lt; subviews.count {
            let size = subviews[index].sizeThatFits(subviewProposalNoHLimit)
            rowMax = max(rowMax, size.height)
            index += 1
        }
        rowHeights.append(rowMax)
    }
    return rowHeights
}

/**
 Calculates proposal for subview — one cell in table
 */
func getSubviewProposal(subviewsCount: Int, from globalProposal: ProposedViewSize) -&gt; ProposedViewSize {
    let rowHeight = max(ceil(Double(subviewsCount / columnsNumber)), 1)
    return ProposedViewSize(
        width: (globalProposal.width ?? 0)
                        / CGFloat(columnsNumber),
        height: (globalProposal.height ?? 0) / rowHeight
    )
}</code></pre><h2 id="placing-subviews">Placing subviews</h2><p>Finally, we just need to carefully place views on their places. Just iterating over subviews and calculating their <code>x</code> and <code>y</code> position.</p><pre><code class="language-swift">public func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Subviews,
    cache: inout ()
) {
    var subviewProposal = getSubviewProposal(
        subviewsCount: subviews.count,
        from: proposal
    )
    let colRealWidth = subviewProposal.width ?? 0
    let rowHeights = getRowHeights(subviews: subviews, subviewProposal: subviewProposal)
    
    var curPos: CGFloat = bounds.minX
    var curHeight: CGFloat = bounds.minY
    
    var rowIndex = 0
    for (index, subview) in subviews.enumerated() {
        subviewProposal.height = rowHeights[rowIndex]
        let size = subview.dimensions(in: subviewProposal)
        
        subview.place(
            at: CGPoint(x: curPos, y: curHeight),
            anchor: .topLeading,
            proposal: subviewProposal
        )
        
        if index % columnsNumber == columnsNumber - 1 {
            curPos = bounds.minX
            curHeight += rowHeights[rowIndex]
            rowIndex += 1
        } else {
        	curPos += colRealWidth
        }
    }
}</code></pre><h2 id="example">Example</h2><p>Now, we can construct a table with the needed number of columns as easy as just a regular view.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.35.29.png" class="kg-image" alt loading="lazy" width="1712" height="1000" srcset="https://alexdremov.me/content/images/size/w600/2022/08/Screenshot-2022-08-11-at-23.35.29.png 600w, https://alexdremov.me/content/images/size/w1000/2022/08/Screenshot-2022-08-11-at-23.35.29.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/08/Screenshot-2022-08-11-at-23.35.29.png 1600w, https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.35.29.png 1712w" sizes="(min-width: 720px) 720px"></figure><pre><code class="language-swift">ColumnsLayout(columnsNumber: 2) {
    VStack {
        Text("That's one view")
        Image(systemName: "tortoise.fill")
    }
    .padding()
    .border(.red)
    Text("That's the second view ")
        .padding()
        .border(.red)
    Text("That's the third view with long lines that are warped automatically")
        .fixedSize(horizontal: false, vertical: true)
        .padding()
        .border(.red)
}
.border(.blue)
.padding()</code></pre><p>And it magically re-assembles after changing the number of columns to three.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.36.48.png" class="kg-image" alt loading="lazy" width="1708" height="894" srcset="https://alexdremov.me/content/images/size/w600/2022/08/Screenshot-2022-08-11-at-23.36.48.png 600w, https://alexdremov.me/content/images/size/w1000/2022/08/Screenshot-2022-08-11-at-23.36.48.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/08/Screenshot-2022-08-11-at-23.36.48.png 1600w, https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.36.48.png 1708w" sizes="(min-width: 720px) 720px"></figure><h2 id="final-notes">Final notes</h2><p>I believe that you see how powerful this tool is. For example, <a href="https://developer.apple.com/documentation/swiftui/composing_custom_layouts_with_swiftui?ref=alexdremov.me">Apple creates a radial view in their example</a> with <code>Layout</code> protocol.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.43.38.png" class="kg-image" alt loading="lazy" width="1914" height="1400" srcset="https://alexdremov.me/content/images/size/w600/2022/08/Screenshot-2022-08-11-at-23.43.38.png 600w, https://alexdremov.me/content/images/size/w1000/2022/08/Screenshot-2022-08-11-at-23.43.38.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/08/Screenshot-2022-08-11-at-23.43.38.png 1600w, https://alexdremov.me/content/images/2022/08/Screenshot-2022-08-11-at-23.43.38.png 1914w" sizes="(min-width: 720px) 720px"></figure><p>So, it's only up to you how to place views inside your container and it's finally a room of flexibility so needed for SwiftUI in iOS 16.</p>
        <div class="kg-card kg-file-card ">
            <a class="kg-file-card-container" href="https://alexdremov.me/content/files/2022/08/ColumnsLayout.swift" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">ColumnsLayout</div>
                    <div class="kg-file-card-caption">Complete example</div>
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">ColumnsLayout.swift</div>
                        <div class="kg-file-card-filesize">4 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <p>Let me know what you think about it in the comments!</p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ SwiftUI Navigation Is a Mess. Here’s What You Can Do ]]></title>
                    <description><![CDATA[ Managing navigation in pure SwiftUI is hard and leads to messy solutions. In this post, I will show you how you can manage views effectively ]]></description>
                    <link>https://alexdremov.me/swiftui-navigation-is-a-mess-heres-what-you-can-do/</link>
                    <guid isPermaLink="false">62e53b3577cf58ecc5257271</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sat, 30 Jul 2022 19:55:14 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1597945161640-9366e6d4253b?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDF8fE5hdmlnYXRpb258ZW58MHx8fHwxNjU5MjAzNjQy&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <h2 id="why-messy">Why messy?</h2><p>It's because of the core idea of SwiftUI — a view is a function of the state, or a view is state-driven. Don't get me wrong, this concept is great, but SwiftUI's navigation is not this advanced yet.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">The view is a function of the state and navigation is not an exception</div></div><p>However, SwiftUI does not have the means to construct robust navigation inside your app.</p><h2 id="messy-example">Messy example</h2><p>Consider the common case of the onboarding screen when you need to present some sequence of views with nice transitions. What can you do with SwiftUI? Probably, create an <code>enum</code> that tells which screen is active and then use <code>switch</code> to present the sequence of views.</p><figure class="kg-card kg-video-card kg-width-wide"><div class="kg-video-container"><video src="https://alexdremov.me/content/media/2022/07/swiftUIOnboarding.mp4" poster="https://img.spacergif.org/v1/1706x802/0a/spacer.png" width="1706" height="802" playsinline preload="metadata" style="background: transparent url('https://alexdremov.me/content/images/2022/07/swiftUIOnboarding-0003.png') 50% 50% / cover no-repeat;" /></video><div class="kg-video-overlay"><button class="kg-video-large-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button></div><div class="kg-video-player-container"><div class="kg-video-player"><button class="kg-video-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button><button class="kg-video-pause-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/><rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/></svg></button><span class="kg-video-current-time">0:00</span><div class="kg-video-time">/<span class="kg-video-duration"></span></div><input type="range" class="kg-video-seek-slider" max="100" value="0"><button class="kg-video-playback-rate">1&#215;</button><button class="kg-video-unmute-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/></svg></button><button class="kg-video-mute-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/></svg></button><input type="range" class="kg-video-volume-slider" max="100" value="100"></div></div></div></figure><p>What if you need to modify the order or change the number of views? You'll need to modify the corresponding <code>enum</code>, modify the logic of switching inside the views, and other stuff.</p><p>Not so flexible, right?</p><p>Oh, and then you decide to present one view right in the middle through <code>.sheet</code>. That's when <em>the mess</em> starts to show up. You create an additional <code>@State</code> to check if the sheet is open, make sure that it's updated correctly, and restructure the <code>switch</code> block that you used before. </p><p>Now, it's a chaotic view that is prone to unexpected bugs.</p><h2 id="existing-navigation-views">Existing navigation views</h2><p>The most obvious one is <a href="https://developer.apple.com/documentation/swiftui/navigationview?ref=alexdremov.me">NavigationView</a> which is deprecated in the new iOS 16.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/07/NavigationView-1@2x.png" class="kg-image" alt loading="lazy" width="1458" height="676" srcset="https://alexdremov.me/content/images/size/w600/2022/07/NavigationView-1@2x.png 600w, https://alexdremov.me/content/images/size/w1000/2022/07/NavigationView-1@2x.png 1000w, https://alexdremov.me/content/images/2022/07/NavigationView-1@2x.png 1458w" sizes="(min-width: 720px) 720px"><figcaption>Image by https://developer.apple.com/documentation/swiftui/navigationview</figcaption></figure><p>Using <code>NavigationLink</code>, it can present new views and also adds a "back" button to return to the previous view.</p><p>And it does not support programmatic navigation.</p><p>Apple presented a new <a href="https://developer.apple.com/documentation/swiftui/navigationstack?ref=alexdremov.me">NavigationStack</a> that addresses this issue <strong>but it is still not flexible enough. </strong>For example, I like to have the ability to modify the view whatever I want, but NavugationStack inserts back buttons. Also, it does not support different transitions. While it is nice to see SwiftUI develop in this direction, yet we are not there.</p><p>So, even in iOS 16, SwiftUI is not powerful enough to manage any kind of navigation you can come up with.</p><p>And <code>.sheet()</code>. <code>NavigationStack</code> does not make it easier to handle <code>.sheet()</code> either.</p><h2 id="designing-a-flexible-navigation-library">Designing a flexible navigation library</h2><p>I decided to create a library with several requirements:</p><ul><li>Programmatic views navigation</li><li>Ability to present a sequence of views</li><li>Support for any SwiftUI transition and Animation</li><li>Completely state-driven: no singletons or environment objects</li><li>Handle <code>.sheet()</code></li></ul><p>Sounds cool, right?</p><p><strong>Straight to the point, I was able to create such a library.</strong></p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AlexRoar/PathPresenter?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - AlexRoar/PathPresenter: Pure SwiftUI state-driven library to present view sequences and hierarchies.</div><div class="kg-bookmark-description">Pure SwiftUI state-driven library to present view sequences and hierarchies. - GitHub - AlexRoar/PathPresenter: Pure SwiftUI state-driven library to present view sequences and hierarchies.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">AlexRoar</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/60817d12634f147ff4d20950055d1a547a93bf5b2fc1d224fe7241d9720670de/AlexRoar/PathPresenter" alt=""></div></a></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">I am always open to objective criticism and requests for a new feature. Do not hesitate to open an issue on GitHub!</div></div><p>So, if you just want a nice tool for the things I listed above, you can stop here. Now, let's see how I did it.</p><h2 id="ways-to-present">Ways to present</h2><p>At the core of the library is a structure that stores views and information about how to present them. Possible options for presentation are</p><pre><code class="language-swift">enum PathType {
    /**
    * Just show a view. No animation, no transition.
    * Show view above all other views
    */
    case plain

    /**
    * Show view with in and out transitions.
    * Transition animation also can be specified.
    */
    case animated(transition: AnyTransition, animation: Animation)

    /**
    * Show view in .sheet()
    */
    case sheet(onDismiss: Action)
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">❗</div><div class="kg-callout-text">Note that presenting through <code>.sheet()</code> is as easy as just presenting any other view.</div></div><p>So, you can present the view without any animation, present it with needed transitions, and present it in a sheet.</p><h2 id="path">Path</h2><p>This structure stores information about views. It just stores an array of type-erased views with presentation type information. You can append views on top and remove them from the top.</p>
<aside class="gh-post-upgrade-cta no-ads">
  <div class="gh-post-upgrade-cta-content" style="background-color: #73926C">
      <h2>This post is for free subscribers only</h2>
      <h4>Subscribe for free now and continue to read the post</h4>
      <a class="gh-btn" data-portal="signup" style="color:#73926C">Subscribe now</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>
  </div>
</aside>
 ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Suffix Automaton and Rickroll Lyrics Graph ]]></title>
                    <description><![CDATA[ Easy to understand explanation of suffix automaton with implementation. Finally, generating correct Rickroll lyrics suffix automaton ]]></description>
                    <link>https://alexdremov.me/suffix-automaton-and-rickroll/</link>
                    <guid isPermaLink="false">62889f3b60ca2b8d265107de</guid>
                    <category><![CDATA[ Algorithms ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sun, 17 Jul 2022 15:57:17 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/05/rr.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Suffix automaton is a robust data structure that allows you to solve complex string-related problems such as: checking the presence of a substring in a string, counting the number of total distinct substrings, finding substring, and many others. In this article, I cover the suffix automaton algorithm, provide implementation, and finally <strong>create the correct rickroll lyrics automaton.</strong></p><h2 id="why-rickroll">Why Rickroll?</h2><p>First of all</p><figure class="kg-card kg-embed-card"><iframe width="200" height="113" src="https://www.youtube.com/embed/iik25wqIuFo?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>Now we can continue.</p><p>There is a meme that I've seen a couple of times with all possible Never Gonna Give You Up central lines. It's nice, but it's not fully correct. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/q75ok4vlrpj61.png-1.webp" class="kg-image" alt loading="lazy" width="960" height="674" srcset="https://alexdremov.me/content/images/size/w600/2022/05/q75ok4vlrpj61.png-1.webp 600w, https://alexdremov.me/content/images/2022/05/q75ok4vlrpj61.png-1.webp 960w" sizes="(min-width: 720px) 720px"><figcaption>Rickroll lyrics graph | https://www.reddit.com/r/memes/comments/lskvsq/never_gonna_make_a_flow_chart/</figcaption></figure><p>The problem is that it conforms to incorrect lines too:</p><ul><li>Never gonna give you cry</li><li>Never gonna tell a lie and desert you down</li><li>Never gonna make you up</li><li>Never gonna give you down</li><li>Never gonna make you never</li></ul><p>And many others. So, we can conclude that this graph is incorrect as incorrect lyrics must be unreachable. Then, we need to correct this immense mistake against humanity and generate the correct automaton for Rickroll lyrics.</p><h2 id="what-is-the-suffix-automaton">What is the suffix automaton?</h2><p>Intuitively, it's a data structure that contains information about all substrings of a string and stores it in compressed form. More specifically, it's a directed acyclic word graph in which each node is a state and all edges are transitions between these states by some letter.</p><p>Each state corresponds to some substring in the initial string. There is also one start state and some states are marked as terminal. We also require that suffix automaton contains the minimal possible number of states.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">So, if each node is some substring and each edge is a transition by some letter, by navigating through this graph we can collect information about substrings.</div></div><p>If a substring is not presented in the text, then this state will be unreachable. There's simply no state or for an absent substring. So, at some point we will need transition that does not exists. </p><p>Here is the example of suffix automaton for string <code>abcbac</code>.</p><figure class="kg-card kg-image-card kg-width-full kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/graphviz-5.svg" class="kg-image" alt="Suffix automaton for abcbac" loading="lazy" width="1102" height="277"><figcaption>Suffix automaton for abcbac</figcaption></figure><p>The leftmost state corresponds to empty string (start state) and the rightmost corresponds to the whole string (terminal). Notice that if you start from the start and somehow end up in the terminal state, then the path you followed corresponds to some suffix of the string. Also, every substring corresponds to one path from the start.</p><h2 id="rickroll-suffix-automate">Rickroll suffix automate</h2><p>For this, I generated suffix automate for every line and then merged these suffix automates. </p><figure class="kg-card kg-image-card kg-width-full kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/graphviz-7.svg" class="kg-image" alt loading="lazy" width="7926" height="8112"><figcaption>Full Never Gonna Give You Up lyrics</figcaption></figure><h2 id="final-thoughts">Final thoughts</h2><p>Even though this graph is not as nice as presented in the meme, it's <strong>correct. </strong>You can explore the graph above by yourself; it's actually fun.</p><p> In the next post, I will discuss how I have built this graph using the suffix automaton. Subscribe so you do not miss it!</p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Using Threads in Swift ]]></title>
                    <description><![CDATA[ Swift provides DispatchQueue as an excellent layer above raw threads. But sometimes you want to use a low-level thread API ]]></description>
                    <link>https://alexdremov.me/using-threads-in-swift/</link>
                    <guid isPermaLink="false">627d127860ca2b8d26510675</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Fri, 13 May 2022 10:44:49 +0200</pubDate>
                    <media:content url="https://images.unsplash.com/photo-1620203853151-496c7228306c?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDV8fHBpcGV8ZW58MHx8fHwxNjUyMzY0MTg4&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" medium="image"/>
                    <content:encoded><![CDATA[ <p>Swift provides DispatchQueue as an excellent layer above raw threads. But sometimes you want to create a new thread dedicated to some specific task. Or maybe implement your own concurrent executor. Swift gives you access to raw threads and in this article, I'll show how to use it.</p><h2 id="thread">Thread</h2><p>Creating a thread in Swift is pretty simple using <code>Thread</code> class. You can either specify <code>objc</code> function through a selector as a starting point, or pass a closure, and, more convenient way, subclass <code>Thread</code>.</p><figure class="kg-card kg-code-card"><pre><code class="language-swift">class MyThread: Thread {
    override func main() { // Thread's starting point
        print("Hi from thread")
    }
}

let thread = MyThread()
thread.start()</code></pre><figcaption>Simple thread</figcaption></figure><p>The thread is not started when the initializer is called. You need to call <code>start()</code>  method explicitly to start the thread.</p><p>The thread runs despite its handle returned by <code>Thread</code> initializer. That's it — the variable can no longer exist and the thread will still run. That's fine, but you will lose the ability to control the thread: check if it's completed, wait for its completion, cancel it, etc. </p><h2 id="wait-for-completion-join-a-thread">Wait for completion, join a thread</h2><p>Swift does not provide a way to wait for the thread's completion.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">The main thread can finish before the new thread. In this case, the latter is also terminated</div></div><p>To wait for thread completion, we can join threads using <code>DispatchGroup</code></p><pre><code class="language-swift">class MyThread: Thread {
    let waiter = DispatchGroup()

    override func start() {
        waiter.enter()
        super.start()
    }

    override func main() {
        task()
        waiter.leave()
    }

    func task() {
        print("Hi from thread")
    }

    func join() {
        waiter.wait()
    }
}

let thread = MyThread()
thread.start()

thread.join() // Waits for thread completion</code></pre><h2 id="terminate-the-thread">Terminate the thread</h2><p>The thread terminates automatically after reaching <code>main</code>'s end. To exit the thread in advance, you can call <code>Thread.exit()</code> function from the thread. To use it correctly with created <code>DispatchGroup</code>, it's better to create a custom exit method:</p><pre><code class="language-swift">class MyThread: Thread {
    ...
	func exit() {
        waiter.leave()
        Thread.exit()
    }
    ...
}</code></pre><h2 id="cancel-the-thread">Cancel the thread</h2><p>Apart from terminating the thread, you can cancel it, by calling <code>cancel()</code> method on the thread's handle or inside the thread itself. This sets <code>isCancelled</code> property to <code>true</code>. </p>
<aside class="gh-post-upgrade-cta no-ads">
  <div class="gh-post-upgrade-cta-content" style="background-color: #73926C">
      <h2>This post is for free subscribers only</h2>
      <h4>Subscribe for free now and continue to read the post</h4>
      <a class="gh-btn" data-portal="signup" style="color:#73926C">Subscribe now</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>
  </div>
</aside>
 ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ SwiftUI Advanced Animation: Morphing Shapes ]]></title>
                    <description><![CDATA[ I&#39;m going to show how complex SwiftUI views can be animated efficiently using VectorArithmetic protocol with Accelerate library for fast computations. ]]></description>
                    <link>https://alexdremov.me/swiftui-advanced-animation/</link>
                    <guid isPermaLink="false">626fd1b428ccc9088e2accab</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 05 May 2022 06:00:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/05/Screenshot-2022-05-05-at-11.02.10.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>The regular <code>.animate()</code> function already provides a powerful way of animating views. Yet, its usage is limited to simple transformations. In this guide, I'm going to show how complex SwiftUI views can be animated efficiently using <code>VectorArithmetic</code> protocol with <code>Accelerate</code> library for fast computations.</p><h2 id="inspiration">Inspiration</h2><p>In the course of this guide, we will make a <em>morphing sphere </em>animation inspired by lava lamp bubbles. Some kind of wobbling lava bubbles.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">The proposed technique can be used in other even more complex animations</div></div><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/ezgif.com-gif-maker-2.gif" class="kg-image" alt="Wobbling bubble" loading="lazy" width="800" height="471"><figcaption>Wobbling bubble</figcaption></figure><h2 id="creating-custom-animations">Creating custom animations</h2><p>You may think about animation as a transition between two states. And this transition must be smooth! To display this smooth transition, SwiftUI needs to know how to draw in-between stages.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/statesTransition.png" class="kg-image" alt="Smooth change between two shapes (states)" loading="lazy" width="1440" height="545" srcset="https://alexdremov.me/content/images/size/w600/2022/05/statesTransition.png 600w, https://alexdremov.me/content/images/size/w1000/2022/05/statesTransition.png 1000w, https://alexdremov.me/content/images/2022/05/statesTransition.png 1440w" sizes="(min-width: 720px) 720px"><figcaption>Smooth change between two shapes (states)</figcaption></figure><h3 id="animatablevector">AnimatableVector</h3><p>The key idea of the animation is to represent objects' states with properties that can change continuously.</p><p>For example, if we try to animate an object's positioning and it has integer coordinates, then creating in-between frames of an object smoothly moving from one coordinate to the other is impossible. On the opposite, if the object's position is represented by a floating-point variable, then we can gradually change the object's coordinate until the new coordinate is achieved.</p><p>The same goes for more complicated animations. But usually, states cannot be represented by a single float variable. In this case, we are going to use <code>AnimatableVector</code>. It represents a mathematical vector, conforming to <code>VectorArithmetic</code> protocol.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">If two animation stages are represented by objects conforming to <code>VectorArithmetic</code> protocol, then SwiftUI can compute in-between vectors and draw transitioning.</div></div><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/05/vectoranimateex.png" class="kg-image" alt loading="lazy" width="754" height="482" srcset="https://alexdremov.me/content/images/size/w600/2022/05/vectoranimateex.png 600w, https://alexdremov.me/content/images/2022/05/vectoranimateex.png 754w" sizes="(min-width: 720px) 720px"></figure><p>The <code>AnimatableVector</code> is pretty simple. We store an array of coordinates and define basic math operations for them. In the code below Accelerate is used for fast computations. </p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Accelerate can introduce too much overhead when the vector contains only several values. So, if your animation can be represented with a few values, then consider rewriting operators without Accelerate</div></div><figure class="kg-card kg-code-card"><pre><code class="language-swift">import enum Accelerate.vDSP

struct AnimatableVector: VectorArithmetic {
    var values: [Float]
    
    static var zero = AnimatableVector(values: [0.0])

    static func + (lhs: AnimatableVector, rhs: AnimatableVector) -&gt; AnimatableVector {
        let count = min(lhs.values.count, rhs.values.count)
        return AnimatableVector(
            values: vDSP.add(
                lhs.values[0..&lt;count],
                rhs.values[0..&lt;count]
            )
        )
    }

    static func += (lhs: inout AnimatableVector, rhs: AnimatableVector) {
        let count = min(lhs.values.count, rhs.values.count)
        vDSP.add(
            lhs.values[0..&lt;count],
            rhs.values[0..&lt;count],
            result: &amp;lhs.values[0..&lt;count]
        )
    }

    static func - (lhs: AnimatableVector, rhs: AnimatableVector) -&gt; AnimatableVector {
        let count = min(lhs.values.count, rhs.values.count)
        return AnimatableVector(
            values: vDSP.subtract(
                lhs.values[0..&lt;count],
                rhs.values[0..&lt;count]
            )
        )
    }

    static func -= (lhs: inout AnimatableVector, rhs: AnimatableVector) {
        let count = min(lhs.values.count, rhs.values.count)
        vDSP.subtract(
            lhs.values[0..&lt;count],
            rhs.values[0..&lt;count],
            result: &amp;lhs.values[0..&lt;count]
        )
    }

    mutating func scale(by rhs: Double) {
        vDSP.multiply(
            Float(rhs),
            values,
            result: &amp;values
        )
    }

    var magnitudeSquared: Double {
        Double(
            vDSP.sum(
                vDSP.multiply(values, values)
            )
        )
    }
    
    var count: Int {
        values.count
    }
    
    subscript(_ i: Int) -&gt; Float {
        get {
            values[i]
        } set {
            values[i] = newValue
        }
    }
}
</code></pre><figcaption>Animatable vector</figcaption></figure><h2 id="wobbling-bubble">Wobbling bubble</h2><p>So, as I already said, we need to define stages of animation with <code>AnimatableVector</code> so that SwiftUI will be able to magically draw all in-between frames. </p><p>To do this with a circle, we first need to somehow make it able to <em>wobble. </em>This is done through approximation with curves. To make the morphing effect, we will use <code>AnimatableVector</code> to modify the radius at every specific point.</p><p>That's it</p><p>The first coordinate of the vector will say how much must be added to the distance of the first approximation point. The second is for the second point and so on.</p><p>You can see in a gif below how the radius at every specific point changes and how SwiftUI changes it smoothly. Curves' control points are also displayed.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/wobbleWireframew.gif" class="kg-image" alt="Under the hood of wobbling" loading="lazy" width="1418" height="1370"><figcaption>Under the hood of wobbling</figcaption></figure><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h2 id="implementation">Implementation</h2><p>The concept of animation is determined. It's time to code!</p><p>As I said, the main idea is to approximate a circle with curves. There is an approximation of control points: <code>(4/3)*tan(pi/(2n))</code> distance from a point in a circle with <code>n</code> segments.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/270te.png" class="kg-image" alt loading="lazy" width="635" height="526" srcset="https://alexdremov.me/content/images/size/w600/2022/05/270te.png 600w, https://alexdremov.me/content/images/2022/05/270te.png 635w"><figcaption>https://stackoverflow.com/questions/1734745/how-to-create-circle-with-bézier-curves</figcaption></figure><p>We're going to represent the circle as an object conforming to <code>Shape</code> protocol. For SwiftUI to know what to animate, you need to define <code>animatableData</code> property. That's what SwiftUI is going to use to animate in-between frames.</p><pre><code class="language-swift">var animatableData: AnimatableVector {
    get { animatedValue }
    set { animatedValue = newValue }
}</code></pre><p>A little bit of linear algebra and all point coordinates are calculated. Some more advanced operations on <code>CGVector</code> and <code>CGPoint</code> are needed:</p><pre><code class="language-swift">import Foundation
import SwiftUI

extension CGPoint {
    public static func +(lhs: CGPoint, rhs: CGPoint) -&gt; CGPoint {
        CGPoint(x: lhs.x + rhs.x, y: lhs.y + rhs.y)
    }
    
    static func +(lhs: CGPoint, rhs: CGVector) -&gt; CGPoint {
        CGPoint(x: lhs.x + rhs.dx, y: lhs.y + rhs.dy)
    }
    
    static func -(lhs: CGPoint, rhs: CGVector) -&gt; CGPoint {
        CGPoint(x: lhs.x - rhs.dx, y: lhs.y - rhs.dy)
    }
    
    public static func -(lhs: CGPoint, rhs: CGPoint) -&gt; CGPoint {
        CGPoint(x: lhs.x - rhs.x, y: lhs.y - rhs.y)
    }
    
    init(_ vec: CGVector) {
        self = CGPoint(x: vec.dx, y: vec.dy)
    }
}

extension CGPoint: VectorArithmetic {
    public mutating func scale(by rhs: Double) {
        x = CGFloat(rhs) * x
        y = CGFloat(rhs) * y
    }
    
    public var magnitudeSquared: Double {
        Double(x * x + y * y)
    }
    

}

extension CGVector {
    init(_ point: CGPoint) {
        self = CGVector(dx: point.x, dy: point.y)
    }
    
    func scalar(_ vec: CGVector) -&gt; CGFloat {
        dx * vec.dx + dy * vec.dy
    }
    
    func len() -&gt; CGFloat {
        sqrt(dx * dx + dy * dy)
    }
    
    func perpendicular() -&gt; CGVector {
        CGVector(dx: -dy, dy: dx) / len()
    }
    
    static func *(lhs: CGVector, rhs: CGFloat) -&gt; CGVector {
        CGVector(dx: lhs.dx * rhs, dy: lhs.dy * rhs)
    }
    
    static func *(lhs: CGFloat, rhs: CGVector) -&gt; CGVector {
        CGVector(dx: rhs.dx * lhs, dy: rhs.dy * lhs)
    }
    
    static func /(lhs: CGVector, rhs: CGFloat) -&gt; CGVector {
        CGVector(dx: lhs.dx / rhs, dy: lhs.dy / rhs)
    }
    
    static func -(lhs: CGVector, rhs: CGVector) -&gt; CGVector {
        CGVector(dx: lhs.dx - rhs.dx, dy: lhs.dy - rhs.dy)
    }
    
    static func +(lhs: CGVector, rhs: CGVector) -&gt; CGVector {
        CGVector(dx: lhs.dx + rhs.dx, dy: lhs.dy + rhs.dy)
    }
    
    func angle(_ rhs: CGVector) -&gt; CGFloat {
        return acos(scalar(rhs) / (rhs.len() * len()))
    }
}
</code></pre><p>Finally, implementing <code>Shape</code>:</p><pre><code class="language-swift">import SwiftUI
import Foundation

struct MorphingCircleShape: Shape {
    let pointsNum: Int
    var morphing: AnimatableVector
    let tangentCoeficient: CGFloat
    
    var animatableData: AnimatableVector {
        get { morphing }
        set { morphing = newValue }
    }
    
    // Calculates control points
    func getTwoTangent(center: CGPoint, point: CGPoint) -&gt; (first: CGPoint, second: CGPoint) {
        let a = CGVector(center - point)
        let dir = a.perpendicular() * a.len() * tangentCoeficient
        return (point - dir, point + dir)
    }
    
    // Draw circle
    func path(in rect: CGRect) -&gt; Path {
        var path = Path()
        let radius = min(rect.width / 2, rect.height / 2)
        let center =  CGPoint(x: rect.width / 2, y: rect.height / 2)
        var nextPoint = CGPoint.zero
        
        let ithPoint: (Int) -&gt; CGPoint = { i in
            let point = center + CGPoint(x: radius * sin(CGFloat(i) * CGFloat.pi * CGFloat(2) / CGFloat(pointsNum)),
                                         y: radius * cos(CGFloat(i) * CGFloat.pi * CGFloat(2) / CGFloat(pointsNum)))
            var direction = CGVector(point - center)
            direction = direction / direction.len()
            return point + direction * CGFloat(morphing[i &gt;= pointsNum ? 0 : i])
        }
        var tangentLast = getTwoTangent(center: center,
                                        point: ithPoint(pointsNum - 1))
        for i in (0...pointsNum){
            nextPoint = ithPoint(i)
            let tangentNow = getTwoTangent(center: center, point: nextPoint)
            if i != 0 {
                path.addCurve(to: nextPoint, control1: tangentLast.1, control2: tangentNow.0)
            } else {
                path.move(to: nextPoint)
            }
            tangentLast = tangentNow
        }
        
        path.closeSubpath()
        return path
    }
    
    
    init(_ morph: AnimatableVector) {
        pointsNum = morph.count
        morphing = morph
        tangentCoeficient = (4 / 3) * tan(CGFloat.pi / CGFloat(2 * pointsNum))
    }
}</code></pre><p>Finally, we can use this shape in a View. To make a wobbling effect, we need to change the vector responsible for radius modification.</p><p>This can be done by timer. </p><h3 id="using-timer">Using Timer</h3><p>We're going to randomly change the morphing vector in the timer's callback. Also, it looks weird to change all points at once, so we're going to animate only a subset of them.</p><pre><code class="language-swift">struct MorphingCircle: View &amp; Identifiable &amp; Hashable {
    static func == (lhs: MorphingCircle, rhs: MorphingCircle) -&gt; Bool {
        lhs.id == rhs.id
    }
    
    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }
    
    let id = UUID()
    @State var morph: AnimatableVector = AnimatableVector.zero
    @State var timer: Timer?
    
    func morphCreator() -&gt; AnimatableVector {
        let range = Float(-morphingRange)...Float(morphingRange)
        var morphing = Array.init(repeating: Float.zero, count: self.points)
        for i in 0..&lt;morphing.count where Int.random(in: 0...1) == 0 {
            morphing[i] = Float.random(in: range)
        }
        return AnimatableVector(values: morphing)
    }
    
    func update() {
        morph = morphCreator()
    }
    
    let duration: Double
    let points: Int
    let secting: Double
    let size: CGFloat
    let outerSize: CGFloat
    var color: Color
    let morphingRange: CGFloat
    
    var radius: CGFloat {
        outerSize / 2
    }
    
    var body: some View {
        MorphingCircleShape(morph)
            .fill(color)
            .frame(width: size, height: size, alignment: .center)
            .animation(Animation.easeInOut(duration: Double(duration + 1.0)), value: morph)
            .onAppear {
                update()
                timer = Timer.scheduledTimer(withTimeInterval: duration / secting, repeats: true) { timer in
                    update()
                }
            }.onDisappear {
                timer?.invalidate()
            }
            .frame(width: outerSize, height: outerSize, alignment: .center)
            .animation(nil, value: morph)
        
    }
    
    init(_ size:CGFloat = 300, morphingRange: CGFloat = 30, color: Color = .red, points: Int = 4,  duration: Double = 5.0, secting: Double = 2) {
        self.points = points
        self.color = color
        self.morphingRange = morphingRange
        self.duration = duration
        self.secting = secting
        self.size = morphingRange * 2 &lt; size ? size - morphingRange * 2 : 5
        self.outerSize = size
        morph = AnimatableVector(values: [])
        update()
    }
    
    func color(_ newColor: Color) -&gt; MorphingCircle {
        var morphNew = self
        morphNew.color = newColor
        return morphNew
    }
}</code></pre><h2 id="results">Results</h2><p>Created bubbles can be combined and animated to drift around the screen for example. Also, in the course of this guide, we created <code>AnimatableVector</code> structure that you can use in your projects. </p><p>Feel free to share your results!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/05/ezgif.com-gif-maker.gif" class="kg-image" alt="More wobbling bubbles" loading="lazy" width="800" height="471"><figcaption>More wobbling bubbles</figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Check my iOS section of the blog to learn more useful tips</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/tag/ios/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Alex Dremov | iOS</div><div class="kg-bookmark-description">One of my favorites. Here I write about Swift and iOS development</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1558126372-76b529458592?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDExfHxpb3N8ZW58MHx8fHwxNjQ5NTA0MTQ5&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><h2 id="references">References</h2><ul><li><a href="https://stackoverflow.com/questions/1734745/how-to-create-circle-with-b%C3%A9zier-curves?ref=alexdremov.me">https://stackoverflow.com/questions/1734745/how-to-create-circle-with-bézier-curves</a></li><li><a href="https://developer.apple.com/documentation/swiftui/animatable/animatabledata-swift.property-6nydg?ref=alexdremov.me">https://developer.apple.com/documentation/swiftui/animatable/animatabledata-swift.property-6nydg</a></li></ul> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ New Package: Look at Swift Async Algorithms ]]></title>
                    <description><![CDATA[ Apple released the first version of the async swift algorithms package. It provides tools and algorithms to use with the introduced not that far ago Async Sequence ]]></description>
                    <link>https://alexdremov.me/swift-async-algorithms-module/</link>
                    <guid isPermaLink="false">6269696d28ccc9088e2ac9a3</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Wed, 27 Apr 2022 22:21:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/asyncseq-2.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>About a month ago, Apple released the first version of the <a href="https://github.com/apple/swift-async-algorithms?ref=alexdremov.me">async swift algorithms</a> package. It provides tools and algorithms to use with the introduced not that far ago asynchronous sequence. The package focuses on implementing already well-known tools like <code>zip</code> as well as new features that transact in time (wow). It also makes available more sophisticated ways of creating and managing asynchronous sequences.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">The module's latest version is <code>0.0.1</code>, which means that it's still in development. So, some methods are not available yet, some may change or appear.<br><br>Mostly, this article here is to get to know new features and, possibly, plan your code, keeping in mind that such features will appear in the future</div></div><h2 id="installation">Installation </h2><p>The new package is distributed through Swift PM. To add it to your project, you need to add it as a dependency in the Xcode project <code>File &gt; Add Packages</code>.</p><p>Or add it to your <code>Package.swift</code> file:</p><pre><code class="language-swift">.package(url: "https://github.com/apple/swift-async-algorithms"),</code></pre><p>Don't forget to also add the dependency to the executable:</p><pre><code class="language-swift">.target(name: "&lt;target&gt;", dependencies: [
    .product(name: "AsyncAlgorithms", package: "swift-async-algorithms"),
]),</code></pre><p>The module will be available in your project after adding <code>import AsyncAlgorithms</code>.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">As I mentioned, the module is still in development. So, you need to install <a href="https://www.swift.org/download/?ref=alexdremov.me#trunk-development-main">Swift Trunk Development toolchain</a> to have access to all features. <br>Some of them are available right away, though!</div></div><h2 id="creating-asynchronous-sequences">Creating asynchronous sequences</h2><p>To test all the beautiful functions the new module provides, we need to create an async sequence at first. And the package introduces new ways of doing so.</p><h3 id="property-async">Property <code>async</code></h3><p>The module adds the following extension to <code>Sequence</code> protocol.</p><pre><code class="language-swift">extension Sequence {
  public var async: AsyncLazySequence&lt;Self&gt; { get }
}</code></pre><p>Where <code>AsyncLazySequence</code> conforms to <code>AsyncSequence</code>.</p><pre><code class="language-swift">public struct AsyncLazySequence&lt;Base: Sequence&gt;: AsyncSequence {
}

extension AsyncLazySequence: Sendable where Base: Sendable {
	...
}
extension AsyncLazySequence.Iterator: Sendable where Base.Iterator: Sendable {
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Using the <code>async</code> property, we can turn any existing Sequence into <code>AsyncSequence</code> to use them in some async API, for example.</div></div><pre><code class="language-swift">let numbers = [1, 2, 3, 4].async
let characters = "Hello, world".async
let items = [1: "one", 2: "two", 3: "three"].async</code></pre><p>However, creating <code>AsyncSequence</code> this way does not really bring benefits as all elements are already here and available right away. There are more useful ways of creating <code>AsyncSequence</code>. </p><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h3 id="asyncchannel-and-asyncthrowingchannel">AsyncChannel and AsyncThrowingChannel</h3><p>If you know what <code>Future</code> or <code>Promise</code> in other languages are, then <code>AsyncChannel</code> will be familiar to you. Except that it provides a way of transferring a <strong>sequence</strong> of values. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">❗</div><div class="kg-callout-text">Channel's element must conform to the <code>Sendable</code> protocol, which basically means that public API is safe to use across concurrency domains.<br><br>All basic types automatically conform to it. For custom types, you need to add the conformance before use.</div></div><p>Here's a pretty straightforward example of <code>AsyncChannel</code> usage.</p><pre><code class="language-swift">let channel = AsyncChannel&lt;String&gt;()
Task {
    for word in ["Hello", "from", "async", "channel"] {
      await channel.send(word)
    }
    await channel.finish()
}

for await message in channel {
    print(message)
}</code></pre><pre><code>Hello
from
async
channel</code></pre><p>Notice that <code>await</code> keyword is used with send and finish. This is because the channel is <strong>actually both ways synchronized</strong>. That means that <code>send</code> awaits consumption and vice versa.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">The <code>await channel.send()</code> waits until the sent value will be consumed in any way. This way, the one who produces values for the channel, will not generate more values than the receiver can consume&nbsp;</div></div><p><code>AsyncThrowingStream</code> is almost the same except that it provides <code>fail(_ error: Error)</code> method that can be used to throw an exception to the channel's consumer.</p><pre><code class="language-swift">let channel = AsyncThrowingChannel&lt;String, Error&gt;()

...

for try await message in channel {
    print(message)
}</code></pre><h3 id="and-converting-back">And converting back</h3><p>The module adds initializers for three primary types: <code>Array</code>, <code>Dictionary</code>, and <code>Set</code> that let you transform the async sequence to the regular one by fetching all elements during init.</p><pre><code class="language-swift">let table = await Dictionary(uniqueKeysWithValues: zip(keys, values))
let allItems = await Set(items.prefix(10))
let allMessages = await Array(channel)</code></pre><h2 id="manipulating-asynchronous-sequences">Manipulating asynchronous sequences</h2><p>The module also provides new ways of combining asynchronous sequences. These functions are pretty straightforward.</p><ul><li><code>chain(_ s1: AsyncSequence, _ s2: AsyncSequence)</code></li></ul><p>Chains two or three asynchronous sequences together sequentially where the elements from the result are comprised in order from the elements of the first asynchronous sequence and then the second (and so on) or until an error occurs. Sequences must have the same <code>Element</code> type.</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Sequence 1</th>
<th style="text-align:center">Sequence 2</th>
<th style="text-align:center">Result</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center"></td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">4</td>
<td style="text-align:center"></td>
<td style="text-align:center">4</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">2</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">3</td>
<td style="text-align:center">3</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Apple notes that it can be used for two<strong> or more </strong>sequences. Though, only two or three arguments are available now.&nbsp;</div></div><ul><li><code>joined()</code> or <code>joined(separator: AsyncSequence)</code></li></ul><p>Concatenates an asynchronous sequence of asynchronous sequences together where the result is comprised in order from the elements of the first asynchronous sequence and then the second (and so on) or until an error occurs. Similar to <code>chain()</code>except the number of asynchronous sequences to concatenate is not known upfront. The separator also can be specified.</p><ul><li><code>combineLatest(_ base1: AsyncSequence, _ base2: AsyncSequence)</code></li></ul><p>Combines two <em>or more</em> sequences, producing tuples of the latest values available from the sequence.</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Sequence 1</th>
<th style="text-align:center">Sequence 2</th>
<th style="text-align:center">Result</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center"></td>
<td style="text-align:center"><em>awaits</em></td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">2</td>
<td style="text-align:center">(1, 2)</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">3</td>
<td style="text-align:center">(1, 3)</td>
</tr>
<tr>
<td style="text-align:center">4</td>
<td style="text-align:center"></td>
<td style="text-align:center">(4, 3)</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><ul><li><code>merge(_ base1: AsyncSequence, _ base2: AsyncSequence)</code></li></ul><p>Merges sequences into a new one. The result is a combination of results from two sequences. Sequences must have the same <code>Element</code> type.</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Sequence 1</th>
<th style="text-align:center">Sequence 2</th>
<th style="text-align:center">Result</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
<td style="text-align:center"><em>awaits</em></td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center"></td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">2</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">3</td>
<td style="text-align:center">3</td>
</tr>
<tr>
<td style="text-align:center">4</td>
<td style="text-align:center"></td>
<td style="text-align:center">4</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Considering that it's not defined from which sequence element will appear faster, the order of elements can be whatever</div></div><ul><li><code>zip(_ base1: AsyncSequence, _ base2: AsyncSequence)</code></li></ul><p>The same as a regular <code>zip</code> but for <code>AsyncSequence</code>. Differs from <code>combineLatest</code> as it waits until the second value is available and does not use the last value.</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Sequence 1</th>
<th style="text-align:center">Sequence 2</th>
<th style="text-align:center">Result</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center"></td>
<td style="text-align:center"><em>awaits</em></td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">2</td>
<td style="text-align:center">(1, 2)</td>
</tr>
<tr>
<td style="text-align:center"></td>
<td style="text-align:center">3</td>
<td style="text-align:center"><em>awaits</em></td>
</tr>
<tr>
<td style="text-align:center">4</td>
<td style="text-align:center"></td>
<td style="text-align:center">(4, 3)</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><h2 id="time-related-functions">Time-related functions</h2><p>Sounds awesome, but Swift is not powerful enough to put <code>await</code> before the time itself. When events can potentially happen faster than the desired consumption rate, there are ways to handle the situation. These functions allow linking <code>AsyncSequences</code> with time. They can be applied to any <code>AsyncSequence</code>.</p><p>For both listed methods, a custom clock can be specified. By default, it's <code>ContinuousClock</code></p><h3 id="debounce">Debounce</h3><pre><code class="language-swift"> public func debounce&lt;C: Clock&gt;(
    for interval: C.Instant.Duration, 
    tolerance: C.Instant.Duration? = nil, 
    clock: C
  ) -&gt; AsyncDebounceSequence&lt;Self, C&gt;</code></pre><p>The debounce algorithm produces elements after a particular duration has passed between events. If there are a lot of events happening, debounce will wait until at least <code>interval</code> of time elapsed from the last event before emitting value.</p><pre><code class="language-swift">seq.debounce(for: .seconds(1))</code></pre><p>In this case, it transforms a potentially fast asynchronous sequence of events into one that waits for a window of 1 second <strong>with no events</strong> to elapse before emitting a value.</p><h3 id="throttle">Throttle</h3><pre><code class="language-swift">extension AsyncSequence {
  public func throttle&lt;C: Clock, Reduced&gt;(
    for interval: C.Instant.Duration, 
    clock: C, 
    reducing: @Sendable @escaping (Reduced?, Element) async -&gt; Reduced
  ) -&gt; AsyncThrottleSequence&lt;Self, C, Reduced&gt;
  
  public func throttle&lt;Reduced&gt;(
    for interval: Duration, 
    reducing: @Sendable @escaping (Reduced?, Element) async -&gt; Reduced
  ) -&gt; AsyncThrottleSequence&lt;Self, ContinuousClock, Reduced&gt;
  
  public func throttle&lt;C: Clock&gt;(
    for interval: C.Instant.Duration, 
    clock: C, 
    latest: Bool = true
  ) -&gt; AsyncThrottleSequence&lt;Self, C, Element&gt;
  
  public func throttle(
    for interval: Duration, 
    latest: Bool = true
  ) -&gt; AsyncThrottleSequence&lt;Self, ContinuousClock, Element&gt;
}</code></pre><p>The throttle algorithm produces elements such that at least a specific interval has elapsed between them. If values are produced by the base <code>AsyncSequence</code> the throttle does not resume its next iterator until the period has elapsed or unless a terminal event is encountered. Similarly to <code>debounce</code>, a custom clock can be specified.</p><pre><code class="language-swift">seq.throttle(for: .seconds(1))</code></pre><p>In this case, the throttle transforms a potentially fast asynchronous sequence of events into one that waits for a window of 1 second to elapse before emitting a value.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Notice that debounce, waits for a window <strong>with no events</strong>, while throttle simply waits for a window.</div></div><h2 id="final-notes">Final notes</h2><p>It's actually frankly entertaining to watch how Swift unfolds new features and how they are developed. Definitely check the project's GitHub mentioned in references to check out the module's source code.</p><p>If you feel not really confident with relatively new swift concurrency features, check out my quick guide to async/await in Swift.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/quick-guide-to-async-await-in-swift/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Quick Guide to Async Await in Swift | Alex Dremov</div><div class="kg-bookmark-description">Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/2022/04/slide_17.jpg" alt=""></div></a></figure><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/apple/swift-async-algorithms?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - apple/swift-async-algorithms: Async Algorithms for Swift</div><div class="kg-bookmark-description">Async Algorithms for Swift. Contribute to apple/swift-async-algorithms development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">apple</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/25b178985a8c49655550b061d0d0ef4bda784e300a455499592788297fd99f67/apple/swift-async-algorithms" alt=""></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Treap: The Easiest Search Tree (Explained) ]]></title>
                    <description><![CDATA[ Binary search trees are mostly hard. Writing red-black tree is a nightmare. Here, I&#39;m going to explain one of the easiest, yet efficient and powerful balanced binary tree — treap or cartesian tree ]]></description>
                    <link>https://alexdremov.me/treap-algorithm-explained/</link>
                    <guid isPermaLink="false">625daf371f7145b0e9441705</guid>
                    <category><![CDATA[ Algorithms ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Mon, 25 Apr 2022 06:00:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/featured.png" medium="image"/>
                    <content:encoded><![CDATA[ <p><strong>Cartesian tree or treap</strong> (binary search tree + binary heap) is a fast yet simple data structure. It conforms to a core search binary tree property and binary heap property at the same time. Despite its simplicity, treap self-balances, resulting in <code>O(logn)</code> complexity on average for all common operations.</p><p>Amazing, right?</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">The algorithm uses random values. Therefore, <code>O(logn)</code> complexity is <strong>on average</strong>. However, with a lot of items <code>O(logn)</code> is almost always true. So, later in this article, I will use just <code>O(logn)</code> without "on average" addition.</div></div><p>Moreover, there is a modification (implicit treap, treap with implicit key) that lets you use treap as a usual array with <code>O(logn)</code> <strong>random insertions and</strong> <strong>random deletions</strong>. Isn't it cool? In this article I'll explain how to create one and provide the implementation in Swift. Also, I will compare treap to the general <code>set</code> from standard library. Let's start!</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">In a binary search tree, for each node, all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater</div></div><h2 id="core-algorithm">Core algorithm</h2><p>As I said earlier, treap combines heaps and binary search trees. Therefore, we are going to store at least two properties: <code>key</code> (or value) and <code>priority</code>. Key is a value for which tree is a search tree and for the priority, it is a binary heap.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">A binary heap is a binary tree where each node child's value is less than the node's value</div></div><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/treap.png" class="kg-image" alt loading="lazy" width="1440" height="900" srcset="https://alexdremov.me/content/images/size/w600/2022/04/treap.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/treap.png 1000w, https://alexdremov.me/content/images/2022/04/treap.png 1440w" sizes="(min-width: 720px) 720px"><figcaption>Treap example</figcaption></figure><p>On the image above, you may notice that for every node, all child's priorities are less. On the other side, all children on the left have a key less than that in the node, and all children on the right have a larger key.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">It's also called a <strong>cartesian tree </strong>as it can be displayed on a regular 2D grid with (key, priority) coordinate for each node. Just like in the image above.</div></div><p>To create a fully-functioning search tree, we need to implement:</p><ul><li>find</li><li>insert</li><li>remove</li></ul><p>More exotic operations like <code>lower bound</code> and <code>upper bound</code> are also pretty simple and does not differ from those in the other search trees. And all these operations can be implemented using <strong>just two helper operations</strong>!</p><p>How to do that?</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><strong>Split</strong><br><br>Splits the tree into two trees by given <code>value</code>. All values in the left tree are <strong>less</strong> than the <code>value</code> while in the right tree are <strong>greater</strong>. And both resulting trees are correct treaps. <br><br>We will use a special flag that decides whether to send values that are equal to the left tree or to the right tree.</div></div><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/split-image-1.png" class="kg-image" alt="Example of split function result. The equal value sent to the right" loading="lazy" width="1440" height="734" srcset="https://alexdremov.me/content/images/size/w600/2022/04/split-image-1.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/split-image-1.png 1000w, https://alexdremov.me/content/images/2022/04/split-image-1.png 1440w" sizes="(min-width: 1200px) 1200px"><figcaption>Example of split function result. The equal value sent to the right</figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><strong>Merge</strong><br><br>Merges two treaps into one big treap.<br><strong>Prerequisite:</strong> all items in the first tree are less than items in the right tree.</div></div><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/merge-example.png" class="kg-image" alt="Treap merge example" loading="lazy" width="1440" height="900" srcset="https://alexdremov.me/content/images/size/w600/2022/04/merge-example.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/merge-example.png 1000w, https://alexdremov.me/content/images/2022/04/merge-example.png 1440w" sizes="(min-width: 1200px) 1200px"><figcaption>Merge example</figcaption></figure><p>So, if we implement these two methods, implementing all other three operations would be trivial.</p><h2 id="split">Split</h2><p>Let's start thinking about code at this stage. I'm going to explain this in <code>C++</code>. Rewriting the following code in <code>Swift</code> is actually really easy. Leave a comment bellow if you need a help.</p><figure class="kg-card kg-code-card"><pre><code class="language-cpp">template&lt;typename T&gt;
struct Node {
	T key;
	size_t prior;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};</code></pre><figcaption>Structure of treap's node</figcaption></figure><p>For split, we have a <code>head</code> node and a <code>key</code> for which split needs to be done. This method is extremely simple using recursion.</p><h3 id="algorithm">Algorithm</h3><p>Let the current head be <code>p</code>.</p><ul><li>If <code>p-&gt;key</code> is <strong>less than</strong> the <code>key</code>, then we need to go <strong>right </strong>and split <code>p-&gt;right</code> further. <br><br>Also, splitting <code>right</code> will bring two trees as well, and the first one will have nodes with keys <strong>less </strong>than the <code>key</code>. Yet, they are greater than the <code>p-&gt;key</code> (as they are in the second tree of the first split). <br>So, we set <code>p-&gt;right</code> to the <strong>first</strong> tree of splitting <code>right</code> result.<br><br><strong>Result:</strong>  <code>p</code>, split right's second tree</li><li>If the <code>p-&gt;key</code> is <strong>greater or equal </strong>to the <code>key</code>, then we need to go <strong>left </strong>and split <code>p-&gt;left</code> further.<br><br>Similarly to the case above,  we set <code>p-&gt;left</code> to the <strong>second</strong> tree of split left.<br><br><strong>Result: </strong>split left's first tree, <code>p</code></li></ul><p>The algorithm above leaves a node that is equal to the split value in the second tree. Symmetrically, we will use the <code>equalOnTheLeft</code> flag to leave the node in the left tree.</p><p>So, the final code:</p><pre><code class="language-cpp">pair&lt;Node *, Node *&gt; split (Node *p, const T&amp; key,
				bool equalOnTheLeft=false) {
    if (!p) // reached leaf
    	return {nullptr, nullptr};
    if (p-&gt;key &lt; key ||
    	(equalOnTheLeft &amp;&amp; p-&gt;key == key)) { // splitting right
        auto q = split(p-&gt;right, key, equalOnTheLeft);
        
        // q.first has nodes of the right
        // subtree that are less than key
        p-&gt;right = q.first; 
        
        return {p, q.second};
    } else { // splitting left
        auto q = split(p-&gt;left, key, equalOnTheLeft);
        
        // q.second has nodes of the left 
        // subtree that are greater or equal
        // to the key
        p-&gt;left = q.second;
        
        return {q.first, p};
	}
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Priorities are not used and not changed during the split procedure. The resulting trees have the right order of priorities as the initial tree had it right</div></div><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h2 id="merge">Merge</h2><p>Merge is similar to split, but it uses <strong>priorities</strong> to do the work. As I mentioned before, there is a <strong>prerequisite</strong>: all items in the first merged tree must be less than items in the second tree. If this is not true, another algorithm must be used.</p><h3 id="algorithm-1">Algorithm</h3><p>Similarly to <code>split</code>, <code>merge</code> is also recursive. Let us have two trees to merge: <code>l</code> and <code>r</code>.</p><ul><li>We need to choose which tree will represent the new head. That's simple — the head must have the greatest priority, so we choose <code>l</code> or <code>r</code> based on that.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Notice that the head node in <code>l</code> has the highest priority in the whole <code>l</code> tree as its a property of correct treap. The same applies to <code>r</code>.</div></div><ul><li>If <code>l</code> has greater priority, then <code>l-&gt;left</code> subtree will remain intact as left subtree for sure less than <code>r</code> and it has nothing to do with it. <br><br>Then,  <code>l-&gt;right</code> subtree must be merged with <code>r</code> and it's going to be the new <code>l-&gt;right</code> subtree.</li><li>If <code>r</code> has greater priority, then, similar to the example above, <code>r-&gt;right</code> will remain intact and <code>r-&gt;left</code> must be merged with <code>l</code></li></ul><pre><code class="language-cpp">Node* merge (Node *l, Node *r) {
    if (!l) // left is empty
    	return r;
    if (!r) // right is empty
    	return l;
        
    if (l-&gt;prior &gt; r-&gt;prior) { // l has the new head.
        l-&gt;right = merge(l-&gt;right, r);
        return l;
    } else { // r has the new head.
        r-&gt;left = merge(l, r-&gt;left);
        return r;
    }
}</code></pre><p>Why is it correct?</p><p>It seems like nothing stops us from breaking the search tree structure where all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text"><strong>Prerequisite</strong> saves binary search tree property as items are never reordered and <code>l &lt; r</code> property is always kept the same</div></div><h2 id="implementing-search-tree-methods">Implementing search tree methods</h2><p>You believed me that all methods are easy to implement through <code>split</code> and <code>merge</code>. Time to prove that.</p><h3 id="find-function find() { [native code] }1">Find</h3><p>Find is implemented just like for the general search tree. We use the fact that keys in the left subtree are greater than the value in the node.</p><pre><code class="language-swift">Node* find(Node* node, const T&amp; key) {
	if (node == nullptr)
		return nullptr;
    if (node-&gt;key == key)
		return node;
    return find(key &gt;= node-&gt;key ? node-&gt;right : node-&gt;left, key);
}</code></pre><h3 id="insert">Insert</h3><p>Let's think about insert in terms of split and merge. We have one big tree and we need to insert a new <code>key</code>. </p><ul><li>Split the tree by <code>key</code> to new trees: <code>first</code> and <code>second</code>. Then, we will have two trees: the first (which has values lower than the <code>key</code>) and the second (which has values greater or equal to the <code>key</code>).<br><br>We can check that node already exists: try to find it in the right tree.</li></ul><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">Implementation requires that each item is met only <strong>once</strong>.<br><br>If you need to insert multiple copies of the same item, you can store an item and it's count to achieve that</div></div><ul><li>Create a new node that will store the new <code>key</code> — <code>newNode</code>. Ta-da this node is a correct treap that has only one node.<br><br>For the new node, you need to set<strong> a random priority</strong></li></ul><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><strong>Random priorities </strong>are key to the complexity. This makes the cartesian tree balance itself, making <code>O(logn)</code> complexity for all operations</div></div><ul><li>New head will be  <code>merge(first, merge(newNode, second))</code></li></ul><p>See? It's that simple.</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/merge-example-specific.png" class="kg-image" alt="Insert example" loading="lazy" width="1440" height="1048" srcset="https://alexdremov.me/content/images/size/w600/2022/04/merge-example-specific.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/merge-example-specific.png 1000w, https://alexdremov.me/content/images/2022/04/merge-example-specific.png 1440w" sizes="(min-width: 1200px) 1200px"><figcaption>Insert example</figcaption></figure><pre><code class="language-cpp">Node* insert(Node* head, T key) {
    auto split = split(head, key);
    if (find(split.second, key) != nullptr) {
    	// Key exists already
        // Merge back
        return merge(split.first, split.second);
    }
    
    auto newNode = new Node(std::move(key), rand());
    return merge(split.first, merge(newNode, splitsplitted.second));
}</code></pre><h3 id="remove">Remove</h3><p>It's very similar to <code>insert</code>. However, that's where the <code>equalOnTheLeft</code> flag is used.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Remember that the <code>second</code> tree produced by <code>split</code> contains items greater or <strong>equal </strong>to the selected key</div></div><p>Therefore, the <code>second</code> tree will contain the value that needs to be removed. But how to remove it from the tree? </p><p>Split again.</p><p>We can split the <code>second</code> tree by key, setting the <code>equalOnTheLeft</code> flag to <code>true</code>. Thus, the node will be separated from the <code>second</code> tree to the new tree.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">After conducting two splits and separating deleted node, unneded node is easely removed everything else is merged.</div></div><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/remove-example.png" class="kg-image" alt="Remove example" loading="lazy" width="1440" height="1048" srcset="https://alexdremov.me/content/images/size/w600/2022/04/remove-example.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/remove-example.png 1000w, https://alexdremov.me/content/images/2022/04/remove-example.png 1440w" sizes="(min-width: 1200px) 1200px"><figcaption>Remove example</figcaption></figure><pre><code class="language-cpp">Node *remove(Node *head, const T &amp;key) {
    auto split = split(head, key);
    if (split.second) {
        auto secondSplit = split(split.second, key,
                                     /*equalOnTheLeft=*/true);
        // Key exists, so delete it and merge
        auto everythingElse = secondSplit.second;
        if (secondSplit.first == nullptr) {
            // There's no element equal to key. Merge back.
            return merge(split.first, everythingElse);
        }

        // We got node with key value in
        // secondSplit.first
        delete secondSplit.first;

        size--;
        return merge(split.first, everythingElse);
    }
    // Key is not presented. Merge back.
    return merge(split.first, split.second);
}</code></pre><h2 id="full-code">Full code</h2><p>You can download C++ code of a little bit optimized Treap here:</p>
        <div class="kg-card kg-file-card ">
            <a class="kg-file-card-container" href="https://alexdremov.me/content/files/2022/04/treap.h" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">Treap</div>
                    <div class="kg-file-card-caption">C++ code allocations-optimised</div>
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">treap.h</div>
                        <div class="kg-file-card-filesize">5 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="comparing-to-stdset">Comparing to <code>std::set</code></h2><p>First of all, the implemented version of treap utilizes <code>split</code> and <code>merge</code> methods. Note that there is more efficient implementation that uses rotations. However, the true power of treap is in <code>split</code> and <code>merge</code> methods as other search trees can't do it easily.</p><h3 id="find-tests">Find tests</h3><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/find-3.svg" class="kg-image" alt="Find operation test" loading="lazy" width="1152" height="768"><figcaption>Find operation test</figcaption></figure><p>It's visible that asymptotics is similar. Though, treap always has the greater overhead. Still, it's a good result! We're competing with an utterly optimized standard library data structure.</p><h3 id="inserts">Inserts</h3><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/insert.svg" class="kg-image" alt loading="lazy" width="1152" height="768"><figcaption>Insert operation test</figcaption></figure><p>Insertion has even bigger overhead. And it was expected: recursive calls of merge and split do not improve performance ;)</p>
        <div class="kg-card kg-file-card ">
            <a class="kg-file-card-container" href="https://alexdremov.me/content/files/2022/04/TreapProject-1.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">TreapProject</div>
                    <div class="kg-file-card-caption">Comparisons tests. Outputs CSV of time measurments </div>
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">TreapProject.zip</div>
                        <div class="kg-file-card-filesize">7 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h3 id="comparison-conclusion">Comparison conclusion</h3><figure class="kg-card kg-image-card kg-width-wide"><img src="https://alexdremov.me/content/images/2022/04/height50m-1.svg" class="kg-image" alt="Height of treap evaluation" loading="lazy" width="1152" height="768"></figure><p>As you see, treap has higher nodes' height on average than that of very well-balanced AVL tree. </p><p>Yes, treap has worse performance than that of <code>std::set</code>. Yet, the results are comparable, and with a large data size, treap gets closer and closer to <code>std::set</code> which in fact is a red and black tree.</p><p>Believe me, <strong>you don't want to write your own RB tree</strong>. It's a nightmare.</p><h2 id="use-cases-and-modifications">Use cases and modifications</h2><p>We developed this data structure not just to lose <code>std::set</code>. There are several useful applications.</p><h3 id="sum-of-numbers-in-the-interval">Sum of numbers in the interval </h3><p>We need to modify <code>Node</code> structure, adding <code>sum</code> field. It will store sum of all its children and itself.</p><pre><code class="language-cpp">template&lt;typename T&gt;
struct Node {
	T key;
	size_t prior;
	long long sum;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};</code></pre><p>It's extremely easy to update the <code>sum</code>. Every time childs are changed, <code>sum = left-&gt;sum + right-&gt;sum</code>. So, you can implement some kind of <code>update</code> function and call it in split and merge right before returning value. That's it.</p><p>How to answer on request?</p><p>We receive interval <code>[l, r]</code>. To calculate the sum of numbers on this interval, we can split the tree by <code>l</code>, then split the second tree of the result by <code>r+1</code> (or by <code>r</code>, leaving equal elements on the left). In the end, we will have a tree containing all added numbers in the interval <code>[l, r]</code>.</p><p><strong>Complexity:</strong> <code>O(logn)</code> versus <code>O(n)</code> naive.</p><h3 id="using-a-hash-of-value-in-place-of-priority">Using a hash of value in place of priority</h3><p>You can use a hash of value as a priority as a good hash function is pretty random. What benefits does it bring?</p><p>If keys and priorities are fixed, then no matter how you construct the treap or add elements, it's always going to have the same structure.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">You may think about it this way: keys fix x axis and priorities fix y axis of treap</div></div><p>Therefore, you can compare two sets in <code>O(n)</code> as treaps containing the same values will have <strong>absolutely the same structure.</strong></p><h2 id="implicit-treap">Implicit treap</h2><p>What if we use <strong>the size of the left subtree</strong> as a key? Then, we can use this key as an index. Wow. That means that we can represent a regular ordered array as a treap!</p><p>By doing this, we can:</p><ul><li>make insertions by random index<br> <code>O(logn)</code> versus <code>O(n)</code> naive</li><li>make deletions by random index<br> <code>O(logn)</code> versus <code>O(n)</code> naive</li></ul><p>With great power, comes great responsibility.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Access by random index downgrades to <code>O(logn)</code> versus <code>O(1)</code> in the standard array.</div></div><p>If your algorithm requires a lot of array modifications and very few accesses/outputs, then it's the right choice. Moreover, you can convert treap into an array and back with <code>O(n)</code> complexity.</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">I have implicit treap implemented in <strong>Swift</strong>. It behaves just like the general array and implements a lot of optimisations. Check it out!</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AlexRoar/swift-collections/tree/main/Sources/OrderedCollections/TreeArray?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">swift-collections/Sources/OrderedCollections/TreeArray at main · AlexRoar/swift-collections</div><div class="kg-bookmark-description">Commonly used data structures for Swift. Contribute to AlexRoar/swift-collections development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">AlexRoar</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/49005a356ceb6b3993976514f19972ef42448ba48f43e8a30af9ce7834af1d0a/AlexRoar/swift-collections" alt=""></div></a></figure><h3 id="cut-paste-problem">Cut-paste problem</h3><p>Imagine that you have a big string and you recieve requests to cut some part and to insert it somwhere. </p><p>This problem can be solved using treaps with implicit key. You can use splits to cut needed part and merge to insert it. </p><h2 id="faq">FAQ</h2><blockquote>Cartesian trees are most suitable for what?</blockquote><p>Treap is useful when you need to collect some kind of characteristic on an interval (for example, sum) or apply some modification to the interval. Treap with implicit key is also useful when you need to apply a lot of random tree insertions/deletions with few accesses.</p><blockquote>Why don't we use array indices as keys for an implicit treap?</blockquote><p>Because in case of insertion we would need to recalculate all indeces that are higher than inserted index. Therefore, it downgrades complexity to <code>O(n)</code>.</p><blockquote>Is treap a randomized tree?</blockquote><p>Yes, it is. But it can also use hash value in place of a random value.</p><blockquote>I know about implementation without split and merge. It utilizes left and right turns. Is it better?</blockquote><p>For example, GeeksforGeeks use such implementation, I know. But I believe that the true value of treap is in seampless splits and merges. You've already seen by examples how it is really usefull. Why implementing treap with turns when you can build AVL that's probably going to be faster?</p><h2 id="love-data-structures">Love data structures?</h2><p>Check out my article on the amazing Skip List! While a lot of people never heard about it, Skip List is <strong>beautiful</strong> and can solve, for example, the problem of finding the n-th maximum or the rolling median problem in the most efficient way.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/skip-list-indexation-and-kth-maximum/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Skip List Indexation and kth Maximum | Alex Dremov</div><div class="kg-bookmark-description">Skip List is a nice structure that lets you to perform insertions, searches, and finding n-th maximum. In this post I fokus on skip list indexation</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/2022/04/--------------2020-11-06---01.51.30.png" alt=""></div></a></figure><p>Also, you can check the whole algorithms section of my blog</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/tag/algorithms/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Alex Dremov | Algorithms</div><div class="kg-bookmark-description">Those are hard! In this section I discuss algorithms that I encountered during work or my college assignments</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://images.unsplash.com/photo-1580777361964-27e9cdd2f838?crop&#x3D;entropy&amp;cs&#x3D;tinysrgb&amp;fit&#x3D;max&amp;fm&#x3D;jpg&amp;ixid&#x3D;MnwxMTc3M3wwfDF8c2VhcmNofDZ8fGFsZ29yaXRobXxlbnwwfHx8fDE2NDk1MDYwMDM&amp;ixlib&#x3D;rb-1.2.1&amp;q&#x3D;80&amp;w&#x3D;2000" alt=""></div></a></figure><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://books.google.de/books?id=NLngYyWFl_YC&pg=PA298&lpg=PA298&dq=treap+algorithm&source=bl&ots=BASmGA8mBd&sig=ACfU3U17YFycVO2ztnR-zjL5yLbhEfv3VQ&hl=en&sa=X&ved=2ahUKEwjxqr_r0qf3AhXD0qQKHcWWDjcQ6AF6BAgyEAM#v=onepage&q=treap%20algorithm&f=false"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Introduction To Algorithms</div><div class="kg-bookmark-description">The first edition won the award for Best 1990 Professional and Scholarly Book in Computer Science and Data Processing by the Association of American Publishers.There are books on algorithms that are rigorous but incomplete and others that cover masses of material but lack rigor. Introduction to Algo…</div><div class="kg-bookmark-metadata"><span class="kg-bookmark-author">Google Books</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://books.google.de/books/content?id&#x3D;NLngYyWFl_YC&amp;printsec&#x3D;frontcover&amp;img&#x3D;1&amp;zoom&#x3D;1&amp;edge&#x3D;curl&amp;imgtk&#x3D;AFLRE73PtOLetmXuVivAcv-TRLkC8fjpuL48GXZzQ576K23NJLUElL93yxbTDC9ES8rz_-HbjeVd9GkteTzhsSnzKtt9jLIQ-vYsdZyiETYa-lk1uLKXz-jXTl2sR8t2kpHJy_777gys" alt=""></div></a></figure><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://algorithmica.org/ru/treap?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Декартово дерево - Алгоритмика</div><div class="kg-bookmark-description"></div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="data:image/x-icon;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAQAAADZc7J/AAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAAmJLR0QA/4ePzL8AAAAJcEhZcwAACxMAAAsTAQCanBgAAAAHdElNRQfjBggQFxAS4ilBAAACgElEQVRIx52VS0iUYRSGn/8fzbxMIYoRKSZKi2wGwo2FxRQZIlGbaGUF0b52LWbTLlx0W7VyEZiBWkE5mraRRrJFRUwolQw6zQTeuoiU423eFuk0M//nhJ7dd+Z7n/8932HOscgIgU0lR/Cxn3LKgGlijDJIkAgJiywhZMujVo0qrsyIa1St8shWFnmp/IooW0TkV6mMYuRVQCv6X6woIK+QQ16vkPP2M/WZICHVpyGEvCb5vE7olH6bEV6l1R4w3RpQkYoVNJcSWHsLIUt+rTpvLOuSELqihAmwKr8sgZBHE2aX5UKoRmNmDxPyCBtoodLU1m5yaADC9Jr7XkkLoCqNmPAx1eqqnqhAqEHfzB5GVGXTQLUJ/5xZznGcOuAtQbOHahpsfOQ5f5mng8McZAdnsVigkyUTIA8fGjZ561OpuiRJY6oR2qU35iKGbSqc4GU6qObYmstmYIrH5iIq0JIT+17lupU8BVUsVKsvJgdLtgnbxXZOJ091HAU+0W+0YCnGnvRUlCa+42OdbRHiA3CSbtyZ+q85RDMBfcxxjZKUf5uPO4wwzGsaMwFR1JZe1Jx8uuB4mBtC6LJzXLTZDLKYihziM+fJzfjQGfYC/XxMTy8yaDNEOLWBDznAIcdT7aMRiPI0PR1myGacnn+Zd7ygiUIHwEUjuUAnsdR0D+M20E7k7/knt5ki39guN9uAEPeIr6citIPrOsxQiC9hveIuLylkkgXyKUsR/2CAB8yyEzdhJiliN3aCmzwCCwSl3FfzDL9wYZFAuClJAcSZJoELALFKAWVYvVxkdm3NbDRUs0TqUM021jeU15s2w9YXSxKy9dWWRGx6uVpOyObW+x/B+LEV0hF3cAAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxOS0wNi0wOFQxNjoyMzoxNiswMjowMLEfSBUAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTktMDYtMDhUMTY6MjM6MTYrMDI6MDDAQvCpAAAAV3pUWHRSYXcgcHJvZmlsZSB0eXBlIGlwdGMAAHic4/IMCHFWKCjKT8vMSeVSAAMjCy5jCxMjE0uTFAMTIESANMNkAyOzVCDL2NTIxMzEHMQHy4BIoEouAOoXEXTyQjWVAAAAAElFTkSuQmCC" alt=""><span class="kg-bookmark-author">Алгоритмика</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://hsto.org/storage/habraeffect/a1/0a/a10a744def8f325a1019502ecc175ef6.png" alt=""></div></a></figure><p><a href="https://www.cs.cmu.edu/~scandal/papers/treaps-spaa98.pdf?ref=alexdremov.me">https://www.cs.cmu.edu/~scandal/papers/treaps-spaa98.pdf</a></p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Type Placeholders: New Swift 5.6 Feature ]]></title>
                    <description><![CDATA[ Type placeholders were recently introduced in Swift 5.6. Get in touch with new useful Swift feature. ]]></description>
                    <link>https://alexdremov.me/swift-type-placeholders/</link>
                    <guid isPermaLink="false">625b016c56f4d53d050c5172</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 21 Apr 2022 06:00:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/placeholderRemade-min.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Type placeholders were recently introduced in Swift 5.6. And yes, they are a nice add-on to powerful Swift type inference system. If you are familiar with C++, you must know about an <code>auto</code> keyword. Type placeholders are <em>almost</em> the same.</p><h2 id="generics-and-type-placeholder">Generics and type placeholder</h2><pre><code class="language-swift">let number: _ = 42 // Type placeholder
let anotherNumber = 42</code></pre><p>Yes, Swift can infer variable's type, but type placeholders mean to be used for a type with <strong>multiple types in it</strong>. Generics. That's where they really shine.</p><p>Consider regular <code>Result</code> enum</p><pre><code class="language-swift">enum Result&lt;Success, Failure&gt; where Failure : Error {
    case success(Success)
    case failure(Failure)
}</code></pre><p>And what if we have some kind of complex object</p><pre><code class="language-swift">var ohMy = [1: [3: (1, 2, 3, "That's a long tuple")]]</code></pre><p>If you will try to create a <code>Result</code>  from <code>ohMy</code>, you'll see compilation error.</p><pre><code class="language-swift">let result = Result.success(ohMy)</code></pre><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Generic parameter <code>Failure</code> could not be inferred</div></div><p>Bruh. So I need to write...</p><pre><code class="language-swift">let result = Result&lt;[Int : [Int : (Int, Int, Int, String)]], Error&gt;.success(ohMy)</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Use type placeholders to omit type that Swift can infer</div></div><p>Thanks to type placeholders, no. Swift can infer object's type by itself. So, we need to provide <code>Failure</code> type only.</p><pre><code class="language-swift">let result = Result&lt;_, Error&gt;.success(ohMy) // Nice</code></pre><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a class="gh-portal-triggerbtn-container">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><h2 id="collections-and-type-placeholder">Collections and type placeholder</h2><p>This feature also useful with collections. What if we need a dictionary with enum keys?</p><pre><code class="language-swift">enum Foo {
	case bizz
	case bonk
}

let results = [
	.bizz: ohMy,
	.bonk: ohMy
]</code></pre><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Reference to member <code>bizz</code> cannot be resolved without a contextual type</div></div><p>So, let's provide this <em>contextual type, </em>but you remember how <code>ohMy</code>'s type is bad-looking? Let's use type placeholder.</p><pre><code class="language-swift">// 🚫
let results:[Foo: [Int : [Int : (Int, Int, Int, String)]]] = [
	.bizz: ohMy,
	.bonk: ohMy
]

// ✅
let results:[Foo: _] = [
	.bizz: ohMy,
	.bonk: ohMy
]</code></pre><h2 id="more-examples">More examples</h2><p>Examples of types containing placeholders are:</p><pre><code class="language-swift">Array&lt;_&gt; // array with placeholder element type
[Int: _] // dictionary with placeholder value type
(_) -&gt; Int // function type accepting a single type placeholder argument and returning 'Int'
(_, Double) // tuple type of placeholder and 'Double'
_? // optional wrapping a type placeholder</code></pre><h2 id="final-notes">Final notes</h2><p>That's a great feature and broadens Swift’s type inference capabilities. For now, it's some kind of less-known, but I think it will be more used in the future.</p><p>You can check out other less-known Swift features in my previous post:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://alexdremov.me/top-7-subtle-swift-features/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Top 7 Subtle Swift Features | Alex Dremov</div><div class="kg-bookmark-description">Here, I collected Swift features that are less known and can be useful when you prepare for interviews or want to deepen your Swift knowledge.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://alexdremov.me/assets/icons/apple-touch-icon.png" alt=""><span class="kg-bookmark-author">Alex Dremov</span><span class="kg-bookmark-publisher">Alex Dremov</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://alexdremov.me/content/images/2022/04/Artboard-1-1.png" alt=""></div></a></figure><h2 id="references">References</h2><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/apple/swift-evolution/blob/main/proposals/0315-placeholder-types.md?ref=alexdremov.me"><div class="kg-bookmark-content"><div class="kg-bookmark-title">swift-evolution/0315-placeholder-types.md at main · apple/swift-evolution</div><div class="kg-bookmark-description">This maintains proposals for changes and user-visible enhancements to the Swift Programming Language. - swift-evolution/0315-placeholder-types.md at main · apple/swift-evolution</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt=""><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">apple</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/c17082dded0015da2efcc475a488d2710ae6bf2f831faac7614e539d5739e9a2/apple/swift-evolution" alt=""></div></a></figure> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Quick Guide to Async Await in Swift ]]></title>
                    <description><![CDATA[ Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered. ]]></description>
                    <link>https://alexdremov.me/quick-guide-to-async-await-in-swift/</link>
                    <guid isPermaLink="false">6255818587754c43f7172533</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sat, 16 Apr 2022 10:50:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/slide_17.jpg" medium="image"/>
                    <content:encoded><![CDATA[ <p>How to create asynchronous functions, run code in parallel, who is MainActor, what is the closures pyramid and how to get rid of it? Let's start.</p>
<!--kg-card-begin: html-->
<h2 id="straight-to-the-point">Straight to the point</h2>
<!--kg-card-end: html-->
<p>Swift 5.5 introduced built-in support for writing asynchronous and parallel code in a structured way. <em>Asynchronous code</em> can be suspended and resumed later, although only one piece of the program executes at a time.</p><p>Keyword <code>async</code> is used to mark function as asynchronous. That's it.</p><pre><code class="language-swift">func downloadNames(fromServer name: String) async -&gt; [String] {
    ... // some other tasks
    return data
}</code></pre><p>But what does it really mean?</p><div class="kg-card kg-callout-card kg-callout-card-purple"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text">The async function can be suspended in the middle of the execution when it’s waiting for something.</div></div><p>Here's how <code>async</code> functions can be called</p><pre><code class="language-swift">let namesMain = await downloadNames(fromServer: "main")
let secondary = await downloadNames(fromServer: "secondary")</code></pre><p>When you type <code>await</code>, the current execution is suspended, until an asynchronous call is finished.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Suspension is never implicit or preemptive — <b><strong style="white-space: pre-wrap;">each </strong></b>such place is marked with the <code spellcheck="false" style="white-space: pre-wrap;">await</code> keyword.</div></div>
<!--kg-card-begin: html-->
<h2 id="where-to-call-async-functions">Where to call async functions</h2>
<!--kg-card-end: html-->
<p>As I said before, <code>await</code> suspends current execution. But there must be a structure underneath that can be suspended. You can't suspend a raw thread or the main thread, for example.</p><p>You opened Playgrounds, right?</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Do not use Swift Playgrounds to test new concurrency features as they are not fully supported yet</div></div><p>If you try to call an async function in an inappropriate place, you will see this error</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text"><code spellcheck="false" style="white-space: pre-wrap;">async</code> call in a function that does not support concurrency</div></div><p>That's because an asynchronous function can be called only in:</p><ul><li>Code in the body of an asynchronous function, method, or property.</li><li>Code in the static <code>main()</code> method of a structure, class, or enumeration that’s marked with <code>@main</code>.</li><li>Code in an unstructured child task</li></ul><p><strong>That's a lot of words.</strong></p><p>For most developers, only the first and the last points make sense. Most of the places in your code do not support <code>await</code>. How to deal with that?</p>
<!--kg-card-begin: html-->
<h2 id="tasks-and-tasks">Tasks and TaskGroup</h2>
<!--kg-card-end: html-->
<h3 id="task">Task</h3><p>To call an asynchronous function in a place that does not support concurrency, you need to create a concurrent task. You can use <code>Task</code> and <code>TaskGroup</code> to achieve that.</p><pre><code class="language-swift">Task {
    let names = await downloadNames(fromServer: "main")
    ... // futher work
    ... // take over the world (asynchronously)

}</code></pre><p>When you create an instance of <code>Task</code>, you provide a closure that contains the work for that task to perform. Tasks can start running immediately after creation and may not. You can create a task in another <code>Task</code> or other concurrent environments.</p><pre><code class="language-swift">let handle = Task { // Creates asynchronous task
	let names = await downloadNames(fromServer: "main")
    
	Task { // Creates asynchronous task
		await save(names: names)
	}
    
	for name in names {
		print(name)
	}
}</code></pre><p>After creating a task, you use the instance to interact with it — for example, to wait for it to complete or to cancel it. Tasks run independently from their handles.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">To cancel a task, you can <b><strong style="white-space: pre-wrap;">throw an error, return nil, or return partially completed work.</strong></b>Use <code spellcheck="false" style="white-space: pre-wrap;">Task.isCancelled</code> to check if the current task was cancelled.</div></div><h3 id="taskgroup-to-group-the-tasks">TaskGroup to group the tasks</h3><p><code>TaskGroup</code> lets you launch several tasks and wait for the completion of all of them. The order in which these tasks are completed is not defined.</p><p>How to create it?</p><p><code>TaskGroup</code> is created through <code>withTaskGroup(of:)</code>. You provide closure in which you spawn new tasks and perform operations on returned data.</p><pre><code class="language-swift">let calculations = await withTaskGroup(of: Int.self) { group -&gt; Int in
	group.addTask { 1 * 2 } // () -&gt; Int
	group.addTask { 2 * 3 }
	group.addTask { 3 * 4 }
	group.addTask { 4 * 5 }
	group.addTask { 5 * 6 }

	var collected = [Int]()

	for await value in group {
		collected.append(value)
	}

	return collected
}</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/gyAFz.jpg" class="kg-image" alt="" loading="lazy" width="400" height="400"><figcaption><span style="white-space: pre-wrap;">http://i.imgur.com/gyAFz.jpg</span></figcaption></figure><p>The <code>group</code> object inside closure conforms to <code>AsyncSequence</code>. It's just like a general sequence, but elements are generated asynchronously. To iterate over it you can use <code>.next()</code> method or <code>for await ... in sequence</code>.</p><p>It can be used to parallelize <code>for</code> loops, for example.</p><pre><code class="language-swift">let calculations = await withTaskGroup(of: Int.self) {[works] group -&gt; [Int] in
	for work in works {
		group.addTask { work() }
	}

	var collected = [Int]()

	for await value in group {
		collected.append(value)
	}

	return collected
}</code></pre><p>That's great, but how to perform unrelated tasks concurrently without TaskGroup?</p>
<!--kg-card-begin: html-->
<h2 id="async-let">Async let, async get, concurrent execution</h2>
<!--kg-card-end: html-->
<p>These features seem like a real power to me. </p><p>Imagine you need to load an article, and data stored on different services or URLs: </p><ul><li>Article thumbnail</li><li>Article text</li><li>Related articles</li><li>Comments</li></ul><p>And the most obvious way to load all data is to write such code</p><pre><code class="language-swift">let thumbnail = await loadThumbnail(forPost: post)
let text = await loadArticleText(forPost: post)
let related = await loadRelatedArticles(forPost: post)
let comments = await loadComments(forPost: post)</code></pre><p>And this is mighty concurrent code that will load needed information the fastest way. Right? Not really.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/04/suspension-await-async.png" class="kg-image" alt="Execution of async await code visualisation" loading="lazy" width="1280" height="800" srcset="https://alexdremov.me/content/images/size/w600/2022/04/suspension-await-async.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/suspension-await-async.png 1000w, https://alexdremov.me/content/images/2022/04/suspension-await-async.png 1280w" sizes="(min-width: 720px) 720px"></figure><p>Code is still executed serially and assets are not loaded in parallel. Each step waits until data is loaded. You can spawn a task for every step, sure. But is it really a nice solution?</p>
<aside class="gh-post-upgrade-cta no-ads">
  <div class="gh-post-upgrade-cta-content" style="background-color: #73926C">
      <h2>This post is for free subscribers only</h2>
      <h4>Subscribe for free now and continue to read the post</h4>
      <a class="gh-btn" data-portal="signup" style="color:#73926C">Subscribe now</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>
  </div>
</aside>
 ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Top 7 Subtle Swift Features ]]></title>
                    <description><![CDATA[ Here, I collected Swift features that are less known and can be useful when you prepare for interviews or want to deepen your Swift knowledge. ]]></description>
                    <link>https://alexdremov.me/top-7-subtle-swift-features/</link>
                    <guid isPermaLink="false">6251609e1405fa05476e2bef</guid>
                    <category><![CDATA[ iOS &amp; Swift ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Sat, 09 Apr 2022 12:41:05 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/Artboard-1-1.png" medium="image"/>
                    <content:encoded><![CDATA[ <h2 id="1-keyword-indirect">1. Keyword <code>indirect</code></h2><p>It’s used with enums only. As you know, enums are <strong>value type</strong> and stored on the stack. Therefore, the compiler needs to know how much memory each enum takes. As only one option is possible at any moment, the enum occupies the memory of the largest case plus some operational information.</p><pre><code class="language-swift">// Just a general enum, nothing fancy
enum Foo {
    case bizz(String)
    case fizz(Int)
}</code></pre><p>But what if we make enum dependant on itself?</p><pre><code class="language-swift">// Infinite size??
enum Foo {
    case bizz(Foo)
    case fizz
}</code></pre><p>This definition generates a compiler error.</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text">Recursive enum <code>Foo</code> is not marked <code>indirect</code></div></div><p>The error makes sense: the compiler can’t calculate <code>Foo</code> size as it tends to infinity. Here comes the <code>indirect</code> keyword.</p><pre><code class="language-swift">// Oh, fine
enum Foo {
    indirect case bizz(Foo)
    case fizz
}</code></pre><p><strong>Simple:</strong> it modifies the enum memory structure to solve the recursion problem. <strong>Detailed:</strong> <code>.bizz(Foo)</code> is no longer stored inline in memory. Actually, with the <code>indirect</code> modifier data is now stored behind a pointer (indirectly).</p><p>Problem solved! Also, we can modify the whole enum as indirect</p><pre><code class="language-swift">// Every case is indirect now
indirect enum Foo {
    case bizz(Foo?)
    case fizz(Foo?)
}</code></pre><hr><h2 id="2-attribute-autoclosure">2. Attribute <code>@autoclosure</code></h2><p>Swift’s <code>@autoclosure</code> attribute enables you to define an argument that automatically gets wrapped in a closure. It’s mostly used to defer the execution of an expression to when it’s actually needed.</p><pre><code class="language-swift">func calculate(_ expression: @autoclosure () -&gt; Int,
               zero: Bool) -&gt; Int {
    guard !zero else {
        return 0
    }

    return expression()
}</code></pre><p>Then, calculate can be called like this:</p><pre><code class="language-swift">calculate(1 + 2, zero: false) // 3

calculate([Int](repeating: 5, count: 10000000).reduce(0, +),
                zero: false) // 50000000

calculate([Int](repeating: 5, count: 1000).reduce(0, +),
                zero: true) // 0</code></pre><p>So, in this case, when <code>zero: true</code>, the call of <code>calculate</code> does not calculate the expression at all, improving code performance.</p><!--kg-card-begin: html--><section class="custom-replace" data-replace=".subscription-article">
    <a href="https://alexdremov.me/#/portal/signup">Subscribe and don't miss posts!</a>
</section><!--kg-card-end: html--><hr><h2 id="3-lazy">3. Lazy</h2><p>A <code>lazy</code> stored property is a property whose initial value isn’t calculated until the first time it’s used. Lazy properties must always be declared as a variable. Note that if you use <code>lazy</code> in <code>struct</code>, then the function that uses it must be marked as <code>mutating</code>.</p><pre><code class="language-swift">class Foo {
    lazy var bonk = DBConnection()
    
    func send() {
        bonk.sendMessage()
    }
}</code></pre><p>We already covered <code>@autoclosure</code> which also can help to defer expression evaluation. That can be used with <code>lazy</code>! Consider this common case of dependency injection.</p><pre><code class="language-swift">class Foo {
    let bonkProvider: () -&gt; DBConnection
    lazy var bonk: DBConnection = bonkProvider()
    
    init(_ expression: @escaping @autoclosure () -&gt; DBConnection) {
        self.bonkProvider = expression
    }
    
    func send() {
    	// Here bonkProvider() is called
        // only for the first call of send()
        bonk.sendMessage()
    }
}</code></pre><hr><h2 id="4-enums-as-namespaces">4. Enums as namespaces</h2><p>Swift does not have namespaces, which may be a problem in big projects. This is easily solved with enums.</p><pre><code class="language-swift">enum API {}

extension API {
    static let token = "…"

    struct CatsCounter {
        …
    }
}

let a = API.CatsCounter()
print(API.token)</code></pre><hr><h2 id="5-dynamic-member-lookup">5. Dynamic member lookup</h2><p>This section describes the <code>@dynamicMemberLookup</code> attribute. It can be used with structs and classes.</p><p>Just adding <code>@dynamicMemberLookup</code> to the definition generates an error </p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text"><code>@dynamicMemberLookup</code> attribute requires <code>Foo</code> to have a <code>subscript(dynamicMember:)</code> method that accepts either <code>ExpressibleByStringLiteral</code> or a <code>key path</code></div></div><p>Therefore, such subscript needs to be defined</p><pre><code class="language-swift">@dynamicMemberLookup
class Foo {
    subscript(dynamicMember string: String) -&gt; String {
        return string
    }
}

let a = Foo()
print(a.helloWorld)</code></pre><p>In <code>subscript</code> you can implement much more complex logic to retrieve data. But you can see how this implementation is limited to strings only and not really safe. This can be modified with a <code>key path</code>.</p><pre><code class="language-swift">class Bob {
    let age = 22
    let name = "Bob"
}

@dynamicMemberLookup
class Foo {
    let himself = Bob()
    
    subscript&lt;T&gt;(dynamicMember keyPath: KeyPath&lt;Bob, T&gt;) -&gt; T {
        return himself[keyPath: keyPath]
    }
}

let a = Foo()
print(a.age)</code></pre><p>Even though you know about this feature does not mean that it should be used everywhere. It’s up to you what is more readable and expressive: <code>a.himself.age</code> or <code>a.age</code>.</p><hr><h2 id="6-dynamically-callable">6. Dynamically callable</h2><p>Also, a compiler feature that allows you to call objects. Can be applied to <code>struct</code>, <code>enum</code>, and <code>class</code>.</p><p>After adding the attribute, the error is generated:</p><div class="kg-card kg-callout-card kg-callout-card-red"><div class="kg-callout-emoji">😡</div><div class="kg-callout-text"><code>@dynamicCallable</code> attribute requires <code>RangeGenerator</code> to have either a valid <code>dynamicallyCall(withArguments:)</code> method or <code>dynamicallyCall(withKeywordArguments:)</code> method</div></div><p>The method signature is similar to that of <code>@dynamicMemberLookup</code>.</p><pre><code class="language-swift">@dynamicCallable
struct RangeGenerator {
    var range: Range&lt;Int&gt;
    
    func dynamicallyCall(withKeywordArguments args: KeyValuePairs&lt;String, Int&gt;) -&gt; [Int] {
        if args.count &gt; 1 || args.first?.key != "count" {
            fatalError("Unknown arguments \(args)")
        }
        let count = args.first!.value
        return (0..&lt;count).map{ _ in Int.random(in: range) }
    }
}

let gen = RangeGenerator(range: 0..&lt;100)
print(gen(count: 13))
// [2, 89, 4, 17, 65, 26, 73, 86, 93, 13, 25, 96, 96]</code></pre><hr><h2 id="7-inlining">7. Inlining</h2><p>Sometimes you want to give additional information about optimisations the compiler can use. Inlining code is one of the most important optimization features. So, how to use <code>‌@inlinable</code>, <code>@inline(__always)</code>, <code>@usableFromInline</code>?</p><p>The <code>@inlinable</code> attribute exports the body of a function as part of a module's interface, making it available to the optimizer when referenced from other modules.</p><p>As a result, <code>@inlinable</code> makes the implementation of the method public and able to be inlined into the caller. Secondly, it forces you to make everything it calls <code>@usableFromInline</code>.</p><p><code>@inline(__always)</code> tells the compiler to ignore inlining heuristics and always (almost) inline the function.</p><p>A function that is <code>@inline(__always)</code>, but not <code>@inlinable</code>, will not be available for inlining outside its module, because the function's code is not available.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">💥</div><div class="kg-callout-text"><code>@inline(__always)</code> can be beneficial for performance, but it can also have catastrophic effects on macro performance due to code size increase.</div></div><pre><code class="language-swift">struct Foo {
    @inlinable
    @inline(__always)
    func simpleComputation(_ a: Int, _ b: Int) -&gt; Int {
        duplicate(a) + duplicate(b)
    }
    
    @usableFromInline
    func duplicate(_ c: Int) -&gt; Int {
        c * 2
    }
    
    func general() {
        print("Hello world")
    }
}</code></pre><p>This has more effects on implementation, check the discussion on this <a href="https://forums.swift.org/t/when-should-both-inlinable-and-inline-always-be-used/37375?ref=alexdremov.me">forum</a> if you want to understand this in-depth</p><h2 id="references">References</h2><ol><li><a href="https://www.swiftbysundell.com/articles/using-autoclosure-when-designing-swift-apis/?ref=alexdremov.me">https://www.swiftbysundell.com/articles/using-autoclosure-when-designing-swift-apis/</a></li><li><a href="https://www.swiftbysundell.com/articles/powerful-ways-to-use-swift-enums/?ref=alexdremov.me">https://www.swiftbysundell.com/articles/powerful-ways-to-use-swift-enums/</a></li><li><a href="https://www.hackingwithswift.com/articles/134/how-to-use-dynamiccallable-in-swift?ref=alexdremov.me">https://www.hackingwithswift.com/articles/134/how-to-use-dynamiccallable-in-swift</a></li><li><a href="https://www.hackingwithswift.com/example-code/language/what-are-lazy-variables?ref=alexdremov.me">https://www.hackingwithswift.com/example-code/language/what-are-lazy-variables</a></li><li><a href="https://forums.swift.org/t/who-benefits-from-the-indirect-keyword/20167?ref=alexdremov.me">https://forums.swift.org/t/who-benefits-from-the-indirect-keyword/20167</a></li><li><a href="https://www.tothenew.com/blog/recursive-enumerations-in-swift/?ref=alexdremov.me">https://www.tothenew.com/blog/recursive-enumerations-in-swift/</a></li><li><a href="https://www.avanderlee.com/swift/dynamic-member-lookup/?ref=alexdremov.me">https://www.avanderlee.com/swift/dynamic-member-lookup/</a></li></ol><p></p><p></p><p></p> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Note-taking apps ]]></title>
                    <description><![CDATA[ Here I cover note-taking apps for productivuty and creating your own knowledge database ]]></description>
                    <link>https://alexdremov.me/note-taking-apps/</link>
                    <guid isPermaLink="false">624eadbd2ced893c89076935</guid>
                    <category><![CDATA[ Tools ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Wed, 18 Aug 2021 18:51:00 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/wordpress/2021/08/E4c0Js-VkAEPgt_-e1639220463622.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>You know that feeling in the start of a college or a school year? You say to yourself “I’ll be as productive as possible” and you feel like you can climb a mountain.</p><p>At least, this was my case. The first thing that I wanted to determine is a note-taking app. I wanted to have a written outline of every lecture organized in the best possible manner. So, I started to search for beautiful, powerful, and optimized for my developer-oriented mind apps for my college workflow.</p><h2 id="evernote-apple-notes-onenote-joplin-notable">Evernote, Apple Notes, OneNote, Joplin, Notable</h2><p>… and many other conventional note-taking apps. The biggest no-no for me was the inability to organize content efficiently. The best option was to work with folders and tags, but it gets messy really quick. Who uses tags? That is due to the fact that these apps were not developed to design some kind of knowledge database but rather for quick note-taking. Also, they don’t suffice my developer needs for code embeddings and markdown. Speaking about OneNote, it’s just ugly and over-complicated.</p><h3 id="links">Links</h3><ol><li><a href="https://evernote.com/?ref=alexdremov.me" rel="noreferrer noopener">Evernote</a></li><li><a href="https://www.microsoft.com/en-us/microsoft-365/onenote/digital-note-taking-app?ref=alexdremov.me" rel="noreferrer noopener">OneNote</a></li><li><a href="https://github.com/laurent22/joplin?ref=alexdremov.me" rel="noreferrer noopener">Joplin</a></li><li><a href="https://github.com/notable/notable?ref=alexdremov.me" rel="noreferrer noopener">Notable</a></li></ol><h2 id="notion-boost-note">Notion, Boost Note</h2><p>These are good! Even though they have hierarchical structuring, it’s supplemented with emoji icons and title pages. These additions help to navigate through data quicker. Notion’s workspaces and page linking helps to structure data efficiently. So, what’s wrong? Online service only. These apps are web-based apps and having a lagging app on some kind of fast-going lecture is not what I am looking for.</p><h3 id="links-1">Links</h3><ol><li><a href="https://www.notion.so/?ref=alexdremov.me" rel="noreferrer noopener">Notion</a></li><li><a href="https://boostnote.io/?ref=alexdremov.me" rel="noreferrer noopener">Boost Note</a></li></ol><h2 id="ia-writer">IA Writer</h2><p>My all-time best app for writing. Minimalistic tool with markdown support. Simply said, best for writing. However, not really suitable for structuring data and poor on linking, image embedding, and code highlighting.</p><h3 id="links-2">Links</h3><ol><li><a href="https://ia.net/writer?ref=alexdremov.me" rel="noreferrer noopener">I</a><a href="https://ia.net/writer?ref=alexdremov.me" rel="noreferrer noopener">A</a><a href="https://ia.net/writer?ref=alexdremov.me" rel="noreferrer noopener"> Writer</a></li></ol><h2 id="logseq">Logseq</h2><p>Weird at the first glance, genius if you dive deeply. Graph-based organization system bemuses at first. “What do you mean there is no folders?” But then you realize that folders or hierarchical structuring is logical but not natural. When you write some content, new concepts flow not in hierarchical order but rather like connections or links. In Logseq, pages are created as they are needed. The whole workspace is graph-organized. Moreover, it supports lots of block types and this satisfies my developer’s needs with overhead. Thus, I started considering this app as a primary one for use.</p><p>UPD: after almost a year use of logseq, my knowledge base looks like that</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Screenshot-2022-10-22-at-15.12.16.png" class="kg-image" alt loading="lazy" width="2000" height="1540" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Screenshot-2022-10-22-at-15.12.16.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/Screenshot-2022-10-22-at-15.12.16.png 1000w, https://alexdremov.me/content/images/size/w1600/2022/10/Screenshot-2022-10-22-at-15.12.16.png 1600w, https://alexdremov.me/content/images/2022/10/Screenshot-2022-10-22-at-15.12.16.png 2000w" sizes="(min-width: 720px) 720px"></figure><p>And it’s cool, but I have to say that this graph view is really of low use, unfortunately. Or I just have not used the app extensively enough.</p><h3 id="links-3">Links</h3><ol><li><a href="https://github.com/logseq/logseq?ref=alexdremov.me" rel="noreferrer noopener">Logseq</a></li></ol><h2 id="athens">Athens</h2><p>Looks like Logseq and has a very similar functionality. However, I found Athens more pleasant-looking and less complicated. Here it is. Minimalistic app with beautiful design and striking structuring system. This is my top-1 of all note-taking apps that I was reviewing for a couple of days.</p><p>However, the project is brand new and has some bugs, so maybe I will be using Logseq for reliablity.</p><h3 id="links-4">Links</h3><ol><li><a href="https://github.com/athensresearch/athens?ref=alexdremov.me" rel="noreferrer noopener">Athens</a></li></ol> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ The Mystery of Mach-O Object Structure ]]></title>
                    <description><![CDATA[ I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure ]]></description>
                    <link>https://alexdremov.me/mystery-of-mach-o-object-file-builders/</link>
                    <guid isPermaLink="false">624eadbd2ced893c89076932</guid>
                    <category><![CDATA[ Code ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 29 Apr 2021 21:59:09 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/wordpress/2021/04/macho.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>During the development of the final project for “the assembly language and low-level architecture” MIPT freshman course, we were developing a compilable programming language. I wanted to make it compilable to the standard object file but encountered the mystery of almost no information about its structure. What’s more important, there were little to no examples on this topic. In this article, I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure.</p><h2 id="general-structure">General Structure</h2><p>Mach-O file can be divided into three main parts:</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/6XLCD.gif" class="kg-image" alt="image of the structure" loading="lazy" width="261" height="285"></figure><ul><li>Header</li><li>Load commands</li><li>Data</li></ul><p>The<strong> header</strong> contains general information and identifies the file as a Mach-O file. The header also contains other basic file type information, indicates the target architecture, and contains flags specifying options that affect the interpretation of the rest of the file.</p><p>Directly after the header is series of variable-size <strong>load commands</strong> that specify the layout and linkage characteristics of the file. This is the core that defines the file characteristics.</p><p>Following the load commands, all Mach-O files contain <strong>segment data</strong>. Each segment has zero or more sections. Each segment defines a region of virtual memory that the dynamic linker maps into the address space of the process. Apart from segment data, other data also can be placed here. For example, symbol table, relocations, etc.</p><h2 id="object-specific-structure">Object-specific structure</h2><p>As this article focuses on object files, I will not go into details about general executable files. Even though their format is the same, load commands and data differ.</p><p>To make a workable object file, we need to define these elements. I ordered them in the order they will be placed in the file.</p><ul><li>Header</li><li>Load commands</li><li>Segment (__TEXT)</li><li>Text section (__text)</li><li>Data section (__data)</li><li>Symbols table (SYMTAB)</li><li>Dynamic symbols table (DYSYMTAB)</li><li>Data</li><li>Text section data</li><li>Data section data</li><li>Relocations</li><li>Symbol table data</li><li>String table</li></ul><h2 id="header">Header</h2><p>Header is defined by this structure:</p><pre><code class="language-cpp">struct mach_header_64 {
    uint32_t       magic;      /* mach magic number identifier */
    cpu_type_t     cputype;    /* cpu specifier */
    cpu_subtype_t  cpusubtype; /* machine specifier */
    uint32_t       filetype;   /* type of file */
    uint32_t       ncmds;      /* number of load commands */
    uint32_t       sizeofcmds; /* the size of all the load commands */
    uint32_t       flags;      /* flags */
    uint32_t       reserved;   /* reserved */
};</code></pre><ol><li><strong>magic</strong> – it’s exactly what the name says. It simply contains the magic number that helps to identify the file as Mach-O. It holds <code>MH_MAGIC_64</code> (0xfeedfacf) constant.</li><li><strong>cputype,</strong> <strong>cpusubtype</strong> – defines CPU information. For most cases, <code>CPU_TYPE_X86_64</code> and <code>CPU_SUBTYPE_X86_64_ALL</code> can be used.</li><li><strong>filetype</strong> – as Mach-O file can be used for multiple purposes, it is needed to know the file type. As we build an object file, <code>MH_OBJECT</code> must be used.</li><li><strong>ncmds</strong> – number of load commands followed by the header.</li><li><strong>sizeofcmds</strong> – the size of load commands (in bytes).</li><li><strong>flags</strong> – special flags, can be found <a href="https://github.com/aidansteele/osx-abi-macho-file-format-reference?ref=alexdremov.me#mach_header" rel="noreferrer noopener">here</a>. For the object file, we will be using <code>MH_SUBSECTIONS_VIA_SYMBOLS</code> which means that the sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes.</li></ol><ul><li><code>MH_NOUNDEFS</code> — The object file contained no undefined references when it was built.</li><li><code>MH_INCRLINK</code> — The object file is the output of an incremental link against a base file and cannot be linked again.</li><li><code>MH_DYLDLINK</code> — The file is input for the dynamic linker and cannot be statically linked again.</li><li><code>MH_TWOLEVEL</code> — The image is using two-level namespace bindings.</li><li><code>MH_BINDATLOAD</code> — The dynamic linker should bind the undefined references when the file is loaded.</li><li><code>MH_PREBOUND</code> — The file’s undefined references are prebound.</li><li><code>MH_PREBINDABLE</code> — This file is not prebound but can have its prebinding redone. Used only when <code>MH_PREBEOUND</code> is not set.</li><li><code>MH_NOFIXPREBINDING</code> — The dynamic linker doesn’t notify the prebinding agent about this executable.</li><li><code>MH_ALLMODSBOUND</code> — Indicates that this binary binds to all two-level namespace modules of its dependent libraries. Used only when <code>MH_PREBINDABLE</code> and <code>MH_TWOLEVEL</code> are set.</li><li><code>MH_CANONICAL</code> — This file has been canonicalized by unprebinding—clearing prebinding information from the file. See the <code>redo_prebinding</code> man page for details.</li><li><code>MH_SPLIT_SEGS</code> — The file has its read-only and read-write segments split.</li><li><code>MH_FORCE_FLAT</code> — The executable is forcing all images to use flat namespace bindings.</li><li><code>MH_SUBSECTIONS_VIA_SYMBOLS</code> — The sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes. See “Linking” for details.</li><li><code>MH_NOMULTIDEFS</code> — This umbrella guarantees there are no multiple definitions of symbols in its subimages. As a result, the two-level namespace hints can always be used.</li></ul><ol><li><strong>reserved</strong> – reserved bytes, not used.</li></ol><p>Summing up, here is the code for initializing header for object file.</p><p><strong>“To be modified”</strong> means that it is not possible to determine the value before constructing the file. Therefore, it will be changed afterwards.</p><pre><code class="language-c">mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;</code></pre><h2 id="load-commands">Load commands</h2><p>The load command structures are located directly after the header of the object file, and they specify both the logical structure of the file and the layout of the file in virtual memory.</p><p>For an object file, several load commands are needed: segment section, symtab, dysymtab. Every load command has two the same fields in the beginning: <code>uint32_t cmd</code> and <code>uint32_t cmdsize</code>, but the following content differs.</p><h3 id="segment_command_64">segment_command_64</h3><p>Specifies the range of bytes in a 64-bit Mach-O file that make up a segment. Those bytes are mapped by the loader into the address space of a program. Segment structure is:</p><pre><code class="language-c">struct segment_command_64 {  /* for 64-bit architectures */
   uint32_t   cmd;           /* LC_SEGMENT_64 */
   uint32_t   cmdsize;       /* includes sizeof section_64 structs */
   char       segname[16];   /* segment name */
   uint64_t   vmaddr;        /* memory address of this segment */
   uint64_t   vmsize;        /* memory size of this segment */
   uint64_t   fileoff;       /* file offset of this segment */
   uint64_t   filesize;      /* amount to map from the file */
   vm_prot_t  maxprot;       /* maximum VM protection */
   vm_prot_t  initprot;      /* initial VM protection */
   uint32_t   nsects;        /* number of sections in segment */
   uint32_t   flags;         /* flags */
};</code></pre><ol><li><strong>segname</strong> – the name of the segment. There are no requirements, but it is common to start the name with a double underline (__) and use uppercase. For example, <code>SEG_TEXT</code> (“__TEXT”), <code>SEG_DATA</code> (“__DATA”).</li><li><strong>vmaddr</strong> – the start of this segment in virtual memory.</li><li><strong>vmsize</strong> – the size of this segment in memory. For executables, this value must be divisible by page. In object files, this is not needed as this requirement is fulfilled on the linking stage.</li><li><strong>fileoff</strong> – offset of this segment in the file. This offset points to some areas after load commands. The image below helps</li><li><strong>filesize</strong> – the amount of file from <strong>fileoff</strong> to be mapped.</li><li><strong>maxprot</strong> – maximum virtual memory protection. For TEXT segment, usually, <code>VM_PROT_READ | VM_PROT_EXECUTE | VM_PROT_WRITE</code> .</li><li><strong>initprot</strong> – memory protection during initialization.</li><li><strong>nsect</strong> – number of sections directly followed by this segment.</li><li><strong>flags</strong> – can be found <a href="https://github.com/aidansteele/osx-abi-macho-file-format-reference?ref=alexdremov.me#mach_header" rel="noreferrer noopener">here</a>. For the object file, no flags are needed.</li></ol><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/vmmap.png" class="kg-image" alt loading="lazy" width="1200" height="343" srcset="https://alexdremov.me/content/images/size/w600/2022/10/vmmap.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/vmmap.png 1000w, https://alexdremov.me/content/images/2022/10/vmmap.png 1200w" sizes="(min-width: 720px) 720px"></figure><h3 id="section_64">section_64</h3><p>Segment load command is directly followed by sections defined in it.</p><pre><code class="language-c">struct section_64 {          /* for 64-bit architectures */
   char       sectname[16];  /* name of this section */
   char       segname[16];   /* segment this section goes in */
   uint64_t   addr;          /* memory address of this section */
   uint64_t   size;          /* size in bytes of this section */
   uint32_t   offset;        /* file offset of this section */
   uint32_t   align;         /* section alignment (power of 2) */
   uint32_t   reloff;        /* file offset of relocation entries */
   uint32_t   nreloc;        /* number of relocation entries */
   uint32_t   flags;         /* flags (section type and attributes)*/
   uint32_t   reserved1;     /* reserved (for offset or index) */
   uint32_t   reserved2;     /* reserved (for count or sizeof) */
   uint32_t   reserved3;     /* reserved */
};</code></pre><ol><li><strong>sectname</strong> – the name of the section. There are no requirements, but it is common to start the name with a double underline (__) and use lowercase. For example, <code>SECT_TEXT</code> (“__text”), <code>SECT_DATA</code> (“__data”).</li><li><strong>segname</strong> – the name of the segment this section goes in.</li><li><strong>addr</strong> – memory address of this section. For example, if segment vaddress is 0x10000, then first section address is also 0x10000.</li><li><strong>size</strong> – the size in bytes of this section in the file.</li><li><strong>offset</strong> – the offset of the file section from the start of the file.</li><li><strong>align</strong> – alignment of the section as a power of 2. For example, 1 means 2 bytes alignment, 2 means 4 bytes alignment. Specifies the alignment of the section in memory.</li><li><strong>reloff</strong> – the offset of relocations array from the file beginning.</li><li><strong>nreloc</strong> – number of relocations.</li><li><strong>flags</strong> – specify information about data contained in the section. For example, for code <code>S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS</code>. For the data section, <code>S_REGULAR</code>.</li><li><strong>reserved1, reserved2, reserved3</strong> – unused in our case.</li></ol><p>Segment load command and sections are the most important part of the file. Object file has only one segment and one or several sections.</p><p>Now, we can define a segment and sections associated with it.</p><p>__TEXT segment – the only segment in the object file</p><pre><code class="language-c">segment_command_64 segment = {};
/*
 * Usually, as there is only one segment in the object file,
 * placing name is omitted. 
 * strcpy(segment.segname, SEG_TEXT);
 */
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + 2 * sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 2; /* code and data sections */</code></pre><p>__text section</p><pre><code class="language-c">section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  &lt;- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname &lt;- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;</code></pre><p>__data section</p><pre><code class="language-c">section_64 sectionData     = {};
strcpy(sectionData.segname,  SEG_DATA ); /* segname  &lt;- __DATA */
strcpy(sectionData.sectname, SECT_DATA); /* sectname &lt;- __data */
sectionData.addr           = 0;          /* = sectionText.size */
sectionData.size           = 0;          /* to be modified */
sectionData.offset         = 0;          /* = sectionText.offset */
                                         /*   + sectionText.size */
sectionData.align          = 1;          /* 2^1 code alignment */
sectionData.reloff         = 0;          /* no relocations in data section */
sectionData.nreloc         = 0;          
sectionData.flags          = S_REGULAR;</code></pre><p>At this point, simple object file structure is almost ready, but SYMTAB and DYSYMTAB load commands are steel needed to be defined even if there is no relocations at all.</p><h2 id="symtab">Symtab</h2><p>Describes the size and location of the symbol table data structures. Its structure is:</p><pre><code class="language-c">struct symtab_command {
   uint32_t   cmd;       /* LC_SYMTAB */
   uint32_t   cmdsize;   /* sizeof(struct symtab_command) */
   uint32_t   symoff;    /* symbol table offset */
   uint32_t   nsyms;     /* number of symbol table entries */
   uint32_t   stroff;    /* string table offset */
   uint32_t   strsize;   /* string table size in bytes */
};</code></pre><ol><li><strong>symoff</strong> – offset to the symbol table – located after load commands somewhere further in the file.</li><li><strong>nsyms</strong> – number of symbols in symbols table.</li><li><strong>stroff</strong> – string table offset.</li><li><strong>strsize</strong> – the size of the string table in bytes.</li></ol><p>The most straightforward description so far. It is convenient to describe a symbol table and string table here.</p><h3 id="string-table">String table</h3><p>The string table is the most straightforward structure of all listed here. It is simply strings separated by zeros.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Screenshot-2021-04-30-at-22.38.27.png" class="kg-image" alt loading="lazy" width="992" height="288" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Screenshot-2021-04-30-at-22.38.27.png 600w, https://alexdremov.me/content/images/2022/10/Screenshot-2021-04-30-at-22.38.27.png 992w" sizes="(min-width: 720px) 720px"></figure><h3 id="symbol-table">Symbol table</h3><p>Symbol table consists of equally sized entries. They must be grouped by their type – local symbols (further grouped by the module they are from), defined external symbols (further grouped by the module they are from), and undefined symbols. The order of groups is not important.</p><pre><code class="language-c">struct nlist_64 {
    union {
        uint32_t  n_strx;  /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see &lt;mach-o/stab.h&gt; */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};</code></pre><ol><li><strong>n_strx</strong> – index of the string in the string table. For example, the index of “_print” in the string table above is 1. The index of _giveYouUp0 is 8; it is the position of the first letter from the start of the string table.</li><li><strong>n_type</strong> – a type of symbol. Defines the meaning of the symbol. There are essential values:</li><li><code>N_TYPE</code> (0x0e) – These bits define the type of the symbol.</li><li><code>N_UNDF</code> (0x0) – The symbol is undefined. Undefined symbols are symbols referenced in this module but defined in a different module. The <code>n_sect</code> field is set to <code>NO_SECT</code>.</li><li><code>N_ABS</code> (0x2) – The symbol is absolute. The linker does not change the value of an absolute symbol. The <code>n_sect</code> field is set to <code>NO_SECT</code>.</li><li><code>N_SECT</code> (0xe) – The symbol is defined in the section number given in <code>n_sect</code>.</li><li><code>N_PBUD</code> (0xc) – The symbol is undefined and the image is using a prebound value for the symbol. The <code>n_sect</code> field is set to <code>NO_SECT</code>.</li><li><code>N_INDR</code> ( 0xa) – The symbol is defined to be the same as another symbol. The <code>n_value</code> field is an index into the string table specifying the name of the other symbol. When that symbol is linked, both this and the other symbol have the same defined type and value.</li><li><code>N_EXT</code>  (0x01) – If this bit is on, this symbol is external, a symbol that is either <strong>defined outside this file</strong> or that is defined in this file but can be referenced by other files.</li><li><code>N_STAB</code> (0xe0) – If any of these 3 bits are set, the symbol is a symbolic debugging table (<code>stab</code>) entry. In that case, the entire <code>n_type</code> field is interpreted as a <code>stab</code>value.</li><li><strong>n_sect</strong> – an integer specifying the number of the section that this symbol can be found in, or <code>NO_SECT</code> if the symbol is not to be found in any section.</li><li><strong>n_desc</strong> – provides additional information about the nature of this symbol for non-stab symbols (not <code>N_STAB</code>). The reference flags can be accessed using the <code>REFERENCE_TYPE</code> mask (0xF). Usually, <code>REFERENCE_FLAG_UNDEFINED_NON_LAZY</code> used for external symbols. If the symbol is defined in the section (<code>N_SECT</code>), use <code>REFERENCE_FLAG_DEFINED</code> + <code>N_EXT</code> if you want to make it available from other files or <code>REFERENCE_FLAG_PRIVATE_DEFINED</code> without specifying <code>N_EXT</code> if not. The most used values are:</li><li><code>REFERENCE_FLAG_UNDEFINED_NON_LAZY</code> (0x0)—This symbol is a reference to an external non-lazy (data) symbol.</li><li><code>REFERENCE_FLAG_UNDEFINED_LAZY</code> (0x1)—This symbol is a reference to an external lazy symbol—that is, to a function call.</li><li><code>REFERENCE_FLAG_DEFINED</code> (0x2)—This symbol is defined in this module.</li><li><code>REFERENCE_FLAG_PRIVATE_DEFINED</code> (0x3)—This symbol is defined in this module and is visible only to modules within this shared library.</li><li><code>REFERENCE_FLAG_PRIVATE_UNDEFINED_NON_LAZY</code> (0x4)—This symbol is defined in another module in this file, is a non-lazy (data) symbol, and is visible only to modules within this shared library.</li><li><code>REFERENCE_FLAG_PRIVATE_UNDEFINED_LAZY</code> (0x5)—This symbol is defined in another module in this file, is a lazy (function) symbol, and is visible only to modules within this shared library.</li><li><strong>n_value</strong> – information about this symbol. The format of this value is different for each type of symbol table entry (as specified by the <code>n_type</code> field). For the <code>N_SECT</code> symbol type, <code>n_value</code> is the address of the symbol – offset from the start of the <strong>segment</strong>. For <code>N_UNDF | N_EXT</code> it is not used.</li></ol><p>This structure is one of the hardest to understand and use. Therefore, there are examples. Notice that symbols are grouped. It will be used later in DYSYMTAB.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Screenshot-2021-05-01-at-01.58.43.png" class="kg-image" alt loading="lazy" width="1426" height="1372" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Screenshot-2021-05-01-at-01.58.43.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/Screenshot-2021-05-01-at-01.58.43.png 1000w, https://alexdremov.me/content/images/2022/10/Screenshot-2021-05-01-at-01.58.43.png 1426w" sizes="(min-width: 720px) 720px"></figure><p>On the image above, there are four symbols in total. Two of them are locally defined, two of them undefined in the current file. There are descriptions of two of these symbols:</p><ul><li>#0th symbol</li></ul><ol><li><strong>n_strx</strong> = 34 – index of naming’s first symbol in the string table.</li><li><strong>n_type</strong> = <code>N_SECT | N_EXT</code> – symbol defined in some section of the current file and available externally.</li><li><strong>n_sect</strong> = 1 – symbol defined in the first (counting from 1) section.</li><li><strong>n_desc</strong> = <code>REFERENCE_FLAG_DEFINED</code> – symbol defined in the file. This information is redundant as it is already known from <code>N_SECT</code>.</li><li><strong>value</strong> = 0 – symbol definition locates at the very beginning of the segment (zero offset).</li></ol><ul><li>#2nd symbol</li></ul><ol><li><strong>n_strx</strong> = 1 – index of naming’s first symbol in the string table.</li><li><strong>n_type</strong> = <code>N_UNDF | N_EXT</code> – symbol is not defined in the current file, must be defined externally.</li><li><strong>n_sect</strong> = <code>NO_SECT</code> – no associated section.</li><li><strong>n_desc</strong> = <code>REFERENCE_FLAG_UNDEFINED_NON_LAZY </code>– this symbol is a reference to an external non-lazy (data) symbol.</li><li><strong>value</strong> = 0 – unused.</li></ol><p>These two symbols can be constructed like this:</p><pre><code class="language-c">nlist_64 symbols[2] = {
    {34, N_SECT  | N_EXT, 1      , REFERENCE_FLAG_DEFINED           , 0},
    {1 , N_UNDF | N_EXT, NO_SECT, REFERENCE_FLAG_UNDEFINED_NON_LAZY, 0}
};</code></pre><h2 id="dysymtab">Dysymtab</h2><p>It describes the sizes and locations of the parts of the symbol table used for dynamic linking. As I already noticed, symtab entries must be grouped by their type. Here, this requirment is used.</p><pre><code class="language-c">struct dysymtab_command {
    uint32_t cmd;            /* LC_DYSYMTAB */
    uint32_t cmdsize;        /* sizeof(struct dysymtab_command) */
    uint32_t ilocalsym;      /* index to local symbols */
    uint32_t nlocalsym;      /* number of local symbols */

    uint32_t iextdefsym;     /* index to externally defined symbols */
    uint32_t nextdefsym;     /* number of externally defined symbols */

    uint32_t iundefsym;      /* index to undefined symbols */
    uint32_t nundefsym;      /* number of undefined symbols */

    uint32_t tocoff;         /* file offset to table of contents */
    uint32_t ntoc;           /* number of entries in table of contents */

    uint32_t modtaboff;      /* file offset to module table */
    uint32_t nmodtab;        /* number of module table entries */


    uint32_t extrefsymoff;   /* offset to referenced symbol table */
    uint32_t nextrefsyms;    /* number of referenced symbol table entries */


    uint32_t indirectsymoff; /* file offset to the indirect symbol table */
    uint32_t nindirectsyms;  /* number of indirect symbol table entries */


    uint32_t extreloff;      /* offset to external relocation entries */
    uint32_t nextrel;        /* number of external relocation entries */

    uint32_t locreloff;      /* offset to local relocation entries */
    uint32_t nlocrel;        /* number of local relocation entries */

}; </code></pre><p>There are a lot of fields, but only several of them are needed for object files.</p><ol><li><strong>ilocalsym + nlocalsym</strong> – local symbols are used only for debugging.</li><li><strong>iextdefsym + nextdefsym</strong> – external symbols.</li><li><strong>iundefsym + nundefsym</strong> – undefined symbols.</li></ol><p>Fields with i* prefix indicate index of the first entry in the symbol table, while n* holds the number of such symbols.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/board.png" class="kg-image" alt loading="lazy" width="1280" height="800" srcset="https://alexdremov.me/content/images/size/w600/2022/10/board.png 600w, https://alexdremov.me/content/images/size/w1000/2022/10/board.png 1000w, https://alexdremov.me/content/images/2022/10/board.png 1280w" sizes="(min-width: 720px) 720px"></figure><h2 id="relocations">Relocations</h2><p>Finally, all this structures were needed just to be able to do relocations. But why we even need them? Consider this assembly code:</p><pre><code class="language-c">call     ...   ; call function – external or internal
mov      rax, [rip + ...] ; load global variable</code></pre><p>In both of these cases address or offset is not known until the linking stage as segments will be rearranged, combined, and placed back in some order. Linker will substitute address or offset by the relevant one. Relocations information specifies where address must be changed, how it must be changed and for what symbol.</p><p>Relocations entry is defined as:</p><pre><code class="language-c">struct relocation_info {
   int32_t  r_address;        /* offset in the section to */
                              /* what is being relocated */
   uint32_t r_symbolnum:24,   /* symbol index if r_extern == 1 or
                              /* section ordinal if r_extern == 0 */
            r_pcrel:1,        /* was relocated pc relative already */
            r_length:2,       /* 0=byte, 1=word, 2=long, 3=quad */
            r_extern:1,       /* does not include value of sym referenced */
            r_type:4;         /* if not 0, machine specific relocation type */
};</code></pre><p>Do you remember that each section may have relocations and they are specified in corresponding field of section dtructure? Here are relocations themselves.</p><ol><li><strong>r_address</strong> – offset of value that is needed to be relocated from the start of the section.</li><li><strong>r_symbolnum</strong> – as symbol index in symbol table if r_extern == 1 or section ordinal (number) if r_extern == 0.</li><li><strong>r_pcrel</strong> – (1/0) Indicates whether the item containing the address to be relocated is part of a CPU instruction that uses PC-relative addressing. For addresses contained in PC-relative instructions, the CPU adds the address of the instruction to the address contained in the instruction.</li><li><strong>r_length</strong> – Indicates the length of item containing the address to be relocated. A value of zero indicates a single byte; a value of 1 indicates a 2-byte address, and a value of 2 indicates a 4-byte address.</li><li><strong>r_extern</strong> – (1/0) Indicates whether the r_symbolnum field is an index into the symbol table (1) or a section number (zero).</li><li><strong>r_type</strong> – Indicates the type of relocation to be performed. Possible values for this field are shared between this structure and the <code><a href="http://mirror.informatimago.com/next/developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/8rt_file_format/chapter_10_section_31.html?ref=alexdremov.me#//apple_ref/doc/uid/20001298/scattered_relocation_entry">scattered_relocation_info</a></code> data structure; see the description of the r_type field in the <code><a href="http://mirror.informatimago.com/next/developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/8rt_file_format/chapter_10_section_31.html?ref=alexdremov.me#//apple_ref/doc/uid/20001298/scattered_relocation_entry">scattered_relocation_info</a></code> data structure for more details. There are two most used values:</li><li><code>GENERIC_RELOC_SECTDIFF</code> – used for relative call addresses.</li><li><code>GENERIC_RELOC_PAIR</code> – used for global variable rip relative offset.</li></ol><p>Here’s an example of common relocation:</p><pre><code class="language-c">relocation_info relocation = {};
relocation.r_address = ...           /* some offset to the beginning */
                                     /* of relocatable address */
relocation.r_symbolnum = 0;          /* first symbol in symtab */
relocation.r_pcrel = 1;              /* let it be call instruction that */
                                     /* is PC-relative */
relocation.r_length = 2;             /* 4-bytes address */
relocation.r_extern = 1;             /* external symbol */
relocation.r_type   = GENERIC_RELOC_SECTDIFF;</code></pre><h2 id="cumulative-example">Cumulative example</h2><p>Here, I provide a code of constructing complete Mach-O object file with call to external function and call to internal function.</p><pre><code class="language-c">mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;

segment_command_64 segment = {};
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 0; /* to be modified */

section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  &lt;- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname &lt;- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;

const unsigned char code[] = {
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call &lt;address&gt; - someFuncExternal
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call &lt;address&gt; - someFunc
        0xB8, 0x01, 0x00, 0x00, 0x02,      // mov     rax, 0x2000001 ; exit
        0xBF, 0x00, 0x00, 0x00, 0x00,      // mov     rdi, 0
        0x0F, 0x05,                        // syscall
        // someFunc:
        0x48, 0x31, 0xC0,                  // xor rax, rax
        0xC3                               // ret
};

symtab_command symtabCommand    = {};
symtabCommand.cmd               = LC_SYMTAB;
symtabCommand.cmdsize           = sizeof(symtab_command);
symtabCommand.symoff            = 0;       /* to be modified */
symtabCommand.nsyms             = 0;       /* to be modified */
symtabCommand.stroff            = 0;       /* to be modified */
symtabCommand.strsize           = 0;       /* to be modified */

const char stringTable[]        = "\0_someFunc0\0_someFuncExternal0\0";

nlist_64 symbols[2] = {
        {
            1,                      // first index in string table
            N_SECT | N_EXT,         // defined in the file, available externally
            1,                      // first section
            REFERENCE_FLAG_DEFINED, // defined in the file
            4 * 5 + 2               // offset of this symbol in the section
        },
        {
            12,                      // second string in string table
            N_UNDF  | N_EXT,         // undefined in the file,
                                     // must be defined externally
            NO_SECT,                 // no section specified
            REFERENCE_FLAG_UNDEFINED_NON_LAZY, // external non-lazy symbol
            0                        // unused
        }
};

dysymtab_command dysymtabCommand      = {};
dysymtabCommand.cmd                   = LC_DYSYMTAB;
dysymtabCommand.cmdsize               = sizeof(dysymtabCommand);
dysymtabCommand.ilocalsym             = 0; // first symbol in symbol table
dysymtabCommand.nlocalsym             = 1; // only one locally defined symbol
dysymtabCommand.iextdefsym            = 1; // second symbol in symbol table
dysymtabCommand.nextdefsym            = 1; // only one externally defined symbol

relocation_info relocations[] = {
        {
            1,      // after first byte address to someFuncExternal
            1,      // second symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
        {
            6,      // second call address
            0,      // first symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
};

size_t offsetCounter = 0;
FILE* binary = fopen("object.o", "wb");

// Write header;
header.ncmds = 3; // segment + symtab + dysymtab
header.sizeofcmds = sizeof(segment) + sizeof(sectionText) + sizeof(symtabCommand) + sizeof(dysymtabCommand);
fwrite(&amp;header, 1, sizeof(header), binary);
offsetCounter += sizeof(header);

// Write segment
segment.vmsize  = segment.filesize = sizeof(code);
segment.fileoff = header.sizeofcmds + sizeof(header); // we'll place code just after all load commands.
segment.nsects  = 1;
fwrite(&amp;segment, 1, sizeof(segment), binary);
offsetCounter += sizeof(segment);

// Write section
sectionText.size   = segment.filesize;
sectionText.offset = segment.fileoff;
sectionText.reloff = segment.fileoff + segment.filesize; // just after the code
sectionText.nreloc = sizeof(relocations) / sizeof(relocations[0]); // two calls
fwrite(&amp;sectionText, 1, sizeof(sectionText), binary);
offsetCounter += sizeof(sectionText);

// Write symtab
symtabCommand.symoff = sectionText.reloff +
                        sectionText.nreloc * sizeof(relocation_info); // just after relocations
symtabCommand.nsyms = 2; // two functions
symtabCommand.stroff = symtabCommand.symoff +
                        symtabCommand.nsyms * sizeof(nlist_64); // just after symbol table
symtabCommand.strsize = sizeof(stringTable);
fwrite(&amp;symtabCommand, 1, sizeof(symtabCommand), binary);
offsetCounter += sizeof(symtabCommand);

// Write dysymtab
fwrite(&amp;dysymtabCommand, 1, sizeof(dysymtabCommand), binary);
offsetCounter += sizeof(dysymtabCommand);

// Write code
fwrite(&amp;code, 1, sizeof(code), binary);

// Write relocations
fwrite(&amp;relocations, 1, sizeof(relocations), binary);

// Write symbol table
fwrite(&amp;symbols, 1, sizeof(symbols), binary);

// Write string table
fwrite(&amp;stringTable, 1, sizeof(stringTable), binary);

fclose(binary);</code></pre><h2 id="references">References</h2><ol><li><a href="http://mirror.informatimago.com/next/developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/8rt_file_format/chapter_10_section_30.html?ref=alexdremov.me" rel="noreferrer noopener">Developer collection – relocation_info</a></li><li><a href="https://github.com/aidansteele/osx-abi-macho-file-format-reference?ref=alexdremov.me" rel="noreferrer noopener">Mach-O format reference OSX-ABI</a></li><li><a href="https://github.com/zhongjianfeipqy/MachOView?ref=alexdremov.me">MachOViewer – check out your file structure</a></li></ol> ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ Skip List Indexation and kth Maximum ]]></title>
                    <description><![CDATA[ Skip List is a nice structure that lets you to perform insertions, searches, and finding n-th maximum. In this post I focus on skip list indexation ]]></description>
                    <link>https://alexdremov.me/skip-list-indexation-and-kth-maximum/</link>
                    <guid isPermaLink="false">624eadbd2ced893c89076931</guid>
                    <category><![CDATA[ Algorithms ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Thu, 05 Nov 2020 23:49:32 +0100</pubDate>
                    <media:content url="https://alexdremov.me/content/images/2022/04/--------------2020-11-06---01.51.30.png" medium="image"/>
                    <content:encoded><![CDATA[ <p>Skip List is a nice structure that lets you to perform <code>O(logn)</code> insertions into sorted list,  <code>O(logn)</code> searches and <code>O(logn)</code> for finding n-th — second, third, fourth, ... — maximum or even calculating the rolling median. In this article I focus on indexation of skip list (indexable skip list).</p><p>The best guide I found was <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.524&rep=rep1&type=pdf&ref=alexdremov.me" rel="noreferrer noopener">“a skip list cookbook”</a> last revised in 1990. It slightly touched the problem of finding <strong>the kth element</strong>, but the provided algorithm is extremely vague and refers to unknown quantities without giving the information on how to find these quantities or update (ex. <code>fDistance[i]</code>). Wikipedia also talks about indexing, but an algorithm for calculating skip distances is not provided.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://alexdremov.me/content/images/2022/04/--------------2020-11-05---22.44.47.png" class="kg-image" alt="Skip list algo from old book" loading="lazy" width="1052" height="802" srcset="https://alexdremov.me/content/images/size/w600/2022/04/--------------2020-11-05---22.44.47.png 600w, https://alexdremov.me/content/images/size/w1000/2022/04/--------------2020-11-05---22.44.47.png 1000w, https://alexdremov.me/content/images/2022/04/--------------2020-11-05---22.44.47.png 1052w" sizes="(min-width: 720px) 720px"><figcaption>Cookbook searchByPosition algorithm</figcaption></figure><p>Therefore, I decided to create this post and provide an algorithm for indexing skip lists. Here, I’m going to give the code as well.</p><h2 id="about-skip-list">About skip list</h2><p>A skip list is a one-way linked list that has <strong>“express lanes”</strong> for reaching distant members. It is a probabilistic data structure: selecting the "height" of each node relies on random numbers. As a result, it provides <code>O(logn)</code> insert and search complexity.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/04/800px-Skip_list.svg_-768x180.png" class="kg-image" alt="Skip list" loading="lazy" width="768" height="180" srcset="https://alexdremov.me/content/images/size/w600/2022/04/800px-Skip_list.svg_-768x180.png 600w, https://alexdremov.me/content/images/2022/04/800px-Skip_list.svg_-768x180.png 768w" sizes="(min-width: 720px) 720px"></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Fast lanes change complexity of search, insert, and indexation from <code>O(n)</code> to <code>O(logn)</code></div></div><p>Each node has a link to the right node on the same level and a link to the bottom node that has the same value, but one level lower. The first layer doesn’t have a bottom link. Some nodes don’t have the right node. We consider the null right node as \(+\infty\) and head as \(-\infty\).</p><p>To search for an element, we start at the left top corner and move: right if the right element is lower or equals to the needed element or move down if it is bigger than the required element.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">If the required element is not presented in the list, we end up in the potential position for the insertion.</div></div><p>Indexing skip list allows us to calculate the rolling median of set in <code>O(logn)</code> and to find n-th minimum or maximum in <code>O(logn)</code> also!</p><h2 id="defining-the-node">Defining the node</h2><p>Each node is going to be:</p><pre><code class="language-cpp">template&lt;typename T&gt;
struct TreeNode {
    T             key;
    unsigned      level     = 1;
    bool          headNode  = false;
    bool          deleted   = false;
    size_t        skipDist  = 0;
    TreeNode&lt;T&gt;*  right     = null;
    TreeNode&lt;T&gt;*  down      = null;
}</code></pre><ul><li><code>key</code> – stored value</li><li><code>level</code> – the level of the node</li><li><code>headNode</code> – is this node is the head node</li><li><code>deleted</code> – the node is marked as deleted</li><li><code>skipDist</code> – distance skipped</li><li><code>right</code> – right node</li><li><code>down</code> – down node</li></ul><p>We need to define the <code>deleted</code> mark as we can’t delete the node immediately due to the fact that the list is one-way linked. We just can’t update the left to the deleted one’s member. On the other hand, such a feature is useful in multi-threaded projects.</p><p>On this figure you can see what <code>skipDist</code> means:</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/04/800px-Skip_list.svg_-------768x180.png" class="kg-image" alt="Skip list with skip distances" loading="lazy" width="768" height="180" srcset="https://alexdremov.me/content/images/size/w600/2022/04/800px-Skip_list.svg_-------768x180.png 600w, https://alexdremov.me/content/images/2022/04/800px-Skip_list.svg_-------768x180.png 768w" sizes="(min-width: 720px) 720px"></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">Basically, <code>skipDist</code> counts how many nodes will be skipped if you travel by the according <em>fast lane</em></div></div><h2 id="defining-skip-list">Defining skip list</h2><pre><code class="language-cpp">template&lt;typename T&gt;
class SkipList {
    unsigned     maxLevels;
    TreeNode&lt;T&gt;* head;
}</code></pre><p>Pretty much self-explanatory structure.</p><p>On initialisation, we create <code>maxLevels</code> number of head nodes:</p><pre><code class="language-cpp">this-&gt;head = new TreeNode&lt;T&gt;(0, this-&gt;maxLevels);
this-&gt;head-&gt;headNode = true;

TreeNode&lt;T&gt;* pos = this-&gt;head;
for(unsigned i = 1; i &lt; maxLevels; ++i) {
    TreeNode&lt;T&gt;* newNode = new TreeNode&lt;T&gt;(0, this-&gt;maxLevels - i);
    newNode-&gt;headNode = true;
    
    pos-&gt;down = newNode;
    pos = newNode;
}</code></pre><h2 id="insert">Insert</h2><p>The hardest part of the insert algorithm is to update skip distances and to create upper-level nodes if random coin said so.</p><p>As I discussed previously, we do not delete elements, but rather mark them as deleted. therefore, before processing, we need to perform deletions. Let it be some function <code>processDeletions(node)</code>. It finally deletes the element right to the node if it was marked as deleted.</p><p>Also, to check for cases when the right node is null, I created a function that compares node value to the key value.</p><pre><code class="language-cpp">// key &lt; node
int compareWithNode(T key, TreeNode&lt;T&gt;* node){
    if (node == nullptr)
        return -1;
    if (key == node-&gt;key)
    	return 0;
    return key &lt; node-&gt;key ? -1 : 1;
}</code></pre><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text"><code>compareWithNode</code> returns <strong>0</strong> if values are equal, <strong>-1</strong> if the key is lower than the node value, and <strong>1</strong> if the key value is higher than the node value</div></div><p>As discussed before, if we encounter a null node, then we consider it as +inf⁡.</p><p>To perform all desired operations, the insert function is going to accept the current node, desired key, a pointer to the inserted node (if any), current position. We need to have a pointer to the inserted node for two reasons: to know on the higher levels whether the node was inserted at all, and we need the link to the bottom if we generate a “fast lane” node.</p><p>Also, let the function return bool value: whether the node was inserted on the previous level. If it was inserted, then we can<strong> flip the coin again</strong> and insert the “fast lane” node again on the current level.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">As I said, the algorithm relies on randomness. Decision whether new fast lane will be created is based on random coin.</div></div><pre><code class="language-cpp">bool insertRecursive(TreeNode&lt;T&gt;* node,
                     T key,
                     TreeNode&lt;T&gt;** insertedOne,
                     unsigned* pos) {
    this-&gt;processDeletions(node);
    
    int compareRight = compareWithNode(key, node-&gt;right);
    // save position at the current recursion level
    unsigned posHere = *pos; 
    
    if (compareRight == 0 ||
       (node-&gt;key == key &amp;&amp; node-&gt;headNode != true))
        return false;</code></pre><p>In the case, if the right node value is equal to the desired or the current node value is equal to the desired and this node is not the head node, the function returns with false as no insertions were needed.</p><p>Finally, if the right node’s value is lower than the desired, the function just increases the pos counter for the right node’s skip distance + 1 and dives deeper.</p><pre><code class="language-cpp">if (compareRight == 1) { // right elem is lower
    *pos += node-&gt;right-&gt;skipDist + 1;
    return insertRecursive(node-&gt;right,
                           key,
                           insertedOne,
                           pos);
}</code></pre><p>Interesting things happen if we go down. First of all, if we need to go lower and it’s the very first level, then we simply insert the node and return true as the node was inserted.</p><pre><code class="language-cpp">else { // (compareRight == -1) // want go down
    if (node-&gt;level == 1) {
        *(insertedOne) = new TreeNode&lt;T&gt;(key, 1, ++(this-&gt;ids));
        (*(insertedOne))-&gt;right = node-&gt;right;
        node-&gt;right = *(insertedOne);
        return true;
}</code></pre><p>If we can go down, then some cases are needed to be considered.</p><figure class="kg-card kg-image-card"><img src="https://media.tenor.com/images/0370865dc28ad806626731f7f7dbdf09/tenor.gif" class="kg-image" alt loading="lazy" width="498" height="280"></figure><p>We need to go deeper. That means that on the current level the right node’s value is higher than the desired, so the insertion is going to occur before the right node. That means that the right node’s <strong>skip distance is going to be increased by 1.</strong></p><p>Also, if on the current level we insert a fast lane node, then the right node’s skip distance is shortened by the skip distance of the inserted node.</p><pre><code class="language-cpp">else {
    // whether there was an insertion on the deeper level
    bool possibleLevelInsert = insertRecursive(node-&gt;down,
                               key, insertedOne, pos);
                               
    if (!possibleLevelInsert ) {
        // if insertion of fast lane is impossible
        if (node-&gt;right != nullptr &amp;&amp; *insertedOne != nullptr) {
        
        // if right node on the current level is presented
        // and we inserted the node (*insertedOne != nullptr)
            node-&gt;right-&gt;skipDist++;
            
         }
         
     return false; // insert of further fast lanes is impossible
     }
     </code></pre><p>At this point, we know that we can insert a fast lane node on the current level. Let’s spin a coin and decide.</p><pre><code class="language-cpp">if(node-&gt;level == 1) // trivial case
    return true;
    
bool insertNow = this-&gt;spinACoin();
if (!insertNow){
	// no fast lane insertion -&gt; increase the next
    // fast lane skip distance as the node was inserted somewhere between.
    if (node-&gt;right != nullptr){
         node-&gt;right-&gt;skipDist++;
    }
    return false;
}
// Can insert the fast lane node

TreeNode&lt;T&gt;* newNode = new TreeNode&lt;T&gt;(key, node-&gt;level);
newNode-&gt;down = *(insertedOne);
newNode-&gt;right = node-&gt;right;
newNode-&gt;skipDist = (*pos - posHere);

// *pos stopped updating at the insertion position.
// At the beginning, we saved temporary pos at the current recursion level.
if (node-&gt;right != nullptr) {
   // shrink right node skip distance as we inserted new fast lane node
   node-&gt;right-&gt;skipDist -= newNode-&gt;skipDist;
}
node-&gt;right = newNode;
*(insertedOne) = newNode;
return true;</code></pre><p>This was massive code. Final insert function:</p><pre><code class="language-cpp">bool insertRecursive(TreeNode&lt;T&gt;* node, T key, TreeNode&lt;T&gt;** insertedOne, unsigned* pos){
    this-&gt;processDeletions(node);
    int compareRight = compareWithNode(key, node-&gt;right);
    unsigned posHere = *pos;
    if (compareRight == 0 || (node-&gt;key == key &amp;&amp; node-&gt;headNode != true))
        return false;
    if (compareRight == 1){ // right elem is lower
        *pos += node-&gt;right-&gt;skipDist + 1;
        return insertRecursive(node-&gt;right, key, insertedOne, pos);
    } else {// (compareRight == -1) // want go down
        if (node-&gt;level == 1){
            *(insertedOne) = new TreeNode&lt;T&gt;(key, 1);
            (*(insertedOne))-&gt;right = node-&gt;right;
            node-&gt;right = *(insertedOne);
            return true;
        } else {
            bool possibleLevelInsert = insertRecursive(node-&gt;down, key, insertedOne, pos);
            if (!possibleLevelInsert ){
                if (node-&gt;right != nullptr &amp;&amp; *insertedOne != nullptr){
                    node-&gt;right-&gt;skipDist++;
                }
                return false;
            }
            if(node-&gt;level == 1)
                return true;
            bool insertNow = this-&gt;spinACoin();
            if (!insertNow){
                if (node-&gt;right != nullptr){
                    node-&gt;right-&gt;skipDist++;
                }
                return false;
            }
            TreeNode&lt;T&gt;* newNode = new TreeNode&lt;T&gt;(key, node-&gt;level);
            newNode-&gt;down = *(insertedOne);
            newNode-&gt;right = node-&gt;right;
            newNode-&gt;skipDist = (*pos - posHere);
            if (node-&gt;right != nullptr) {
                node-&gt;right-&gt;skipDist -= newNode-&gt;skipDist;
            }
            node-&gt;right = newNode;
            *(insertedOne) = newNode;
            return true;
        }
    }
}</code></pre><h2 id="process-deletions">Process deletions</h2><p>The only thing that left is to define the <code>processDeletions(node)</code> function. If the right node is marked as deleted, then we need to update <strong>right to the right node</strong> skip distance. Also, at first, it’s needed to go to the deepest level of recursion and perform alterations from the end to the start.</p>
<aside class="gh-post-upgrade-cta no-ads">
  <div class="gh-post-upgrade-cta-content" style="background-color: #73926C">
      <h2>This post is for free subscribers only</h2>
      <h4>Subscribe for free now and continue to read the post</h4>
      <a class="gh-btn" data-portal="signup" style="color:#73926C">Subscribe now</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>
  </div>
</aside>
 ]]></content:encoded>
                </item>
                <item turbo="true">
                    <title><![CDATA[ How Deep Neural Networks Work ]]></title>
                    <description><![CDATA[ Here, I combine the explanation of Neural Nets with coding. By the end, we will develop a basic neural network and try to solve usual problems ]]></description>
                    <link>https://alexdremov.me/how-deep-neural-networks-train/</link>
                    <guid isPermaLink="false">624eadbd2ced893c8907692e</guid>
                    <category><![CDATA[ Machine Learning ]]></category>
                    <dc:creator><![CDATA[ Alex Dremov ]]></dc:creator>
                    <pubDate>Fri, 08 May 2020 04:25:40 +0200</pubDate>
                    <media:content url="https://alexdremov.me/content/images/wordpress/2020/05/90.jpeg" medium="image"/>
                    <content:encoded><![CDATA[ <h2 id="introduction">Introduction</h2><p>Today, when such beautiful frameworks as <strong>Keras</strong>, <strong>Tensorflow</strong>, <strong>SkLearn</strong> exist, many people are not worried about how Neural Network models work and train. However, when interested people start to dig and search for explanations, they usually face unreasonably significant amounts of linear algebra thrown right into the face without any practical information.</p><p>At least, it was my case. Decided to understand Neural Networks, I enrolled in a local university online course. I watched lecture after lecture, noted everything necessary, and from the bottom of my heart waited for practical information and possible algorithms implementation.</p><p>The course ended, and I was left with a thick notebook of linear algebra, calculus, and no understanding of what can I do with all this information.</p><p>However, I don’t want somebody else to walk on the same road as me, so I decided to write this article. e.g. “Hello, world” in Neural Nets.</p><p><strong>Side note:</strong> in this guide, I will not explore deeply program architecture and good Python practices as it is not the primary purpose of the article.</p><h2 id="single-neurone-what-is-it">Single neurone: what is it?</h2><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown.png" class="kg-image" alt="Linear regression" loading="lazy" width="598" height="579"></figure><p>We find a line that approximates our points the best. The line can be described by the following equation:</p><p>\[ y(x) = wx + b \]</p><p>By adjusting \(w\) and \(b\) we can make our line the best fit for the current points distribution.</p><p>And that’s actually what every single neuron in basic Neural Net does. The big difference is that line of best fit presented on the image is in 2D. In the real world, algorithms solve problems in multidimensional space.</p><p>For example, if you would like to predict who survives after the Titanic tragedy, you could take into account such parameters as age, fare, sex, number of siblings, etc. See that we already have 4 dimensions to work with. However, you should not be scared of that. A lot of concepts that work in 3D or 2D can also be applied to multidimensional space.</p><h2 id="classification-problem">Classification problem</h2><p>Let’s continue to work on the Titanic survival chance problem and imagine that our neuron already knows the line of best fit. The problem of binary dependent variable classification (survived/did not survive) names Logistic Regression.</p><p>The problem is that line is not limited, but probability can’t be lower than zero and higher than one. Here comes a sigmoid function.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-1.png" class="kg-image" alt="Sigmoid function" loading="lazy" width="885" height="575" srcset="https://alexdremov.me/content/images/size/w600/2022/10/Unknown-1.png 600w, https://alexdremov.me/content/images/2022/10/Unknown-1.png 885w" sizes="(min-width: 720px) 720px"></figure><p>\[ y(z) = \frac{1}{1 + e^{-z}} \]</p><p>As you see, the function is limited by 0 and 1. So, we will use it to adjust the neuron output. The name of the function that sets neuron output basing on a linear part is an “<strong>activation</strong> function”.</p><h2 id="how-this-works-with-multidimensions">How this works with multidimensions </h2><p>The same problem is a little bit different when \(x\) has multiple dimensions – vector. Then, every component has a different effect on the final result, so \(w\) also should be a vector.</p><p>Formula in vector form:</p><p>\[ z = w^{T}x + b\]</p><p>We set \(w\) as column-vector and \(x\) as column-vector. Therefore, to get scalar, we transpose the \(w\) vector. How it works:</p><p>\[ x = \begin{bmatrix} x_1\\ x_2\\ \ldots \\ x_n\\ \end{bmatrix} w = \begin{bmatrix} w_1\\ w_2\\ \ldots;\\ w_n\\ \end{bmatrix} \] \[ w^{T}*x = \begin{bmatrix} w_1, w_2, \ldots, w_n \end{bmatrix} * \begin{bmatrix} x_1\\ x_2\\ \ldots;\\ x_n\\ \end{bmatrix} =\] \[ w_1x_1 + w_2x_2 + \ldots\]</p><p>If you feel a little bit uncomfortable with the expression above, repeat basic matrix multiplication.</p><p>That's it. This is how a single basic neuron works. It takes input, multiplies it by \(w\), adds \(b\) (just a number), applies activation function, and sends computed value further. This process is named forward propagation. Now, let's implement this in code.</p><h2 id="forward-propagation-in-code">Forward propagation in code</h2><p>I will use NumPy for basic operations. Of course, you can implement matrix multiplication, addition, etc. by yourself, but NumPy does it more effectively and faster as it’s already compiled.</p><pre><code class="language-python">import numpy as np</code></pre><p>Sigmoid function:</p><pre><code class="language-python">def sigmoid(z):
    return 1 / (1 + np.exp(-z))</code></pre><p>Forward propagation function:</p><pre><code class="language-python">def forward_propagation(w, b, x):
    z = np.dot(w.T, x) + b # np.dot(..., ...) — matrix multiplication
    return sigmoid(z)</code></pre><p>Great. Now we can calculate forward propagation of a single neuron. But how we figure out \(w\) and \(b\) values?</p><h2 id="loss-function">Loss function</h2><p>To understand how well our algorithm performs, we need to define a loss function. For purposes of binary classification &lt;strong&gt;logarithmic loss&lt;/strong&gt; performs well. So, we will use it.</p><p>\[ L(\widehat{y}, y) = -(y ln(\widehat{y}) + (1-y)ln(1-\widehat{y}) \]</p><p>That's how it looks:</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/XUYY3761.gif" class="kg-image" alt loading="lazy" width="750" height="580" srcset="https://alexdremov.me/content/images/size/w600/2022/10/XUYY3761.gif 600w, https://alexdremov.me/content/images/2022/10/XUYY3761.gif 750w" sizes="(min-width: 720px) 720px"></figure><p>\( \hat{y} \) represents computed value, \(y\) – actual</p><pre><code class="language-python">def loss(A, Y):
    return -(Y * np.log(A) + (1 - Y) * np.log(1 - A))</code></pre><p>As you see, loss tends to infinity when \(\hat{y}\) and \(y\) are different, but it's 0 when they are exactly the same.</p><p>To train model, we use labeled information: pairs of \(x , y\).  \(x\) represents input vector and \(y\) – desired output for this vector. If we stack all available \(x\) into a single matrix, we'll create \(X\) – a matrix that contains all training data. We can do the same thing for \(Y\)</p><p>\[ X = \begin{bmatrix} | &amp;&amp; | &amp;&amp;  &amp;&amp; |\\  x_1 &amp;&amp; x_2 &amp;&amp; \ldots &amp;&amp; x_m\\ | &amp;&amp; | &amp;&amp;  &amp;&amp; | \end{bmatrix}  \]</p><p>\[ Y = \begin{bmatrix} | &amp;&amp; | &amp;&amp;  &amp;&amp; |\\  y_1 &amp;&amp; y_2 &amp;&amp; \ldots &amp;&amp; y_m\\ | &amp;&amp; | &amp;&amp;  &amp;&amp; | \end{bmatrix}  \]</p><p>If we have \(m\) samples and every x vector is \(n\)-dimensional, then we can calculate the cost of the algorithm with selected \(w\) and \(b\).</p><p>\[ J(w,b) = \frac{1}{m} \sum_{i=1}^{m} L(\hat{y}^{(i)}, y^{(i)}) \]</p><pre><code class="language-python">def cost(A, Y, m):
    return 1 / m * np.sum(loss(A, Y))</code></pre><h2 id="vectorization">Vectorization</h2><p>Currently, we can calculate forward propagation for single (x, y) set and we need to calculate values for all \(m\) available training pairs. The most obvious answer is to start the for loop and calculate iteratively. But this is not the optimal case. We can use \(X\) and \(Y\) matrices to calculate forward propagation for the entire training set. That's how it works:</p><p>\[ X^{T}w + b = \begin{bmatrix} – &amp;&amp; x_1 &amp;&amp; – \\ – &amp;&amp; x_2 &amp;&amp; – \\ &amp;&amp; … &amp;&amp; \\ – &amp;&amp; x_m &amp;&amp; –\end{bmatrix} * \begin{bmatrix} w_1\\ w_2\\ …\\ w_n\\ \end{bmatrix} + b = \]<br>\[ = \begin{bmatrix} x_1^{T}w + b \\ x_2^{T}w + b \\ … \\ x_m^{T}w+ b\\ \end{bmatrix} \]</p><p>As you see, every row represents forward propagation for every training set.</p><p>This approach optimizes code and speeds up calculations. Whenever possible, vectorize code. The same technique can be applied during backpropagation.</p><h2 id="in-the-core-of-learning-backpropagation">In the core of learning: Backpropagation</h2><p>At this moment we can calculate neuron output and estimate how close it to the actual value. But how we can figure out \(w\) and \(b\) values? Here comes a Gradient Descent concept.</p><p>The best explanation of GradDescent I ever heard:</p><blockquote class="kg-blockquote-alt">It’s like you are trying to find a door in a completely dark room and you can only “feel” in what direction to move</blockquote><p>Imagine that you have some function, but you do not know it’s expression. And shape. And you are in multidimensional space. Then, you randomly placed at some point of this function and asked to find its minimum. Not the most pleasant situation, right? However, you know how your position was calculated, so you can find a derivative. But what derivative gives? Let’s take a look at this function.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-3.png" class="kg-image" alt="Gradient descent visual" loading="lazy" width="597" height="575"></figure><p>Using derivative we can find direction to the function’s minimum. That’s how it looks animated:</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/1-1.gif" class="kg-image" alt loading="lazy" width="600" height="450" srcset="https://alexdremov.me/content/images/2022/10/1-1.gif 600w"></figure><p>We can take steps in an outlined direction and finally reach a minimum loss point. That’s how we can implement this to adjust \(w\) and \(b\)</p><p>\[w^{new} = w^{old} – \alpha \cdot \frac{\partial{J(w, b, x)}}{\partial{w}}\]</p><p>\[b^{new} = b^{old} – \alpha \cdot \frac{\partial{J(w, b, x)}}{\partial{b}}\]</p><p>Where \(\alpha\) is a learning rate. You can view derivative calculation in the spoiler, here are final expressions:</p><p>\[ \frac{\partial{J}}{\partial{z}} = \hat{Y} – Y \] \[ \frac{\partial{J(w, b, x)}}{\partial{w}} = \frac{1}{m} X*(\frac{dJ}{dz})^{T} = \frac{1}{m} X*(\hat{Y}-Y)^{T} \] \[ \frac{\partial{J(w, b, x)}}{\partial{b}} = \frac{1}{m} \sum_{i}^{n}{\sum_{j}^{m}{(\frac{dJ}{dz})_{ij}}} =\] \[ \frac{1}{m} \sum_{i}^{n}{\sum_{j}^{m}{(\hat{Y}-Y)_{ij}}} \]</p><p>Do not worry. It all looks a lot better in code. Further, I will use a different notation: \(\frac{\partial{J(w, b, x)}}{\partial{w}}\) as \(dw\), \(\frac{\partial{J(w, b, x)}}{\partial{b}}\) as \(db\), etc. Also, it’s common to name \(\hat{Y}\) as \(A\) because it represents activation function value.</p><p>\[\frac{\partial{L}}{\partial{A}} = \frac{-Y}{A} + \frac{1-Y}{1-A}\]</p><p>\[\frac{dA}{dZ} = \frac{-e^{-Z}}{(1+e^{-Z})^{2}} = \sigma(Z)(1-\sigma(Z))\]</p><p>\[A =  \sigma(Z)\]</p><p>\[\frac{\partial{L}}{\partial{Z}} = \frac{\partial{L}}{\partial{A}} * \frac{\partial{A}}{\partial{Z}} =\]</p><p>\[=  (1 – A)(-Y) + (1 – Y)A = A – Y\]</p><pre><code class="language-python">dZ = A - Y
db = 1 / m * np.sum(dZ)
dw = 1 / m * np.dot(X, dZ.T)</code></pre><p>Then, single back propagation step can be represented in this function:</p><pre><code class="language-python">def backpropagation(w, b, X, A, Y, learning_rate, m):
    dZ = A - Y
    db = 1 / m * np.sum(dZ)
    dw = 1 / m * np.dot(X, dZ.T)
    assert(dw.shape == w.shape)
    w = w - learning_rate * dw
    b = b - learning_rate * db
    return w, b</code></pre><h2 id="initialization">Initialization</h2><p>To adjust \(w\) and \(b\), we need to have starting point. We are going to initialise \(w\) and \(b\) with zeros</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">💡</div><div class="kg-callout-text">We can initialise parameters with zeros when we have just one neuron. This approach does not work if there are several neurons and layers. If we initialize them with 0, then all neurons will develop in the same way and the whole network becomes almost useless.</div></div><p>Initialisation:</p><pre><code class="language-python">w = np.zeros((n, 1))
b = np.zeros((n, 1))</code></pre><p>Finally, we can write full neuron learning code.</p><pre><code class="language-python">def model(X, Y, learning_rate=0.1, n_iter=2000, costIter = [[],[]]):
    m = X.shape[1]
    n = X.shape[0]
    w = np.zeros((n, 1))
    b = 0
    for i in range(n_iter):
        A = forwardpropagation(w, b, X)
        c = cost(A, Y, m)
        if i % 5 == 0:
            print("Iteration %s: %s" % (i, c))
        costIter[0].append(i)
        costIter[1].append(c)
        w, b = backpropagation(w, b, X, A, Y, learning_rate, m)
    return w, b</code></pre><p>In this code, we combine all previous steps:</p><!--kg-card-begin: html--><ul><li>Initialize all parameters</li><li>Start a loop</li><li>Perform forward propagation</li><li>Calculate cost</li><li>Print some data / save into array</li><li>Perform backpropagation step and update parameters</li></ul>
<!--kg-card-end: html--><p>Finally, the model returns optimal \(w\) and \(b\) values so that we can use them to predict answers for new values.</p><h2 id="testing">Testing</h2><p>For testing, I selected a line</p><p>\[y = 1.23x + 3.23\]</p><p>Let points above the line be blue and ones that below – red. Here is the set that I gave to the model for training.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-4.png" class="kg-image" alt loading="lazy" width="370" height="248"></figure><p>Training:</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-5.png" class="kg-image" alt loading="lazy" width="372" height="248"></figure><p>As we see, the cost minimizes overtime. That means that backpropagation works correctly and our \(w\) and \(b\) are adjusted right.</p><p>To check how well the algorithm performs, I randomly generated 2000 points and requested neuron to classify them.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-6.png" class="kg-image" alt loading="lazy" width="377" height="248"></figure><p>That’s how the algorithm performed. The green line represents the actual line.</p><figure class="kg-card kg-image-card"><img src="https://alexdremov.me/content/images/2022/10/Unknown-7.png" class="kg-image" alt loading="lazy" width="377" height="248"></figure><p>The accuracy is around 99%. I suppose it misclassified ~1% due to the points that lie directly on the line.</p><h2 id="what-s-special-about-this-classifier">What’s special about this classifier?</h2><p>So one neuron approximates some linear function. How can it distinct cats from dogs, survived from not survived?</p><p>By combining neuron in stacks and in layers, we form complicated linear functions compositions, and then we can approximate sophisticated multidimensional functions that find subtle dependencies and relations during training.</p><p>But single neuron and backpropagation concept lie in the heart of the whole process.</p> ]]></content:encoded>
                </item>

    </channel>
</rss>