Alex Dremov

Simple Ways to Speed Up Your PyTorch Model Training

Alex Dremov — Tue, 28 May 2024 23:16:11 +0300

Does this topic even need an introduction?

Speeding up machine learning model training is one thing that all machine learning engineers want. Faster training equals faster experiments equals faster iterations for your product. Also, it means that one model training will require fewer resources. So, straight to the point

Containerization

Yes, this will not speed up your training on its own. But this targets another important aspect — reproducibility. Sometimes virtualenv with fixed library versions is enough, but I encourage you to take one step further and build an all-in-one docker container for your model training.

This ensures that the environment is fully consistent during debugging, profiling, and final training. The last thing you want is to optimize a part of code that is no longer a bottleneck due to python12 speed up, for example. Or even a bug that is not reproducible on different CUDA versions.

As a starting point, you can use pre-built images from NVIDIA. They already have CUDA, PyTorch, and other popular libs installed:

PyTorch | NVIDIA NGC

PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.

NVIDIA NGC Catalog

💡

A Docker container is the ultimate solution for problems like
"Hey, it works on my machine. I have no idea why it doesn't on yours."

Get comfortable with PyTorch profiler

Before optimizing anything, you have to understand how long some parts of your code run. Pytorch profiler is almost an all-in-one tool for profiling training. It's able to record:

CPU operations timings
CUDA kernels timings
Memory consumption history

That's all you need. And it's easy to enable!

To record events, all you need is to embed training into a profiler context like this:

import torch.autograd.profiler as profiler

with profiler.profile(
  activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
  on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs'),
) as prof:
  train(args)

After that, you can launch the tensorboard and view profiling traces. Do not forget to install torch-tb-profiler.

PyTorch Profiler With TensorBoard — PyTorch Tutorials 2.3.0+cu121 documentation

Profiler has a lot of different options, but the most important are activities and profile_memory. You can experiment with other options, but keep in mind a simple rule: the fewer options you've enabled, the less overhead you have.

So, if you want to profile CUDA kernel execution timings, it is a good idea to turn off CPU profiling and all other features. In this mode, profiling will be as close to the real execution as possible.

To make traces easier to understand, consider adding profiling contexts that describe core parts of your code. If profiling is not enabled, those are no-op.

with profiler.record_function("forward_pass"):
  result = model(**batch)

with profiler.record_function("train_step"):
  step(**result)

This way, the labels that you use will be visible in traces. So, it will be easier to identify code blocks. Or even more granular inside mode's forward:

with profiler.record_function("transformer_layer:self_attention"):
  data = self.self_attention(**data)

...

with profiler.record_function("transformer_layer:encoder_attention"):
  data = self.encoder_attention(**data, **encoder_data)

Understanding PyTorch traces

After you gather traces, open them in the tensorboard. That's what the CPU + CUDA profile looks like:

source: https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html

Straight away, find the core parts of any training:

data loading
forward pass
backward pass

Backward pass is handled by PyTorch in a separate thread (thread 16893 on the image above), so it is easy to identify.

Data loading

For data loading, we want near-zero timings.

No compromises.

That's because during data loading GPU does nothing, which under-utilizes available resources. However, data processing can be overlapped with GPU computing as those are independent parts.

You can easily identify areas where GPU is idle — just look at GPU Est. SM Efficiency and GPU Utilization figures in the profiler's trace. Areas with zero activity are our patients. That's where GPU does nothing.

A simple solution for that is:

process data in the background process (no GIL)
process data augmentations and transforms in parallel processes

If you use PyTorch DataLoader, then it can be easily achieved by specifying num_workers. It's more complicated if you use IterableDataset, as then data will be duplicated. However, this issue still can be solved by using get_worker_info() — you need to adjust iteration in a way so that each worker receives different, non-intersecting rows.

For more configurable processing, you may consider implementing multi-process transforms yourself with multiprocessing

💡

If you never checked your code's data processing speed, then this slight modification can yield dramatic speedups

Subscribe and don't miss posts!

Making friends with memory allocator

You want to be friends with PyTorch's CUDA caching allocator.

When you allocate tensors with PyTorch on a CUDA device, PyTorch will use a caching allocator. That's because cudaMalloc/cudaFree are expensive operations that we want to avoid, so PyTorch has its allocator that will try to reuse previously allocated through cudaMalloc blocks. That is, if PyTorch's allocator has an appropriate block available, it will give it straight away without calling cudaMalloc. That way, cudaMalloc is called only at the beginning.

However, if you're dealing with data of variable length, different forward passes will require intermediate tensors of different sizes. So, PyTorch's allocator may not have an appropriate block of data available. In this case, the allocator panics and releases allocated previously bocks by calling cudaFree to free up space for new allocations.

After that, the allocator starts building its cache again, doing tons of cudaMalloc, which is an expensive operation. You can spot this problem by looking at the memory profiler section of the tensorboard profiler viewer.

💡

You also can spot this problem in the traces. It will be visible as calls to cudaMalloc and cudaFree

PyTorch allocator freaks out

As you see, a red line that corresponds to the allocator's reserved memory constantly changes. That means that PyTorch allocator is not able to efficiently handle allocation requests.

When allocations are handled without the allocator panicking, the red line is completely straight

PyTorch allocator works as expected

As I said, that is usually due to variable shapes of tensors. How to fix that?

Expandable Segments

The first thing that is worth trying is to set PyTorch's relatively new allocator mode:

PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"

If set to True, this setting instructs the allocator to create CUDA allocations that can later be expanded to better handle cases where a job changes allocation sizes frequently, such as having a changing batch size.

So, this tells PyTorch allocator to allocate blocks that could be expanded in the future, which is exactly our case. Though, if size variations are too big, it still may fail to solve the issue. In this case, move to the next option.

Make allocations variate less

Another possible solution is to make data shapes consistent. That way it will be easier for the allocator to find an appropriate data block to reuse.

To accomplish that, you may pad data to the same sizes. Or you can preheat the allocator by running a model with maximum input sizes.

You can learn more about PyTorch allocator modification in the following article

CUDA semantics — PyTorch 2.3 documentation

A guide to torch.cuda, a PyTorch module to run CUDA operations

Tidy up allocations history

We want to use all available GPU memory — that allows us to run big batches and process data faster. However, at some point, you will encounter a CUDA out-of-memory error when increasing batch size. What causes this error?

To debug this, we can view the allocator's memory history. It can be recorded through PyTorch and then visualized at https://pytorch.org/memory_viz

Start: torch.cuda.memory._record_memory_history(max_entries=100000)
Save: torch.cuda.memory._dump_snapshot(file_name)
Stop: torch.cuda.memory._record_memory_history(enabled=None)

Visualization will draw something like this:

source: https://pytorch.org/blog/understanding-gpu-memory-1/

The x-axis represents time, the y-axis represents total used memory, and colourful blocks represent tensors. So, it shows when tensors were allocated and when it was released.

You may notice narrow spikes — those are short-lasting tensors that take up a lot of space. By clicking on a tensor, you can get information on where this tensor was allocated. We want to minimize those spikes as they limit efficient memory usage. Check out what caused this spike and consider other ways of computing what you intended.

Apart from spikes, it's easy to detect memory leaks:

source: https://pytorch.org/blog/understanding-gpu-memory-1/

As you see, some data after the first forward is not cleared. By clicking on blocks you can get the idea where these tensors come from. In the image is the case when gradients are not cleared after the training step, so they lay dead during the forward pass, limiting the ability to increase the batch size to fit more data.

Understanding GPU Memory 1: Visualizing All Allocations over Time

During your time with PyTorch on GPUs, you may be familiar with this common error message:

PyTorchAaron Shi, Zachary DeVito

Speed up the model and use less memory

What can be better than this? We can achieve so by using the FlashAttention kernel for calculating dot-product attention.

GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention

Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.

GitHubDao-AILab

If you haven't heard about it, it is a way of calculating precise dot product attention without constructing the attention matrix explicitly. That optimizes GPU's io operations which improves speed and also dramatically minimizes memory consumption. There's simply no reason not to use it.

😡

Unfortunately, there's one reason not to use it — hardware.

Flash attention only works with fp16 and bf16 precision on compatible hardware. That is NVIDIA Ampere, Hooper, etc

Other libraries use flash attention under the hood, so you may consider using other variants that better fit your codebase.

XFormers

GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.

Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers

GitHubfacebookresearch

Transformer Engine

GitHub - NVIDIA/TransformerEngine: A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

GitHubNVIDIA

PyTorch itself!

That is true, new versions of PyTorch may use flash attention when applicable. To activate this mode, you need to execute attention blocks in the context manager that specify which attention strategy to use:

torch.nn.functional.scaled_dot_product_attention — PyTorch 2.3 documentation

Optimize multi-GPU data redundancy — FSDP

If you use multiple GPUs to run your training, the basic solution is to use the DistributedDataParallel class. This way, several identical processes are spawned, and gradients are aggregated during the backward step.

However, that is sub-optimal!

The problem is as we spawned identical processes, then we have identical models and optimiser states on each GPU, which is redundant. The solution is to shard data across. We can do so using the Fully Sharded Data Parallel PyTorch wrapper.

source: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html

How does it work?

As I said, when training on several GPUs, each process has exact copies of the same data when training with DDP. We can optimize it, by implementing several enhancements:

Shard optimizer state (ZeRO 1)

When training with DDP, each process holds a complete copy of the optimizer states. With ZeRO1, we shard these optimizer states across all ranks such that each rank holds only a portion of the optimizer states. During the backward pass, each rank only needs to gather the optimizer states relevant to its parameters to make an optimization step. This reduction in redundancy helps conserve memory.

💡

In case of the Adam, which holds parameters at roughly twice the model size, sharding the optimizer state among 8 ranks means each rank stores only one quarter (2/8) of the total state size.

Shard gradients (ZeRO 2)

We shard optimizer states. Now, we will modify the optimizer step to shard gradients too. If one rank has optimizer states for a portion of parameters, then we will:

aggregate all gradients relevant to the states the rank holds
calculate optimization step
send optimization step for a portion of parameters to all other ranks

As you noticed, now each rank does not need to hold a full replica of gradients. We can send gradients to a relevant rank as soon as they are available. So, we can reduce peak memory consumption even further.

Shard model parameters (ZeRO 3)

This is about to be epic.

Why do we need to store a full copy of the model on each rank? Let's shard model parameters between all ranks. Then, we're going to fetch the required parameters just in time during forward and backward.

💡

In case of large models, these optimisations can drammaticaly decrease memory consumption

How to use FSDP?

Quite simple actually. All we need is to wrap the model with FSDP:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP


model = FSDP(model)

# it's critical to get parameters from the wrapped model
# as only a portion of them returned (sharded part)
optimizer = optim.Adam(model.parameters())

# consuct training as usual
train(model, optimizer)

You can also specify the sharding strategy of FSDP. For example, we can select the SHARD_GRAD_OP strategy to achieve behaviour similar to that of ZeRO2. You can learn about other strategies here:

FullyShardedDataParallel — PyTorch 2.3 documentation

Also, you can wrap with FSDP submodules. In the example above, only one FSDP module is used, which will reduce computation efficiency and memory efficiency. The way it works is that, suppose your model contains 100 Linear layers. If you do FSDP(model), there will only be one FSDP unit which wraps the entire model. In that case, the allgather would collect the full parameters for all 100 linear layers, and hence won’t save CUDA memory for parameter sharding.

You can wrap submodules explicitly or define an auto-wrap policy. To learn more about FSDP, read the PyTorch guide:

Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 2.3.0+cu121 documentation

Magic speedup with `torch.compile`

That is, torch compile can speed up your code by several percent by just enabling it.

Torch traces your execution graph and tries to compile it into an efficient format so that the model can be executed almost without Python invocation.

Basic usage is to wrap the model with compile:

import torch

model = torch.compile(model)

This will execute almost instantly. The actual tracing will happen only during the first forward.

It also has a lot of options that are worth to try:

torch.compile — PyTorch 2.3 documentation

💡

Torch compiler is a big feature that will be covered in the next posts!
Stay tuned

Learn more about torch compile here:

Introduction to torch.compile — PyTorch Tutorials 2.3.0+cu121 documentation

Conclusion

This post is in no way complete with explanations. Rather, that is a list of speed-ups that are worth trying straight away. Hope that it was helpful. Feel free to leave a comment!

Consider subscribing

Swift Actors — Common Problems and Tips

Alex Dremov — Tue, 13 Jun 2023 15:32:57 +0300

Swift actors are a powerful tool to address data races and make your code thread-safe. However, it is also quite a sophisticated concept that requires deep understanding.

💡

Check out my introduction to Swift Actors or quick guide to Swift async/await

Conquer Data Races with Swift Actors | Alex Dremov

Unleash the power of Swift concurrency with Actors! Get all the information you need in this comprehensive article

Alex DremovAlex Dremov

Quick Guide to Async Await in Swift | Alex Dremov

Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.

Alex DremovAlex Dremov

Reentrancy: Invalid State Expectations

One of the core actor's features is reentrancy. By allowing calls to the actor's isolated methods while another method awaits for something, actors reduce the time your code spends on waiting for actor availability.

Though, it requires additional considerations about the actor's state. Classic example:

actor Door {
    private var isOpen = false
    
    func open() async {
        isOpen = true
        
        await notifyDoorOpened() // Suspension point
        
        // Mistake! Door could have been closed
        // while notifyDoorOpened was executing
        print("Door is open: \(isOpen)")
    }
    
    func close() {
        isOpen = false
    }
    
    func notifyDoorOpened() async {
        try! await Task.sleep(for: .seconds(1))
    }
}

let door = Door()
Task {
    await door.open()
}
Task {
    await door.close()
}

Door is open: false

So, the first tip is to drop any expectations about the actor's state after an asynchronous call inside it. Explicitly check for conditions you believe to be true.

Reentrancy: Double Computations

An even more common case is when execution enters the same method with the same arguments several times.

For example, let's suppose that actor performs heavy data loading inside one of its methods. But we don't want heavy data to be loaded each call, so we implement simple caching:

import Foundation

actor ActivitiesStorage {
    var cache = [UUID: Data?]()
    
    func retrieveHeavyData(for id: UUID) async -> Data? {
        if let data = cache[id] {
            return data
        }
        
        // ...
        
        let data = await requestDataFromDatabase(for: id) // suspension
        cache[id] = data
        
        return data
    }
    
    private func requestDataFromDatabase(for id: UUID) async -> Data? {
        print("Performing heavy data loading!")
        try! await Task.sleep(for: .seconds(1))
        // ...
        return nil
    }
    
}

let id = UUID()
let storage = ActivitiesStorage()

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

But our caching is useless as data is loaded twice anyways. We deal with data race:

Performing heavy data loading!
Performing heavy data loading!

At this point, you already see that this is due to the actor's reentrancy. The cache is not set until data is loaded, allowing the following heavy loadings.

Let's use mutexes! (no, please don't)

To fix this problem we can explicitly "subscribe" to single heavy data loading and return it when it is available:

import Foundation

actor ActivitiesStorage {
    var cache = [UUID: Task]()
    
    func retrieveHeavyData(for id: UUID) async -> Data? {
        if let task = cache[id] {
            return await task.value
        }
        
        // ...
        
        let task = Task {
            await requestDataFromDatabase(for: id)
        }
        
        // Notice that it is set before `await`
        // So, the following calls will have this task available
        cache[id] = task
        return await task.value // suspension
    }
    
    private func requestDataFromDatabase(for id: UUID) async -> Data? {
        print("Performing heavy data loading!")
        try! await Task.sleep(for: .seconds(1))
        // ...
        return nil
    }
    
}

let id = UUID()
let storage = ActivitiesStorage()

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

Task {
    let data = await storage.retrieveHeavyData(for: id)
}

As you see, we use a task to delay await inside an actor, allowing us to set the cache before the suspension. Now, only one call to heavy data is performed.

💥

Using tasks inside actors to delay await is a powerful feature!

@MainActor Overuse

Marking your methods or classes with @MainActor results in the code inside them running on the main thread. It is useful for UI-related code as UI updates must happen on the main thread.

However, overusing @MainActor slows down your concurrent code a lot as it will be running only in one thread, freezing your UI frequently.

To not fall into this trap, do not use @MainActor for the whole class:

@MainActor
class OnboardingViewModel: ViewModel {
	// ...
}

Such use restricts all methods to the main thread, which may be overlooked when adding new methods or functionality.

Use it for specific methods only.

And decompose your methods so that @MainActor methods have as little code as possible, resulting in a low chance of main thread block.

class OnboardingViewModel {
    func performLogIn() async {
        // loading, processing and stuff
        // can be executed on any thread
        
        await updateLogInInformation()
    }
    
    @MainActor func updateLogInInformation() {
        // fast ui updates only
    }
}

Subscribe and don't miss posts!

Use Sendable. Do Not Keep This Information In Mind

The Sendable protocol is a feature added in Swift 5.5 that is used to mark code as safe to be passed across concurrency domains by copying. This means that it is safe to execute Sendable code concurrently.

Before that, you had to keep in mind which classes and closures are thread-safe and which are not. Now, you can explicitly state this by conforming to the Sendable protocol

final class FoodData: Sendable {
    // ...
    
    func addFood(foodFactory: @Sendable () -> Food) {
        // ...
    }
}

In the code above, we say that FoodData methods are safe to be called without synchronization. Also, foodFactory closure is marked with @Sendable which also means that it can be safely called from different concurrent contexts.

💥

Moreover, if you use Sendable, Swift automatically checks that your code is actually thread-safe. That's cool as you cannot introduce unsafe code by accident as your code will not compile.

You can take one step further and set SWIFT_STRICT_CONCURRENCY build setting to complete. In this mode, the swift compiler will not tolerate any thread-unsafe code it detects.

Do Not Ignore Nonisolated Keyword

Nonisolated methods do not mutate or access the actor's isolated state, therefore they do not require the actor's isolated execution. Use them to decompose actors' isolated methods into smaller methods. Actors' code must be readable too

Continue Reading About Swift & iOS

Alex Dremov | iOS

One of my favourites. Here I write about Swift and iOS development. It is noticeable that I mainly focus on iOS development right now.

Alex Dremov

I Contributed to PyTorch. Here's What I Learned

Alex Dremov — Mon, 20 Mar 2023 19:23:35 +0300

The Issue Must Not Be That Bad

That's what I thought when I encountered a PyTorch problem during one of my college assignments. Jupyter kernel was dying because of some bug in the LSTM implementation for MPS.

💡

MPS (Metal Performance Shaders) is an acceleration backend for MacOS that utilizes GPU for computations

After a quick investigation, I discovered that this happens because of the batch_first flag. MPS's backend did not work correctly with it and crushed the entire kernel.

"Easy fix"
P.S. After that phrase, Alex spend the next two days fixing what looked like an "easy fix"

PR was merged pretty quickly. Thanks, PyTorch team, for that! And the story could've ended here, but I discovered a funny detail in MPS tests.

@unittest.skipIf(True, "Backward of lstm returns wrong result")
def test_lstm_2(self, device="mps", dtype=torch.float32)

And LSTM was really bad. It got a whole lot worse score than when trained on CUDA or CPU.

It Was Bad. Really Bad

It turned out that LSTM on MPS was completely broken. The forward pass had a bug with the batch_first flag and hidden cell initialization.

Backward pass used first layers weights for the last layers, mixing up all gradients. It did not calculate gradients for hidden states. And my favourite: the backward function returned initialized with garbage tensors, screwing up all subsequent training. It was a mess that I kept investigating for several days.

Eventually, I fixed LSTM and its tests in a massive PR, ensuring that it is consistent with the CPU.

What I Learned

Big projects also have garbage code. Broken implementation lived in stable releases for almost a year, generating several related GitHub issues.
Contributing to a big project is fun and challenging. And it eventually helps a lot of developers, which keeps me warm during cold winter nights. Specifically, contributing to PyTorch is extremely simple. Thanks, PyTorch team, for arranging that!
Deploying untested code that looks right is extremely dangerous. I listed pretty severe mistakes that I found scrutinizing LSTM sources for several days. There's no way they could have been discovered without extensive testing. Even though the issues were severe, they were also subtle. The code looked right.

Finally

I was able to complete the college PyTorch assignment even though it required rewriting PyTorch's LSTM MPS implementation. Consider also solving open issues of your favourite framework or project. At the end of the day, it is a lot more fun than Leetcode problems.

Subscribe and don't miss posts!

See My Work

[MPS] Fix LSTM backward and forward pass by AlexRoar · Pull Request #95137 · pytorch/pytorch

Fixes #91694Fixes #92615Several transpositions were missing for backward graph in case of batch_first=True. The #91694 is not reproduced with batch_first=False.After fixing transpose issue, I fi...

GitHubpytorch

[MPS] Fix bidirectional LSTM & small one-direction LSTM fix by AlexRoar · Pull Request #95563 · pytorch/pytorch

Fixes #94754With this PR I hope to finish my breathtaking journey of fixing MPS LSTM.Here, I enable bidirectional on MPS. Also, I’ve noticed that cache key did not account for all parameters, so ...

GitHubpytorch

[MPS] LSTM grad_y missing fix by AlexRoar · Pull Request #96601 · pytorch/pytorch

Fixes #96416Added tests that do not use LSTM output simalarly to the issueSeems like this fix once again introduces backward incompatibility.

GitHubpytorch

[MPS] LogSoftmax numerical stability by AlexRoar · Pull Request #95091 · pytorch/pytorch

Fixes #94043Calculations are now consistent with numericaly stable formula and CPU:$LogSoftmax(X, \dim) = X - \max(X, \dim) - \log(sum(X - \max(X, \dim), \dim))$@malfet

GitHubpytorch

Conquer Data Races with Swift Actors

Alex Dremov — Tue, 07 Feb 2023 22:08:18 +0300

Mobile development is close to impossible without concurrent code. While executing tasks concurrently generally speeds up your app, it also introduces a lot of challenges to overcome. And one of them is a data race.

Data Races And When They Happen

Try to find a problem in the code below

import Foundation

var counter = 0
let queue = DispatchQueue.global()

for _ in 1...100500 {
    queue.async {
        counter += 1
    }
}

queue.sync(flags: .barrier) {
    // Synchronous barrier to wait untill all
    // async tasks are finished
    print("Final value: \(counter)")
}

This does not output 100500 as desired

Final value: 100490

Let me run the same code one more time.

Final value: 100486

Voilà

As you see, the same code produces different results. In this case, we deal with a data race.

💡

Data races occur when multiple threads access a shared resource without protections, leading to undefined behaviour

In the code above, asynchronous tasks capture counter and modify it simultaneously. This leads to undefined behaviour.

What's under the hood?

The reasoning behind such behaviour is in assembly operations. Before incrementing the value, it is loaded from RAM into the processor's register. At the same time, other threads can increment the value and save it back to RAM. But the thread that saved value from memory to register will not know about it and will continue to work with the old value, eventually overwriting the updated value in RAM

Non-Actor Solutions

Before the introduction of actors, several solutions to the problem were used.

Serial Queue

We can create a dedicated queue that will be used during all accesses to the counter. Internally, tasks execute serially, so no data races occur.

import Foundation

var counter = 0
let queue = DispatchQueue.global()

// Serial queue
let counterAccessQueue = DispatchQueue(label: "CounterAccessQueue")

for _ in 1...100500 {
	queue.async {
		counterAccessQueue.sync { counter += 1 }
	}
}

queue.sync(flags: .barrier) {
	counterAccessQueue.sync { print("Final value: \(counter)") }
}

Concurrent Queue With Barrier

It's possible to use sync with barrier parameter to modify value even in concurrent queue. Basically, the barrier waits until all previous tasks are completed, then it executes code synchronously, and after that queue continues to operate as usual.

In the current example, it basically transforms concurrent queue to serial, but still, it's a different approach.

import Foundation

var counter = 0
let queue = DispatchQueue.global()

for _ in 1...100500 {
	queue.sync(flags: .barrier) {
		counter += 1
	}
}

queue.sync {
	print("Final value: \(counter)")
}

Subscribe and don't miss posts!

Actors Model

The actor model is an architecturally different approach. Consider actors as classes with additional restrictions. Ideologically, code inside actors cannot be executed concurrently, therefore actors can safely modify their state.

In the world of chaos (concurrent) consider actors as a safe space

Also, other instances cannot modify the actor's state from the outside. Thus, ensuring the safety of accesses.

💥

All in all, actors let you safely share information between concurrent contexts

Using Actors in Swift

Luckily, we do not need to implement the actor model ourselves. Starting from Swift 5.7, actors are available as part of Swift concurrency.

Actors are defined with actor keyword.

actor Counter {
	private(set) var counter = 0
    
	func increment() {
		counter += 1
	}
}

💡

Like classes, actors are reference types

Generally, all access to actors may be suspended and require await keyword.

💥

If you're unfamiliar with Swift concurrency, check out my quick guide!

Quick Guide to Async Await in Swift | Alex Dremov

Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.

Alex DremovAlex Dremov

Now, according to the defined model, an actor represents an isolated state. Therefore, we cannot directly execute code inside the actor or change its state because some other task can already be changing the actor's state.

We want to mitigate data races!

let counter = Counter()
let queue = DispatchQueue.global()

// Used only to wait for all tasks to complete
let group = DispatchGroup()

for _ in 1...100500 {
    group.enter()
    
    queue.async {
    	// async calls can be executed only in
        // appropriate concurrent environment, so
        // we spawn a new task
        Task.detached {
            await counter.increment()
            group.leave()
        }
    }
}

group.wait()
Task {
    print("Final value: \(await counter.counter)")
}

As you see, all calls to methods of Counter and even to its properties are asynchronous and marked with await keyword.

💡

Notice that await is not needed inside the actor's method. That's because the actor's methods are already inside an isolated state

Nonisolated Members

All members of actors are by default isolated. Actors also can have non-isolated members. Access to them is the same as if actor was a regular class. Notice, though, that nonisolated methods cannot directly access isolated members.

😡

Stored non-constant properties cannot be nonisolated

💡

Constant properties ( let ) are nonisolated by default, as they cannot provoke a data race

actor Counter {
    let id = UUID()
    private(set) var counter: Int = 0
    
    private nonisolated var description: String {
        "Counter"
    }
    
    func increment() {
        counter += 1
    }
    
    nonisolated func getDescription() -> String {
        return description
    }
}

...

print(counter.getDescription()) // no await
print(counter.id) // no await

Difference to Locks

One may ask

How's it different from taking a lock before executing code inside an actor and releasing a lock on an exit?

The difference is noticeable if actor itself runs asynchronous operations inside it. For example, if it messages another actor.

Take a look

actor Ping {
    let pong = Pong()
    
    func run() async {
        print("ping!")
        await pong.run() // Suspension point
        
        // While pong.run() is waited, other tasks
        // can enter this actor
    }
}

actor Pong {
    func run() async {
        try! await Task.sleep(for: .seconds(1)) // sleeping a bit
        print("pong!")
    }
}

let ping = Ping()
Task {
    await ping.run()
}

Task{
    await ping.run()
}

This code outputs

ping!
ping!
pong!
pong!

Notice that another actor is also called using await keyword. I marked this place as a suspension point. The current task is suspended while waiting for an asynchronous task, so the actor is free for entrance again.

That's the core difference to a simple mutex or lock, and it is called Actor Reentrancy. Some consider this a problem. However, it is an awesome optimization at expense of complicating code a bit.

💥

Mind about actor reentrancy! It is incorrect to make assumptions about an actor's state after an await call inside an actor

actor Door {
    private var _open = false
    
    func open() async {
        _open = true
        
        await someTask() // Suspension point
        
        // Mistake! Door could have been closed
        // while someTask was executing
        print("Door is open")
    }
    
    func close() {
        _open = false
    }
}

Luckily, suspension points are all marked with await keyword, so it is easy to keep track of them

Final Notes

Actors are a great solution to data races. They nicely integrate into Swift concurrency. Keep in mind, though, that actor reentrancy must be taken into account to avoid incorrect state assumptions.

References

Apple Developer Documentation

Concurrency — The Swift Programming Language (Swift 5.7)

Swift.orgApple Inc.

Dive into Swift's Memory Management

Alex Dremov — Sun, 08 Jan 2023 22:33:03 +0300

In this post, I'll explore how Swift's memory management works under the hood, and how the memory modifiers: unowned and weak, affect an object's lifetime. You'll get a deeper understanding of how Swift manages objects' lifetime internally.

💥

Swift memory management is one of the basic interview questions. It was asked in every iOS developer interview I've ever been to

Memory Management

For example, in C, only the developer is in charge of deallocating unused objects. This can lead to memory leaks, double deallocations, or the use of invalid memory areas.

We don't want this.

Swift uses automatic reference counting (ARC) under the hood to deduce objects' lifetime and automatically deallocate unused objects. Swift has three different types of the reference count. They count how many other instances use an object. And when it is not needed, it is deallocated.

💡

This guide will progress from a general overview to the internals of ARC. Even if you're familiar with Swift's memory management, there's a high chance that you will learn something new

Strong Reference

The counter that is responsible for deallocation is a strong reference counter (RC). The strong RC counts strong references to the object. When the strong RC reaches zero the object is deinited.

A strong reference is just a regular object usage. Creating a variable, or a constant, or saving a reference to an object in another object's property — they all create a strong reference.

Why a developer should even care about reference counting? Seems like a low-level implementation detail that is not important. But actually, it's crucial.

Take a look at this example

class Person {
    let name: String
    init(name: String) { self.name = name }
    var apartment: Apartment?
    deinit { print("\(name) is being deinitialized") }
}

class Apartment {
    let unit: String
    init(unit: String) { self.unit = unit }
    var tenant: Person?
    deinit { print("Apartment \(unit) is being deinitialized") }
}

var john: Person? = Person(name: "John Appleseed")
var unit4A: Apartment? = Apartment(unit: "4A")

john!.apartment = unit4A // Person -> Apartment: strong reference
unit4A!.tenant = john // Apartment -> Person: strong reference

john = nil // Person is no longer needed
unit4A = nil // Apartment is no longer needed

In the above example, the Person and Apartment objects have a strong reference to each other, creating a retain cycle. As a result, when you set both john and unit4A to nil, the deinitializers are not called and the objects are not deallocated.

💥

This situation is called a memory leak. In Swift, it occurs only in the case of a retain cycle. Two objects depend on each other and they will never be deallocated.

That's where memory management modifiers come in handy.

Weak Reference

One of the solutions to the problem of a retain cycle is a weak reference. It is created using the weak modifier like that:

let person = Person(name: "John Appleseed") // person is a strong reference
weak var weakPerson = person // weak reference to the same object

Weak var always has an optional type and cannot be constant (let). That's because the object can be deallocated while it is still referenced by a weak variable. In this case, the variable is automatically set to nil.

💡

Consider weak reference like the one that needs an object but can go on correctly without it (using nil), allowing it to deallocate when nobody else needs it

Let's take a look at the solution to the problem above using the weak modifier:

class Person {
    let name: String
    init(name: String) { self.name = name }
    var apartment: Apartment?
    deinit { print("\(name) is being deinitialized") }
}

class Apartment {
    let unit: String
    init(unit: String) { self.unit = unit }
    
    weak var tenant: Person?
    
    deinit { print("Apartment \(unit) is being deinitialized") }
}

var john: Person? = Person(name: "John Appleseed")
var unit4A: Apartment? = Apartment(unit: "4A")

john!.apartment = unit4A // Person -> Apartment: strong reference
unit4A!.tenant = john // Apartment -> Person: weak reference

john = nil
unit4A = nil

Now, retain cycle is no longer here. At first, the Person object is deallocated because it has no strong references to it. Then, the Apartment object is deallocated.

No memory leak!

That's it. That is how you break retention cycles in Swift. There is one more modifier that can help you with that.

Unowned Reference

An unowned reference is very similar to a weak reference cause it also does not increase a strong reference count. The difference is that it's up to the developer to not use an invalid object.

Unowned variables can be constant or non-optional. When an object is deallocated, ARC does not set the unowned reference’s value to nil. However, if you try to access a deallocated object, you will catch a runtime error.

😡

Use an unowned reference only when you are sure that the reference always refers to an instance that has not been deallocated

Here's a similar example:

class Customer {
    let name: String
    var card: CreditCard?
    init(name: String) {
        self.name = name
    }
    deinit { print("\(name) is being deinitialized") }
}

class CreditCard {
    let number: UInt64
    unowned let customer: Customer
    init(number: UInt64, customer: Customer) {
        self.number = number
        self.customer = customer
    }
    deinit { print("Card #\(number) is being deinitialized") }
}

var john: Customer? = Customer(name: "John Appleseed")
john!.card = CreditCard(number: 1234_5678_9012_3456, customer: john!)

john = nil // No retain cycle, both objects are deallocated

Subscribe and don't miss posts!

Three Reference Counters

So, how does all this magic works inside? Swift sources have an amazing detailed description of all processes under the hood.

The strong RC counts strong references to the object. When the strong RC reaches zero the object is deinited, unowned reference reads become errors, and weak reference reads become nil. The strong RC is stored as an extra count: when the physical field is 0 the logical value is 1.

The unowned RC counts unowned references to the object. The unowned RC also has an extra +1 on behalf of the strong references; this +1 is decremented after deinit completes. When the unowned RC reaches zero the object's allocation is freed.

The weak RC counts weak references to the object. The weak RC also has an extra +1 on behalf of the unowned references; this +1 is decremented after the object's allocation is freed. When the weak RC reaches zero the object's side table entry is freed.

But what is a side table and why is it needed?

💥

What's side table is another popular interview question, usually more advanced

Side Table

An object conceptually has three refcounts. These refcounts are stored either "inline" or in a "side table entry" pointed to by the internal field. You cannot access these fields from Swift directly

class User {
	var id: Int
    var name: String
    
    init(id: Int, name: String) {
    	self.id = id
        self.name = name
    }
}

let user = User(id: 0, name: "John")

💡

Remember that unowned has +1 on behalf of strong reference and weak has +1 on behalf of unowned references

Objects initially start with no side table. They can gain a side table when a weak reference is formed.

Gaining a side table entry is a one-way operation; an object with a side table entry never loses it. This prevents some thread races.

weak var weakUser = user // Side table implicitly created

A side table is created

Strong and unowned variables point at the object. Weak variables point at the object's side table.

This idea is fundamental to understanding how weak references work. By pointing not to the object but to the side table, the object itself can be deinitialized and fully deallocated.

Weak and Unowned. Deep Differences

Now, by looking at the implementation we can notice important differences between weak and unowned.

Performance

Using unowned introduces less overhead than using weak. That's because weak variables reference the object through a side table. This means that there's one more pointer hop to reach the object.

Unowned references point directly to the object, so they do not have such overhead.

Deallocation vs deinitialization

According to the sources, when the strong RC reaches zero the object is deinited. And when the unowned RC reaches zero the object's allocation is freed.

That means that object memory is not available for realocation until all unowned references disappear.

💥

If an object holds a large amount of memory, its memory will not be available until the last unowned reference disappear.

If lack of memory is a problem, consider using weak reference because it allows objects to be fully deallocated even when there are alive weak references.

Common Problems

The example of Person and Apartment retain cycle can be trivial. It's important to know about common cases when retain cycle appears.

Closures, strong capture, and self

By default, a closure expression captures constants and variables from its surrounding scope with strong references to those values.

As we've already noted, uncontrollable strong references may create a retain cycle. An escaping closure that refers to self needs special consideration if self refers to an instance of a class. Capturing self in an escaping closure makes it easy to accidentally create a strong reference cycle.

For example:

class Person {
  var name: String
  var voice: Voice? = nil

  init(name: String) {
    self.name = name
    self.voice = Voice {
      print("I'm \(self.name)")
    }
  }
  func say() { voice?.say() }
  deinit {
    print("Person deallocated")
  }
}

class Voice {
  var say: () -> ()
  init(say: @escaping () -> ()) { self.say = say }
  deinit {
    print("Voice deallocated")
  }
}

var person: Person? = Person(name: "Alex")
person!.say()

person = nil

Which outputs only this line — without deinit prints

My name is Alex

What's going on here? Let's draw a strong references graph:

Retain cycle with closure

And, as expected, there is a pretty notable strong reference cycle. The problem is in the creation of the Voice instance:

self.voice = Voice {
	print("My name is \(self.name)")
}

Here, self is captured with a strong reference to the escaping closure. To solve that, we can capture self with the weak modifier:

self.voice = Voice {[weak self] in
	guard let self = self else { return; }
	print("My name is \(self.name)")
}

With such modification, we receive an expected output:

My name is Alex
Person deallocated
Voice deallocated

💥

Do not use weak self when it is not needed. Remember that strong reference is required so that object is not deallocated before it is needed.

Final notes

If you want to achieve an even deeper understanding of ARC internals, definitely check the ARC source code. You can start with this amazing description of an object's lifetime state machine.

swift/RefCount.h at 3bac57d9ac20eb9a6e41fd3c32e8d6fb23e37a47 · apple/swift

The Swift Programming Language. Contribute to apple/swift development by creating an account on GitHub.

GitHubapple

Hope that this post was helpful to you. Feel free to leave a comment or to reach me through my social nets!

References

swift/RefCount.h at main · apple/swift

The Swift Programming Language. Contribute to apple/swift development by creating an account on GitHub.

GitHubapple

Memory Management in Swift: Understanding Strong, Weak and Unowned References

Behind all the coding that we are doing, you probably have noticed some of your variables with the reference of strong, weak or unowned…

AppCoda TutorialsAppCoda

Automatic Reference Counting — The Swift Programming Language (Swift 5.7)

Swift.orgApple Inc.

Expressions — The Swift Programming Language (Swift 5.7)

Swift.orgApple Inc.

Data Binding in SwiftUI: Tips, Tricks, and Best Practices

Alex Dremov — Fri, 30 Dec 2022 16:16:07 +0300

Are you building an app with SwiftUI and wondering how to manage your app's state? Data binding is a powerful tool that can help you build dynamic and responsive interfaces.

In this tutorial, we'll explore how to use @State, @ObservedObject, and @EnvironmentObject.

What is data binding in SwiftUI?

Data binding connects UI element to a piece of data in your app. When the data changes, the UI element automatically updates to reflect the new value, and when the user interacts with the element, the data updates to reflect the new input.

SwiftUI provides several tools for data binding: @State, @ObservedObject, and @EnvironmentObject. These tools allow you to bind values, objects, and even global objects to your user interface.

How to use @State to bind a simple value to your user interface

@State is a property wrapper that allows you to bind a simple value, like a string or an integer, to your user interface.

💥

Strictly, @State can be used to bind value-type objects only. So, any struct also can be binded using @State.

To use @State, you first define a property with the @State wrapper, and then use the property in your user interface as a usual. For example, here's how you might use @State to bind a string to a text field:

struct ContentView: View {
    @State private var name: String = ""
    
    var body: some View {
        VStack {
            TextField("Enter your name", text: $name)
            Text("Hello, \(name)!")
        }
    }
}

You may notice that $name is used. It allows to access projectedValue of the wrapper. In case of @State it is Binding.

Now, whenever name is changed, the UI updates automatically. And when the user modifies the text field, variable data gets updated too.

Using @Binding

@Binding is used when you want to bind a value or object that is owned by a different view.

To use @Binding, you first define a property with the @Binding wrapper, and then pass the binding to another view as an argument. The other view can then use the binding to read and write the data from the original view.

struct CustomTextField: View {
    @Binding var text: String
    
    var body: some View {
        HStack {
            Image(systemName: "person.circle")
            TextField("Enter your name", text: $text)
        }
        .padding()
    }
}

struct ContentView: View {
    @State private var name: String = ""
    
    var body: some View {
        VStack {
            CustomTextField(text: $name)
            Text("Hello, \(name)!")
        }
    }
}

You also can pass binding in init using direct access to property wrapper through underscore.

struct CustomTextField: View {
    @Binding var text: String
    
    init(text: Binding) {
        self._text = text
    }
    
    var body: some View {
        HStack {
            Image(systemName: "person.circle")
            TextField("Enter your name", text: $text)
        }
        .padding()
    }
}

💥

You can view @Binding as a channel that gets value from the source and sets value to the source. It does not own an object.

Therefore, @Binding is great for the view decomposition as it allows to inject dependencies to subviews.

Read more about modular app architecture with SwiftUI in my previous post:

iOS App As a Microservice. Using SwiftUI in Modular App

The modular architecture is excellent. But how to implement it effectively with SwiftUI? From its core, SwiftUI is state-driven, and it can be tricky to modularize an app and define exact responsibility borders.

Alex DremovAlex Dremov

How to use @ObservedObject to bind a class to your user interface

@ObservedObject allows you to bind a class to your user interface. The class must conform to the ObservableObject protocol and use the @Published property wrapper for any properties that you want to bind to your user interface. When the object's @Published properties change, the user interface updates.

Here's an example of how you might use @ObservedObject to bind a User object to a form:

class User: ObservableObject {
    @Published var name: String = ""
    @Published var email: String = ""
    
    var someUntrackedValue = ""
}

struct ContentView: View {
    @ObservedObject private var user = User()
    
    var body: some View {
        VStack {
            TextField("Enter your name", text: $user.name)
            TextField("Enter your email", text: $user.email)
            Text("Hello, \(user.name)!")
        }
    }
}

In this example, the user property is bound to the text fields using the $user.name and $user.email syntax. When the user types in the text fields, the name and email properties of the User object update to reflect the new input, and the Text view updates to show the new value.

💥

Mind that if you publish a reference type in ObservableObject, then changes inside it will not be propagated.

How to use @EnvironmentObject to bind a global object to your user interface

EnvironmentObject allows you to bind a global object. The object must conform to the ObservableObject protocol the same way as with @ObservedObject.

iOS App As a Microservice. Using SwiftUI in Modular App

Alex Dremov — Wed, 19 Oct 2022 16:00:23 +0300

In this post, I will describe features of SwiftUI that work well in modular design and those that are better to avoid.

💥

This is the third and the last post in the series on a modular architecture.Check out the previous issues to boost your understanding of critical concepts!

iOS App As a Microservice. Build Robust App Architecture

What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?

Alex DremovAlex Dremov

iOS App As a Microservice. Modularize Your App With Tuist

This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist

Alex DremovAlex Dremov

What's The Problem

Why SwiftUI use in modular design is different, and why do I need a whole new post for it? As I already mentioned, SwiftUI is state-driven and trying to avoid that leads to ineffective and messy solutions.

For example

Let's suggest that you have settings and homepage modules. Users can log out on the settings screen and your app needs to handle this case correctly. The first intent is to pass a closure to the settings module that will be called on the logout button press. Sounds reasonable, right?

Ok, but how does it connect with SwiftUI? Notice that handling action does not necessarily mean that there will be a change in state. There is a logical change, though. But how can SwiftUI know about that?

💡

State-driven means that views are a function of the state. So, the only way to update the view is to change its state.

Data Flow

Apple released a nice presentation on WWDC19 about the role of data in SwiftUI. The presentation covers cases where @Binding, @EnvironmentObject, etc. are the most applicable.

Apple WWDC19 — Swift Data Flow

But also the crucial point is made — the view is not the result of a sequence of events, but rather a representation of data or state. It's also essential where this data comes from. There should be a single source of truth.

Data Flow Through SwiftUI - WWDC19 - Videos - Apple Developer

SwiftUI was built from the ground up to let you write beautiful and correct user interfaces free of inconsistencies. Learn how to connect...

Apple Developer

Keeping this in mind, let's move on to the first tip that will solve the issue proposed in the "problem" section of this article.

Use Data Flows and Not Callbacks

The problem with handling the logout action is in the word handle itself. There is no explicit change in state and it's unknown who's responsible for changing the state if it is even defined.

So, if SwiftUI is state-driven, let's define the source of truth for this state. It must be a variable that stores the current logged-in / logged-out state. Depending on the state's complexity, it can be a bool, enum, or struct.

Singleton or global state? No.

💥

As described in previous posts, dependencies should be explicit.In this case, the logged-in / logged-out variable should be passed as a dependency to the settings module and to the homepage module.

But we need to listen for changes in this variable and update views respectively. Also, it's bad if every module can change this variable. There should be restrictions on which module can modify state and which can only read.

SwiftUI + Combine. It's a Match

You may already know that SwiftUI automatically listens for ObservableObject changes and updates views when something is changed. So, we can create such a class:

class LogInState: ObservableObject {
    @Published var isLoggedIn: Bool
    
    init(isLoggedIn: Bool) {
        self.isLoggedIn = isLoggedIn
    }
    
    func loggedOut() {
        isLoggedIn = false
    }
    
    func loggedIn() {
        isLoggedIn = true
    }
}

It later can be injected into a SwiftUI view as simple as that

struct MyView: View {
	@ObservedObject var logInState: LogInState

	var body: some View {
    	Text(logInState.isLoggedIn ? "Yes" : "No")
    }
}

...
let logInState = LogInState(isLoggedIn: true)
HomePageModule(logInState: logInState)
...
SettingsModule(logInState: logInState)

Don't you think that creating such a distinct class for every state is bad? It may be fine for complex data types, but definitely not for a single boolean value.

Also, notice that both HomePageModule and SettingsModule can change the state. What if you have many more modules that depend on logInState? They all could change it!

💥

If every part of your app can hypothetically change the shared state, then if a bug arises, you start playing an amazing game"Who the hell changed this value?"

Subscribe and don't miss posts!

Better Combine Use

Ok, we've solved the problem with callbacks. Though we still have a problem with the boilerplate code needed to define a new ObservableObject, and a problem with state modification privileges.

We can solve those by creating a custom ObservableObject!

💡

You also can use third-party reactive frameworks, but I will cover implementation using Combine as it seamlessly integrates with SwiftUI

To use SwiftUI's automatic listening to updates, we need to conform to ObservableObject. Here's a generic class to make any type observable. It also utilizes @propertyWrapper and @dynamicMemberLookup features.

import Foundation
import Combine

@dynamicMemberLookup
@propertyWrapper
public class ObservableProperty: ObservableObject {
    @Published private var storedValue: Output
    
    public var wrappedValue: Output {
        get {
            storedValue
        }
        set {
            storedValue = newValue
        }
    }
    
    public init(wrappedValue initialValue: Output) {
        self.storedValue = initialValue
    }
    
    public subscript(dynamicMember keyPath: WritableKeyPath) -> Result {
        get {
            storedValue[keyPath: keyPath]
        }
        set {
            storedValue[keyPath: keyPath] = newValue
        }
    }
    
    public subscript(dynamicMember keyPath: KeyPath) -> Result {
        storedValue[keyPath: keyPath]
    }
}

It can be used as simply as that

struct MyView: View {
    @ObservedObject
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty) {
        self._logInState = .init(initialValue: logInState)
    }
    
    var body: some View {
        VStack {
            Text(logInState ? "Yes" : "No")
            Button("toggle") {
                logInState = !logInState
            }
        }
    }
}

💥

However, ObservableProperty works with value types only. Passing reference types will not trigger updates

Restrict Modules To Read-Only Variables

In the example above, MyView can modify the value. But how we can restrict it to read-only mode? We can create a similar class that will prohibit modification

@dynamicMemberLookup
@propertyWrapper
public class ObservableValue: ObservableObject {
    @Published private var storedValue: Output
    public var wrappedValue: Output {
        storedValue
    }

    public var value: Output {
        storedValue
    }

    public init(wrappedValue initialValue: Output) {
        fatalError("ObservableValue cannot be initialized with value. Use constant()")
    }

    init>(initialValue: Output, publisher: Pub) {
        storedValue = initialValue
        publisher.assign(to: &$storedValue)
    }

    public subscript(dynamicMember keyPath: WritableKeyPath) -> Result {
        get {
            storedValue[keyPath: keyPath]
        }
        set {
            storedValue[keyPath: keyPath] = newValue
        }
    }

    public subscript(dynamicMember keyPath: KeyPath) -> Result {
        storedValue[keyPath: keyPath]
    }

    public static func constant(initialValue: Output) -> ObservableValue {
        .init(
            initialValue: initialValue,
            publisher: Empty()
        )
    }

    public var publisher: Published.Publisher {
        $storedValue
    }
}

Then, we can add projectedValue to ObservableProperty to create ObservableValue from it.

public class ObservableProperty: ObservableObject {
	...
    public var publisher: AnyPublisher {
        $storedValue.eraseToAnyPublisher()
    }
    
    public var projectedValue: ObservableValue {
        ObservableValue(
            initialValue: storedValue,
            publisher: publisher
        )
    }
	...
}

Great!

Now we can create an observable source of truth, and pass it to modules, restricting some of them to read-only mode. Check out the example:

struct ReadOnlyModule: View {
    @ObservedObject
    @ObservableValue
    var logInState: Bool
    
    init(logInState: ObservableValue) {
        self._logInState = .init(wrappedValue: logInState)
    }
    
    var body: some View {
        Text(logInState ? "Yes" : "No")
    }
}

struct ModifyModule: View {
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty) {
        self._logInState = logInState
    }
    
    var body: some View {
        Button("toggle") {
            logInState = !logInState
        }
    }
}

struct MyView: View {
    @ObservableProperty
    var logInState: Bool
    
    init(logInState: ObservableProperty) {
        self._logInState = logInState
    }
    
    var body: some View {
        VStack {
        	// projected read-only value (ObservableValue)
            ReadOnlyModule(logInState: $logInState)
            
            // ObservableProperty reference
            ModifyModule(logInState: _logInState)
        }
    }
}

So, the callbacks problem is solved and we can move on to the next idea.

Do Not Use EnvironmentObjects

Yes, I'm this definite about it. Environment objects in their core are global variables that create implicit dependencies. Also, they are easily overlooked and can produce unexpected crashes when not set.

Apart from that, you can't set two environment objects of the same type and it results in messy decisions and code modifications.

And the third reason is that they simply don't work with dependency inversion. You cannot hide the environment object behind the protocol as only ObservableObject can be passed as an environment object.

SwiftUI is trying to introduce ways for implementing programmatic navigation, but it is not ready yet. Though, it's essential for modular architecture because of loose coupling.

There are frameworks that can be used to achieve that. I have a post on this topic. Check it out!

SwiftUI Navigation Is a Mess. Here’s What You Can Do

Managing navigation in pure SwiftUI is hard and leads to messy solutions. In this post, I will show you how you can manage views effectively

Alex DremovAlex Dremov

Alternatively, you can use other open-source solutions. For example, I recently found a similar framework:

GitHub - johnpatrickmorgan/FlowStacks: FlowStacks allows you to hoist SwiftUI navigation and presentation state into a Coordinator

FlowStacks allows you to hoist SwiftUI navigation and presentation state into a Coordinator - GitHub - johnpatrickmorgan/FlowStacks: FlowStacks allows you to hoist SwiftUI navigation and presentati...

GitHubjohnpatrickmorgan

As always, let me know what you think in the comments!

References

Data Flow Through SwiftUI - WWDC19 - Videos - Apple Developer

SwiftUI was built from the ground up to let you write beautiful and correct user interfaces free of inconsistencies. Learn how to connect...

Apple Developer

Apple Developer Documentation

iOS App As a Microservice. Modularize Your App With Tuist

Alex Dremov — Fri, 07 Oct 2022 13:14:04 +0300

Tuist is an excellent command line tool that helps you generate, maintain and interact with Xcode projects.

💥

I covered the core ideas of modular architecture in the previous post. Check it out if you haven't yet!

iOS App As a Microservice. Build Robust App Architecture

What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?

Alex DremovAlex Dremov

What’s next?

In the next and last post in this series, I will cover implementation tips with SwiftUI. Subscribe so you don’t miss it
UPD: now available

iOS App As a Microservice. Using SwiftUI in Modular App

Alex DremovAlex Dremov

Why Tuist?

It encourages you to further code modularization as it provides an elegant way to create separate Xcode projects for different modules, making tight coupling or implicit dependencies less viable

Also, it's great for teamwork. Have you tried to commit an Xcode project to a VCS like GitHub?

It's a mess

Diff of the modified Xcode project is not human-readable. It's simply impossible to trace changes or review a PR. What if you could define the Xcode project in a simple config file? Tuist does that. Moreover, tuist config files are written in Swift.

Our goal

We want to divide our project into separate Xcode projects according to the architecture I proposed in the previous article.

To reiterate, our app will consist of a combination of modules and for every module or feature, we will create a new Tuist project.

💡

Remember that each feature should not depend on other features' implementation. Only interfaces should be public

So, for each feature, we will create several targets corresponding to the feature interface, implementation, and testing or mocking targets if required.

Defining project

💥

Sources for this post are published on GitHub. So, before reading this article you can see how elegant describing a project could be when using Tuist

GitHub - AlexRoar/TuistExample: Using Tuist for modular app architecture

Using Tuist for modular app architecture. Contribute to AlexRoar/TuistExample development by creating an account on GitHub.

GitHubAlexRoar

Structure

Tuist project is a simple folder with config files describing your workspace structure

Your project root
├── Workspace.swift
├── Tuist
│   ├── Config.swift
│   ├── Dependencies.swift
│   └── ProjectDescriptionHelpers
│       └── 
└── modules
    ├── Foo
    │   ├── Project.swift
    │   └── 
    ├── Biz
    │   ├── Project.swift
    │   └── 
    └── ...

But as I said early, each module should have at least an implementation and interface target

💡

There could be modules that contain common tools and that are not dependent on any other module. Then, it might have implementation only

So, let's modify the structure according to that

Your project root
├── Workspace.swift
├── Tuist
│   ├── Config.swift
│   ├── Dependencies.swift
│   └── ProjectDescriptionHelpers
│       └── 
└── modules
    ├── Foo
    │   └── Project.swift
    │       ├── interface
    │       │   └── 
    │       └── src
    │           └── 
    ├── Biz
    │   └── Project.swift
    │       ├── interface
    │       │   └── 
    │       └── src
    │           └── 
    └── ...

Before defining modules, we need to define where Tuist should search for these modules. This can be done in Workspace.swift file

import ProjectDescription

let workspace = Workspace(
    name: "ExampleWorkspace",
    projects: [
        "modules/*"
    ]
)

Subscribe and don't miss posts!

Project file

Tuist defines the Xcode project with a simple Swift file.

// Project.swift
import ProjectDescription
import ProjectDescriptionHelpers

let project = Project(
  name: "ProjectName",
  targets: [
  	...
  ]
)

But this post is not just a review of Tuist

Let's define a project, knowing that we need to have an interface and implementation targets. Also, let's create an enum for feature names so that we don't have to use strings and remember all namings

💡

As config is defined in Swift, you can use the power of suggestions and auto-completion in Xcode while defining your project structure.

For example, Xcode will suggest other modules' names when using enums

With several simple helpers, we could define project structure with Swift's beauty:

import ProjectDescription
import ProjectDescriptionHelpers

let project = Project(
    name: Feature.Foo.rawValue,
    targets: [
        .feature(
            implementation: .Foo,
            dependencies: [
                .feature(interface: .Biz),
                .external(.AsyncAlgorithms)
            ]
        ),
        .feature(
            interface: .Foo,
            dependencies: [
                .feature(interface: .Biz)
            ]
        )
    ]
)

Features are going to be separate frameworks.

💡

All swift files that help to describe tuist configs should be placed in the ProjectDescriptionHelpers folder

public extension Target {
    static func makeFramework(
        name: String,
        sources: ProjectDescription.SourceFilesList,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -> Target {
        Target(
            name: name,
            platform: .iOS,
            product: defaultPackageType,
            bundleId: makeBundleID(with: name + ".framework"),
            sources: sources,
            resources: resources,
            dependencies: dependencies
        )
    }
}

Then, we can define what feature is

public extension Target {
    static func feature(
        interface featureName: Feature,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -> Target {
        .makeFramework(
            name: featureName.rawValue + "Interface",
            sources: [ "interface/**" ],
            dependencies: dependencies,
            resources: resources
        )
    }
    
    static func feature(
        interface featureName: Feature,
        dependencies: [ProjectDescription.TargetDependency] = [],
        resources: ProjectDescription.ResourceFileElements? = []
    ) -> Target {
        .makeFramework(
            name: featureName.rawValue,
            sources: [ "src/**" ],
            dependencies: dependencies,
            resources: resources
        )
    }
}

Finally, we combine modules in an app target. It's defined in the same way

public extension Target {
    static func makeApp(
        name: String,
        sources: ProjectDescription.SourceFilesList,
        dependencies: [ProjectDescription.TargetDependency]
    ) -> Target {
        Target(
            name: name,
            platform: .iOS,
            product: .app,
            bundleId: makeBundleID(with: "app"),
            deploymentTarget: .iOS(targetVersion: "16.0", devices: .iphone),
            sources: sources,
            dependencies: dependencies
        )
    }
}

let project = Project(
    name: "ExampleApp",
    targets: [
        .makeApp(
            name: "ExampleApp",
            sources: [
                "src/**"
            ],
            dependencies: [
                .common,
                .feature(implementation: .Foo),
                .feature(interface: .Foo),

                .feature(implementation: .Biz),
                .feature(interface: .Biz),

                .external(.FoggyColors)
            ]
        )
    ]
)

That's it.

Now we can create different features and state dependencies between them. After that, we simply use tuist generate command and it generates Xcode workspace and Xcode projects for us.

Tuist-generated workspace

Great!

Now we have our project bootstrapped, and it is fully defined in nice Swift files with a clean structure and explicit dependencies. You can add all .xcodeproj and .xcworkspace to gitignore and forget about a mess in GitHub repositories.

💥

Some details are not covered for the brevity of this post. The full example is published on GitHub and do not hesitate to ask me about anything in the comments!

GitHub - AlexRoar/TuistExample: Using Tuist for modular app architecture

Using Tuist for modular app architecture. Contribute to AlexRoar/TuistExample development by creating an account on GitHub.

GitHubAlexRoar

Creating an app with Tuist

I already showed how to define project structure in the examples above. Let's get even more specific and write a simple app that will show a random value in a range.

App Architecture

RandomProvider defines a protocol for generating a random number and several implementations for it

// Interface
public protocol NumberProvider {
    var number: Int { get }
}

// Implementation
public struct NumberProviderZero: NumberProvider {
    public let number = 0
    
    public init() {
        
    }
}

public struct NumberProviderRandom: NumberProvider {
    private let range: ClosedRange
    
    public var number: Int {
        Int.random(in: range)
    }
    
    public init(range: ClosedRange) {
        self.range = range
    }
}

RandomScreen defines several UI screens to display random number and re-generate it. Notice that it depends only on RandomProviderInterface and not on RandomProvider which is the implementation

public struct RandomScreenSimple: RandomScreen {
    let randomProvider: NumberProvider
    
    @State var number: Int = 0
    
    public init(randomProvider: NumberProvider) {
        self.randomProvider = randomProvider
    }
    
    public var body: some View {
        VStack {
            Text("\(number)")
            Button("generate") {
                number = randomProvider.number
            }
        }.onAppear {
            number = randomProvider.number
        }
        .animation(.default, value: number)
    }
}

Common is a module that provides common tools. Actually, it is used only by the App module, but I wanted to show that many modules can depend on it

ExampleApp is an app module that combines other modules and builds the final app

This is the only module that can depend on other modules' implementation. Moreover, it chooses which implementation to use depending on the scenario. In the example app, NumberProvider implementation is changed in runtime

0:00

Example app

Final notes

So, in this post, we constructed a modular app using Tuist. In the example project, I added useful tools like

Additions to default Info.plist
Template for creating a new feature that can be invoked by
tuist scaffold framework --name ModuleName. This will create a new module folder, Project.swift file
Building for release mode. You can invoke generation with an environment variable and this will make all modules static. Using static frameworks improves app speed and is good for production.
TUIST_BUILD_TYPE_RELEASE=TRUE tuist generate --no-cache

Also, If you have not read my article on a general overview of modular architecture, check it out!

iOS App As a Microservice. Build Robust App Architecture

What will you choose: MVVM, MVC, VIPER? Those all are local and problem-specific architectures. But how to structure your app on a larger scale to make it scalable and well-organized?

Alex DremovAlex Dremov

Do not hesitate to ask anything in the comments

References

Xcode on steroids | Tuist

Tuist is a tool that helps developers manage large Xcode projects by leveraging project generation. Moreover, it provides some tools to automate most common tasks, allowing developers to focus on building apps.

Tuist - Xcode on steroids

iOS App As a Microservice. Build Robust App Architecture

Alex Dremov — Fri, 16 Sep 2022 10:43:25 +0300

In this post, I will discuss microfeature architecture that is, simply said, amazing when implemented correctly in an iOS app.

Next Episodes

Ideas on implementation with SwiftUI

iOS App As a Microservice. Using SwiftUI in Modular App

Alex DremovAlex Dremov

Using tuist to structure microfeature application

iOS App As a Microservice. Modularize Your App With Tuist

This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist

Alex DremovAlex Dremov

Core Idea

The idea comes from microservice server-side application infrastructure. The whole app is divided into logical components corresponding to different functional areas of the application.

💥

Considering how complex mobile apps can be, why not apply the same architecture to iOS apps?

Briefly, microfeature architecture implies splitting your app into different components that accept other components' interfaces or data as explicit dependencies.

Therefore, your app can be represented as a graph of modules that explicitly interact with each other.

Main Benefits

Improved maintainability — each component is small and so is easier to understand and change.
Better testability — components explicitly define their public interface. So, they are easier to mock and test.
Team organization — different teams can work on different components independently.
Scalability, code reuse —when an app is a combination of modules, you can robustly change the app's behaviour by recombining modules. If you decide to create an app extension, watchOS app, or App Clip, just pick the required components and you're all set up.
Explicit dependencies — implicit dependencies are one of the worst things that can happen to an app's architecture. This architecture requires defining explicit dependencies for each module.

Details

So, how to structure an iOS app once you decided to use microfeature architecture? The core concept is separation. But you still can use one Xcode project for that and separate features purely by architecture.

💥

You can put each feature into a separate Xcode project. This will push you to a strict separation of components.

I will cover how to do this effectively with tuist in the next episode!

Your codebase will be divided into several blocks:

Features

That's where elements of your app live. Later in this post, I will show by example what this part includes.

Components are logical blocks of your app. Each component explicitly defines an interface to interact with it.

💡

Swift does not have namespaces, but you can use enums to hide internal module logic.

Apps

You can have a WatchOS app, widgets, and the main iOS app. Each app depends on features and builds the final app using features, combining them like bricks.

General apps structure

Photo by Ashkan Forouzani / Unsplash

Tests + Testing Data And Mock

This logic also lies apart from the feature's main parts. It's separate because:

We don't want to use mock data accidentally in the app
We don't want to include irrelevant data in the final app binary

Feature design

The feature consists of four blocks. Tests and mocks may not be present, but the feature always has an interface and implementation.

One feature structure

Interface

This part defines parts visible for other features. Public interfaces and models or entities of the feature stay here.

Interfaces define ways that are used to interact with the feature.

Models or entities are simple structures with almost no logic that simply define data used to communicate with the feature.

You can include other components in the interface but remember that interface must not expose implementation details

💥

If the feature depends on another feature, then it depends on the other feature's interface.

Features must not depend on other feature's implementation

Implementation

Implementation depends on an interface and provides classes and structures conforming to defined protocols in the interface. Resources, images, and other implementation details also stay here.

💡

Separation Interface/Implementation forces you to write code conforming to the letter D from SOLID.

Dependency inversion happens naturally when other modules know about interfaces and not about implementations.

Knowing this information, we can add details to our app's graph image:

Detailed apps structure

Notice that none of the features depends on the other feature's interface. Each feature interface strictly depends on the other feature's interface.

Now you see that apps take building blocks and combine them to make an app.

Subscribe and don't miss posts!

Case Example

Let's architect a scheduling app. It will have:

Schedule view
Add event/edit view
Schedule WatchOS View

Pretty simple.

Let's split this app into several features:

UICommon

Contains common UI elements that can be used to create more complex views

Schedule

Contains main schedule views and logic associated with them. The interface defines ways to interact with views or present them.

WatchSchedule

Contains watch-specific schedule views and logic associated with them

EventModification

Contains event modification logic and views

ScheduleData

Data provider. Defines data structures and entities to obtain them.

The interface will contain simple data entities and model protocols defining ways of obtaining these entities.

Implementation defines models conforming to protocols defined in the interface. For example, you may want to define a local storage model or network model. It's up to the final app to decide which option to use.

App Graph

Case app graph

As you see, WatchOS and the main iOS app reuse common components. Also, Each app decides which implementation of modules' interfaces they pick. For example, the WatchOS app can choose different data sources in ScheduleData feature rather than the main iOS app.

In a monolithic app, you would probably need to write almost a second app and copy a lot of code

Next Episodes

In the next posts, I will share my ideas on using microfeature architecture with SwiftUI and tuist to structure code efficiently.

FAQ

When should I create a new feature and when It's better not to?

It purely depends on the case and on what you think the best option is. If you can come up with some use case when your feature will be reused in some other context, then it's a separate feature.

💥

Do not overcomplicate things!
Making a new feature for each class will do more harm than good.

If some block probably will not be reused, but you just feel that it's logically separate functionality, then also go with a new feature as it will help to keep your architecture clean.

What to do with circular references?

Circular references can be a pain and they happen if two features depend on each other's interfaces. If such a situation happens, critically consider if your feature separation is correct. There are two possible options.

Two features are actually one feature. Then, you can merge these two features and get rid of circular references.
Two features are actually three features. If features depend on each other, then there is some part that's needed by both features. What if this part is an independent feature? If this is the case, extract the third feature and fix dependencies.

Possible circular reference solution

There's a lot said about making dependencies explicit. What's the point?

It's nearly impossible to scale or modify big apps when components are implicitly dependent. Just imagine the mess that is going to happen if you modify some class that is a dependency of all other modules through a singleton.

Your app may start to have unexpected behaviour here and there and you can't even know how your modification will affect the whole app.

It's like sitting on a box of TNT.

Photo by Mehdi MeSSrro / Unsplash

I encourage you to avoid implicit dependencies whenever possible. Microfeatures architecture will help you with doing that.

References

iOS App As a Microservice. Modularize Your App With Tuist

This is the second article in a series on modular app architecture. In this post, I will cover implementation details using Tuist

Alex DremovAlex Dremov

µFeatures Architecture | Tuist Documentation

This document describes an approach for architecting a modular Apple OS application to enable scalability, optimize build and test cycles, and ensure good practices.

Tuist

Exploring SwiftUI Layout Protocol | Creating Custom Layout

Alex Dremov — Fri, 12 Aug 2022 00:00:18 +0300

Apple introduces a new SwiftUI Layout protocol with the release of iOS 16. It is a powerful tool for constructing custom views with SwiftUI elegance. In this post, I will cover what Layout is and how it can be used.

In the end, we will construct a custom table view that auto-arranges its subviews. Complete code is provided!

Conforming to Layout

The discussed Layout is a new protocol that allows you to select a way of arranging your views.

Through it, you literally can say at what coordinates you want to place subviews. For example, now HStack, VStack, and ZStack can easily be implemented through it in iOS 16.

protocol Layout : Animatable

To conform to the protocol, you need to define two methods

func sizeThatFits(
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
) -> CGSize


func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
)

You also can define makeCache(subviews:) if your layout has some calculations that do not depend on a proposal and depend only on subviews. Then, you can make your calculations in makeCache(subviews:) and then use these values.

Method `sizeThatFits`

func sizeThatFits(
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
) -> CGSize

Returns a size that indicates how much space the container needs to arrange its subviews. SwiftUI can call this method several times, probing your view and finally deciding the best option

💥

Only finite sizes can be returned. Returning size with infinite coordinate results in a crash without a reasonable call stack, so keep attention to sizes that you return

To calculate it, you can use passed arguments:

proposal

Basically, it's SwiftUI's proposal for your view's size. I like to think about it as a negotiation.

I can give you this much space. What's your size is going to be? Will you even fit?

— SwiftUI negotiator

ProposedViewSize is like a CGSize that also can have some specific values.

The zero proposal; the view responds with its minimum size.
The infinity proposal; the view responds with its maximum size.
The unspecified proposal; the view responds with its ideal size.

You can also access width and height of proposal if it is not of the above values.

The proposal can have one dimension fixed and the second one as nil. For example, an HStack might measure the flexibility of its subviews’ widths, while using a fixed value for the height.

subviews

It is just a container of subviews' proxies LayoutSubview. Through it, you can ask subviews about their size, and also give them your proposal

Dear subview, I give you this much space. What's your size is going to be?

— Custom Layout negotiator

You can ask for subview size through

func sizeThatFits(ProposedViewSize) -> CGSize

and

func dimensions(in: ProposedViewSize) -> ViewDimensions

cache

It is a cache provided by your makeCache(subviews:) function. It also can be Void (no cache).

Method `placeSubviews`

func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Self.Subviews,
    cache: inout Self.Cache
)

It's where the magic happens. In this method (and only this) you are given bounds for your view and subviews for your disposal.

To place subviews, you need to call place method on subviews elements.

func place(
    at position: CGPoint,
    anchor: UnitPoint = .topLeading,
    proposal: ProposedViewSize
)

The definition is pretty self-explanatory. For every subview, you need to specify a point to place it, an anchor for this point, and your proposal for the selected subview.

bounds

It's bounds for your view to use. It is one of your sizeThatFits outputs.

💡

While it is named bounds, it is actually frame. So, the origin point is also specified and you need to arrange subviews with respect to that

proposal

The size proposal from which the container generated the size that the parent used to create the bounds parameter.

About caching

You may not use it, but usually, some subviews-concerned calculations can be cached which is a good practice and great for performance.

When subviews are changed, func updateCache(inout Self.Cache, subviews: Self.Subviews) is called. Its default implementation is just to call makeCache(subviews:).

Creating auto-filled table

SwiftUI has a Grid to construct table-like structures, but what if you have an unknown number of subviews? Then, you need to construct GridRow somehow correctly.

Let's better use the new Layout protocol feature!

Subscribe and don't miss posts!

Calculating sizes

Deciding what size the result view will have is relatively simple.

public func sizeThatFits(
        proposal: ProposedViewSize,
        subviews: Subviews,
        cache: inout ()
    ) -> CGSize {
    
        let subviewProposal = getSubviewProposal(
            subviewsCount: subviews.count,
            from: proposal
        )
        
        let rowHeights = getRowHeights(
        	subviews: subviews,
            globalProposal: proposal
        )
        
        let resultWidth = proposal.width ?? 
        		((subviewProposal.width ?? 0) * CGFloat(columnsNumber))
        return CGSize(
            width: resultWidth,
            height: rowHeights.reduce(0, +)
        )
    }

It uses several helper-functions

/**
 Get array of heights for every row.
 Just get max height on every row
 */
private func getRowHeights(subviews: Subviews, subviewProposal: ProposedViewSize) -> [CGFloat] {
    var subviewProposalNoHLimit = subviewProposal
    subviewProposalNoHLimit.height = .infinity
    
    var rowHeights = [CGFloat]()
    var index = 0
    while index < subviews.count {
        var rowMax: CGFloat = 0
        for _ in 0.. ProposedViewSize {
    let rowHeight = max(ceil(Double(subviewsCount / columnsNumber)), 1)
    return ProposedViewSize(
        width: (globalProposal.width ?? 0)
                        / CGFloat(columnsNumber),
        height: (globalProposal.height ?? 0) / rowHeight
    )
}

Placing subviews

Finally, we just need to carefully place views on their places. Just iterating over subviews and calculating their x and y position.

public func placeSubviews(
    in bounds: CGRect,
    proposal: ProposedViewSize,
    subviews: Subviews,
    cache: inout ()
) {
    var subviewProposal = getSubviewProposal(
        subviewsCount: subviews.count,
        from: proposal
    )
    let colRealWidth = subviewProposal.width ?? 0
    let rowHeights = getRowHeights(subviews: subviews, subviewProposal: subviewProposal)
    
    var curPos: CGFloat = bounds.minX
    var curHeight: CGFloat = bounds.minY
    
    var rowIndex = 0
    for (index, subview) in subviews.enumerated() {
        subviewProposal.height = rowHeights[rowIndex]
        let size = subview.dimensions(in: subviewProposal)
        
        subview.place(
            at: CGPoint(x: curPos, y: curHeight),
            anchor: .topLeading,
            proposal: subviewProposal
        )
        
        if index % columnsNumber == columnsNumber - 1 {
            curPos = bounds.minX
            curHeight += rowHeights[rowIndex]
            rowIndex += 1
        } else {
        	curPos += colRealWidth
        }
    }
}

Example

Now, we can construct a table with the needed number of columns as easy as just a regular view.

ColumnsLayout(columnsNumber: 2) {
    VStack {
        Text("That's one view")
        Image(systemName: "tortoise.fill")
    }
    .padding()
    .border(.red)
    Text("That's the second view ")
        .padding()
        .border(.red)
    Text("That's the third view with long lines that are warped automatically")
        .fixedSize(horizontal: false, vertical: true)
        .padding()
        .border(.red)
}
.border(.blue)
.padding()

And it magically re-assembles after changing the number of columns to three.

Final notes

I believe that you see how powerful this tool is. For example, Apple creates a radial view in their example with Layout protocol.

So, it's only up to you how to place views inside your container and it's finally a room of flexibility so needed for SwiftUI in iOS 16.

Let me know what you think about it in the comments!

SwiftUI Navigation Is a Mess. Here’s What You Can Do

Alex Dremov — Sat, 30 Jul 2022 20:55:14 +0300

Why messy?

It's because of the core idea of SwiftUI — a view is a function of the state, or a view is state-driven. Don't get me wrong, this concept is great, but SwiftUI's navigation is not this advanced yet.

💡

The view is a function of the state and navigation is not an exception

However, SwiftUI does not have the means to construct robust navigation inside your app.

Messy example

Consider the common case of the onboarding screen when you need to present some sequence of views with nice transitions. What can you do with SwiftUI? Probably, create an enum that tells which screen is active and then use switch to present the sequence of views.

0:00

What if you need to modify the order or change the number of views? You'll need to modify the corresponding enum, modify the logic of switching inside the views, and other stuff.

Not so flexible, right?

Oh, and then you decide to present one view right in the middle through .sheet. That's when the mess starts to show up. You create an additional @State to check if the sheet is open, make sure that it's updated correctly, and restructure the switch block that you used before.

Now, it's a chaotic view that is prone to unexpected bugs.

The most obvious one is NavigationView which is deprecated in the new iOS 16.

Image by https://developer.apple.com/documentation/swiftui/navigationview

Using NavigationLink, it can present new views and also adds a "back" button to return to the previous view.

And it does not support programmatic navigation.

Apple presented a new NavigationStack that addresses this issue but it is still not flexible enough. For example, I like to have the ability to modify the view whatever I want, but NavugationStack inserts back buttons. Also, it does not support different transitions. While it is nice to see SwiftUI develop in this direction, yet we are not there.

So, even in iOS 16, SwiftUI is not powerful enough to manage any kind of navigation you can come up with.

And .sheet(). NavigationStack does not make it easier to handle .sheet() either.

I decided to create a library with several requirements:

Programmatic views navigation
Ability to present a sequence of views
Support for any SwiftUI transition and Animation
Completely state-driven: no singletons or environment objects
Handle .sheet()

Sounds cool, right?

Straight to the point, I was able to create such a library.

GitHub - AlexRoar/PathPresenter: Pure SwiftUI state-driven library to present view sequences and hierarchies.

Pure SwiftUI state-driven library to present view sequences and hierarchies. - GitHub - AlexRoar/PathPresenter: Pure SwiftUI state-driven library to present view sequences and hierarchies.

GitHubAlexRoar

💡

I am always open to objective criticism and requests for a new feature. Do not hesitate to open an issue on GitHub!

So, if you just want a nice tool for the things I listed above, you can stop here. Now, let's see how I did it.

Ways to present

At the core of the library is a structure that stores views and information about how to present them. Possible options for presentation are

enum PathType {
    /**
    * Just show a view. No animation, no transition.
    * Show view above all other views
    */
    case plain

    /**
    * Show view with in and out transitions.
    * Transition animation also can be specified.
    */
    case animated(transition: AnyTransition, animation: Animation)

    /**
    * Show view in .sheet()
    */
    case sheet(onDismiss: Action)
}

❗

Note that presenting through .sheet() is as easy as just presenting any other view.

So, you can present the view without any animation, present it with needed transitions, and present it in a sheet.

Path

This structure stores information about views. It just stores an array of type-erased views with presentation type information. You can append views on top and remove them from the top.

Suffix Automaton and Rickroll Lyrics Graph

Alex Dremov — Sun, 17 Jul 2022 16:57:17 +0300

Suffix automaton is a robust data structure that allows you to solve complex string-related problems such as: checking the presence of a substring in a string, counting the number of total distinct substrings, finding substring, and many others. In this article, I cover the suffix automaton algorithm, provide implementation, and finally create the correct rickroll lyrics automaton.

Why Rickroll?

First of all

Now we can continue.

There is a meme that I've seen a couple of times with all possible Never Gonna Give You Up central lines. It's nice, but it's not fully correct.

Rickroll lyrics graph | https://www.reddit.com/r/memes/comments/lskvsq/never_gonna_make_a_flow_chart/

The problem is that it conforms to incorrect lines too:

Never gonna give you cry
Never gonna tell a lie and desert you down
Never gonna make you up
Never gonna give you down
Never gonna make you never

And many others. So, we can conclude that this graph is incorrect as incorrect lyrics must be unreachable. Then, we need to correct this immense mistake against humanity and generate the correct automaton for Rickroll lyrics.

What is the suffix automaton?

Intuitively, it's a data structure that contains information about all substrings of a string and stores it in compressed form. More specifically, it's a directed acyclic word graph in which each node is a state and all edges are transitions between these states by some letter.

Each state corresponds to some substring in the initial string. There is also one start state and some states are marked as terminal. We also require that suffix automaton contains the minimal possible number of states.

💥

So, if each node is some substring and each edge is a transition by some letter, by navigating through this graph we can collect information about substrings.

If a substring is not presented in the text, then this state will be unreachable. There's simply no state or for an absent substring. So, at some point we will need transition that does not exists.

Here is the example of suffix automaton for string abcbac.

Suffix automaton for abcbac

The leftmost state corresponds to empty string (start state) and the rightmost corresponds to the whole string (terminal). Notice that if you start from the start and somehow end up in the terminal state, then the path you followed corresponds to some suffix of the string. Also, every substring corresponds to one path from the start.

Rickroll suffix automate

For this, I generated suffix automate for every line and then merged these suffix automates.

Full Never Gonna Give You Up lyrics

Final thoughts

Even though this graph is not as nice as presented in the meme, it's correct. You can explore the graph above by yourself; it's actually fun.

In the next post, I will discuss how I have built this graph using the suffix automaton. Subscribe so you do not miss it!

When Nothing Left To Do — Teach

Alex Dremov — Fri, 17 Jun 2022 16:15:14 +0300

Not a usual type of post on my blog. Well, not the usual times.

I announce that I will teach two courses in "Not just math" volunteer project. The project is primarily for middle and high school students.

It's a project that conducts free online lessons for children who find themselves in a difficult situation due to the war. The purpose of these lessons is to divert children to useful and engaging activities and to give parents a break.

— Not just math website

Starting this July I will instruct two courses:

Python for beginners
I will explain Python's basic features, syntax and simple algorithms.
Competitive programming & C++
I was contacted with the idea of starting a competitive algorithms class. Currently, I work on squeezing the second lesson into my schedule, still, I believe that the course will be started successfully.

What can I do?

Not just math is purely a volunteer project that already has a big community of teachers and students. Still, if you can start useful courses, I encourage you to contact coordinators and propose your course.

Also, you can help to spread not just math project so that more affected families know about it.

Using Threads in Swift

Alex Dremov — Fri, 13 May 2022 11:44:49 +0300

Swift provides DispatchQueue as an excellent layer above raw threads. But sometimes you want to create a new thread dedicated to some specific task. Or maybe implement your own concurrent executor. Swift gives you access to raw threads and in this article, I'll show how to use it.

Thread

Creating a thread in Swift is pretty simple using Thread class. You can either specify objc function through a selector as a starting point, or pass a closure, and, more convenient way, subclass Thread.

class MyThread: Thread {
    override func main() { // Thread's starting point
        print("Hi from thread")
    }
}

let thread = MyThread()
thread.start()

Simple thread

The thread is not started when the initializer is called. You need to call start() method explicitly to start the thread.

The thread runs despite its handle returned by Thread initializer. That's it — the variable can no longer exist and the thread will still run. That's fine, but you will lose the ability to control the thread: check if it's completed, wait for its completion, cancel it, etc.

Wait for completion, join a thread

Swift does not provide a way to wait for the thread's completion.

💡

The main thread can finish before the new thread. In this case, the latter is also terminated

To wait for thread completion, we can join threads using DispatchGroup

class MyThread: Thread {
    let waiter = DispatchGroup()

    override func start() {
        waiter.enter()
        super.start()
    }

    override func main() {
        task()
        waiter.leave()
    }

    func task() {
        print("Hi from thread")
    }

    func join() {
        waiter.wait()
    }
}

let thread = MyThread()
thread.start()

thread.join() // Waits for thread completion

Terminate the thread

The thread terminates automatically after reaching main's end. To exit the thread in advance, you can call Thread.exit() function from the thread. To use it correctly with created DispatchGroup, it's better to create a custom exit method:

class MyThread: Thread {
    ...
	func exit() {
        waiter.leave()
        Thread.exit()
    }
    ...
}

Cancel the thread

Apart from terminating the thread, you can cancel it, by calling cancel() method on the thread's handle or inside the thread itself. This sets isCancelled property to true.

SwiftUI Advanced Animation: Morphing Shapes

Alex Dremov — Thu, 05 May 2022 07:00:00 +0300

The regular .animate() function already provides a powerful way of animating views. Yet, its usage is limited to simple transformations. In this guide, I'm going to show how complex SwiftUI views can be animated efficiently using VectorArithmetic protocol with Accelerate library for fast computations.

Inspiration

In the course of this guide, we will make a morphing sphere animation inspired by lava lamp bubbles. Some kind of wobbling lava bubbles.

💡

The proposed technique can be used in other even more complex animations

Wobbling bubble

Creating custom animations

You may think about animation as a transition between two states. And this transition must be smooth! To display this smooth transition, SwiftUI needs to know how to draw in-between stages.

Smooth change between two shapes (states)

AnimatableVector

The key idea of the animation is to represent objects' states with properties that can change continuously.

For example, if we try to animate an object's positioning and it has integer coordinates, then creating in-between frames of an object smoothly moving from one coordinate to the other is impossible. On the opposite, if the object's position is represented by a floating-point variable, then we can gradually change the object's coordinate until the new coordinate is achieved.

The same goes for more complicated animations. But usually, states cannot be represented by a single float variable. In this case, we are going to use AnimatableVector. It represents a mathematical vector, conforming to VectorArithmetic protocol.

💡

If two animation stages are represented by objects conforming to VectorArithmetic protocol, then SwiftUI can compute in-between vectors and draw transitioning.

The AnimatableVector is pretty simple. We store an array of coordinates and define basic math operations for them. In the code below Accelerate is used for fast computations.

💥

Accelerate can introduce too much overhead when the vector contains only several values. So, if your animation can be represented with a few values, then consider rewriting operators without Accelerate

import enum Accelerate.vDSP

struct AnimatableVector: VectorArithmetic {
    var values: [Float]
    
    static var zero = AnimatableVector(values: [0.0])

    static func + (lhs: AnimatableVector, rhs: AnimatableVector) -> AnimatableVector {
        let count = min(lhs.values.count, rhs.values.count)
        return AnimatableVector(
            values: vDSP.add(
                lhs.values[0.. AnimatableVector {
        let count = min(lhs.values.count, rhs.values.count)
        return AnimatableVector(
            values: vDSP.subtract(
                lhs.values[0.. Float {
        get {
            values[i]
        } set {
            values[i] = newValue
        }
    }
}

Animatable vector

Wobbling bubble

So, as I already said, we need to define stages of animation with AnimatableVector so that SwiftUI will be able to magically draw all in-between frames.

To do this with a circle, we first need to somehow make it able to wobble. This is done through approximation with curves. To make the morphing effect, we will use AnimatableVector to modify the radius at every specific point.

That's it

The first coordinate of the vector will say how much must be added to the distance of the first approximation point. The second is for the second point and so on.

You can see in a gif below how the radius at every specific point changes and how SwiftUI changes it smoothly. Curves' control points are also displayed.

Under the hood of wobbling

Subscribe and don't miss posts!

Implementation

The concept of animation is determined. It's time to code!

As I said, the main idea is to approximate a circle with curves. There is an approximation of control points: (4/3)*tan(pi/(2n)) distance from a point in a circle with n segments.

https://stackoverflow.com/questions/1734745/how-to-create-circle-with-bézier-curves

We're going to represent the circle as an object conforming to Shape protocol. For SwiftUI to know what to animate, you need to define animatableData property. That's what SwiftUI is going to use to animate in-between frames.

var animatableData: AnimatableVector {
    get { animatedValue }
    set { animatedValue = newValue }
}

A little bit of linear algebra and all point coordinates are calculated. Some more advanced operations on CGVector and CGPoint are needed:

import Foundation
import SwiftUI

extension CGPoint {
    public static func +(lhs: CGPoint, rhs: CGPoint) -> CGPoint {
        CGPoint(x: lhs.x + rhs.x, y: lhs.y + rhs.y)
    }
    
    static func +(lhs: CGPoint, rhs: CGVector) -> CGPoint {
        CGPoint(x: lhs.x + rhs.dx, y: lhs.y + rhs.dy)
    }
    
    static func -(lhs: CGPoint, rhs: CGVector) -> CGPoint {
        CGPoint(x: lhs.x - rhs.dx, y: lhs.y - rhs.dy)
    }
    
    public static func -(lhs: CGPoint, rhs: CGPoint) -> CGPoint {
        CGPoint(x: lhs.x - rhs.x, y: lhs.y - rhs.y)
    }
    
    init(_ vec: CGVector) {
        self = CGPoint(x: vec.dx, y: vec.dy)
    }
}

extension CGPoint: VectorArithmetic {
    public mutating func scale(by rhs: Double) {
        x = CGFloat(rhs) * x
        y = CGFloat(rhs) * y
    }
    
    public var magnitudeSquared: Double {
        Double(x * x + y * y)
    }
    

}

extension CGVector {
    init(_ point: CGPoint) {
        self = CGVector(dx: point.x, dy: point.y)
    }
    
    func scalar(_ vec: CGVector) -> CGFloat {
        dx * vec.dx + dy * vec.dy
    }
    
    func len() -> CGFloat {
        sqrt(dx * dx + dy * dy)
    }
    
    func perpendicular() -> CGVector {
        CGVector(dx: -dy, dy: dx) / len()
    }
    
    static func *(lhs: CGVector, rhs: CGFloat) -> CGVector {
        CGVector(dx: lhs.dx * rhs, dy: lhs.dy * rhs)
    }
    
    static func *(lhs: CGFloat, rhs: CGVector) -> CGVector {
        CGVector(dx: rhs.dx * lhs, dy: rhs.dy * lhs)
    }
    
    static func /(lhs: CGVector, rhs: CGFloat) -> CGVector {
        CGVector(dx: lhs.dx / rhs, dy: lhs.dy / rhs)
    }
    
    static func -(lhs: CGVector, rhs: CGVector) -> CGVector {
        CGVector(dx: lhs.dx - rhs.dx, dy: lhs.dy - rhs.dy)
    }
    
    static func +(lhs: CGVector, rhs: CGVector) -> CGVector {
        CGVector(dx: lhs.dx + rhs.dx, dy: lhs.dy + rhs.dy)
    }
    
    func angle(_ rhs: CGVector) -> CGFloat {
        return acos(scalar(rhs) / (rhs.len() * len()))
    }
}

Finally, implementing Shape:

import SwiftUI
import Foundation

struct MorphingCircleShape: Shape {
    let pointsNum: Int
    var morphing: AnimatableVector
    let tangentCoeficient: CGFloat
    
    var animatableData: AnimatableVector {
        get { morphing }
        set { morphing = newValue }
    }
    
    // Calculates control points
    func getTwoTangent(center: CGPoint, point: CGPoint) -> (first: CGPoint, second: CGPoint) {
        let a = CGVector(center - point)
        let dir = a.perpendicular() * a.len() * tangentCoeficient
        return (point - dir, point + dir)
    }
    
    // Draw circle
    func path(in rect: CGRect) -> Path {
        var path = Path()
        let radius = min(rect.width / 2, rect.height / 2)
        let center =  CGPoint(x: rect.width / 2, y: rect.height / 2)
        var nextPoint = CGPoint.zero
        
        let ithPoint: (Int) -> CGPoint = { i in
            let point = center + CGPoint(x: radius * sin(CGFloat(i) * CGFloat.pi * CGFloat(2) / CGFloat(pointsNum)),
                                         y: radius * cos(CGFloat(i) * CGFloat.pi * CGFloat(2) / CGFloat(pointsNum)))
            var direction = CGVector(point - center)
            direction = direction / direction.len()
            return point + direction * CGFloat(morphing[i >= pointsNum ? 0 : i])
        }
        var tangentLast = getTwoTangent(center: center,
                                        point: ithPoint(pointsNum - 1))
        for i in (0...pointsNum){
            nextPoint = ithPoint(i)
            let tangentNow = getTwoTangent(center: center, point: nextPoint)
            if i != 0 {
                path.addCurve(to: nextPoint, control1: tangentLast.1, control2: tangentNow.0)
            } else {
                path.move(to: nextPoint)
            }
            tangentLast = tangentNow
        }
        
        path.closeSubpath()
        return path
    }
    
    
    init(_ morph: AnimatableVector) {
        pointsNum = morph.count
        morphing = morph
        tangentCoeficient = (4 / 3) * tan(CGFloat.pi / CGFloat(2 * pointsNum))
    }
}

Finally, we can use this shape in a View. To make a wobbling effect, we need to change the vector responsible for radius modification.

This can be done by timer.

Using Timer

We're going to randomly change the morphing vector in the timer's callback. Also, it looks weird to change all points at once, so we're going to animate only a subset of them.

struct MorphingCircle: View & Identifiable & Hashable {
    static func == (lhs: MorphingCircle, rhs: MorphingCircle) -> Bool {
        lhs.id == rhs.id
    }
    
    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }
    
    let id = UUID()
    @State var morph: AnimatableVector = AnimatableVector.zero
    @State var timer: Timer?
    
    func morphCreator() -> AnimatableVector {
        let range = Float(-morphingRange)...Float(morphingRange)
        var morphing = Array.init(repeating: Float.zero, count: self.points)
        for i in 0.. MorphingCircle {
        var morphNew = self
        morphNew.color = newColor
        return morphNew
    }
}

Results

Created bubbles can be combined and animated to drift around the screen for example. Also, in the course of this guide, we created AnimatableVector structure that you can use in your projects.

Feel free to share your results!

More wobbling bubbles

💡

Check my iOS section of the blog to learn more useful tips

Alex Dremov | iOS

One of my favorites. Here I write about Swift and iOS development

Alex Dremov

References

New Package: Look at Swift Async Algorithms

Alex Dremov — Wed, 27 Apr 2022 23:21:00 +0300

About a month ago, Apple released the first version of the async swift algorithms package. It provides tools and algorithms to use with the introduced not that far ago asynchronous sequence. The package focuses on implementing already well-known tools like zip as well as new features that transact in time (wow). It also makes available more sophisticated ways of creating and managing asynchronous sequences.

💥

The module's latest version is 0.0.1, which means that it's still in development. So, some methods are not available yet, some may change or appear.

Mostly, this article here is to get to know new features and, possibly, plan your code, keeping in mind that such features will appear in the future

Installation

The new package is distributed through Swift PM. To add it to your project, you need to add it as a dependency in the Xcode project File > Add Packages.

Or add it to your Package.swift file:

.package(url: "https://github.com/apple/swift-async-algorithms"),

Don't forget to also add the dependency to the executable:

.target(name: "", dependencies: [
    .product(name: "AsyncAlgorithms", package: "swift-async-algorithms"),
]),

The module will be available in your project after adding import AsyncAlgorithms.

💥

As I mentioned, the module is still in development. So, you need to install Swift Trunk Development toolchain to have access to all features.
Some of them are available right away, though!

Creating asynchronous sequences

To test all the beautiful functions the new module provides, we need to create an async sequence at first. And the package introduces new ways of doing so.

Property `async`

The module adds the following extension to Sequence protocol.

extension Sequence {
  public var async: AsyncLazySequence { get }
}

Where AsyncLazySequence conforms to AsyncSequence.

public struct AsyncLazySequence: AsyncSequence {
}

extension AsyncLazySequence: Sendable where Base: Sendable {
	...
}
extension AsyncLazySequence.Iterator: Sendable where Base.Iterator: Sendable {
}

💡

Using the async property, we can turn any existing Sequence into AsyncSequence to use them in some async API, for example.

let numbers = [1, 2, 3, 4].async
let characters = "Hello, world".async
let items = [1: "one", 2: "two", 3: "three"].async

However, creating AsyncSequence this way does not really bring benefits as all elements are already here and available right away. There are more useful ways of creating AsyncSequence.

Subscribe and don't miss posts!

AsyncChannel and AsyncThrowingChannel

If you know what Future or Promise in other languages are, then AsyncChannel will be familiar to you. Except that it provides a way of transferring a sequence of values.

❗

Channel's element must conform to the Sendable protocol, which basically means that public API is safe to use across concurrency domains.

All basic types automatically conform to it. For custom types, you need to add the conformance before use.

Here's a pretty straightforward example of AsyncChannel usage.

let channel = AsyncChannel()
Task {
    for word in ["Hello", "from", "async", "channel"] {
      await channel.send(word)
    }
    await channel.finish()
}

for await message in channel {
    print(message)
}

Hello
from
async
channel

Notice that await keyword is used with send and finish. This is because the channel is actually both ways synchronized. That means that send awaits consumption and vice versa.

💡

The await channel.send() waits until the sent value will be consumed in any way. This way, the one who produces values for the channel, will not generate more values than the receiver can consume

AsyncThrowingStream is almost the same except that it provides fail(_ error: Error) method that can be used to throw an exception to the channel's consumer.

let channel = AsyncThrowingChannel()

...

for try await message in channel {
    print(message)
}

And converting back

The module adds initializers for three primary types: Array, Dictionary, and Set that let you transform the async sequence to the regular one by fetching all elements during init.

let table = await Dictionary(uniqueKeysWithValues: zip(keys, values))
let allItems = await Set(items.prefix(10))
let allMessages = await Array(channel)

Manipulating asynchronous sequences

The module also provides new ways of combining asynchronous sequences. These functions are pretty straightforward.

chain(_ s1: AsyncSequence, _ s2: AsyncSequence)

Chains two or three asynchronous sequences together sequentially where the elements from the result are comprised in order from the elements of the first asynchronous sequence and then the second (and so on) or until an error occurs. Sequences must have the same Element type.

Sequence 1	Sequence 2	Result
1		1
4		4
	2	2
	3	3

💥

Apple notes that it can be used for two or more sequences. Though, only two or three arguments are available now.

joined() or joined(separator: AsyncSequence)

Concatenates an asynchronous sequence of asynchronous sequences together where the result is comprised in order from the elements of the first asynchronous sequence and then the second (and so on) or until an error occurs. Similar to chain()except the number of asynchronous sequences to concatenate is not known upfront. The separator also can be specified.

combineLatest(_ base1: AsyncSequence, _ base2: AsyncSequence)

Combines two or more sequences, producing tuples of the latest values available from the sequence.

Sequence 1	Sequence 2	Result
1		awaits
	2	(1, 2)
	3	(1, 3)
4		(4, 3)

merge(_ base1: AsyncSequence, _ base2: AsyncSequence)

Merges sequences into a new one. The result is a combination of results from two sequences. Sequences must have the same Element type.

Sequence 1	Sequence 2	Result
		awaits
1		1
	2	2
	3	3
4		4

💡

Considering that it's not defined from which sequence element will appear faster, the order of elements can be whatever

zip(_ base1: AsyncSequence, _ base2: AsyncSequence)

The same as a regular zip but for AsyncSequence. Differs from combineLatest as it waits until the second value is available and does not use the last value.

Sequence 1	Sequence 2	Result
1		awaits
	2	(1, 2)
	3	awaits
4		(4, 3)

Sounds awesome, but Swift is not powerful enough to put await before the time itself. When events can potentially happen faster than the desired consumption rate, there are ways to handle the situation. These functions allow linking AsyncSequences with time. They can be applied to any AsyncSequence.

For both listed methods, a custom clock can be specified. By default, it's ContinuousClock

Debounce

 public func debounce(
    for interval: C.Instant.Duration, 
    tolerance: C.Instant.Duration? = nil, 
    clock: C
  ) -> AsyncDebounceSequence

The debounce algorithm produces elements after a particular duration has passed between events. If there are a lot of events happening, debounce will wait until at least interval of time elapsed from the last event before emitting value.

seq.debounce(for: .seconds(1))

In this case, it transforms a potentially fast asynchronous sequence of events into one that waits for a window of 1 second with no events to elapse before emitting a value.

Throttle

extension AsyncSequence {
  public func throttle(
    for interval: C.Instant.Duration, 
    clock: C, 
    reducing: @Sendable @escaping (Reduced?, Element) async -> Reduced
  ) -> AsyncThrottleSequence
  
  public func throttle(
    for interval: Duration, 
    reducing: @Sendable @escaping (Reduced?, Element) async -> Reduced
  ) -> AsyncThrottleSequence
  
  public func throttle(
    for interval: C.Instant.Duration, 
    clock: C, 
    latest: Bool = true
  ) -> AsyncThrottleSequence
  
  public func throttle(
    for interval: Duration, 
    latest: Bool = true
  ) -> AsyncThrottleSequence
}

The throttle algorithm produces elements such that at least a specific interval has elapsed between them. If values are produced by the base AsyncSequence the throttle does not resume its next iterator until the period has elapsed or unless a terminal event is encountered. Similarly to debounce, a custom clock can be specified.

seq.throttle(for: .seconds(1))

In this case, the throttle transforms a potentially fast asynchronous sequence of events into one that waits for a window of 1 second to elapse before emitting a value.

💡

Notice that debounce, waits for a window with no events, while throttle simply waits for a window.

Final notes

It's actually frankly entertaining to watch how Swift unfolds new features and how they are developed. Definitely check the project's GitHub mentioned in references to check out the module's source code.

If you feel not really confident with relatively new swift concurrency features, check out my quick guide to async/await in Swift.

Quick Guide to Async Await in Swift | Alex Dremov

Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.

Alex DremovAlex Dremov

References

GitHub - apple/swift-async-algorithms: Async Algorithms for Swift

Async Algorithms for Swift. Contribute to apple/swift-async-algorithms development by creating an account on GitHub.

GitHubapple

Treap: The Easiest Search Tree (Explained)

Alex Dremov — Mon, 25 Apr 2022 07:00:00 +0300

Cartesian tree or treap (binary search tree + binary heap) is a fast yet simple data structure. It conforms to a core search binary tree property and binary heap property at the same time. Despite its simplicity, treap self-balances, resulting in O(logn) complexity on average for all common operations.

Amazing, right?

💥

The algorithm uses random values. Therefore, O(logn) complexity is on average. However, with a lot of items O(logn) is almost always true. So, later in this article, I will use just O(logn) without "on average" addition.

Moreover, there is a modification (implicit treap, treap with implicit key) that lets you use treap as a usual array with O(logn) random insertions and random deletions. Isn't it cool? In this article I'll explain how to create one and provide the implementation in Swift. Also, I will compare treap to the general set from standard library. Let's start!

💡

In a binary search tree, for each node, all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater

Core algorithm

As I said earlier, treap combines heaps and binary search trees. Therefore, we are going to store at least two properties: key (or value) and priority. Key is a value for which tree is a search tree and for the priority, it is a binary heap.

💡

A binary heap is a binary tree where each node child's value is less than the node's value

Treap example

On the image above, you may notice that for every node, all child's priorities are less. On the other side, all children on the left have a key less than that in the node, and all children on the right have a larger key.

💡

It's also called a cartesian tree as it can be displayed on a regular 2D grid with (key, priority) coordinate for each node. Just like in the image above.

To create a fully-functioning search tree, we need to implement:

find
insert
remove

More exotic operations like lower bound and upper bound are also pretty simple and does not differ from those in the other search trees. And all these operations can be implemented using just two helper operations!

How to do that?

💥

Split

Splits the tree into two trees by given value. All values in the left tree are less than the value while in the right tree are greater. And both resulting trees are correct treaps.

We will use a special flag that decides whether to send values that are equal to the left tree or to the right tree.

Example of split function result. The equal value sent to the right

💥

Merge

Merges two treaps into one big treap.
Prerequisite: all items in the first tree are less than items in the right tree.

Merge example

So, if we implement these two methods, implementing all other three operations would be trivial.

Split

Let's start thinking about code at this stage. I'm going to explain this in C++. Rewriting the following code in Swift is actually really easy. Leave a comment bellow if you need a help.

template
struct Node {
	T key;
	size_t prior;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};

Structure of treap's node

For split, we have a head node and a key for which split needs to be done. This method is extremely simple using recursion.

Algorithm

Let the current head be p.

If p->key is less than the key, then we need to go right and split p->right further.

Also, splitting right will bring two trees as well, and the first one will have nodes with keys less than the key. Yet, they are greater than the p->key (as they are in the second tree of the first split).
So, we set p->right to the first tree of splitting right result.

Result: p, split right's second tree
If the p->key is greater or equal to the key, then we need to go left and split p->left further.

Similarly to the case above, we set p->left to the second tree of split left.

Result: split left's first tree, p

The algorithm above leaves a node that is equal to the split value in the second tree. Symmetrically, we will use the equalOnTheLeft flag to leave the node in the left tree.

So, the final code:

pair split (Node *p, const T& key,
				bool equalOnTheLeft=false) {
    if (!p) // reached leaf
    	return {nullptr, nullptr};
    if (p->key < key ||
    	(equalOnTheLeft && p->key == key)) { // splitting right
        auto q = split(p->right, key, equalOnTheLeft);
        
        // q.first has nodes of the right
        // subtree that are less than key
        p->right = q.first; 
        
        return {p, q.second};
    } else { // splitting left
        auto q = split(p->left, key, equalOnTheLeft);
        
        // q.second has nodes of the left 
        // subtree that are greater or equal
        // to the key
        p->left = q.second;
        
        return {q.first, p};
	}
}

💡

Priorities are not used and not changed during the split procedure. The resulting trees have the right order of priorities as the initial tree had it right

Subscribe and don't miss posts!

Merge

Merge is similar to split, but it uses priorities to do the work. As I mentioned before, there is a prerequisite: all items in the first merged tree must be less than items in the second tree. If this is not true, another algorithm must be used.

Algorithm

Similarly to split, merge is also recursive. Let us have two trees to merge: l and r.

We need to choose which tree will represent the new head. That's simple — the head must have the greatest priority, so we choose l or r based on that.

💡

Notice that the head node in l has the highest priority in the whole l tree as its a property of correct treap. The same applies to r.

If l has greater priority, then l->left subtree will remain intact as left subtree for sure less than r and it has nothing to do with it.

Then, l->right subtree must be merged with r and it's going to be the new l->right subtree.
If r has greater priority, then, similar to the example above, r->right will remain intact and r->left must be merged with l

Node* merge (Node *l, Node *r) {
    if (!l) // left is empty
    	return r;
    if (!r) // right is empty
    	return l;
        
    if (l->prior > r->prior) { // l has the new head.
        l->right = merge(l->right, r);
        return l;
    } else { // r has the new head.
        r->left = merge(l, r->left);
        return r;
    }
}

Why is it correct?

It seems like nothing stops us from breaking the search tree structure where all items' values in the left subtree are less than the node's value, and all items in the right subtree are greater.

💡

Prerequisite saves binary search tree property as items are never reordered and l < r property is always kept the same

Implementing search tree methods

You believed me that all methods are easy to implement through split and merge. Time to prove that.

Find

Find is implemented just like for the general search tree. We use the fact that keys in the left subtree are greater than the value in the node.

Node* find(Node* node, const T& key) {
	if (node == nullptr)
		return nullptr;
    if (node->key == key)
		return node;
    return find(key >= node->key ? node->right : node->left, key);
}

Insert

Let's think about insert in terms of split and merge. We have one big tree and we need to insert a new key.

Split the tree by key to new trees: first and second. Then, we will have two trees: the first (which has values lower than the key) and the second (which has values greater or equal to the key).

We can check that node already exists: try to find it in the right tree.

💥

Implementation requires that each item is met only once.

If you need to insert multiple copies of the same item, you can store an item and it's count to achieve that

Create a new node that will store the new key — newNode. Ta-da this node is a correct treap that has only one node.

For the new node, you need to set a random priority

💥

Random priorities are key to the complexity. This makes the cartesian tree balance itself, making O(logn) complexity for all operations

New head will be merge(first, merge(newNode, second))

See? It's that simple.

Insert example

Node* insert(Node* head, T key) {
    auto split = split(head, key);
    if (find(split.second, key) != nullptr) {
    	// Key exists already
        // Merge back
        return merge(split.first, split.second);
    }
    
    auto newNode = new Node(std::move(key), rand());
    return merge(split.first, merge(newNode, splitsplitted.second));
}

Remove

It's very similar to insert. However, that's where the equalOnTheLeft flag is used.

💡

Remember that the second tree produced by split contains items greater or equal to the selected key

Therefore, the second tree will contain the value that needs to be removed. But how to remove it from the tree?

Split again.

We can split the second tree by key, setting the equalOnTheLeft flag to true. Thus, the node will be separated from the second tree to the new tree.

💡

After conducting two splits and separating deleted node, unneded node is easely removed everything else is merged.

Remove example

Node *remove(Node *head, const T &key) {
    auto split = split(head, key);
    if (split.second) {
        auto secondSplit = split(split.second, key,
                                     /*equalOnTheLeft=*/true);
        // Key exists, so delete it and merge
        auto everythingElse = secondSplit.second;
        if (secondSplit.first == nullptr) {
            // There's no element equal to key. Merge back.
            return merge(split.first, everythingElse);
        }

        // We got node with key value in
        // secondSplit.first
        delete secondSplit.first;

        size--;
        return merge(split.first, everythingElse);
    }
    // Key is not presented. Merge back.
    return merge(split.first, split.second);
}

Full code

You can download C++ code of a little bit optimized Treap here:

Treap

C++ code allocations-optimised

treap.h

5 KB

Comparing to `std::set`

First of all, the implemented version of treap utilizes split and merge methods. Note that there is more efficient implementation that uses rotations. However, the true power of treap is in split and merge methods as other search trees can't do it easily.

Find tests

Find operation test

It's visible that asymptotics is similar. Though, treap always has the greater overhead. Still, it's a good result! We're competing with an utterly optimized standard library data structure.

Inserts

Insert operation test

Insertion has even bigger overhead. And it was expected: recursive calls of merge and split do not improve performance ;)

TreapProject

Comparisons tests. Outputs CSV of time measurments

TreapProject.zip

7 KB

Comparison conclusion

As you see, treap has higher nodes' height on average than that of very well-balanced AVL tree.

Yes, treap has worse performance than that of std::set. Yet, the results are comparable, and with a large data size, treap gets closer and closer to std::set which in fact is a red and black tree.

Believe me, you don't want to write your own RB tree. It's a nightmare.

Use cases and modifications

We developed this data structure not just to lose std::set. There are several useful applications.

Sum of numbers in the interval

We need to modify Node structure, adding sum field. It will store sum of all its children and itself.

template
struct Node {
	T key;
	size_t prior;
	long long sum;
	Node* left = nullptr, *right = nullptr;

	Node(T key, size_t prior) :
		key(std::move(key)),
		prior(prior) {
	}
};

It's extremely easy to update the sum. Every time childs are changed, sum = left->sum + right->sum. So, you can implement some kind of update function and call it in split and merge right before returning value. That's it.

How to answer on request?

We receive interval [l, r]. To calculate the sum of numbers on this interval, we can split the tree by l, then split the second tree of the result by r+1 (or by r, leaving equal elements on the left). In the end, we will have a tree containing all added numbers in the interval [l, r].

Complexity: O(logn) versus O(n) naive.

Using a hash of value in place of priority

You can use a hash of value as a priority as a good hash function is pretty random. What benefits does it bring?

If keys and priorities are fixed, then no matter how you construct the treap or add elements, it's always going to have the same structure.

💡

You may think about it this way: keys fix x axis and priorities fix y axis of treap

Therefore, you can compare two sets in O(n) as treaps containing the same values will have absolutely the same structure.

Implicit treap

What if we use the size of the left subtree as a key? Then, we can use this key as an index. Wow. That means that we can represent a regular ordered array as a treap!

By doing this, we can:

make insertions by random index
O(logn) versus O(n) naive
make deletions by random index
O(logn) versus O(n) naive

With great power, comes great responsibility.

😡

Access by random index downgrades to O(logn) versus O(1) in the standard array.

If your algorithm requires a lot of array modifications and very few accesses/outputs, then it's the right choice. Moreover, you can convert treap into an array and back with O(n) complexity.

💥

I have implicit treap implemented in Swift. It behaves just like the general array and implements a lot of optimisations. Check it out!

swift-collections/Sources/OrderedCollections/TreeArray at main · AlexRoar/swift-collections

Commonly used data structures for Swift. Contribute to AlexRoar/swift-collections development by creating an account on GitHub.

GitHubAlexRoar

Cut-paste problem

Imagine that you have a big string and you recieve requests to cut some part and to insert it somwhere.

This problem can be solved using treaps with implicit key. You can use splits to cut needed part and merge to insert it.

FAQ

Cartesian trees are most suitable for what?

Treap is useful when you need to collect some kind of characteristic on an interval (for example, sum) or apply some modification to the interval. Treap with implicit key is also useful when you need to apply a lot of random tree insertions/deletions with few accesses.

Why don't we use array indices as keys for an implicit treap?

Because in case of insertion we would need to recalculate all indeces that are higher than inserted index. Therefore, it downgrades complexity to O(n).

Is treap a randomized tree?

Yes, it is. But it can also use hash value in place of a random value.

I know about implementation without split and merge. It utilizes left and right turns. Is it better?

For example, GeeksforGeeks use such implementation, I know. But I believe that the true value of treap is in seampless splits and merges. You've already seen by examples how it is really usefull. Why implementing treap with turns when you can build AVL that's probably going to be faster?

Love data structures?

Check out my article on the amazing Skip List! While a lot of people never heard about it, Skip List is beautiful and can solve, for example, the problem of finding the n-th maximum or the rolling median problem in the most efficient way.

Skip List Indexation and kth Maximum | Alex Dremov

Skip List is a nice structure that lets you to perform insertions, searches, and finding n-th maximum. In this post I fokus on skip list indexation

Alex DremovAlex Dremov

Also, you can check the whole algorithms section of my blog

Alex Dremov | Algorithms

Those are hard! In this section I discuss algorithms that I encountered during work or my college assignments

Alex Dremov

References

Introduction To Algorithms

The first edition won the award for Best 1990 Professional and Scholarly Book in Computer Science and Data Processing by the Association of American Publishers.There are books on algorithms that are rigorous but incomplete and others that cover masses of material but lack rigor. Introduction to Algo…

Google Books

Декартово дерево - Алгоритмика

Алгоритмика

https://www.cs.cmu.edu/~scandal/papers/treaps-spaa98.pdf

Type Placeholders: New Swift 5.6 Feature

Alex Dremov — Thu, 21 Apr 2022 07:00:00 +0300

Type placeholders were recently introduced in Swift 5.6. And yes, they are a nice add-on to powerful Swift type inference system. If you are familiar with C++, you must know about an auto keyword. Type placeholders are almost the same.

Generics and type placeholder

let number: _ = 42 // Type placeholder
let anotherNumber = 42

Yes, Swift can infer variable's type, but type placeholders mean to be used for a type with multiple types in it. Generics. That's where they really shine.

Consider regular Result enum

enum Result where Failure : Error {
    case success(Success)
    case failure(Failure)
}

And what if we have some kind of complex object

var ohMy = [1: [3: (1, 2, 3, "That's a long tuple")]]

If you will try to create a Result from ohMy, you'll see compilation error.

let result = Result.success(ohMy)

😡

Generic parameter Failure could not be inferred

Bruh. So I need to write...

let result = Result<[Int : [Int : (Int, Int, Int, String)]], Error>.success(ohMy)

💡

Use type placeholders to omit type that Swift can infer

Thanks to type placeholders, no. Swift can infer object's type by itself. So, we need to provide Failure type only.

let result = Result<_, Error>.success(ohMy) // Nice

Subscribe and don't miss posts!

Collections and type placeholder

This feature also useful with collections. What if we need a dictionary with enum keys?

enum Foo {
	case bizz
	case bonk
}

let results = [
	.bizz: ohMy,
	.bonk: ohMy
]

😡

Reference to member bizz cannot be resolved without a contextual type

So, let's provide this contextual type, but you remember how ohMy's type is bad-looking? Let's use type placeholder.

// 🚫
let results:[Foo: [Int : [Int : (Int, Int, Int, String)]]] = [
	.bizz: ohMy,
	.bonk: ohMy
]

// ✅
let results:[Foo: _] = [
	.bizz: ohMy,
	.bonk: ohMy
]

More examples

Examples of types containing placeholders are:

Array<_> // array with placeholder element type
[Int: _] // dictionary with placeholder value type
(_) -> Int // function type accepting a single type placeholder argument and returning 'Int'
(_, Double) // tuple type of placeholder and 'Double'
_? // optional wrapping a type placeholder

Final notes

That's a great feature and broadens Swift’s type inference capabilities. For now, it's some kind of less-known, but I think it will be more used in the future.

You can check out other less-known Swift features in my previous post:

Top 7 Subtle Swift Features | Alex Dremov

Here, I collected Swift features that are less known and can be useful when you prepare for interviews or want to deepen your Swift knowledge.

Alex DremovAlex Dremov

References

swift-evolution/0315-placeholder-types.md at main · apple/swift-evolution

This maintains proposals for changes and user-visible enhancements to the Swift Programming Language. - swift-evolution/0315-placeholder-types.md at main · apple/swift-evolution

GitHubapple

Quick Guide to Async Await in Swift

Alex Dremov — Sat, 16 Apr 2022 11:50:00 +0300

How to create asynchronous functions, run code in parallel, who is MainActor, what is the closures pyramid and how to get rid of it? Let's start.

Straight to the point

Swift 5.5 introduced built-in support for writing asynchronous and parallel code in a structured way. Asynchronous code can be suspended and resumed later, although only one piece of the program executes at a time.

Keyword async is used to mark function as asynchronous. That's it.

func downloadNames(fromServer name: String) async -> [String] {
    ... // some other tasks
    return data
}

But what does it really mean?

💥

The async function can be suspended in the middle of the execution when it’s waiting for something.

Here's how async functions can be called

let namesMain = await downloadNames(fromServer: "main")
let secondary = await downloadNames(fromServer: "secondary")

When you type await, the current execution is suspended, until an asynchronous call is finished.

💡

Suspension is never implicit or preemptive — each such place is marked with the await keyword.

Where to call async functions

As I said before, await suspends current execution. But there must be a structure underneath that can be suspended. You can't suspend a raw thread or the main thread, for example.

You opened Playgrounds, right?

💡

Do not use Swift Playgrounds to test new concurrency features as they are not fully supported yet

If you try to call an async function in an inappropriate place, you will see this error

😡

async call in a function that does not support concurrency

That's because an asynchronous function can be called only in:

Code in the body of an asynchronous function, method, or property.
Code in the static main() method of a structure, class, or enumeration that’s marked with @main.
Code in an unstructured child task

That's a lot of words.

For most developers, only the first and the last points make sense. Most of the places in your code do not support await. How to deal with that?

Tasks and TaskGroup

Task

To call an asynchronous function in a place that does not support concurrency, you need to create a concurrent task. You can use Task and TaskGroup to achieve that.

Task {
    let names = await downloadNames(fromServer: "main")
    ... // futher work
    ... // take over the world (asynchronously)

}

When you create an instance of Task, you provide a closure that contains the work for that task to perform. Tasks can start running immediately after creation and may not. You can create a task in another Task or other concurrent environments.

let handle = Task { // Creates asynchronous task
	let names = await downloadNames(fromServer: "main")
    
	Task { // Creates asynchronous task
		await save(names: names)
	}
    
	for name in names {
		print(name)
	}
}

After creating a task, you use the instance to interact with it — for example, to wait for it to complete or to cancel it. Tasks run independently from their handles.

💡

To cancel a task, you can throw an error, return nil, or return partially completed work.Use Task.isCancelled to check if the current task was cancelled.

TaskGroup to group the tasks

TaskGroup lets you launch several tasks and wait for the completion of all of them. The order in which these tasks are completed is not defined.

How to create it?

TaskGroup is created through withTaskGroup(of:). You provide closure in which you spawn new tasks and perform operations on returned data.

let calculations = await withTaskGroup(of: Int.self) { group -> Int in
	group.addTask { 1 * 2 } // () -> Int
	group.addTask { 2 * 3 }
	group.addTask { 3 * 4 }
	group.addTask { 4 * 5 }
	group.addTask { 5 * 6 }

	var collected = [Int]()

	for await value in group {
		collected.append(value)
	}

	return collected
}

http://i.imgur.com/gyAFz.jpg

The group object inside closure conforms to AsyncSequence. It's just like a general sequence, but elements are generated asynchronously. To iterate over it you can use .next() method or for await ... in sequence.

It can be used to parallelize for loops, for example.

let calculations = await withTaskGroup(of: Int.self) {[works] group -> [Int] in
	for work in works {
		group.addTask { work() }
	}

	var collected = [Int]()

	for await value in group {
		collected.append(value)
	}

	return collected
}

That's great, but how to perform unrelated tasks concurrently without TaskGroup?

Async let, async get, concurrent execution

These features seem like a real power to me.

Imagine you need to load an article, and data stored on different services or URLs:

Article thumbnail
Article text
Related articles
Comments

And the most obvious way to load all data is to write such code

let thumbnail = await loadThumbnail(forPost: post)
let text = await loadArticleText(forPost: post)
let related = await loadRelatedArticles(forPost: post)
let comments = await loadComments(forPost: post)

And this is mighty concurrent code that will load needed information the fastest way. Right? Not really.

Code is still executed serially and assets are not loaded in parallel. Each step waits until data is loaded. You can spawn a task for every step, sure. But is it really a nice solution?

Top 7 Subtle Swift Features

Alex Dremov — Sat, 09 Apr 2022 13:41:05 +0300

1. Keyword `indirect`

It’s used with enums only. As you know, enums are value type and stored on the stack. Therefore, the compiler needs to know how much memory each enum takes. As only one option is possible at any moment, the enum occupies the memory of the largest case plus some operational information.

// Just a general enum, nothing fancy
enum Foo {
    case bizz(String)
    case fizz(Int)
}

But what if we make enum dependant on itself?

// Infinite size??
enum Foo {
    case bizz(Foo)
    case fizz
}

This definition generates a compiler error.

😡

Recursive enum Foo is not marked indirect

The error makes sense: the compiler can’t calculate Foo size as it tends to infinity. Here comes the indirect keyword.

// Oh, fine
enum Foo {
    indirect case bizz(Foo)
    case fizz
}

Simple: it modifies the enum memory structure to solve the recursion problem. Detailed: .bizz(Foo) is no longer stored inline in memory. Actually, with the indirect modifier data is now stored behind a pointer (indirectly).

Problem solved! Also, we can modify the whole enum as indirect

// Every case is indirect now
indirect enum Foo {
    case bizz(Foo?)
    case fizz(Foo?)
}

2. Attribute `@autoclosure`

Swift’s @autoclosure attribute enables you to define an argument that automatically gets wrapped in a closure. It’s mostly used to defer the execution of an expression to when it’s actually needed.

func calculate(_ expression: @autoclosure () -> Int,
               zero: Bool) -> Int {
    guard !zero else {
        return 0
    }

    return expression()
}

Then, calculate can be called like this:

calculate(1 + 2, zero: false) // 3

calculate([Int](repeating: 5, count: 10000000).reduce(0, +),
                zero: false) // 50000000

calculate([Int](repeating: 5, count: 1000).reduce(0, +),
                zero: true) // 0

So, in this case, when zero: true, the call of calculate does not calculate the expression at all, improving code performance.

Subscribe and don't miss posts!

3. Lazy

A lazy stored property is a property whose initial value isn’t calculated until the first time it’s used. Lazy properties must always be declared as a variable. Note that if you use lazy in struct, then the function that uses it must be marked as mutating.

class Foo {
    lazy var bonk = DBConnection()
    
    func send() {
        bonk.sendMessage()
    }
}

We already covered @autoclosure which also can help to defer expression evaluation. That can be used with lazy! Consider this common case of dependency injection.

class Foo {
    let bonkProvider: () -> DBConnection
    lazy var bonk: DBConnection = bonkProvider()
    
    init(_ expression: @escaping @autoclosure () -> DBConnection) {
        self.bonkProvider = expression
    }
    
    func send() {
    	// Here bonkProvider() is called
        // only for the first call of send()
        bonk.sendMessage()
    }
}

4. Enums as namespaces

Swift does not have namespaces, which may be a problem in big projects. This is easily solved with enums.

enum API {}

extension API {
    static let token = "…"

    struct CatsCounter {
        …
    }
}

let a = API.CatsCounter()
print(API.token)

5. Dynamic member lookup

This section describes the @dynamicMemberLookup attribute. It can be used with structs and classes.

Just adding @dynamicMemberLookup to the definition generates an error

😡

@dynamicMemberLookup attribute requires Foo to have a subscript(dynamicMember:) method that accepts either ExpressibleByStringLiteral or a key path

Therefore, such subscript needs to be defined

@dynamicMemberLookup
class Foo {
    subscript(dynamicMember string: String) -> String {
        return string
    }
}

let a = Foo()
print(a.helloWorld)

In subscript you can implement much more complex logic to retrieve data. But you can see how this implementation is limited to strings only and not really safe. This can be modified with a key path.

class Bob {
    let age = 22
    let name = "Bob"
}

@dynamicMemberLookup
class Foo {
    let himself = Bob()
    
    subscript(dynamicMember keyPath: KeyPath) -> T {
        return himself[keyPath: keyPath]
    }
}

let a = Foo()
print(a.age)

Even though you know about this feature does not mean that it should be used everywhere. It’s up to you what is more readable and expressive: a.himself.age or a.age.

6. Dynamically callable

Also, a compiler feature that allows you to call objects. Can be applied to struct, enum, and class.

After adding the attribute, the error is generated:

😡

@dynamicCallable attribute requires RangeGenerator to have either a valid dynamicallyCall(withArguments:) method or dynamicallyCall(withKeywordArguments:) method

The method signature is similar to that of @dynamicMemberLookup.

@dynamicCallable
struct RangeGenerator {
    var range: Range
    
    func dynamicallyCall(withKeywordArguments args: KeyValuePairs) -> [Int] {
        if args.count > 1 || args.first?.key != "count" {
            fatalError("Unknown arguments \(args)")
        }
        let count = args.first!.value
        return (0..

`7. Inlining`

Sometimes you want to give additional information about optimisations the compiler can use. Inlining code is one of the most important optimization features. So, how to use ‌@inlinable, @inline(__always), @usableFromInline?

The @inlinable attribute exports the body of a function as part of a module's interface, making it available to the optimizer when referenced from other modules.

As a result, @inlinable makes the implementation of the method public and able to be inlined into the caller. Secondly, it forces you to make everything it calls @usableFromInline.

@inline(__always) tells the compiler to ignore inlining heuristics and always (almost) inline the function.

A function that is @inline(__always), but not @inlinable, will not be available for inlining outside its module, because the function's code is not available.

💥

@inline(__always) can be beneficial for performance, but it can also have catastrophic effects on macro performance due to code size increase.

struct Foo {
    @inlinable
    @inline(__always)
    func simpleComputation(_ a: Int, _ b: Int) -> Int {
        duplicate(a) + duplicate(b)
    }
    
    @usableFromInline
    func duplicate(_ c: Int) -> Int {
        c * 2
    }
    
    func general() {
        print("Hello world")
    }
}

This has more effects on implementation, check the discussion on this forum if you want to understand this in-depth

`References`



 Yandex iOS Interview Notes 
Alex Dremov — Thu, 07 Apr 2022 19:18:56 +0300
 Swift
Syntax
Basic structures
Arrays, collections
Closures
Passing parameters to closures
Errors — find, describe outcomes, solve 
Check information about @autoclosure, indirect, lazy, and other keywords. Try to Google tests on Swift knowledge and check yourself.
Alex Dremov | Top 7 Subtle Swift Features
Here, I collected Swift features that are less known and can be useful when you prepare for interviews or want to deepen your Swift knowledge.
Alex DremovAlex Dremov
Protocols conformance knowledge was usefull. Note which protocols are automatically generated for enums, structures; when do they break?
Understending of ARC and how Swift memory management works is required. Is closure value type or reference type? How weak modifier works under the hood?
Platform
Basic overview. There should be no magic on how the app launches
App life cycle
#import 
#import "AppDelegate.h"

int main(int argc, char * argv[]) {
	@autoreleasepool {
    	return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
    }
} 



State
Description




Not running
The app is not running.


Inactive
The app is running in the foreground, but not receiving events. An iOS app can be placed into an inactive state, for example, when a call or SMS message is received.


Active
The app is running in the foreground, and receiving events. Transitional state.


Background
The app is running in the background, and executing code.


Suspended
The app is in the background, but no code is being executed. App is still in memory






Method
Description




application:willFinishLaunchingWithOptions
This method is your app’s first chance to execute code at launch time.


application:didFinishLaunchingWithOptions
This method allows you to perform any final initialization before your app is displayed to the user.


applicationDidBecomeActive
The app has entered the foreground app. Use this method for any last minute preparation.


applicationWillResignActive
The app is transitioning away from being the foreground app.


applicationDidEnterBackground
The app runs in the background and may be suspended at any time.


applicationWillEnterForeground
The app moves out of the background and back into the foreground, but that it is not yet active.


applicationWillTerminate
The app is being terminated. This method is not called if your app is suspended.



👀
Every iOS application needs at least one window—an instance of the UIWindow class—and some may include more than one window
UIKit
I almost failed on that one
UIViewController
viewDidLoad() 
Called after init(coder:) when the view is loaded into memory
Note that just because the view has been loaded into memory doesn’t necessarily mean that it’s going to be displayed soon – for that, you’ll want to look at viewWillAppear
viewWillAppear(_:) 
Always called after viewDidLoad (for obvious reasons, if you think about it), and just before the view appears on the screen to the user, viewWillAppear is called
viewWillDisappear(_:) 
Similar to viewWillAppear, this method is called just before the view disappears from the screen

loadView()
This event creates the view that the controller manages. It is only called when the view controller is created programmatically. You can override this method in order to create your views manually. If you are working with storyboards or nib files, then you do not have to anything with this method and you can ignore it.
loadViewIfNeeded()
Loads the view controller’s view if it has not already been set. available from iOS >=9.0
viewDidLoad()
Called after the view has been loaded. For view controllers created in code, this is after -loadView. For view controllers unarchived from a nib, this is after the view is set. Use this method to initialize setup of the interface
viewWillAppear(_ animated: Bool)
This method will get called every time the view is about to appear, whether or not the view is already in memory.
viewWillLayoutSubviews()
Called just before the view controller’s view’s layoutSubviews method is invoked. This is the first step in the lifecycle where the bounds are finalised. If you are not using constraints or Auto Layout you probably want to update the subviews here.
viewDidLayoutSubviews()
Called just after the view controller’s view’s layoutSubviews method is invoked. This event notifies the view controller that the subviews have been setup.
viewDidAppear(_ animated: Bool)
Called when the view has been fully transitioned onto the screen.
viewWillDisappear(_ animated: Bool)
This method will get called when the view controller’s view is about to be removed from the view hierarchy.
viewDidDisappear(_ animated: Bool)
This method will get called when the view controller’s view was removed from the view hierarchy.

    Subscribe and don't miss posts!
UIView
View controller has one root view, which is a UIView instance
UIViewController handles all the magic behind UIView while UIView just represents the screen and some content to the user. UIViewController tells the root UIView object when to come to the screen. First, the view controller creates its root view and loads it. After loading, it tells the view to appear on the screen and disappear when necessary.
didAddSubview(_:)
willRemoveSubview(_:)
willMove(toSuperView:)
didMoveToSuperview()
willMove(toWindow:)
didMoveToWindow()
UIKit basics
UIResponder
An abstract interface for responding to and handling events.
https://developer.apple.com/documentation/uikit/uiresponder
In UIKit, all view layers have options for shadow opacity, radius, offset, color, and path
Layout Principles
There are actually three phases associated with the layout and drawing of the views.
The first is update of constraints (constraint pass), which happens bottom up.
The second is layout of views and subviews (layout pass), which happens top down and is dependent on constraint settings.
The third phase is the display pass, where the views get redrawn based on the layout pass.
Auto Layout dynamically calculates the size and position of all the views in your view hierarchy, based on constraints placed on those views
Examples
Old
let views = [
	"icon": iconView,
    "titleLabel": titleLabelView,
    "postDate": postDateView
]

NSLayoutConstraint.constraints(
	withVisualFormat: "H:|-[icon(==postDate)]-20-[titleLabel(120@250)]-20@750-[postDate(>=50)]-|",
    metrics: nil,
    views: views
 )
Anchors
extension UIView {
    open var leadingAnchor: NSLayoutXAxisAnchor { get }
    open var trailingAnchor: NSLayoutXAxisAnchor { get }
    open var leftAnchor: NSLayoutXAxisAnchor { get }
    open var rightAnchor: NSLayoutXAxisAnchor { get }
    open var topAnchor: NSLayoutYAxisAnchor { get }
    open var bottomAnchor: NSLayoutYAxisAnchor { get }
    open var widthAnchor: NSLayoutDimension { get }
    open var heightAnchor: NSLayoutDimension { get }
    open var centerXAnchor: NSLayoutXAxisAnchor { get }
    open var centerYAnchor: NSLayoutYAxisAnchor { get }
    open var firstBaselineAnchor: NSLayoutYAxisAnchor { get }
    open var lastBaselineAnchor: NSLayoutYAxisAnchor { get }
}

let constraints = [
	view.centerXAnchor.constraint(equalTo: superview.centerXAnchor),
	view.centerYAnchor.constraint(equalTo: superview.centerYAnchor),
	view.widthAnchor.constraint(equalToConstant: 100),
	view.heightAnchor.constraint(equalTo: view.widthAnchor)
]
NSLayoutConstraint.activate(constraints)
Concurrency
In the core
GCD, OperationQueue
OperationQueue internally uses Grand Central Dispatch and on iOS.
OperationQueue gives you a lot more control over how your operations are executed. You can define dependencies between individual operations for example, which isn't possible with plain GCD queues. It is also possible to cancel operations that have been enqueued in an OperationQueue (as far as the operations support it). When you enqueue a block in a GCD dispatch queue, it will definitely be executed at some point.
To sum it up, OperationQueue can be more suitable for long-running operations that may need to be cancelled or have complex dependencies. GCD dispatch queues are better for short tasks that should have minimum performance and memory overhead.
I already had in-depth understanding of how GCD works. Yf you don't feel confident in this area, defenitely check some topics on it. That's really crucial part of iOS development
Priority
userInteractive
We use this for UI updates, event handling and small workloads that require low latency
userInitiated
The user initiates these asynchronous tasks from the UI. We can use this when the user is waiting for results and for tasks which are required to continue user interaction. This run in the high priority global queue.
utility
This represents long-running tasks such as a progress indicator which is visible during computations, networking, continuous data fetching, etc. This run in the low priority global queue.
background
This represents tasks that users are not aware of such as prefetching, maintenance, and other tasks that don’t require user interaction and aren’t time-sensitive. As I mentioned, this has the lowest priority.
Networking
let url = URL(string: "http://www.stackoverflow.com")!

let task = URLSession.shared.dataTask(with: url) {(data, response, error) in 
	guard let data = data else { return }
	print(String(data: data, encoding: .utf8)!)
    // Global queue
}
task.resume()
let url = URL(string: "https://bit.ly/3sspdFO")!

let session = URLSession.shared var request = URLRequest(url: url) 
request.setValue("application/json", forHTTPHeaderField: "Content-Type") 
request.httpMethod = "GET"

session.dataTask(with: request) { (data, response, error) in 
	print("Done")
    // Global queue
}.resume()
Miscellaneous
SOLID
Single Responsibility Principle
Open/Closed Principle
Liskov Substitution Principle
Interface Segregation
Dependency Inversion
MVC
Model View Controller
Drawback: not every element can fit into these three groups
MVVM
Model View ViewModel
References
https://bobthedev.gitbooks.io/the-uikit-fundamentals-with-bob/content/course/ios-ecosystem/app-life-cycle.html
https://medium.com/@vipandey54/uiviewcontroller-lifecycle-7ca2d36f4f07
https://candost.blog/view-lifecycle-in-ios/
https://appleeducation.instructure.com/courses/144/pages/5e7725f07b96ab8f78000a5bbdd54991-dot-readme?module_item_id=2252 


 17 Rejection Letters 
Alex Dremov — Sat, 21 Aug 2021 12:27:03 +0300
 I applied to American colleges. That was hard, fun, and freaking expensive. Mostly, such kind of posts are focused on acceptance letters and WRITING IN CAPS HOW YOU ARE HAPPY!!¡ That’s not my case.
As you can infer from the title, I got 17 rejects. Is it because of the COVID or just my application wasn’t solid enough — I don’t know.
However, these rejections were not completely useless. Only now do I understand that they made an exceptional impact on my life priorities, self-perception, and refusal handling. Therefore, I decided to get them printed. Unfortunately, as one year passed, they were no longer available.
So, I mailed 17 American colleges, requesting a copy of my rejection decision.
And received quite a lot of rejections to my rejection requests. Well, at least does not hurt this time. Yet, some replies were really sweet and sent a copy that I needed.
“Why do I see this post?”
For three reasons.
Firstly, if you found yourself in a similar kind of situation, it’s always good to know that you are not the only one.
Secondly, in the general case, this story is my proof that at any outcome, “It is going to be okay” – my favorite quote.
Finally, this post is for those who know me but hesitated to ask “Alex, what’s up with your college application?”
Hope you picked up something from all these words
Here’s the pride wall of my rejections.
As you see, I wasn’t able to collect them all. Still, it’s something. Colleges that have not provided rejection letters (but I still remember them)
Yale
Harvard
GeorgiaTech
Swarthmore
UPenn
Harvey Mudd College
Cornell
MIT
Princeton
and several other
Actually, accepted
Still, there were two pleasing letters. But due to COVID and low financial aid, I declined these offers. 
 


 Note-taking apps 
Alex Dremov — Wed, 18 Aug 2021 19:51:00 +0300
 You know that feeling in the start of a college or a school year? You say to yourself “I’ll be as productive as possible” and you feel like you can climb a mountain.
At least, this was my case. The first thing that I wanted to determine is a note-taking app. I wanted to have a written outline of every lecture organized in the best possible manner. So, I started to search for beautiful, powerful, and optimized for my developer-oriented mind apps for my college workflow.
Evernote, Apple Notes, OneNote, Joplin, Notable
… and many other conventional note-taking apps. The biggest no-no for me was the inability to organize content efficiently. The best option was to work with folders and tags, but it gets messy really quick. Who uses tags? That is due to the fact that these apps were not developed to design some kind of knowledge database but rather for quick note-taking. Also, they don’t suffice my developer needs for code embeddings and markdown. Speaking about OneNote, it’s just ugly and over-complicated.
Links
Evernote
OneNote
Joplin
Notable
Notion, Boost Note
These are good! Even though they have hierarchical structuring, it’s supplemented with emoji icons and title pages. These additions help to navigate through data quicker. Notion’s workspaces and page linking helps to structure data efficiently. So, what’s wrong? Online service only. These apps are web-based apps and having a lagging app on some kind of fast-going lecture is not what I am looking for.
Links
Notion
Boost Note
IA Writer
My all-time best app for writing. Minimalistic tool with markdown support. Simply said, best for writing. However, not really suitable for structuring data and poor on linking, image embedding, and code highlighting.
Links
IA Writer
Logseq
Weird at the first glance, genius if you dive deeply. Graph-based organization system bemuses at first. “What do you mean there is no folders?” But then you realize that folders or hierarchical structuring is logical but not natural. When you write some content, new concepts flow not in hierarchical order but rather like connections or links. In Logseq, pages are created as they are needed. The whole workspace is graph-organized. Moreover, it supports lots of block types and this satisfies my developer’s needs with overhead. Thus, I started considering this app as a primary one for use.
UPD: after almost a year use of logseq, my knowledge base looks like that
And it’s cool, but I have to say that this graph view is really of low use, unfortunately. Or I just have not used the app extensively enough.
Links
Logseq
Athens
Looks like Logseq and has a very similar functionality. However, I found Athens more pleasant-looking and less complicated. Here it is. Minimalistic app with beautiful design and striking structuring system. This is my top-1 of all note-taking apps that I was reviewing for a couple of days.
However, the project is brand new and has some bugs, so maybe I will be using Logseq for reliablity.
Links
Athens 


 The Mystery of Mach-O Object Structure 
Alex Dremov — Thu, 29 Apr 2021 22:59:09 +0300
 During the development of the final project for “the assembly language and low-level architecture” MIPT freshman course, we were developing a compilable programming language. I wanted to make it compilable to the standard object file but encountered the mystery of almost no information about its structure. What’s more important, there were little to no examples on this topic. In this article, I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure.
General Structure
Mach-O file can be divided into three main parts:
Header
Load commands
Data
The header contains general information and identifies the file as a Mach-O file. The header also contains other basic file type information, indicates the target architecture, and contains flags specifying options that affect the interpretation of the rest of the file.
Directly after the header is series of variable-size load commands that specify the layout and linkage characteristics of the file. This is the core that defines the file characteristics.
Following the load commands, all Mach-O files contain segment data. Each segment has zero or more sections. Each segment defines a region of virtual memory that the dynamic linker maps into the address space of the process. Apart from segment data, other data also can be placed here. For example, symbol table, relocations, etc.
Object-specific structure
As this article focuses on object files, I will not go into details about general executable files. Even though their format is the same, load commands and data differ.
To make a workable object file, we need to define these elements. I ordered them in the order they will be placed in the file.
Header
Load commands
Segment (__TEXT)
Text section (__text)
Data section (__data)
Symbols table (SYMTAB)
Dynamic symbols table (DYSYMTAB)
Data
Text section data
Data section data
Relocations
Symbol table data
String table
Header
Header is defined by this structure:
struct mach_header_64 {
    uint32_t       magic;      /* mach magic number identifier */
    cpu_type_t     cputype;    /* cpu specifier */
    cpu_subtype_t  cpusubtype; /* machine specifier */
    uint32_t       filetype;   /* type of file */
    uint32_t       ncmds;      /* number of load commands */
    uint32_t       sizeofcmds; /* the size of all the load commands */
    uint32_t       flags;      /* flags */
    uint32_t       reserved;   /* reserved */
};
magic – it’s exactly what the name says. It simply contains the magic number that helps to identify the file as Mach-O. It holds MH_MAGIC_64 (0xfeedfacf) constant.
cputype, cpusubtype – defines CPU information. For most cases, CPU_TYPE_X86_64 and CPU_SUBTYPE_X86_64_ALL can be used.
filetype – as Mach-O file can be used for multiple purposes, it is needed to know the file type. As we build an object file, MH_OBJECT must be used.
ncmds – number of load commands followed by the header.
sizeofcmds – the size of load commands (in bytes).
flags – special flags, can be found here. For the object file, we will be using MH_SUBSECTIONS_VIA_SYMBOLS which means that the sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes.
MH_NOUNDEFS — The object file contained no undefined references when it was built.
MH_INCRLINK — The object file is the output of an incremental link against a base file and cannot be linked again.
MH_DYLDLINK — The file is input for the dynamic linker and cannot be statically linked again.
MH_TWOLEVEL — The image is using two-level namespace bindings.
MH_BINDATLOAD — The dynamic linker should bind the undefined references when the file is loaded.
MH_PREBOUND — The file’s undefined references are prebound.
MH_PREBINDABLE — This file is not prebound but can have its prebinding redone. Used only when MH_PREBEOUND is not set.
MH_NOFIXPREBINDING — The dynamic linker doesn’t notify the prebinding agent about this executable.
MH_ALLMODSBOUND — Indicates that this binary binds to all two-level namespace modules of its dependent libraries. Used only when MH_PREBINDABLE and MH_TWOLEVEL are set.
MH_CANONICAL — This file has been canonicalized by unprebinding—clearing prebinding information from the file. See the redo_prebinding man page for details.
MH_SPLIT_SEGS — The file has its read-only and read-write segments split.
MH_FORCE_FLAT — The executable is forcing all images to use flat namespace bindings.
MH_SUBSECTIONS_VIA_SYMBOLS — The sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes. See “Linking” for details.
MH_NOMULTIDEFS — This umbrella guarantees there are no multiple definitions of symbols in its subimages. As a result, the two-level namespace hints can always be used.
reserved – reserved bytes, not used.
Summing up, here is the code for initializing header for object file.
“To be modified” means that it is not possible to determine the value before constructing the file. Therefore, it will be changed afterwards.
mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;
Load commands
The load command structures are located directly after the header of the object file, and they specify both the logical structure of the file and the layout of the file in virtual memory.
For an object file, several load commands are needed: segment section, symtab, dysymtab. Every load command has two the same fields in the beginning: uint32_t cmd and uint32_t cmdsize, but the following content differs.
segment_command_64
Specifies the range of bytes in a 64-bit Mach-O file that make up a segment. Those bytes are mapped by the loader into the address space of a program. Segment structure is:
struct segment_command_64 {  /* for 64-bit architectures */
   uint32_t   cmd;           /* LC_SEGMENT_64 */
   uint32_t   cmdsize;       /* includes sizeof section_64 structs */
   char       segname[16];   /* segment name */
   uint64_t   vmaddr;        /* memory address of this segment */
   uint64_t   vmsize;        /* memory size of this segment */
   uint64_t   fileoff;       /* file offset of this segment */
   uint64_t   filesize;      /* amount to map from the file */
   vm_prot_t  maxprot;       /* maximum VM protection */
   vm_prot_t  initprot;      /* initial VM protection */
   uint32_t   nsects;        /* number of sections in segment */
   uint32_t   flags;         /* flags */
};
segname – the name of the segment. There are no requirements, but it is common to start the name with a double underline (__) and use uppercase. For example, SEG_TEXT (“__TEXT”), SEG_DATA (“__DATA”).
vmaddr – the start of this segment in virtual memory.
vmsize – the size of this segment in memory. For executables, this value must be divisible by page. In object files, this is not needed as this requirement is fulfilled on the linking stage.
fileoff – offset of this segment in the file. This offset points to some areas after load commands. The image below helps
filesize – the amount of file from fileoff to be mapped.
maxprot – maximum virtual memory protection. For TEXT segment, usually, VM_PROT_READ | VM_PROT_EXECUTE | VM_PROT_WRITE .
initprot – memory protection during initialization.
nsect – number of sections directly followed by this segment.
flags – can be found here. For the object file, no flags are needed.
section_64
Segment load command is directly followed by sections defined in it.
struct section_64 {          /* for 64-bit architectures */
   char       sectname[16];  /* name of this section */
   char       segname[16];   /* segment this section goes in */
   uint64_t   addr;          /* memory address of this section */
   uint64_t   size;          /* size in bytes of this section */
   uint32_t   offset;        /* file offset of this section */
   uint32_t   align;         /* section alignment (power of 2) */
   uint32_t   reloff;        /* file offset of relocation entries */
   uint32_t   nreloc;        /* number of relocation entries */
   uint32_t   flags;         /* flags (section type and attributes)*/
   uint32_t   reserved1;     /* reserved (for offset or index) */
   uint32_t   reserved2;     /* reserved (for count or sizeof) */
   uint32_t   reserved3;     /* reserved */
};
sectname – the name of the section. There are no requirements, but it is common to start the name with a double underline (__) and use lowercase. For example, SECT_TEXT (“__text”), SECT_DATA (“__data”).
segname – the name of the segment this section goes in.
addr – memory address of this section. For example, if segment vaddress is 0x10000, then first section address is also 0x10000.
size – the size in bytes of this section in the file.
offset – the offset of the file section from the start of the file.
align – alignment of the section as a power of 2. For example, 1 means 2 bytes alignment, 2 means 4 bytes alignment. Specifies the alignment of the section in memory.
reloff – the offset of relocations array from the file beginning.
nreloc – number of relocations.
flags – specify information about data contained in the section. For example, for code S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS. For the data section, S_REGULAR.
reserved1, reserved2, reserved3 – unused in our case.
Segment load command and sections are the most important part of the file. Object file has only one segment and one or several sections.
Now, we can define a segment and sections associated with it.
__TEXT segment – the only segment in the object file
segment_command_64 segment = {};
/*
 * Usually, as there is only one segment in the object file,
 * placing name is omitted. 
 * strcpy(segment.segname, SEG_TEXT);
 */
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + 2 * sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 2; /* code and data sections */
__text section
section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  <- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname <- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;
__data section
section_64 sectionData     = {};
strcpy(sectionData.segname,  SEG_DATA ); /* segname  <- __DATA */
strcpy(sectionData.sectname, SECT_DATA); /* sectname <- __data */
sectionData.addr           = 0;          /* = sectionText.size */
sectionData.size           = 0;          /* to be modified */
sectionData.offset         = 0;          /* = sectionText.offset */
                                         /*   + sectionText.size */
sectionData.align          = 1;          /* 2^1 code alignment */
sectionData.reloff         = 0;          /* no relocations in data section */
sectionData.nreloc         = 0;          
sectionData.flags          = S_REGULAR;
At this point, simple object file structure is almost ready, but SYMTAB and DYSYMTAB load commands are steel needed to be defined even if there is no relocations at all.
Symtab
Describes the size and location of the symbol table data structures. Its structure is:
struct symtab_command {
   uint32_t   cmd;       /* LC_SYMTAB */
   uint32_t   cmdsize;   /* sizeof(struct symtab_command) */
   uint32_t   symoff;    /* symbol table offset */
   uint32_t   nsyms;     /* number of symbol table entries */
   uint32_t   stroff;    /* string table offset */
   uint32_t   strsize;   /* string table size in bytes */
};
symoff – offset to the symbol table – located after load commands somewhere further in the file.
nsyms – number of symbols in symbols table.
stroff – string table offset.
strsize – the size of the string table in bytes.
The most straightforward description so far. It is convenient to describe a symbol table and string table here.
String table
The string table is the most straightforward structure of all listed here. It is simply strings separated by zeros.
Symbol table
Symbol table consists of equally sized entries. They must be grouped by their type – local symbols (further grouped by the module they are from), defined external symbols (further grouped by the module they are from), and undefined symbols. The order of groups is not important.
struct nlist_64 {
    union {
        uint32_t  n_strx;  /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see  */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
n_strx – index of the string in the string table. For example, the index of “_print” in the string table above is 1. The index of _giveYouUp0 is 8; it is the position of the first letter from the start of the string table.
n_type – a type of symbol. Defines the meaning of the symbol. There are essential values:
N_TYPE (0x0e) – These bits define the type of the symbol.
N_UNDF (0x0) – The symbol is undefined. Undefined symbols are symbols referenced in this module but defined in a different module. The n_sect field is set to NO_SECT.
N_ABS (0x2) – The symbol is absolute. The linker does not change the value of an absolute symbol. The n_sect field is set to NO_SECT.
N_SECT (0xe) – The symbol is defined in the section number given in n_sect.
N_PBUD (0xc) – The symbol is undefined and the image is using a prebound value for the symbol. The n_sect field is set to NO_SECT.
N_INDR ( 0xa) – The symbol is defined to be the same as another symbol. The n_value field is an index into the string table specifying the name of the other symbol. When that symbol is linked, both this and the other symbol have the same defined type and value.
N_EXT  (0x01) – If this bit is on, this symbol is external, a symbol that is either defined outside this file or that is defined in this file but can be referenced by other files.
N_STAB (0xe0) – If any of these 3 bits are set, the symbol is a symbolic debugging table (stab) entry. In that case, the entire n_type field is interpreted as a stabvalue.
n_sect – an integer specifying the number of the section that this symbol can be found in, or NO_SECT if the symbol is not to be found in any section.
n_desc – provides additional information about the nature of this symbol for non-stab symbols (not N_STAB). The reference flags can be accessed using the REFERENCE_TYPE mask (0xF). Usually, REFERENCE_FLAG_UNDEFINED_NON_LAZY used for external symbols. If the symbol is defined in the section (N_SECT), use REFERENCE_FLAG_DEFINED + N_EXT if you want to make it available from other files or REFERENCE_FLAG_PRIVATE_DEFINED without specifying N_EXT if not. The most used values are:
REFERENCE_FLAG_UNDEFINED_NON_LAZY (0x0)—This symbol is a reference to an external non-lazy (data) symbol.
REFERENCE_FLAG_UNDEFINED_LAZY (0x1)—This symbol is a reference to an external lazy symbol—that is, to a function call.
REFERENCE_FLAG_DEFINED (0x2)—This symbol is defined in this module.
REFERENCE_FLAG_PRIVATE_DEFINED (0x3)—This symbol is defined in this module and is visible only to modules within this shared library.
REFERENCE_FLAG_PRIVATE_UNDEFINED_NON_LAZY (0x4)—This symbol is defined in another module in this file, is a non-lazy (data) symbol, and is visible only to modules within this shared library.
REFERENCE_FLAG_PRIVATE_UNDEFINED_LAZY (0x5)—This symbol is defined in another module in this file, is a lazy (function) symbol, and is visible only to modules within this shared library.
n_value – information about this symbol. The format of this value is different for each type of symbol table entry (as specified by the n_type field). For the N_SECT symbol type, n_value is the address of the symbol – offset from the start of the segment. For N_UNDF | N_EXT it is not used.
This structure is one of the hardest to understand and use. Therefore, there are examples. Notice that symbols are grouped. It will be used later in DYSYMTAB.
On the image above, there are four symbols in total. Two of them are locally defined, two of them undefined in the current file. There are descriptions of two of these symbols:
#0th symbol
n_strx = 34 – index of naming’s first symbol in the string table.
n_type = N_SECT | N_EXT – symbol defined in some section of the current file and available externally.
n_sect = 1 – symbol defined in the first (counting from 1) section.
n_desc = REFERENCE_FLAG_DEFINED – symbol defined in the file. This information is redundant as it is already known from N_SECT.
value = 0 – symbol definition locates at the very beginning of the segment (zero offset).
#2nd symbol
n_strx = 1 – index of naming’s first symbol in the string table.
n_type = N_UNDF | N_EXT – symbol is not defined in the current file, must be defined externally.
n_sect = NO_SECT – no associated section.
n_desc = REFERENCE_FLAG_UNDEFINED_NON_LAZY – this symbol is a reference to an external non-lazy (data) symbol.
value = 0 – unused.
These two symbols can be constructed like this:
nlist_64 symbols[2] = {
    {34, N_SECT  | N_EXT, 1      , REFERENCE_FLAG_DEFINED           , 0},
    {1 , N_UNDF | N_EXT, NO_SECT, REFERENCE_FLAG_UNDEFINED_NON_LAZY, 0}
};
Dysymtab
It describes the sizes and locations of the parts of the symbol table used for dynamic linking. As I already noticed, symtab entries must be grouped by their type. Here, this requirment is used.
struct dysymtab_command {
    uint32_t cmd;            /* LC_DYSYMTAB */
    uint32_t cmdsize;        /* sizeof(struct dysymtab_command) */
    uint32_t ilocalsym;      /* index to local symbols */
    uint32_t nlocalsym;      /* number of local symbols */

    uint32_t iextdefsym;     /* index to externally defined symbols */
    uint32_t nextdefsym;     /* number of externally defined symbols */

    uint32_t iundefsym;      /* index to undefined symbols */
    uint32_t nundefsym;      /* number of undefined symbols */

    uint32_t tocoff;         /* file offset to table of contents */
    uint32_t ntoc;           /* number of entries in table of contents */

    uint32_t modtaboff;      /* file offset to module table */
    uint32_t nmodtab;        /* number of module table entries */


    uint32_t extrefsymoff;   /* offset to referenced symbol table */
    uint32_t nextrefsyms;    /* number of referenced symbol table entries */


    uint32_t indirectsymoff; /* file offset to the indirect symbol table */
    uint32_t nindirectsyms;  /* number of indirect symbol table entries */


    uint32_t extreloff;      /* offset to external relocation entries */
    uint32_t nextrel;        /* number of external relocation entries */

    uint32_t locreloff;      /* offset to local relocation entries */
    uint32_t nlocrel;        /* number of local relocation entries */

}; 
There are a lot of fields, but only several of them are needed for object files.
ilocalsym + nlocalsym – local symbols are used only for debugging.
iextdefsym + nextdefsym – external symbols.
iundefsym + nundefsym – undefined symbols.
Fields with i* prefix indicate index of the first entry in the symbol table, while n* holds the number of such symbols.
Relocations
Finally, all this structures were needed just to be able to do relocations. But why we even need them? Consider this assembly code:
call     ...   ; call function – external or internal
mov      rax, [rip + ...] ; load global variable
In both of these cases address or offset is not known until the linking stage as segments will be rearranged, combined, and placed back in some order. Linker will substitute address or offset by the relevant one. Relocations information specifies where address must be changed, how it must be changed and for what symbol.
Relocations entry is defined as:
struct relocation_info {
   int32_t  r_address;        /* offset in the section to */
                              /* what is being relocated */
   uint32_t r_symbolnum:24,   /* symbol index if r_extern == 1 or
                              /* section ordinal if r_extern == 0 */
            r_pcrel:1,        /* was relocated pc relative already */
            r_length:2,       /* 0=byte, 1=word, 2=long, 3=quad */
            r_extern:1,       /* does not include value of sym referenced */
            r_type:4;         /* if not 0, machine specific relocation type */
};
Do you remember that each section may have relocations and they are specified in corresponding field of section dtructure? Here are relocations themselves.
r_address – offset of value that is needed to be relocated from the start of the section.
r_symbolnum – as symbol index in symbol table if r_extern == 1 or section ordinal (number) if r_extern == 0.
r_pcrel – (1/0) Indicates whether the item containing the address to be relocated is part of a CPU instruction that uses PC-relative addressing. For addresses contained in PC-relative instructions, the CPU adds the address of the instruction to the address contained in the instruction.
r_length – Indicates the length of item containing the address to be relocated. A value of zero indicates a single byte; a value of 1 indicates a 2-byte address, and a value of 2 indicates a 4-byte address.
r_extern – (1/0) Indicates whether the r_symbolnum field is an index into the symbol table (1) or a section number (zero).
r_type – Indicates the type of relocation to be performed. Possible values for this field are shared between this structure and the scattered_relocation_info data structure; see the description of the r_type field in the scattered_relocation_info data structure for more details. There are two most used values:
GENERIC_RELOC_SECTDIFF – used for relative call addresses.
GENERIC_RELOC_PAIR – used for global variable rip relative offset.
Here’s an example of common relocation:
relocation_info relocation = {};
relocation.r_address = ...           /* some offset to the beginning */
                                     /* of relocatable address */
relocation.r_symbolnum = 0;          /* first symbol in symtab */
relocation.r_pcrel = 1;              /* let it be call instruction that */
                                     /* is PC-relative */
relocation.r_length = 2;             /* 4-bytes address */
relocation.r_extern = 1;             /* external symbol */
relocation.r_type   = GENERIC_RELOC_SECTDIFF;
Cumulative example
Here, I provide a code of constructing complete Mach-O object file with call to external function and call to internal function.
mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;

segment_command_64 segment = {};
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 0; /* to be modified */

section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  <- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname <- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;

const unsigned char code[] = {
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call  - someFuncExternal
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call  - someFunc
        0xB8, 0x01, 0x00, 0x00, 0x02,      // mov     rax, 0x2000001 ; exit
        0xBF, 0x00, 0x00, 0x00, 0x00,      // mov     rdi, 0
        0x0F, 0x05,                        // syscall
        // someFunc:
        0x48, 0x31, 0xC0,                  // xor rax, rax
        0xC3                               // ret
};

symtab_command symtabCommand    = {};
symtabCommand.cmd               = LC_SYMTAB;
symtabCommand.cmdsize           = sizeof(symtab_command);
symtabCommand.symoff            = 0;       /* to be modified */
symtabCommand.nsyms             = 0;       /* to be modified */
symtabCommand.stroff            = 0;       /* to be modified */
symtabCommand.strsize           = 0;       /* to be modified */

const char stringTable[]        = "\0_someFunc0\0_someFuncExternal0\0";

nlist_64 symbols[2] = {
        {
            1,                      // first index in string table
            N_SECT | N_EXT,         // defined in the file, available externally
            1,                      // first section
            REFERENCE_FLAG_DEFINED, // defined in the file
            4 * 5 + 2               // offset of this symbol in the section
        },
        {
            12,                      // second string in string table
            N_UNDF  | N_EXT,         // undefined in the file,
                                     // must be defined externally
            NO_SECT,                 // no section specified
            REFERENCE_FLAG_UNDEFINED_NON_LAZY, // external non-lazy symbol
            0                        // unused
        }
};

dysymtab_command dysymtabCommand      = {};
dysymtabCommand.cmd                   = LC_DYSYMTAB;
dysymtabCommand.cmdsize               = sizeof(dysymtabCommand);
dysymtabCommand.ilocalsym             = 0; // first symbol in symbol table
dysymtabCommand.nlocalsym             = 1; // only one locally defined symbol
dysymtabCommand.iextdefsym            = 1; // second symbol in symbol table
dysymtabCommand.nextdefsym            = 1; // only one externally defined symbol

relocation_info relocations[] = {
        {
            1,      // after first byte address to someFuncExternal
            1,      // second symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
        {
            6,      // second call address
            0,      // first symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
};

size_t offsetCounter = 0;
FILE* binary = fopen("object.o", "wb");

// Write header;
header.ncmds = 3; // segment + symtab + dysymtab
header.sizeofcmds = sizeof(segment) + sizeof(sectionText) + sizeof(symtabCommand) + sizeof(dysymtabCommand);
fwrite(&header, 1, sizeof(header), binary);
offsetCounter += sizeof(header);

// Write segment
segment.vmsize  = segment.filesize = sizeof(code);
segment.fileoff = header.sizeofcmds + sizeof(header); // we'll place code just after all load commands.
segment.nsects  = 1;
fwrite(&segment, 1, sizeof(segment), binary);
offsetCounter += sizeof(segment);

// Write section
sectionText.size   = segment.filesize;
sectionText.offset = segment.fileoff;
sectionText.reloff = segment.fileoff + segment.filesize; // just after the code
sectionText.nreloc = sizeof(relocations) / sizeof(relocations[0]); // two calls
fwrite(§ionText, 1, sizeof(sectionText), binary);
offsetCounter += sizeof(sectionText);

// Write symtab
symtabCommand.symoff = sectionText.reloff +
                        sectionText.nreloc * sizeof(relocation_info); // just after relocations
symtabCommand.nsyms = 2; // two functions
symtabCommand.stroff = symtabCommand.symoff +
                        symtabCommand.nsyms * sizeof(nlist_64); // just after symbol table
symtabCommand.strsize = sizeof(stringTable);
fwrite(&symtabCommand, 1, sizeof(symtabCommand), binary);
offsetCounter += sizeof(symtabCommand);

// Write dysymtab
fwrite(&dysymtabCommand, 1, sizeof(dysymtabCommand), binary);
offsetCounter += sizeof(dysymtabCommand);

// Write code
fwrite(&code, 1, sizeof(code), binary);

// Write relocations
fwrite(&relocations, 1, sizeof(relocations), binary);

// Write symbol table
fwrite(&symbols, 1, sizeof(symbols), binary);

// Write string table
fwrite(&stringTable, 1, sizeof(stringTable), binary);

fclose(binary);
References
Developer collection – relocation_info
Mach-O format reference OSX-ABI
MachOViewer – check out your file structure 


 Skip List Indexation and kth Maximum 
Alex Dremov — Fri, 06 Nov 2020 01:49:32 +0300
 Skip List is a nice structure that lets you to perform O(logn) insertions into sorted list,  O(logn) searches and O(logn) for finding n-th — second, third, fourth, ... — maximum or even calculating the rolling median. In this article I focus on indexation of skip list (indexable skip list).
The best guide I found was “a skip list cookbook” last revised in 1990. It slightly touched the problem of finding the kth element, but the provided algorithm is extremely vague and refers to unknown quantities without giving the information on how to find these quantities or update (ex. fDistance[i]). Wikipedia also talks about indexing, but an algorithm for calculating skip distances is not provided.
Cookbook searchByPosition algorithm
Therefore, I decided to create this post and provide an algorithm for indexing skip lists. Here, I’m going to give the code as well.
About skip list
A skip list is a one-way linked list that has “express lanes” for reaching distant members. It is a probabilistic data structure: selecting the "height" of each node relies on random numbers. As a result, it provides O(logn) insert and search complexity.
💡
Fast lanes change complexity of search, insert, and indexation from O(n) to O(logn)
Each node has a link to the right node on the same level and a link to the bottom node that has the same value, but one level lower. The first layer doesn’t have a bottom link. Some nodes don’t have the right node. We consider the null right node as \(+\infty\) and head as \(-\infty\).
To search for an element, we start at the left top corner and move: right if the right element is lower or equals to the needed element or move down if it is bigger than the required element.
💡
If the required element is not presented in the list, we end up in the potential position for the insertion.
Indexing skip list allows us to calculate the rolling median of set in O(logn) and to find n-th minimum or maximum in O(logn) also!
Defining the node
Each node is going to be:
template
struct TreeNode {
    T             key;
    unsigned      level     = 1;
    bool          headNode  = false;
    bool          deleted   = false;
    size_t        skipDist  = 0;
    TreeNode*  right     = null;
    TreeNode*  down      = null;
}
key – stored value
level – the level of the node
headNode – is this node is the head node
deleted – the node is marked as deleted
skipDist – distance skipped
right – right node
down – down node
We need to define the deleted mark as we can’t delete the node immediately due to the fact that the list is one-way linked. We just can’t update the left to the deleted one’s member. On the other hand, such a feature is useful in multi-threaded projects.
On this figure you can see what skipDist means:
💡
Basically, skipDist counts how many nodes will be skipped if you travel by the according fast lane
Defining skip list
template
class SkipList {
    unsigned     maxLevels;
    TreeNode* head;
}
Pretty much self-explanatory structure.
On initialisation, we create maxLevels number of head nodes:
this->head = new TreeNode(0, this->maxLevels);
this->head->headNode = true;

TreeNode* pos = this->head;
for(unsigned i = 1; i < maxLevels; ++i) {
    TreeNode* newNode = new TreeNode(0, this->maxLevels - i);
    newNode->headNode = true;
    
    pos->down = newNode;
    pos = newNode;
}
Insert
The hardest part of the insert algorithm is to update skip distances and to create upper-level nodes if random coin said so.
As I discussed previously, we do not delete elements, but rather mark them as deleted. therefore, before processing, we need to perform deletions. Let it be some function processDeletions(node). It finally deletes the element right to the node if it was marked as deleted.
Also, to check for cases when the right node is null, I created a function that compares node value to the key value.
// key < node
int compareWithNode(T key, TreeNode* node){
    if (node == nullptr)
        return -1;
    if (key == node->key)
    	return 0;
    return key < node->key ? -1 : 1;
}
💡
compareWithNode returns 0 if values are equal, -1 if the key is lower than the node value, and 1 if the key value is higher than the node value
As discussed before, if we encounter a null node, then we consider it as +inf⁡.
To perform all desired operations, the insert function is going to accept the current node, desired key, a pointer to the inserted node (if any), current position. We need to have a pointer to the inserted node for two reasons: to know on the higher levels whether the node was inserted at all, and we need the link to the bottom if we generate a “fast lane” node.
Also, let the function return bool value: whether the node was inserted on the previous level. If it was inserted, then we can flip the coin again and insert the “fast lane” node again on the current level.
💡
As I said, the algorithm relies on randomness. Decision whether new fast lane will be created is based on random coin.
bool insertRecursive(TreeNode* node,
                     T key,
                     TreeNode** insertedOne,
                     unsigned* pos) {
    this->processDeletions(node);
    
    int compareRight = compareWithNode(key, node->right);
    // save position at the current recursion level
    unsigned posHere = *pos; 
    
    if (compareRight == 0 ||
       (node->key == key && node->headNode != true))
        return false;
In the case, if the right node value is equal to the desired or the current node value is equal to the desired and this node is not the head node, the function returns with false as no insertions were needed.
Finally, if the right node’s value is lower than the desired, the function just increases the pos counter for the right node’s skip distance + 1 and dives deeper.
if (compareRight == 1) { // right elem is lower
    *pos += node->right->skipDist + 1;
    return insertRecursive(node->right,
                           key,
                           insertedOne,
                           pos);
}
Interesting things happen if we go down. First of all, if we need to go lower and it’s the very first level, then we simply insert the node and return true as the node was inserted.
else { // (compareRight == -1) // want go down
    if (node->level == 1) {
        *(insertedOne) = new TreeNode(key, 1, ++(this->ids));
        (*(insertedOne))->right = node->right;
        node->right = *(insertedOne);
        return true;
}
If we can go down, then some cases are needed to be considered.
We need to go deeper. That means that on the current level the right node’s value is higher than the desired, so the insertion is going to occur before the right node. That means that the right node’s skip distance is going to be increased by 1.
Also, if on the current level we insert a fast lane node, then the right node’s skip distance is shortened by the skip distance of the inserted node.
else {
    // whether there was an insertion on the deeper level
    bool possibleLevelInsert = insertRecursive(node->down,
                               key, insertedOne, pos);
                               
    if (!possibleLevelInsert ) {
        // if insertion of fast lane is impossible
        if (node->right != nullptr && *insertedOne != nullptr) {
        
        // if right node on the current level is presented
        // and we inserted the node (*insertedOne != nullptr)
            node->right->skipDist++;
            
         }
         
     return false; // insert of further fast lanes is impossible
     }
     
At this point, we know that we can insert a fast lane node on the current level. Let’s spin a coin and decide.
if(node->level == 1) // trivial case
    return true;
    
bool insertNow = this->spinACoin();
if (!insertNow){
	// no fast lane insertion -> increase the next
    // fast lane skip distance as the node was inserted somewhere between.
    if (node->right != nullptr){
         node->right->skipDist++;
    }
    return false;
}
// Can insert the fast lane node

TreeNode* newNode = new TreeNode(key, node->level);
newNode->down = *(insertedOne);
newNode->right = node->right;
newNode->skipDist = (*pos - posHere);

// *pos stopped updating at the insertion position.
// At the beginning, we saved temporary pos at the current recursion level.
if (node->right != nullptr) {
   // shrink right node skip distance as we inserted new fast lane node
   node->right->skipDist -= newNode->skipDist;
}
node->right = newNode;
*(insertedOne) = newNode;
return true;
This was massive code. Final insert function:
bool insertRecursive(TreeNode* node, T key, TreeNode** insertedOne, unsigned* pos){
    this->processDeletions(node);
    int compareRight = compareWithNode(key, node->right);
    unsigned posHere = *pos;
    if (compareRight == 0 || (node->key == key && node->headNode != true))
        return false;
    if (compareRight == 1){ // right elem is lower
        *pos += node->right->skipDist + 1;
        return insertRecursive(node->right, key, insertedOne, pos);
    } else {// (compareRight == -1) // want go down
        if (node->level == 1){
            *(insertedOne) = new TreeNode(key, 1);
            (*(insertedOne))->right = node->right;
            node->right = *(insertedOne);
            return true;
        } else {
            bool possibleLevelInsert = insertRecursive(node->down, key, insertedOne, pos);
            if (!possibleLevelInsert ){
                if (node->right != nullptr && *insertedOne != nullptr){
                    node->right->skipDist++;
                }
                return false;
            }
            if(node->level == 1)
                return true;
            bool insertNow = this->spinACoin();
            if (!insertNow){
                if (node->right != nullptr){
                    node->right->skipDist++;
                }
                return false;
            }
            TreeNode* newNode = new TreeNode(key, node->level);
            newNode->down = *(insertedOne);
            newNode->right = node->right;
            newNode->skipDist = (*pos - posHere);
            if (node->right != nullptr) {
                node->right->skipDist -= newNode->skipDist;
            }
            node->right = newNode;
            *(insertedOne) = newNode;
            return true;
        }
    }
}
Process deletions
The only thing that left is to define the processDeletions(node) function. If the right node is marked as deleted, then we need to update right to the right node skip distance. Also, at first, it’s needed to go to the deepest level of recursion and perform alterations from the end to the start.

  
      This post is for free subscribers only
      Subscribe for free now and continue to read the post
      Subscribe now
      Already have an account? Sign in
  

 


 My Experience On AI Stock Prediction 
Alex Dremov — Mon, 24 Aug 2020 16:18:00 +0300
 What It’s All About?
Probably, you found this post while searching for methods of stock prediction. Spoiler: in this article, I will not introduce a fascinating model that will make a lot of money for you. On the contrary, I want to focus on the wrong approaches and techniques that some data scientists may fall into.
You can find hundreds of posts on the internet when somebody using a single LSTM unit obtains 95%+ precision in market prediction. I scrutinized the most popular articles and checked the authors’ results. Here, I check: are they really that good.
Simple models – beautiful results
Sounds cool, right? In most cases, simple models can lead to fascinating outcomes, but the stock prediction is definitely not the case.
Searching for “Stock prediction with LSTM,” you’ll encounter numerous posts showing how simple LSTM-based models can achieve magic results. Here’s usual traps:
1. LSTMs are powerful
If you encounter this kind of architecture:
regressor = Sequential()
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))
regressor.add(Dense(units = 1))
Run.
Models with 3+ LSTM layers already considered as big models that can do amazing things. Andrej Karpathy in his blog was able to generate C code that almost compiles using just 3 layers of LSTM He needed about 474 Mbytes of code to not overfit the model. I know that the capacity of the model highly depends on the number of LSTM hidden units but usually, 4 layers of LSTM are redundant and you need an enormous dataset to not overfit your model.
In the article that I cited above, the author uses 4 layers and trains model on about 1500 samples. No need to talk that model will overfit. However, the author demonstrates amazing results:
https://analyticsindiamag.com/hands-on-guide-to-lstm-recurrent-neural-network-for-stock-market-prediction/
In the first view, the perfomance is really good. But is it really useful? It looks like the model approximates one of the exponentially weighted moving averages. Moreover, if you look at the graph at some specific point, you’ll notice that the predicted data give you little to no information about stock price movemnts.
Another thing that is noticeable is the severe left shift of predictions. You’ll say: “Just shift it left and they will match.” Unfortunately, no. In this case, we will not obtain any relevant data that can predict future prices. Moreover, this “shift” behavior can be due to the fact that in the author’s solution, the model makes approximations, basing on the recent 60 values. Look at the explanation:
This article will be updated when I find another problematic researches.

 


 How Deep Neural Networks Work 
Alex Dremov — Fri, 08 May 2020 05:25:40 +0300
 Introduction
Today, when such beautiful frameworks as Keras, Tensorflow, SkLearn exist, many people are not worried about how Neural Network models work and train. However, when interested people start to dig and search for explanations, they usually face unreasonably significant amounts of linear algebra thrown right into the face without any practical information.
At least, it was my case. Decided to understand Neural Networks, I enrolled in a local university online course. I watched lecture after lecture, noted everything necessary, and from the bottom of my heart waited for practical information and possible algorithms implementation.
The course ended, and I was left with a thick notebook of linear algebra, calculus, and no understanding of what can I do with all this information.
However, I don’t want somebody else to walk on the same road as me, so I decided to write this article. e.g. “Hello, world” in Neural Nets.
Side note: in this guide, I will not explore deeply program architecture and good Python practices as it is not the primary purpose of the article.
Single neurone: what is it?
We find a line that approximates our points the best. The line can be described by the following equation:
\[ y(x) = wx + b \]
By adjusting \(w\) and \(b\) we can make our line the best fit for the current points distribution.
And that’s actually what every single neuron in basic Neural Net does. The big difference is that line of best fit presented on the image is in 2D. In the real world, algorithms solve problems in multidimensional space.
For example, if you would like to predict who survives after the Titanic tragedy, you could take into account such parameters as age, fare, sex, number of siblings, etc. See that we already have 4 dimensions to work with. However, you should not be scared of that. A lot of concepts that work in 3D or 2D can also be applied to multidimensional space.
Classification problem
Let’s continue to work on the Titanic survival chance problem and imagine that our neuron already knows the line of best fit. The problem of binary dependent variable classification (survived/did not survive) names Logistic Regression.
The problem is that line is not limited, but probability can’t be lower than zero and higher than one. Here comes a sigmoid function.
\[ y(z) = \frac{1}{1 + e^{-z}} \]
As you see, the function is limited by 0 and 1. So, we will use it to adjust the neuron output. The name of the function that sets neuron output basing on a linear part is an “activation function”.
How this works with multidimensions 
The same problem is a little bit different when \(x\) has multiple dimensions – vector. Then, every component has a different effect on the final result, so \(w\) also should be a vector.
Formula in vector form:
\[ z = w^{T}x + b\]
We set \(w\) as column-vector and \(x\) as column-vector. Therefore, to get scalar, we transpose the \(w\) vector. How it works:
\[ x = \begin{bmatrix} x_1\\ x_2\\ \ldots \\ x_n\\ \end{bmatrix} w = \begin{bmatrix} w_1\\ w_2\\ \ldots;\\ w_n\\ \end{bmatrix} \] \[ w^{T}*x = \begin{bmatrix} w_1, w_2, \ldots, w_n \end{bmatrix} * \begin{bmatrix} x_1\\ x_2\\ \ldots;\\ x_n\\ \end{bmatrix} =\] \[ w_1x_1 + w_2x_2 + \ldots\]
If you feel a little bit uncomfortable with the expression above, repeat basic matrix multiplication.
That's it. This is how a single basic neuron works. It takes input, multiplies it by \(w\), adds \(b\) (just a number), applies activation function, and sends computed value further. This process is named forward propagation. Now, let's implement this in code.
Forward propagation in code
I will use NumPy for basic operations. Of course, you can implement matrix multiplication, addition, etc. by yourself, but NumPy does it more effectively and faster as it’s already compiled.
import numpy as np
Sigmoid function:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
Forward propagation function:
def forward_propagation(w, b, x):
    z = np.dot(w.T, x) + b # np.dot(..., ...) — matrix multiplication
    return sigmoid(z)
Great. Now we can calculate forward propagation of a single neuron. But how we figure out \(w\) and \(b\) values?
Loss function
To understand how well our algorithm performs, we need to define a loss function. For purposes of binary classification logarithmic loss performs well. So, we will use it.
\[ L(\widehat{y}, y) = -(y ln(\widehat{y}) + (1-y)ln(1-\widehat{y}) \]
That's how it looks:
\( \hat{y} \) represents computed value, \(y\) – actual
def loss(A, Y):
    return -(Y * np.log(A) + (1 - Y) * np.log(1 - A))
As you see, loss tends to infinity when \(\hat{y}\) and \(y\) are different, but it's 0 when they are exactly the same.
To train model, we use labeled information: pairs of \(x , y\).  \(x\) represents input vector and \(y\) – desired output for this vector. If we stack all available \(x\) into a single matrix, we'll create \(X\) – a matrix that contains all training data. We can do the same thing for \(Y\)
\[ X = \begin{bmatrix} | && | &&  && |\\  x_1 && x_2 && \ldots && x_m\\ | && | &&  && | \end{bmatrix}  \]
\[ Y = \begin{bmatrix} | && | &&  && |\\  y_1 && y_2 && \ldots && y_m\\ | && | &&  && | \end{bmatrix}  \]
If we have \(m\) samples and every x vector is \(n\)-dimensional, then we can calculate the cost of the algorithm with selected \(w\) and \(b\).
\[ J(w,b) = \frac{1}{m} \sum_{i=1}^{m} L(\hat{y}^{(i)}, y^{(i)}) \]
def cost(A, Y, m):
    return 1 / m * np.sum(loss(A, Y))
Vectorization
Currently, we can calculate forward propagation for single (x, y) set and we need to calculate values for all \(m\) available training pairs. The most obvious answer is to start the for loop and calculate iteratively. But this is not the optimal case. We can use \(X\) and \(Y\) matrices to calculate forward propagation for the entire training set. That's how it works:
\[ X^{T}w + b = \begin{bmatrix} – && x_1 && – \\ – && x_2 && – \\ && … && \\ – && x_m && –\end{bmatrix} * \begin{bmatrix} w_1\\ w_2\\ …\\ w_n\\ \end{bmatrix} + b = \]
\[ = \begin{bmatrix} x_1^{T}w + b \\ x_2^{T}w + b \\ … \\ x_m^{T}w+ b\\ \end{bmatrix} \]
As you see, every row represents forward propagation for every training set.
This approach optimizes code and speeds up calculations. Whenever possible, vectorize code. The same technique can be applied during backpropagation.
In the core of learning: Backpropagation
At this moment we can calculate neuron output and estimate how close it to the actual value. But how we can figure out \(w\) and \(b\) values? Here comes a Gradient Descent concept.
The best explanation of GradDescent I ever heard:
It’s like you are trying to find a door in a completely dark room and you can only “feel” in what direction to move
Imagine that you have some function, but you do not know it’s expression. And shape. And you are in multidimensional space. Then, you randomly placed at some point of this function and asked to find its minimum. Not the most pleasant situation, right? However, you know how your position was calculated, so you can find a derivative. But what derivative gives? Let’s take a look at this function.
Using derivative we can find direction to the function’s minimum. That’s how it looks animated:
We can take steps in an outlined direction and finally reach a minimum loss point. That’s how we can implement this to adjust \(w\) and \(b\)
\[w^{new} = w^{old} – \alpha \cdot \frac{\partial{J(w, b, x)}}{\partial{w}}\]
\[b^{new} = b^{old} – \alpha \cdot \frac{\partial{J(w, b, x)}}{\partial{b}}\]
Where \(\alpha\) is a learning rate. You can view derivative calculation in the spoiler, here are final expressions:
\[ \frac{\partial{J}}{\partial{z}} = \hat{Y} – Y \] \[ \frac{\partial{J(w, b, x)}}{\partial{w}} = \frac{1}{m} X*(\frac{dJ}{dz})^{T} = \frac{1}{m} X*(\hat{Y}-Y)^{T} \] \[ \frac{\partial{J(w, b, x)}}{\partial{b}} = \frac{1}{m} \sum_{i}^{n}{\sum_{j}^{m}{(\frac{dJ}{dz})_{ij}}} =\] \[ \frac{1}{m} \sum_{i}^{n}{\sum_{j}^{m}{(\hat{Y}-Y)_{ij}}} \]
Do not worry. It all looks a lot better in code. Further, I will use a different notation: \(\frac{\partial{J(w, b, x)}}{\partial{w}}\) as \(dw\), \(\frac{\partial{J(w, b, x)}}{\partial{b}}\) as \(db\), etc. Also, it’s common to name \(\hat{Y}\) as \(A\) because it represents activation function value.
\[\frac{\partial{L}}{\partial{A}} = \frac{-Y}{A} + \frac{1-Y}{1-A}\]
\[\frac{dA}{dZ} = \frac{-e^{-Z}}{(1+e^{-Z})^{2}} = \sigma(Z)(1-\sigma(Z))\]
\[A =  \sigma(Z)\]
\[\frac{\partial{L}}{\partial{Z}} = \frac{\partial{L}}{\partial{A}} * \frac{\partial{A}}{\partial{Z}} =\]
\[=  (1 – A)(-Y) + (1 – Y)A = A – Y\]
dZ = A - Y
db = 1 / m * np.sum(dZ)
dw = 1 / m * np.dot(X, dZ.T)
Then, single back propagation step can be represented in this function:
def backpropagation(w, b, X, A, Y, learning_rate, m):
    dZ = A - Y
    db = 1 / m * np.sum(dZ)
    dw = 1 / m * np.dot(X, dZ.T)
    assert(dw.shape == w.shape)
    w = w - learning_rate * dw
    b = b - learning_rate * db
    return w, b
Initialization
To adjust \(w\) and \(b\), we need to have starting point. We are going to initialise \(w\) and \(b\) with zeros
💡
We can initialise parameters with zeros when we have just one neuron. This approach does not work if there are several neurons and layers. If we initialize them with 0, then all neurons will develop in the same way and the whole network becomes almost useless.
Initialisation:
w = np.zeros((n, 1))
b = np.zeros((n, 1))
Finally, we can write full neuron learning code.
def model(X, Y, learning_rate=0.1, n_iter=2000, costIter = [[],[]]):
    m = X.shape[1]
    n = X.shape[0]
    w = np.zeros((n, 1))
    b = 0
    for i in range(n_iter):
        A = forwardpropagation(w, b, X)
        c = cost(A, Y, m)
        if i % 5 == 0:
            print("Iteration %s: %s" % (i, c))
        costIter[0].append(i)
        costIter[1].append(c)
        w, b = backpropagation(w, b, X, A, Y, learning_rate, m)
    return w, b
In this code, we combine all previous steps:
Initialize all parameters
Start a loop
Perform forward propagation
Calculate cost
Print some data / save into array
Perform backpropagation step and update parameters
Finally, the model returns optimal \(w\) and \(b\) values so that we can use them to predict answers for new values.
Testing
For testing, I selected a line
\[y = 1.23x + 3.23\]
Let points above the line be blue and ones that below – red. Here is the set that I gave to the model for training.
Training:
As we see, the cost minimizes overtime. That means that backpropagation works correctly and our \(w\) and \(b\) are adjusted right.
To check how well the algorithm performs, I randomly generated 2000 points and requested neuron to classify them.
That’s how the algorithm performed. The green line represents the actual line.
The accuracy is around 99%. I suppose it misclassified ~1% due to the points that lie directly on the line.
What’s special about this classifier?
So one neuron approximates some linear function. How can it distinct cats from dogs, survived from not survived?
By combining neuron in stacks and in layers, we form complicated linear functions compositions, and then we can approximate sophisticated multidimensional functions that find subtle dependencies and relations during training.
But single neuron and backpropagation concept lie in the heart of the whole process.

State	Description
Not running	The app is not running.
Inactive	The app is running in the foreground, but not receiving events. An iOS app can be placed into an inactive state, for example, when a call or SMS message is received.
Active	The app is running in the foreground, and receiving events. Transitional state.
Background	The app is running in the background, and executing code.
Suspended	The app is in the background, but no code is being executed. App is still in memory

Method	Description
`application:willFinishLaunchingWithOptions`	This method is your app’s first chance to execute code at launch time.
`application:didFinishLaunchingWithOptions`	This method allows you to perform any final initialization before your app is displayed to the user.
`applicationDidBecomeActive`	The app has entered the foreground app. Use this method for any last minute preparation.
`applicationWillResignActive`	The app is transitioning away from being the foreground app.
`applicationDidEnterBackground`	The app runs in the background and may be suspended at any time.
`applicationWillEnterForeground`	The app moves out of the background and back into the foreground, but that it is not yet active.
`applicationWillTerminate`	The app is being terminated. This method is not called if your app is suspended.

Alex Dremov

Simple Ways to Speed Up Your PyTorch Model Training

Containerization

Get comfortable with PyTorch profiler

Understanding PyTorch traces

Data loading

Making friends with memory allocator

Tidy up allocations history

Speed up the model and use less memory

Optimize multi-GPU data redundancy — FSDP

Shard optimizer state (ZeRO 1)

Shard gradients (ZeRO 2)

Shard model parameters (ZeRO 3)

How to use FSDP?

Magic speedup with torch.compile

Conclusion

Swift Actors — Common Problems and Tips

Reentrancy: Invalid State Expectations

Reentrancy: Double Computations

@MainActor Overuse

Use Sendable. Do Not Keep This Information In Mind

Do Not Ignore Nonisolated Keyword

Continue Reading About Swift & iOS

I Contributed to PyTorch. Here's What I Learned

The Issue Must Not Be That Bad

It Was Bad. Really Bad

What I Learned

Finally

See My Work

Conquer Data Races with Swift Actors

Data Races And When They Happen

What's under the hood?

Non-Actor Solutions

Serial Queue

Concurrent Queue With Barrier

Actors Model

Using Actors in Swift

Nonisolated Members

Difference to Locks

Final Notes

References

Dive into Swift's Memory Management

Memory Management

Strong Reference

Weak Reference

Unowned Reference

Three Reference Counters

Side Table

Weak and Unowned. Deep Differences

Performance

Deallocation vs deinitialization

Common Problems

Closures, strong capture, and self

Final notes

References

Data Binding in SwiftUI: Tips, Tricks, and Best Practices

What is data binding in SwiftUI?

How to use @State to bind a simple value to your user interface

Using @Binding

How to use @ObservedObject to bind a class to your user interface

How to use @EnvironmentObject to bind a global object to your user interface

iOS App As a Microservice. Using SwiftUI in Modular App

What's The Problem

Data Flow

Use Data Flows and Not Callbacks

SwiftUI + Combine. It's a Match

Better Combine Use

Restrict Modules To Read-Only Variables

Do Not Use EnvironmentObjects

Go For Programmatic Navigation

References

iOS App As a Microservice. Modularize Your App With Tuist

What’s next?

Why Tuist?

Our goal

Defining project

Structure

Project file

Creating an app with Tuist

Final notes

Magic speedup with `torch.compile`

Method `sizeThatFits`

Method `placeSubviews`