ML Tasks on Swift Without Python, Neural Networks and Libraries

Neural networks are at the forefront of Machine Learning (ML) today, and Python is undoubtedly the go-to programming language for any ML task, regardless of whether one intends to use Neural Networks to solve it or not. There is a vast array of Python libraries available that cover the entire spectrum of ML tasks, such as NumPy, Pandas, Keras, TensorFlow, PyTorch, and so on. These libraries usually rely on C or C++ implementations of ML algorithms and approaches under the hood because Python is too slow for them. However, Python is not the only programming language in existence, and it is not the one I use in my daily work.

This article is not a guide on how to write something in Swift; rather, it is more like a thought piece about the current mindset of many developers who view Python as a bridge to the ultimate solution for ML libraries that will resolve any problem or task they encounter, regardless of the language they are using. I would wager that most developers prefer to invest their time in finding ways to integrate Python libraries into their language/environment rather than considering alternative solutions without them. While this is not inherently bad – reuse has been a significant driver of progress in IT over the past few decades – I have started to sense that many developers no longer even consider alternative solutions. This mindset becomes even more entrenched with the current state and advancements in Large Language Models.

The balance is lacking; we are rushing towards asking LLMs to resolve our issues, obtaining some Python code, copying it, and enjoying our productivity with potentially significant overhead from unnecessary dependencies.

Let's explore alternative approach for solving the task at hand using only Swift, mathematics, and no other tools.

When people start learning Neural Networks, there are two classic Hello World examples that you can find in most tutorials and introductory materials for it. The first one is a handwritten digits recognition. The second is a data classification. I will focus on the second one in this article, but the solution I will go through will work for the first one as well.

The very good visual example for it can be found in TensorFlow Playground, where you can play around with different neural network structures and visually observe how well the resulting model solves the task.

TensorFlow Playground example

You might ask what the practical meaning of these dots on an image with different colors is? The thing is that it's a visual representation of some data sets. You can present many different types of data in exactly the same or similar way, such as social groups of people who buy specific products or music preferences. Since I primarily focus on mobile iOS development, I will also give an example of a real task I was solving that can be visually represented in a similar manner: finding electric wires inside walls using a gyroscope and magnetometer on a mobile phone. In this particular example, we have a set of parameters related to the wire found and another set of parameters for nothing is inside the wall.

Let's take a look at the data we'll be using.

ML-Test

We have two types of data here: red dots and blue dots. As I described above, it may be a visual representation of any kind of classified data. For example, let's take the red area as the one where we have a signal from the magnetometer and gyroscope in cases when we have an electric wire in the wall, and the blue area in case we don't.

We can see that these dots are grouped together somehow and form some sort of red and blue shapes. The way these dots were generated is by taking random points from the following image:

Dots grouped together

We will use this picture as a random model for our train process by taking random points for training the model and other random points for testing our trained model.

The original picture is 300 x 300 pixels, containing 90,000 dots (points). For training purposes, we will use only 0.2% of these dots, which is less than 100 points. To gain a better understanding of the model's performance, we will randomly select 3000 points and draw circles around them on the picture. This visual representation will provide us with a more comprehensive idea of the results. We can also measure the percentage of accuracy to verify the model's efficiency.

How we gonna make a model? If we take a look at these two images together and try to simplify our task, we will find out that the task, in fact, is to recreate the Origin picture from the data we have (batch of red and blue dots). And as closer the picture we get from our model to the original one will be as more accurate our model works. We can also consider our test data as some sort of extremely compressed version of our original image and have a goal of decompressing it back.

What we are going to do is to transform our dots into mathematical functions that will be represented in code as arrays or vectors (I will use vector term here in the text just because it's between function from math world and array from software development). Then, we will use these vectors to challenge every test point and identify to which vector it belongs more.

To transform our data, I will try a Discrete Cosine Transform (DCT). I won't go into any mathematical explanations about what it is and how it works, as you can easily find that information if you wish. However, I can explain in simple terms how it can help us and why it's useful. The DCT is used in many areas, including image compression (such as JPEG format). It transforms the data into a more compact format by keeping only the important parts of the image while removing the unimportant details. If we apply the DCT to our 300x300 image containing only red dots, we will get a 300x300 matrix of values that can be transformed into an array (or vector) by taking each row separately.

Let's finally write some code for it. First, we need to create a simple object that will represent our point (dot).

enum Category {
    case red
    case blue
    case none
}

struct Point: Hashable {
    let x: Int
    let y: Int
    let category: Category
}

You may notice that we have an additional category called none. We will actually create three vectors in the end: one for red points, second for blue points, and the third one for anything else that is represented by none. While we could just have two of them, having a trained vector for not red and not blue will make things a bit simpler.

We have `Point` conforms to the Hashable protocol to use a Set to avoid having points with the same coordinates in our test vector.

func randomPoints(from points: [Point], percentage: Double) -> [Point] {
    let count = Int(Double(points.count) * percentage)
    var result = Set<Point>()
    while result.count < count {
        let index = Int.random(in: 0 ..< points.count)
        result.insert(points[index])
    }
    return Array<Point>(result)
}

Now we can use it to take 0.2% random points from our original image for red, blue, and none points.

redTrainPoints = randomPoints(from: redPoints, percentage: 0.002)
blueTrainPoints = randomPoints(from: bluePoints, percentage: 0.002)
noneTrainPoints = randomPoints(from: nonePoints, percentage: 0.002)

We are ready to transform these training data using DCT. Here's an implementation of it:

final class CosTransform {

    private var sqrtWidthFactorForZero: Double = 0
    private var sqrtWidthFactorForNotZero: Double = 0
    private var sqrtHeightFactorForZero: Double = 0
    private var sqrtHeightFactorForNotZero: Double = 0

    private let cosLimit: Int

    init(cosLimit: Int) {
        self.cosLimit = cosLimit
    }

    func discreteCosTransform(for points: [Point], width: Int, height: Int) -> [[Double]] {
        if sqrtWidthFactorForZero == 0 {
            prepareSupportData(width: width, height: height)
        }

        var result = Array(repeating: Array(repeating: Double(0), count: width), count: height)

        for y in 0..<height {
            for x in 0..<width {
                let cos = cosSum(
                    points: points,
                    width: width,
                    height: height,
                    x: x,
                    y: y
                )
                result[y][x] = cFactorHeight(index: y) * cFactorWidth(index: x) * cos
            }
        }

        return result
    }

    func shortArray(matrix: [[Double]]) -> [Double] {
        let height = matrix.count
        guard let width = matrix.first?.count else { return [] }

        var array: [Double] = []
        for y in 0..<height {
            for x in 0..<width {
                if y + x <= cosLimit {
                    array.append(matrix[y][x])
                }
            }
        }
        return array
    }

    private func prepareSupportData(width: Int, height: Int) {
        sqrtWidthFactorForZero = Double(sqrt(1 / CGFloat(width)))
        sqrtWidthFactorForNotZero = Double(sqrt(2 / CGFloat(width)))
        sqrtHeightFactorForZero = Double(sqrt(1 / CGFloat(height)))
        sqrtHeightFactorForNotZero = Double(sqrt(2 / CGFloat(height)))
    }

    private func cFactorWidth(index: Int) -> Double {
        return index == 0 ? sqrtWidthFactorForZero : sqrtWidthFactorForNotZero
    }

    private func cFactorHeight(index: Int) -> Double {
        return index == 0 ? sqrtHeightFactorForZero : sqrtHeightFactorForNotZero
    }

    private func cosSum(
        points: [Point],
        width: Int,
        height: Int,
        x: Int,
        y: Int
    ) -> Double {
        var result: Double = 0
        for point in points {
            result += cosItem(point.x, x, height) * cosItem(point.y, y, width)
        }
        return result
    }

    private func cosItem(
        _ firstParam: Int,
        _ secondParam: Int,
        _ lenght: Int
    ) -> Double {
        return cos((Double(2 * firstParam + 1) * Double(secondParam) * Double.pi) / Double(2 * lenght))
    }
}

Let's create an instance of CosTransform object and test it.

let math = CosTransform(cosLimit: Int.max)
...
redCosArray = cosFunction(points: redTrainPoints)
blueCosArray = cosFunction(points: blueTrainPoints)
noneCosArray = cosFunction(points: noneTrainPoints)

We use some simple helper functions here:

func cosFunction(points: [Point]) -> [Double] {
    return math.shortArray(
        matrix: math.discreteCosTransform(
            for: points,
            width: 300,
            height: 300
        )
    )
}

There is a cosLimit parameter in CosTransform that is used inside shortArray function, I will explain the purpose of it later, for now let's ignore it and check the result of 3000 random points from original image against our created Vectors redCosArray, blueCosArray and noneCosArray. To make it work, we need to create another DCT vector from a single point taken from the original image. This we do exactly the same way and using the same functions we already did for our Red, Blue and None cos Vectors. But how can we find which one this new vector belongs to? There is a very simple math approach for it: Dot Product. Since we have a task of comparing two Vectors and finding the most similar pair, Dot Product will give us exactly this. If you apply a Dot Product operation for two identical Vectors, it will give you some positive value that will be greater than any other Dot Product result applying to the same Vector and any other Vector that has different values. And if you apply a Dot Product to the orthogonal Vectors (Vectors that don't have anything common between each other), you will get a 0 as a result. Taking this into consideration, we can come up with a simple algorithm:

Go through all our 3000 random points one by one.
Create a vector from a 300x300 matrix with only one single point using DCT (Discrete Cosine Transform).
Apply a dot product for this vector with redCosArray, then with blueCosArray, and then with noneCosArray.
The greatest result out of the previous step will point us to the right answer: Red, Blue, None.

The only missing functionality here is a Dot Product, let's write a simple function for it:

func dotProduct(_ first: [Double], _ second: [Double]) -> Double {
    guard first.count == second.count else { return 0 }
    var result: Double = 0
    for i in 0..<first.count {
        result += first[i] * second[i]
    }
    return result
}

And here is an implementation of the algorithm:

var count = 0
while count < 3000 {
    let index = Int.random(in: 0 ..< allPoints.count)
    let point = allPoints[index]
    count += 1

    let testArray = math.shortArray(
        matrix: math.discreteCosTransform(
            for: [point],
            width: 300,
            height: 300
        )
    )

    let redResult = dotProduct(redCosArray, testArray)
    let blueResult = dotProduct(blueCosArray, testArray)
    let noneResult = dotProduct(noneCosArray, testArray)

    var maxValue = redResult
    var result: Category = .red
    if blueResult > maxValue {
        maxValue = blueResult
        result = .blue
    }
    if noneResult > maxValue {
        maxValue = noneResult
        result = .none
    }
    fillPoints.append(Point(x: point.x, y: point.y, category: result))
}

All we need to do now is to draw an image from fillPoints. Let's take a look at the train points we've used, DCT vectors we've created from our train data, and the end result we've got:

Result

Well, looks like random noise. But let's take a look at the visual representation of vectors. You can see some spikes there, that's exactly the information we need to focus on and remove most of the noise from our DCT result. If we take a look at the simple visual representation of the DCT matrix, we will find that the most useful information (the one that describes the unique features of the image) is concentrated at the top left corner:

Concentration

Now let's take a step back and check the shortArray function once again. We use a cosLimit parameter here exactly for the reason of taking the top left corner of the DCT matrix and using just the most active parameters that make our vector unique.

func shortArray(matrix: [[Double]]) -> [Double] {
    let height = matrix.count
    guard let width = matrix.first?.count else { return [] }

    var array: [Double] = []
    for y in 0..<height {
        for x in 0..<width {
            if y + x <= cosLimit {
                array.append(matrix[y][x])
            }
        }
    }
    return array
}

Let's create our math object with different cosLimit:

let math = CosTransform(cosLimit: 30)

Now instead of using all 90,000 values, we will use just 30 x 30 / 2 = 450 of them from the top left corner of the DCT matrix. Let's take a look at the result we've obtained:

Result

As you can see, it's already better. We can also observe that most of the spikes that make Vectors unique are still located in the front part (as selected with green in the picture), let's try to use CosTransform(cosLimit: 6) which means we will use just 6 x 6 / 2 = 18 values out of 90,000 and check the result:

Success

It's much better now, very close to the original image. However, there is only one little problem - this implementation is slow. You wouldn't need to be an expert in algorithm complexity to realise that DCT is a time-consuming operation, but even the dot product, which has a linear time complexity, is not fast enough when working with large vectors using Swift arrays. The good news is that we can do it much faster and simpler in implementation by using vDSP from Apple's Accelerate framework, which we already have as a standard library. You can read about vDSP here, but in simple words, it's a set of methods for digital signal processing tasks execution in a very fast way. It has a lot of low-level optimisations under the hood that work perfect with large data sets. Let's implement our dot product and DCT using vDSP:

infix operator •
public func •(left: [Double], right: [Double]) -> Double {
    return vDSP.dot(left, right)
}

prefix operator ->>
public prefix func ->>(value: [Double]) -> [Double] {
    let setup = vDSP.DCT(count: value.count, transformType: .II)
    return setup!.transform(value.compactMap { Float($0) }).compactMap { Double($0) }
}

To make it less tedious, I've used some operators to make it more readable. Now you can use these functions in the following way:

let cosRedArray = ->> redValues
let redResult = redCosArray • testArray

There is a problem with the new DCT implementation regarding our current matrix size. It wouldn't work with our 300 x 300 image as it's optimised to work with specific sizes that are powers of 2. Therefore, we will need to put in some effort to scale the image before giving it to the new method.

Summary

Thanks to anyone who managed to read this text until now or was lazy enough to scroll through without reading. The purpose of this article was to show that many tasks that people don't consider solving with some native instruments can be solved with minimal effort. It's enjoyable to look for alternative solutions, and don't limit your mind to Python library integration as the only option for solving such tasks.

Discussion (20)

Not yet any reply