The new way of managing Chat storage in the iOS version of LINE

Harley Pham2023-02-22

Harley is an iOS engineer at LINE Vietnam.

While mobile device storage capacity is getting bigger and bigger nowadays, user data is also increasing in size as well. Users can have tons of extremely high quality photos or videos which can eat away their storage capacity very fast. Moreover, digital content can be easily shared among users in many social media apps or messaging apps such as our own LINE app or Instagram to name a few. This data duplication can even put more pressure on device storage as well. Our LINE app is not only the same but also allows user to share their photos in their original size in chat, which can occupy a significant amount of storage capacity. The important thing is that device storage capacity is limited and often can't be changed after purchase. Users have no choice except to clean up their data, buy more cloud storage, or switch to a new device with bigger capacity.

Many LINE users may send messages that contain important data they would want to keep locally on their device. This is a popular feature since there aren't much messaging services capable of persistent user data storage on their data center and cloud permanently. Whenever there is a large amount of data occupying a device, we need a convenient way to help users manage that data effectively. For example, users might want to have an overview of how their chats are using the most storage capacity so they can consider removing some of them to save their storage. iPhone's storage settings show how much storage capacity an application is using but with not much information nor actions a user can take, with the exception of the "Delete App" option which is a last resort in most cases.

In the following sections, we will explain more about our previous solution and the new solution that has replaced it.

The previous implementation

To help users manage their chat data effectively, our engineers have provided a feature called "Delete data" in 2016, and the current UX/UI has been around for long. While maintaining the current feature, we've had many inquiries from our users about difficulties they've had when using this feature. With the help of our Customer Service team, we could narrow down the difficulties to two main points:

The "progressive indicator" doesn't tell us much about what it means, which sometimes confuses users when they have to wait for it to finish. In fact, it shows the current number of chats that have been calculated or synchronized before displaying the final storage size for users.
Calculation and synchronization of data for some users could take very long to finish. This can lead to another pain point: Users can't interact with the app until the process finishes, even when the process is partially done.

Besides the above, users don't have too many options for deleting their data. For example, when they want to clean up their chat data, the fastest way is to delete by category (image, audio, file, or all chat data). This will remove all data from all chats they have. If they want to keep some of the data, they have no choice but to head to each chat's settings and remove one by one, which can be inconvenient.

After many careful investigations and discussions, we've come up with improvements to the UX/UI together with better calculation performance.

The new Chat Storage Management feature

Our improvements in the first phase included:

New UX/UI for showing LINE app capacity for better describing LINE storage usage and information about device capacity as well. In this case, we showed a simple chart, since this is the easiest way to visualize things for users so that they can get an overview of their storage capacity at a glance. Also, the progress indicator became more meaningful and users could interact with their data even when the calculation task is happening.
Introduce a new way for storing chat data and for quickly getting storage size.
Improve UX by providing them a new way for managing their chat data. In this case we organize their chats in a sorted list as well as showing details about them separately.

More details about those improvements will be listed in the following sections.

Chat storage capacity calculating improvements

The iOS version of the LINE app stores chats and messages using CoreData; Apple's complex but robust framework. The message model contains business information and some information about data on the user's local storage, such as file types or file paths. Those models are the most important in LINE chat and they're stable enough for us to use them as a source of truth for any calculation relating to chat data size. For the feature's requirements, we wanted to calculate the total size of chat data (from all messages in chat) based on the existing model. We could simply accumulate all the message models and do the file size calculation (for example, using FileManager's APIs). It sounds very straightforward with the solution.

However, things are not simple like that in reality. We have many users whose chats can contain a huge number of files (we're talking about the quantity, not only the file size itself). This can dramatically drop performance and take a lot of time for the calculation task because of the overhead from the file reading operations. As documented by Apple:

Relative to other operations, accessing files on disk is one of the slowest operations a computer can perform. Depending on the size and number of files, it can take anywhere from a few milliseconds to several minutes to read files from a disk-based hard drive.

Even on some of the newest modern devices nowadays, the problem persisted. And it's even worse when it has to do the same task over and over again every time a user uses that feature. To avoid that, we used one of the familiar techniques called "caching". We wanted to do the least amount of calculation while reusing things that have been already calculated.

With the above idea, we used an additional persistent store for saving chat capacity information. In this case they're message data such as images, audio, files, and videos. We don't want to stick to any kind of persistent storages like SQLite, Realm, or even CoreData framework. We also avoid depending on business models so we decided to abstract all of them as StorageManagementAbstractDatabase and use the StorageItemModelable as the unit for saving any data. The model contains some information as follows:

protocol StorageItemModelable {
    var id: String { get }
    var serviceType: StorageManagementDatabaseScheme.ServiceType { get }
    var groupId: String { get }
    var contentId: String { get }
    var contentSubId: String? { get }
    var contentType: String { get }
    var contentSizeInBytes: Int64 { get }
    var createdDate: Date { get }
    var fileName: String { get }
    var additionalData: String? { get }
}

This protocol can be used by the persistent store (in our case, SQLite) when saving the information into its backing store (File, Database). In order to get data back from the persistent store, we also need another protocol, which could be used when re-creating the model after fetching again from the persistent store:

protocol StorageItemConstructible {
    init(from item: StorageItemModelable)
}

One of the beautiful things of this approach is that it can be used for storing other information from other services, and not only Chat. By combining all of the above protocols, we could make a general type which can be used by any services:

typealias StorageItemConvertible = StorageItemModelable & StorageItemConstructible

This is an example when using in Chat storage service:

struct ChatStorageItemModel: StorageItemConvertible {
    let id: String
    let chatId: String
    let messageId: String
    ...
    var serviceType: StorageManagementDatabaseScheme.ServiceType {
        .chat
    }
 
    var groupId: String {
        chatId
    }
 
    var contentId: String {
        messageId
    }
}
extension ChatStorageItemModel { // for generating model after fetching from persistent store
    init(from item: StorageItemModelable) {
        id = item.id
        chatId = item.groupId
        messageId = item.contentId
        ...
    }
}

In the example above, we define the new model containing information which we want to save into the persistent store and then simply adapt to the StorageItemConvertible protocol. The above ChatStorageItemModel is the one we use for the feature improvements. And after saving things into persistent store, we can quickly get the chat storage size (for example, the SQLite table using the SUM aggregation operator with an ~O(n) time complexity).

The first time we used the feature, the heaviest but necessary task was summarization (migrating existing data to the persistent store), which was processed incrementally (chats which are completed processing were displayed first instead of having to wait until they're all completed). The flow can be seen in the following diagram:

Handling the inconsistency

The above approach for calculating storage capacity can be very fast. However it also needs to be reliable in any situation as well. In our approach, we update the persistent store every time there are any events relating to chat data (for example, receiving an image message, removing a message which contains attached file). To make this more robust, we need to ensure the persistent store's correctness as much as possible. Since there could be some reasons causing data inconsistency between the persistent store and the files on disk. For example, the app might crash at the time it receives any insert/remove message events before it updates the changes into the persistent store. We call this the synchronization step. The logic for checking could be described as follows:

For better UX, the checking task should be invisible to users as much as possible and with less impact to system performance as well. Besides that, this task can run independently from the feature flow so we can schedule it to run when the device is idle, for example while the device is charging.

Apple has also introduced a framework called BackgroundTasks in their session in WWDC19. This framework can help us schedule our heavy tasks to run at appropriate times in the background. This means the system at some point will launch our app and then do the scheduled work in the background, which is very efficient. So we decided to schedule the synchronization task as the background task using this framework. The tasks scheduled to only run when the device is idle, and charging which will have less impact on performance. With this approach, we can ensure the correctness of user data while not affecting system performance and battery too much.

Displaying list of chats taking storage capacity

One of the benefits when we structure the database like the above is that we can easily group items into conditions we need, in this case chatId. When grouping items like this, we can help users manage their data, as the contents of some chats may be more important to them than others. This makes the removal of the unimportant chats occupying significant amount of storage capacity more reasonable.

At WWDC19, Apple has introduced one of the most interesting improvements for UIKit in iOS 13, especially the UI Data Sources, and some advanced usage of this Data Sources in WWDC20. This approach of the DiffableDataSource (UICollectionViewDiffableDataSource and UITableViewDiffableDataSource) isn't new, as was proposed and used in some of the Reactive frameworks like RxSwift. The idea is to consider the data source as some kind of "snapshot" and then "apply" the new one to the old one to figure out the "difference" between them before updating the collection view (or table view), so we can get better performance and also avoid the error-prone traditional approach, especially when using "batch update" on the data source.

Here's some of the benefits they mention:

The "snapshot" can be treated as a "Truth of UI State", which includes unique identifiers for sections and items, so we can avoid interacting with IndexPath directly.
We can avoid crashing, hassles, complexity when using performBatchUpdates() the traditional way. We can just call apply() on the snapshot, and everything will be done for us automatically.

We sometimes need to do some batch updates for chat storage items while calculating, according to our business needs. So this new approach is more reasonable. With official support from Apple and proven stability from iOS 13 and onwards, we decided to use it when implementing the feature.

Calculating the total LINE storage usage

Similarly to every iOS app, the LINE app also has its own app sandbox, containing app bundle and app data as well. The app data directory, which we can retrieve by using the path NSHomeDirectory(), is the part which mainly affects total app size. It contains many directories that we might be familiar with, such as Library/Cache, Library/Application Support, Document/ and so on.

Apple also provides app size information for every app in their iOS storage system settings (Settings > General > iOS storage). Unfortunately, there's no API for getting the app size, and there's not even any kind of documentation provided by Apple which mentions how to calculate it. At the time of writing this article, we have to calculate the total app size manually.

Through some experimentation on accumulating the size of the contents of the application sandbox to make it match with the iOS storage setting, we finally ended up with the following formula:

Application size* = Application bundle size + (Data container size - Cache directory size) + App Groups size

( * ) The closest approximation of the application size

We're using the URLResourceValues for every directories and files to get their size. There are some properties that need to be noted:

fileSize: The total file size, in bytes
fileAllocatedSize: The total allocated size on-disk for the file, in bytes. This was calculated by getting the number of blocks multiplied by block size (in APFS)
totalFileSize and totalFileAllocatedSize: May include space that metadata uses

For the formula above, we're using totalFileAllocatedSize and fileAllocatedSize to calculate the directory and file size.

There are some interesting points to make about Apple's iOS storage settings:

It seems Apple also counts the App Groups size as part of the app size. They don't specifically mention this in their documentation.
The application's Library/Cache directory isn't counted as app size. However, based on our business needs, we have to include the cache directory size for better describing to users how much storage capability the LINE app is using. Because of that, we accept the size discrepancy with iOS storage settings and consider it a known issue.
The iOS storage settings sometimes don't update the app storage usage, even when we change the data (by adding or deleting some of them) or restart the device after that.

Conclusion

After a long time of investigation, discussion and experimentation, we finally ended up with a solution to improve the chat storage management and released the feature for the 12.14.0 version of the LINE app. It was worth the effort of our engineers, planners, UX, and QA teams. We hope we have contributed to the improvement of UX, both in terms of satisfaction while chatting as well as using the LINE app in general.

Blog