From ‘Always Connected’ to ‘Fully Off-Grid’: Engineering Offline IoT Solutions
In this article, we'll guide you through our approach to engineering offline-resilient IoT applications, turning the challenge of disconnection into an opportunity for enhanced functionality.
Many IoT devices' Achilles' heel is their dependence on constant connectivity. But what if we could transform this vulnerability into a strength? In this article, we'll guide you through our approach to engineering offline-resilient IoT applications, turning the challenge of disconnection into an opportunity for enhanced functionality.
From robust data modeling to sophisticated synchronization strategies, we'll explore the architectural considerations and engineering principles that make this possible. To illustrate these principles, we will discuss how we applied them in our collaboration with SpotOn, a leader in GPS-based virtual fence systems for dogs. Together, we developed an 'off-grid' mode that ensures pet safety in remote areas, enabling owners to take their dogs on backcountry adventures while maintaining the security of a virtual fence.
In this article, we cover:
Architectural Considerations for Designing Offline Experiences
Drawing from our IoT experience, we've developed a framework of essential considerations for offline scenarios. These practical principles are directly applicable to a wide range of IoT products. We'll explore each consideration in detail, then illustrate their real-world impact through our SpotOn project.
1. Data Modeling
The bedrock of any offline-capable system is its data model. We need a clear understanding of:
- Data Flow: How information travels between system components.
- Data Manipulation: How data is read, written, and synchronized when connectivity is restored.
- Conflict Resolution: How to manage discrepancies when changes occur from multiple sources (e.g., device and server).
An application working offline must support at least one locally stored source of data, serving as both a cache for data downloaded from the backend and a fallback when disconnected. A well-structured data layer might resemble the following:
- Local Data: This will be the source of truth for your app. Whenever you want to read any data, it must be fetched from the Local Data store. It’s important to back the local data to a persistent store, since you might need to access it when the app is opened and you don’t have an internet connection.
- Network Data: This represents the most updated data known by the whole system. Note that because things will be happening offline, the known state will be out of sync (either ahead or behind) with the local data.
- Repository: This is responsible for tasks such as:
- Retrieving network data and updating the local data.
- Saving the local data to persistent store.
- Implementing the data synchronization logic.
- Providing other parts of the app with the proper data for different states.
2. Reading Data
Reading data is relatively straightforward. The repository chooses the most up-to-date information and provides it to the data reader.
A common strategy here is:
- If the application is offline, just provide the local data.
- If the application is online, try to fetch the data from the Network:
- If the request fails, provide the local data.
- If it succeeds, update the local data cache and provide the updated response.
If updating the Network data fails, you have several options:
- Automatic Retry: Attempt to fetch the data again after a short delay.
- Error Handling and User Notification: Inform the user of the failure and provide options.
- Silent Failure: Simply provide the locally cached data without notifying the user.
A more sophisticated approach, often used in social media apps like Facebook or Instagram, is to provide local data immediately while fetching updated data in the background. This allows users to see content quickly, with the app refreshing once new data is available.
3. Writing Data
While reading data is straightforward, writing is more complex as it requires preparing data for later synchronization and conflict resolution.
There are 3 main writing strategies that your app can take:
- Online-only Updates: Update data only if the application is online, ignoring all attempts to update while offline. This approach is very limited and will not support many features while offline.
- Queued Updates: If you have a simple synchronization mechanism (for example, based on timestamps), a good idea is to queue all the write attempts that are done while offline. Then, when the application becomes online again, all the queued writes are sent to the backend, where the data is synchronized.
- Lazy Updates: Update local data first, then enqueue write updates to be done when there's a connection to the backend. It requires defining a conflict resolution strategy, as we’ll see later. The local data might easily run out of sync with the network data because the local data is updated as soon as the app tries to write, while the network data might not be updated immediately.
4. Synchronization and Conflict Resolution
The synchronization process happens when the app regains connectivity. Conflict resolution is crucial if the app writes data locally when offline that is misaligned with the network data source.
Synchronization Approaches:
- Pull Synchronization: The app fetches updated data from the server.
- Push Synchronization: The app sends local changes to the server.
- Hybrid Approach: Some data is pull-synchronized, while other data is push-synchronized.
The choice depends on your specific use case and data update patterns.
Conflict Resolution Methods:
Conflict resolution is a topic that deserves special attention. If the app writes data locally when offline that is misaligned with the network data source, a conflict has occurred that you must resolve before synchronization can take place.
To resolve conflicts the system must have a mechanism to decide what’s the last change to apply. Common methods include:
- Timestamps: Apply the most recent change based on when it occurred.
- Version Tags: Use incrementing version numbers to track changes.
- Last-write-wins: Simply use the most recent write as the authoritative version.
The last-write-wins approach works well for simple data that can be overwritten entirely. However, for complex objects with multiple properties that can change independently, you'll need a more sophisticated reconciliation process.
For example:
- Merge non-conflicting changes automatically.
- For conflicting changes, you might:
- Apply a predefined resolution rule.
- Present the conflict to the user for manual resolution.
- Keep both versions and let the user decide later.
The specific approach you take will depend on your application domain and the nature of your data. It's crucial to plan your conflict resolution strategy as an integral part of your offline support implementation.
Remember, effective synchronization and conflict resolution are key to maintaining data integrity and providing a smooth user experience in applications with offline capabilities.
From Theory to Practice: Developing SpotOn's 'Off-Grid' Mode
Now that we've explored the key foundations of offline IoT architecture, let's examine how we applied these principles in our collaboration with SpotOn.
SpotOn: Revolutionizing Dog Freedom with Virtual Fencing
SpotOn's innovative product enables dog owners to create wireless GPS fences across vast areas, granting their pets the freedom to roam within safe boundaries. The system comprises three key components:
- The Collar: A hardware device with GPS capabilities, sensors, and correction mechanisms.
- The Mobile App: The user's primary interface for controlling the system.
- The Backend: Managing data synchronization and processing.
The collar has a virtual fence boundary loaded onto it. When a dog approaches the borders, it plays different sounds as alerts. If the dog crosses the boundaries, the collar applies vibration or static correction. A well-trained dog will respond to these cues and return inside the containment area.
When fully online, this ecosystem works in harmony. The collar provides real-time updates on the dog's location, escape alerts, battery level, and connectivity status. This information flows to the app via the server or Bluetooth Low Energy (BLE) when in range. While the collar reports status information to the backend server, all configuration and maintenance occur through the mobile app.
The Offline Challenge: Ensuring Pet Safety Beyond the Grid
SpotOn's product is designed for large geographic areas, so many customers reside in rural locations with poor cellular connectivity. While we had already implemented mechanisms to handle periodic connectivity drops, SpotOn envisioned a more ambitious goal: enabling users to go completely "off-grid" while maintaining the product's core functionality.
Given the potential complexity offline scenarios could introduce, we aligned as a team on two critical questions:
- Which features were essential for the offline experience?
- How could we deliver an excellent user experience while minimizing development complexity?
We identified the following critical feature set:
- Fence activation and deactivation.
- Fence creation by walking (not possible to draw from the app).
- Collar status monitoring (battery, GPS signal strength).
Our guiding principles included:
- Users must enable offline mode. No ‘behind the scenes’, automatic switching- at least for V1.
- Because of the nature of the product (Hardware + Mobile App + Backend), we decided to force users to enable offline mode, something similar to enabling Airplane mode in your phone. This on/off mode allowed us to simplify the UX and development efforts, instead of handling several scenarios for automatic switching.
- Features that can’t be supported offline should not be made available to users offline.
- This would significantly simplify the application interface, preventing users from starting flows that would end in dead ends or unrecoverable errors.
- Given the hardware device itself could not be set to offline mode, we should handle incoming messages that could arrive while in Offline mode on the app.
Applying the Architecture to Our Use Case
For our product, the local data is stored in two places:
- The mobile application stores the user information, collars list, fences information and such.
- The collar stores the active fence, which is written by the application.
Since the active fence is stored both in the collar and the app, we can simplify the problem and only care about the synchronization between the app and the backend.
The network data is clearly all the data we have in our backend. One thing to note here is that while the app can be put in offline mode, the hardware could still report data to the backend that we must deal with.
For reading the data, we always use the local store. Using a reactive application allows us to update the application every time the local store is updated, so the problem is reduced to keep the local and network store synchronized, and connect the mobile application to the local store.
For writing the data, we took a lazy update approach: we always update the local storage and push the data to the backend as soon as we have an internet connection.
Finally, for synchronization and conflict resolution, there are two things to consider since the collar might keep sending information to the backend if it occasionally recovers connection:
- All data related to the collar status is updated on the backend, and will be pull-synchronized by the app when it recovers connection.
- All events reported by the collar will be saved, but the information will be temporarily limited if they’re related to data that still doesn't exist in the backend. For example, we can store a breach event, but we can’t show the boundary details since those are still not synchronized on the backend.
The Build: From Low-Effort POC to Full Launch
Once we defined the technical approach and offline considerations, the product team needed to fully scope the feature set and user experience. Rather than engaging in theoretical discussions, we implemented an initial proof of concept (POC) with basic functionality and shared key insights on the implications of offline mode for other features.
This combination of a low-effort POC and tech investigations covered the following areas:
- Caching important information - Including updating the cache and using the cached data when the internet connection is lost.
- Showing the information offline but not allowing to modify it.
- Preparing a list of existing features and the potential implications of having them offline.
Our application has about 10 core features, but not all of them could be used offline. Some examples include:
- Real-time tracking wouldn’t work because we can’t receive location updates from the collar without internet connection, so this feature is removed in Offline mode.
- We rely on maps for several features, and we had to redesign the user experience because maps wouldn’t load while being offline.
The POC and the offline features documentation were then shared with the product and design team, who then defined the final set of features that the app should support according to our use case and a new user experience for each of those features.
Finally, for the implementation, the dev team integrated react-native-cache library to read/write/update the cache, and a synchronization mechanism that works as follows:
- Lazy updates: while in Offline mode, we always update the local cache regardless of the internet connectivity.
- Once the internet connection is recovered:
- Pull-synchronized information for events and collar status information (like battery level or position).
- Push-synchronized information for all changes that we want to keep (like fences created).
- For data reconciliation, we agreed to drop some data that was only relevant while offline, simplifying the synchronization and conflict resolution process.
We developed the initial version based on the POC and product specifications, then shared it with a group of SpotOn beta testers. Their feedback helped us refine the feature, leading to a full release in Spring 2024. This enabled dog owners to confidently take their pets on off-grid adventures that summer, with overwhelmingly positive reviews.
Conclusion
Building robust offline capabilities for IoT devices isn’t easy. Many IoT products are already complex when fully online, so one has to be particularly careful when adding offline capabilities not to dramatically increase dev complexity or introduce frustrating user experiences. But under the right circumstances and done in the right way, the pay off can be huge for users and the product. Through our work with SpotOn, we've hopefully demonstrated that with careful planning, smart architecture, and incremental implementation, it's possible to create IoT systems that provide significant value even when offline.
The key takeaways from our journey:
- Clearly define critical offline features and guiding principles.
- Design a flexible data architecture that supports both online and offline operations.
- Implement smart read-and-write strategies that maintain data integrity.
- Develop a robust synchronization and conflict resolution system.
- Take an incremental approach, starting with a POC and iterating based on real-world feedback.