The Internet of Things (IoT) is the concept of an ubiquitous network of devices to facilitate communication between the devices themselves, as well as between the devices and human end users. The involved devices are typically constrained devices such as RFID sensors, but more sophisticated ones like smartphones are also considered to be part of the IoT ecosystem.
Processing data from IoT devices lends itself to the Big Data approach, that is, using scale-out techniques on commodity hardware in a schema-on-read fashion along with community-defined interfaces. Why is that so? Well, in order to develop a commercial-grade IoT application you need to be able to capture and store all the incoming sensor data to build up the historical references (volume aspect of Big Data). Then, there are dozens of data formats in use in the IoT world and none of the sensor data is relational per se (variety aspect of Big Data). Last but not least, many devices generate data at a high rate and usually we cope with data streams in an IoT context (the velocity aspect of Big Data).
In the following we discuss a polyglot processing architecture, enabling us to develop and operate IoT applications, at scale: the Internet of Things Architecture (or iot-a, for short). It consists of three main building blocks:
MQ/SP ... the Message Queue/Stream Processing
block takes the input data and performs—depending on the application
requirements—one or more of the following operations on it:
- Buffering: to adapt to the processing speed or throughput characteristics of downstream components, it often is necessary to buffer inbound data points. Equally, through micro-batching data points, the ingestion rate into downstream components can be increased.
- Filtering: ranging from simple cleansing operation to application-specific removal of certain data points.
- Complex online processing over streams: continuous queries, aggregates, counts, real-time machine learning algorithms, etc.
Example: In an application for the automotive sector, the MQ/SP block might be responsible for generating a heads-up text message that is sent to the driver of a connected car in the event of a predicted engine malfunction. The most important aspect here is that the alert reaches the user in time, potentially preventing an accident.
DB ... the Database block takes the data from the upstream MQ/SP
and provides structured, fine-grained, low-latency access to the data points.
Due to the nature of the data, the database block is typically a NoSQL solution,
able to accommodate sparse data, with auto-sharding and horizontal scale-out properties.
The database block output is of interactive nature, with an interface provided
either through a store-specific API (such as the HBase API) or through the standard
Example: Staying with the connected car from above, an example usage of the DB block is as follows: a mechanic in a garage can, once the car owner approves a service-as-you-go, inspect the car’s vital signal, asking ad-hoc questions and correlate it with other cars of the same build in order to asses potential damage and develop a repair strategy.
DFS ... the Distributed File System block takes the data from
either the DB block or directly from the MQ/SP block and
performs batch jobs—usually aggregations and reporting type of jobs—over
the entire dataset. This might include combining the data from IoT devices
with other data sources, such as those delivering customer or product data
or potentially unstructured ones, for example PDF documents.
Example: Again in the context of the smart car, an example usage of the DFS block goes as so: the car owner has access to a Web site where, on a weekly basis, metrics about the car, are made available in order for the owner to assess the overall health and performance of her vehicle.
An example configuration of the iot-a is as follows:
To learn more about the above configuration, see the following resources: