Design And Implementation

The ansar suite of libraries - ansar-encode, ansar-create and ansar-connect - has significant networking goals but is also pragmatic. There have been tradeoffs and compromises. It would be nice if ansar-connect was available for every development language and for every platform. Instead there is one development language and a significant number of supported platforms.

This section of the documentation captures any design or implementation detail that might be useful to the user.

Implementation Language

Ansar is implemented in Python after years of work in C/C++. Moving to Python delivers Ansar to every instance of a modern Python environment. The decision to move was not after determining that Python was a better language, it was about reach. A distributed computing technology that is only available to a small section of the development community can barely refer to itself in such grand terms. To reach the same number of environments with a C/C++ library would take hundreds of build chains and thousands of developer hours.

There is no intention to port ansar to other languages.

Supported Platforms

Ansar has been tested on the following platforms;

Ubuntu 20+
Debian 12+
Raspberry Pi OS 12+ (debian-based)
macOS 14 Sonoma
Windows 11+ (no ansar CLI, i.e. process orchestration)

Fully automated installation of the ansar services, ansar-host and ansar-lan is currently only available on those Linux variants supporting systemd. With some manual configuration these services can also be installed on non-systemd platforms. Use the following commands to automate a partial construction of the ansar-host service;

$ cd <folder-of-operational-services>
$ git clone https://github.com/mr-ansar/ansar-services.git
$ cd ansar-services
$ python3 -m venv .services
$ source .services/bin/activate
$ pip3 install ansar-connect pyinstaller
$ make ansar-host ansar-host-files

This results in the following set of materials;

`.ansar-home`	folder containing a ready-to-run service
`ansar-host-start`	script for the initiation of the service
`ansar-host-stop`	script for the termination of the service
`ansar-host.service`	systemd-specific integration

The .ansar-home folder contains executables and process orchestration details. Initiation and termination procedures for a Linux platform can be found in the respective script files. These can be used as templates for equivalent procedures on non-systemd platforms.

Other Linux Platforms

Any combination of a modern Python interpreter (3.7+) and a Linux-based operating system may support ansar operation. Only those included in ansar testing are listed in the previous section.

Windows Support

Execution of individual Ansar-based scripts on Windows is supported. Process orchestration using the ansar CLI is not. Ansar uses the signal library to implement asynchronous control over running processes. Limited support for this library on Windows, along with the differences between *nix-style filesystems and Windows filesystems (file extensions and path separator) have delayed the implementation of the ansar CLI for Windows.

WAN Networking And The Supporting Service

At the top level of the ansar publish-subscribe services is ansar-wan. This service facilitates the connection of objects located in different LANs. It does this by accepting connections from services at the lower levels and where it matches subscribers to publishers, connection details are sent to the relevant parties. Those parties then make runtime connections to the relay service within ansar-wan. The relay service simply passes byte blocks from one connection to another, as appropriate.

For an initial period this service is available free for evaluation. Provisioning of the service has been kept deliberately low and there are limits imposed on the resourcing of accounts. This is to minimize operational costs and to preserve the quality of service in a few simple ways.

Including a third-party online service as a component of a networking product brings issues such as availabiliy, reliability, privacy and performance. In response to these issues, the future of the ansar-wan service is defined by the following points;

There will be an initial period of evaluation. This will be for at least three months but no longer than one year. The earliest end date is 2024-08-30 and is subject to discussions with active account holders.
Privacy risks can be minimized by using non-personal details during signup. The only account value significant to ansar-wan is the email address used as a unique identity.
All communications with ansar-wan are encrypted and authenticated.
Messages are never stored because there is no feature of the ansar suite that requires it to do so; there are no delivery guarantees in the style of MQTT. There are practical reasons to avoid the storing of messages including associated IAAS costs and performance overhead. Most importantly it avoids legal considerations that come into effect when data is at rest. By never storing messages ansar-wan maintains a strong legal position.
There are several options available at the end of the evaluation period. These are listed below. Any changes to the service will be discussed with active account holders.
An organization may be formed to take over the ansar-wan service. This would likely be subscription based to cover the operational costs of what is potentially a large-scale, multi-tenanted service. At present the service resides in AWS, consumes EC2 instances and incurs costs for outbound network activity. Half of all relay activity is classified as outbound traffic.
There will be an option to deploy a private instance of ansar-wan. Materials and procedures will be available such that the service can be constructed under a different IAAS account (e.g. AWS), in a manner similar to that documented for ansar-host and ansar-wan. There will be a one-off fee and an optional support agreement.
A private instance of ansar-wan is the strongest response to the privacy issue. It is a complete guarantee that privacy of messages is not somehow being breached during the relay process. It does move the security risk from the multi-tenanted service to the owner of the private instance.
Deployment of a private instance brings the lowest possible costs to the owner and there is the potential to optimize for network latency, i.e. by selection of the nearest IAAS region.
In the event that the multi-tenanted service is not available the private instance option will be available to all active account holders. For the first 5 account holders the one-off fee will be waived.

Asynchronous Operation

The model of execution was adopted from SDL. This model is based around signal processing where signals are analogous to messages. Objects are created, send and receive messages, create new objects, and terminate. For anyone unfamiliar with or curious about this style of programming, refer to this dining philosophers solution.

There is a single internal map of objects, where the key is an integer and the value is one of the supported object types, i.e. a function with its own thread, or a machine. Every async application starts with the creation of a single object, using create_object() or create_node().

An async application may accumulate tens of thousands of objects. Attempting to create large numbers in a burst may encounter message overflow problems (refer to following paragraps).

Objects that are implemented as functions cause the allocation of a platform thread. Objects implemented as machines that use the Threaded base class also cause the allocation of a thread. A reasonable maximum number of threads for any process may be around 500, i.e. an async application with large numbers of objects that are allocated threads, may become adversely affected by the overhead of thread switching.

Machines that are not assigned to a named thread on the call to bind() are assigned to the default thread, created automatically by the async runtime. A thread that processes messages on behalf of large numbers of objects may become an execution bottleneck and affect the order of message processing.

Sending messages requires message queues. Message queues are assigned a maximum size of 8192 at creation time. Messages sent to a full queue are discarded - a burst of 10k messages to the same destination has a good chance of resulting in lost messages. Lost messages are likely to be a symptom of programming as if there were infinite resources. Redesign the messaging to be more exchange-like rather than a one-way flood.

Discarding overflow messages is deliberate. The alternative to discarding is blocking until items are taken off the queue by a consumer. This is considered to be the risky option (i.e. deadlocks).

Async Timers

Timers are implemented as messages that are processed by the same mechanisms as any other message. An object requests a timer using start() and an instance of the specified timer will arrive after the specified time period. This arrangement means that timers can be applied to anything - there is no need for each individual operation to provide a timing option. A timer can also be applied to an expected sequence of operations, e.g. a T1 message can be used to indicate that the sequence of operations A, B and C took too long.

Timers will arrive after a period at least as long as the specified time. Timers can be delayed in heavy traffic. Internally, monotonic time values are used. Starting a timer that is still pending is effectively a restart. The countdown continues with the new period.

Timers are not intended to be realtime. They run at human speed rather than machine speed. Accuracy is around 0.25s. Timer values at a finer resolution have no effect, i.e. with a value of 2.1s the timer message will arrive some time after 2.0s have passed.

To cancel an outstanding timer use cancel(). There is always the chance that timer messages can pass each other by in message queues - its possible to receive a timer after it has been cancelled. The standard approach to message processing should ensure these are ignored.

Network I/O And Safety Measures

Sending messages across networks uses the same method (i.e. send()) used to send to any async object and uses the same underlying message processing machinery. Bursts of large numbers of network messages may result in overflow of a message queue.

There are no real limits imposed on the sending end of network messaging. Any message type registered using bind() will be transferred across the network. Each messaging socket (i.e. accepted or connected) is assigned its own outbound message queue and streaming buffers. Large messages may result in processing bottlenecks and memory fragmentation. A reasonable maximum message size may be around 100k. This refers to the quantity of memory consumed by the Python application message.

All socket I/O is based around blocks of 4096 bytes.

Several limits are imposed at the receiving end of network messaging. The encoded representation of a message (the JSON byte representation) cannot exceed 1Mb and there are further checks applied to frame dimensions. Any message that fails to meet requirements results in an immediate shutdown of the associated socket and a session control message is sent to the relevant party (i.e. Closed or Cleared). These are measures to defend against messages that somehow arrive corrupted and the possibility of bad actors.

Long Term Connections And Keep-Alives

Long term connections are at risk of failures in the operational environment. These include events such as dropout of network infrastructure (e.g. someone pulls the plug on a network switch) and discarded NAT mappings. The significance of these events is that they are likely to go unreported. There will be no related activity in the local network stack and therefore no Abandoned message propagated to the application.

Enabling the self_checking flag on the call to connect() activates a keep-alive capability. After a period of inactivity - no messages sent or received - the library will perform a low-level enquiry-ack exchange to verify the operational status of the network transport and the remote application. This may result in either an error in the network stack or a timeout, further resulting in Abandoned or Closed messages, respectively.

Inactivity is defined to be a period of two minutes with no message activity. The enquiry-ack exchange is expected to complete within five seconds.

Long term connections are good in that they improve responsiveness. Messages can be sent in response to a local event without having to wait for a successful connection. On the other hand, regular housekeeping messages are noisey and may create their own problems at scale (e.g. a fanout to thousands of services).

Connections initiated with a defined task and an expected completion, in the style of a file transfer, do not need a keep-alive capability. The presence of the associated machinery may be an unnecessary complication.

By default the self_checking flag is disabled. Note also that all connections established as a result of subscribe() calls have self_checking enabled.

Logging associated with keep-alive activity is deliberately limited to the recording of a few initial enquiry-ack exchanges. This is to provide evidence that the feature is operational and also to preserve the value of the logging facility, i.e. useful log entries may be pushed out by the recording of endless enquiry-ack exchanges.

Data Types And Portability

A type system is imposed on all messages. This facility is inherited from ansar-encode and is therefore relevant to both file and network operations.

Network messaging under ansar-connect is fully-typed, i.e. applications send and receive instances of application types. This is part of the initiative to remove networking details from the application code and requires that the library knows the internal details of each message. The type information forms the basis for marshaling and encoding activities.

The type system is static in nature rather than dynamic. This is a design decision and motivated by goals of portability and robustness. Data based on a static type system has an increased chance of moving between languages, e.g. Python and C++. Robustness is improved in the sense that checks that would probably be needed in the application instead occur automatically in the messaging machinery, e.g. if a list of 3 GPS coordinates is expected.

Implementation of data transfer between ansar (i.e. Python) and some other language, at the file level would be a realistic initiative. This kind of export/import code would need to resolve any mismatches in the respective type systems, e.g. the Python int vs the C/C++ int. The chances of a quality mapping is improved by the presence of the ansar type system.

Publish And Subscribe Networking

There are four techniques used to deliver the pub-sub environment where send() can transfer messages anywhere, i.e. between processes within a host, between hosts in a LAN and between hosts in different LANs. There is an ansar CLI (ansar-group), two installable services and an online service. Previous implementations of ansar adopted a variety of approaches, including the use of UDP broadcasting to discover the presence of other hosts on the LAN to avoid the need for the ansar-lan service.

The current set of materials is considered to be the best compromise for reasons such as the variable level of UDP broadcast support across all network infrastructure and a somewhat hostile attitude from network administrators.

Subscribers Still Need To Listen

Every service registered with the publish() function results in a listen(). Special parameters arrange for allocation of a port by the network, i.e. an ephemeral port. Connections made on behalf of subscribing clients will refer to that port. Those connections may come from processes on the same host or from across the LAN. WAN connections are handled differently. Host and port information is propagated from the async application up through the ansar services.

Peer Connections

Pub-sub networking results in optimal network connections. If a subscriber and a publisher are on the same host there will be a connection across the loopback interface. If they are on other sides of the same LAN then a connection to a private IP is likely (e.g. 192.168.1.24). These connections are referred to as peer connections and are managed by the async runtime. If there are multiple subscriber objects within a process requesting the same publisher, the runtime will establish a single peer connection and multiplex all communications over that single transport. Peer connections are cleared when the last related subscriber terminates.

GROUP, HOST, LAN and WAN

Matches between subscriber and publisher can occur at any level of the ansar services. Each level has its own behaviour with respect to the address information eventually supplied to a connection agent.

GROUP - provides a fixed host value, i.e. the loopback interface,
HOST - provides a fixed host value, i.e. the loopback interface,
LAN - a host value associated with the upward service connection and acquired through the sockets API,
WAN - refer to following paragraphs.

Discovery of LAN Addresses

Pub-sub for LAN connections relies on all hosts connecting to the ansar-lan service and providing their respective publish and subscribe datasets. The ansar-lan service queries the sockets API for the IP information about each remote end and substitutes this into the address information for the relevant services.

WANs, Directories And Relays

Network messaging at the WAN level is based around the ansar-wan service. When communication between a subscriber and a publisher is initiated at WAN level both parties make outbound connections to a relay service that will support the subsequent network messaging.

Further explanation of WAN messaging is outside the scope of this documentation. Suffice to say that

communications with the ansar WAN entity (i.e. directory) are encrypted and authenticated,
communications with the ansar relay service are encrypted and authenticated,
WAN messages pass through the relay service as byte blocks,
for performance reasons WAN messages are never decoded by the relay service.

Connections to a directory are authenticated using the details acquired from an ansar directory command. This connection may be configured into an ansar-group, the ansar-host service or the ansar-lan service. Connections to the relay service are authenticated using temporary credentials issued by the directory. These credentials have a short TTL.