Design And Implementation
The ansar suite of libraries - ansar-encode, ansar-create and ansar-connect - has significant networking goals but is also pragmatic. There have been tradeoffs and compromises. It would be nice if ansar-connect was available for every development language and for every platform. Instead there is one development language and a significant number of supported platforms.
This section of the documentation captures any design or implementation detail that might be useful to the user.
Implementation Language
Ansar is implemented in Python after years of work in C/C++. Moving to Python delivers Ansar to every instance of a modern Python environment. The decision to move was not after determining that Python was a better language, it was about reach. A distributed computing technology that is only available to a small section of the development community can barely refer to itself in such grand terms. To reach the same number of environments with a C/C++ library would take hundreds of build chains and thousands of developer hours.
There is no intention to port ansar to other languages.
Supported Platforms
Ansar has been tested on the following platforms;
Ubuntu 20+
Debian 12+
Raspberry Pi OS 12+ (debian-based)
macOS 14 Sonoma
Windows 11+ (no
ansarCLI, i.e. process orchestration)
Fully automated installation of the ansar services, ansar-host and ansar-lan is currently only
available on those Linux variants supporting systemd. With some manual configuration these services
can also be installed on non-systemd platforms. Use the following commands to automate a partial
construction of the ansar-host service;
$ cd <folder-of-operational-services>
$ git clone https://github.com/mr-ansar/ansar-services.git
$ cd ansar-services
$ python3 -m venv .services
$ source .services/bin/activate
$ pip3 install ansar-connect pyinstaller
$ make ansar-host ansar-host-files
This results in the following set of materials;
|
folder containing a ready-to-run service |
|
script for the initiation of the service |
|
script for the termination of the service |
|
systemd-specific integration |
The .ansar-home folder contains executables and process orchestration details. Initiation and
termination procedures for a Linux platform can be found in the respective script files. These
can be used as templates for equivalent procedures on non-systemd platforms.
Other Linux Platforms
Any combination of a modern Python interpreter (3.7+) and a Linux-based operating system may support ansar operation. Only those included in ansar testing are listed in the previous section.
Windows Support
Execution of individual Ansar-based scripts on Windows is supported. Process orchestration using
the ansar CLI is not. Ansar uses the signal library to implement asynchronous control over
running processes. Limited support for this library on Windows, along with the differences
between *nix-style filesystems and Windows filesystems (file extensions and path separator) have
delayed the implementation of the ansar CLI for Windows.
WAN Networking And The Supporting Service
At the top level of the ansar publish-subscribe services is ansar-wan. This service facilitates the connection of objects located in different LANs. It does this by accepting connections from services at the lower levels and where it matches subscribers to publishers, connection details are sent to the relevant parties. Those parties then make runtime connections to the relay service within ansar-wan. The relay service simply passes byte blocks from one connection to another, as appropriate.
For an initial period this service is available free for evaluation. Provisioning of the service has been kept deliberately low and there are limits imposed on the resourcing of accounts. This is to minimize operational costs and to preserve the quality of service in a few simple ways.
Including a third-party online service as a component of a networking product brings issues such as availabiliy, reliability, privacy and performance. In response to these issues, the future of the ansar-wan service is defined by the following points;
There will be an initial period of evaluation. This will be for at least three months but no longer than one year. The earliest end date is 2024-08-30 and is subject to discussions with active account holders.
Privacy risks can be minimized by using non-personal details during signup. The only account value significant to ansar-wan is the email address used as a unique identity.
All communications with ansar-wan are encrypted and authenticated.
Messages are never stored because there is no feature of the ansar suite that requires it to do so; there are no delivery guarantees in the style of MQTT. There are practical reasons to avoid the storing of messages including associated IAAS costs and performance overhead. Most importantly it avoids legal considerations that come into effect when data is at rest. By never storing messages ansar-wan maintains a strong legal position.
There are several options available at the end of the evaluation period. These are listed below. Any changes to the service will be discussed with active account holders.
An organization may be formed to take over the ansar-wan service. This would likely be subscription based to cover the operational costs of what is potentially a large-scale, multi-tenanted service. At present the service resides in AWS, consumes EC2 instances and incurs costs for outbound network activity. Half of all relay activity is classified as outbound traffic.
There will be an option to deploy a private instance of ansar-wan. Materials and procedures will be available such that the service can be constructed under a different IAAS account (e.g. AWS), in a manner similar to that documented for ansar-host and ansar-wan. There will be a one-off fee and an optional support agreement.
A private instance of ansar-wan is the strongest response to the privacy issue. It is a complete guarantee that privacy of messages is not somehow being breached during the relay process. It does move the security risk from the multi-tenanted service to the owner of the private instance.
Deployment of a private instance brings the lowest possible costs to the owner and there is the potential to optimize for network latency, i.e. by selection of the nearest IAAS region.
In the event that the multi-tenanted service is not available the private instance option will be available to all active account holders. For the first 5 account holders the one-off fee will be waived.
Asynchronous Operation
The model of execution was adopted from SDL. This model is based around signal processing where signals are analogous to messages. Objects are created, send and receive messages, create new objects, and terminate. For anyone unfamiliar with or curious about this style of programming, refer to this dining philosophers solution.
There is a single internal map of objects, where the key is an integer and the value is one of the
supported object types, i.e. a function with its own thread, or a machine. Every async application
starts with the creation of a single object, using create_object() or create_node().
An async application may accumulate tens of thousands of objects. Attempting to create large numbers in a burst may encounter message overflow problems (refer to following paragraps).
Objects that are implemented as functions cause the allocation of a platform thread. Objects
implemented as machines that use the Threaded base class also cause the allocation of a thread.
A reasonable maximum number of threads for any process may be around 500, i.e. an async application
with large numbers of objects that are allocated threads, may become adversely affected by the
overhead of thread switching.
Machines that are not assigned to a named thread on the call to bind() are assigned to the default thread, created automatically by the async runtime. A thread that processes messages on behalf of large numbers of objects may become an execution bottleneck and affect the order of message processing.
Sending messages requires message queues. Message queues are assigned a maximum size of 8192 at creation time. Messages sent to a full queue are discarded - a burst of 10k messages to the same destination has a good chance of resulting in lost messages. Lost messages are likely to be a symptom of programming as if there were infinite resources. Redesign the messaging to be more exchange-like rather than a one-way flood.
Discarding overflow messages is deliberate. The alternative to discarding is blocking until items are taken off the queue by a consumer. This is considered to be the risky option (i.e. deadlocks).
Async Timers
Timers are implemented as messages that are processed by the same mechanisms as any other
message. An object requests a timer using start() and an instance of the
specified timer will arrive after the specified time period. This arrangement means that
timers can be applied to anything - there is no need for each individual operation to provide
a timing option. A timer can also be applied to an expected sequence of operations, e.g. a
T1 message can be used to indicate that the sequence of operations A, B and C
took too long.
Timers will arrive after a period at least as long as the specified time. Timers can be delayed in heavy traffic. Internally, monotonic time values are used. Starting a timer that is still pending is effectively a restart. The countdown continues with the new period.
Timers are not intended to be realtime. They run at human speed rather than machine speed. Accuracy is around 0.25s. Timer values at a finer resolution have no effect, i.e. with a value of 2.1s the timer message will arrive some time after 2.0s have passed.
To cancel an outstanding timer use cancel(). There is always the chance
that timer messages can pass each other by in message queues - its possible to receive
a timer after it has been cancelled. The standard approach to message processing should
ensure these are ignored.
Network I/O And Safety Measures
Sending messages across networks uses the same method (i.e. send()) used to
send to any async object and uses the same underlying message processing machinery. Bursts
of large numbers of network messages may result in overflow of a message queue.
There are no real limits imposed on the sending end of network messaging. Any message type registered using bind() will be transferred across the network. Each messaging socket (i.e. accepted or connected) is assigned its own outbound message queue and streaming buffers. Large messages may result in processing bottlenecks and memory fragmentation. A reasonable maximum message size may be around 100k. This refers to the quantity of memory consumed by the Python application message.
All socket I/O is based around blocks of 4096 bytes.
Several limits are imposed at the receiving end of network messaging. The encoded representation
of a message (the JSON byte representation) cannot exceed 1Mb and there are further checks
applied to frame dimensions. Any message that fails to meet requirements results in an immediate
shutdown of the associated socket and a session control message is sent to the relevant party
(i.e. Closed or Cleared). These are measures to defend
against messages that somehow arrive corrupted and the possibility of bad actors.
Long Term Connections And Keep-Alives
Long term connections are at risk of failures in the operational environment. These include
events such as dropout of network infrastructure (e.g. someone pulls the plug on a network
switch) and discarded NAT mappings. The significance of these events is that they are likely
to go unreported. There will be no related activity in the local network stack and therefore
no Abandoned message propagated to the application.
Enabling the self_checking flag on the call to connect() activates
a keep-alive capability. After a period of inactivity - no messages sent or received - the
library will perform a low-level enquiry-ack exchange to verify the operational status of
the network transport and the remote application. This may result in either an error in
the network stack or a timeout, further resulting in Abandoned
or Closed messages, respectively.
Inactivity is defined to be a period of two minutes with no message activity. The enquiry-ack exchange is expected to complete within five seconds.
Long term connections are good in that they improve responsiveness. Messages can be sent in response to a local event without having to wait for a successful connection. On the other hand, regular housekeeping messages are noisey and may create their own problems at scale (e.g. a fanout to thousands of services).
Connections initiated with a defined task and an expected completion, in the style of a file transfer, do not need a keep-alive capability. The presence of the associated machinery may be an unnecessary complication.
By default the self_checking flag is disabled. Note also that all connections
established as a result of subscribe() calls have self_checking enabled.
Logging associated with keep-alive activity is deliberately limited to the recording of a few initial enquiry-ack exchanges. This is to provide evidence that the feature is operational and also to preserve the value of the logging facility, i.e. useful log entries may be pushed out by the recording of endless enquiry-ack exchanges.
Data Types And Portability
A type system is imposed on all messages. This facility is inherited from ansar-encode and is therefore relevant to both file and network operations.
Network messaging under ansar-connect is fully-typed, i.e. applications send and receive instances of application types. This is part of the initiative to remove networking details from the application code and requires that the library knows the internal details of each message. The type information forms the basis for marshaling and encoding activities.
The type system is static in nature rather than dynamic. This is a design decision and motivated by goals of portability and robustness. Data based on a static type system has an increased chance of moving between languages, e.g. Python and C++. Robustness is improved in the sense that checks that would probably be needed in the application instead occur automatically in the messaging machinery, e.g. if a list of 3 GPS coordinates is expected.
Implementation of data transfer between ansar (i.e. Python) and some other language, at the file level
would be a realistic initiative. This kind of export/import code would need to resolve any mismatches
in the respective type systems, e.g. the Python int vs the C/C++ int. The chances of a quality
mapping is improved by the presence of the ansar type system.
Publish And Subscribe Networking
There are four techniques used to deliver the pub-sub environment where send() can transfer messages
anywhere, i.e. between processes within a host, between hosts in a LAN and between hosts in different LANs.
There is an ansar CLI (ansar-group), two installable services and an online service. Previous
implementations of ansar adopted a variety of approaches, including the use of UDP broadcasting to
discover the presence of other hosts on the LAN to avoid the need for the ansar-lan service.
The current set of materials is considered to be the best compromise for reasons such as the variable level of UDP broadcast support across all network infrastructure and a somewhat hostile attitude from network administrators.
Subscribers Still Need To Listen
Every service registered with the publish() function results in a listen(). Special
parameters arrange for allocation of a port by the network, i.e. an ephemeral port. Connections made
on behalf of subscribing clients will refer to that port. Those connections may come from processes on
the same host or from across the LAN. WAN connections are handled differently. Host and port information
is propagated from the async application up through the ansar services.
Peer Connections
Pub-sub networking results in optimal network connections. If a subscriber and a publisher are on the same host there will be a connection across the loopback interface. If they are on other sides of the same LAN then a connection to a private IP is likely (e.g. 192.168.1.24). These connections are referred to as peer connections and are managed by the async runtime. If there are multiple subscriber objects within a process requesting the same publisher, the runtime will establish a single peer connection and multiplex all communications over that single transport. Peer connections are cleared when the last related subscriber terminates.
GROUP, HOST, LAN and WAN
Matches between subscriber and publisher can occur at any level of the ansar services. Each level has its own behaviour with respect to the address information eventually supplied to a connection agent.
GROUP - provides a fixed host value, i.e. the loopback interface,
HOST - provides a fixed host value, i.e. the loopback interface,
LAN - a host value associated with the upward service connection and acquired through the sockets API,
WAN - refer to following paragraphs.
Discovery of LAN Addresses
Pub-sub for LAN connections relies on all hosts connecting to the ansar-lan service and providing their respective publish and subscribe datasets. The ansar-lan service queries the sockets API for the IP information about each remote end and substitutes this into the address information for the relevant services.
WANs, Directories And Relays
Network messaging at the WAN level is based around the ansar-wan service. When communication between a subscriber and a publisher is initiated at WAN level both parties make outbound connections to a relay service that will support the subsequent network messaging.
Further explanation of WAN messaging is outside the scope of this documentation. Suffice to say that
communications with the ansar WAN entity (i.e. directory) are encrypted and authenticated,
communications with the ansar relay service are encrypted and authenticated,
WAN messages pass through the relay service as byte blocks,
for performance reasons WAN messages are never decoded by the relay service.
Connections to a directory are authenticated using the details acquired from an ansar directory
command. This connection may be configured into an ansar-group, the ansar-host service or the
ansar-lan service. Connections to the relay service are authenticated using temporary credentials
issued by the directory. These credentials have a short TTL.