Strategy for scheduling dial-up service requests
SummaryConsider the situation where a number of computers wish to download information overnight. The server may be engaged or temporarily faulty so the client will need to have a procedure for scheduling another attempt in a little while. I look at what happens if when an attempt fails the next attempt is scheduled for a random time in the first half of the time window remaining. The scheme looks promising for efficient use of resources and reliable connection with the added bonus that if some clients try to 'cheat' by pretending that their window finishes earlier in an attempt to get preferential service then their probability of missing out completely increases.
IntroductionOne of my clients will soon be installing a system which downloads data from a service which is available between midnight and 8am. The duration of the download will be a small number of minutes so if a first attempt to connect fails then another attempt can be made later and so on. The general scheme I thought of takes the premise that early on in the time window we can be laid-back about getting the data but as the deadline of 8am approaches then we need to be more insistent in our attempts to connect.
So we select a random time between midnight (for the first attempt) or now (if we're going to have another go in a while) halfway between midnight/now and 8am. So the first attempt will be in between 00.00 and 04.00. If this happened to be 02.00 and failed then we look at the time remaining time (6 hours) divide by 2 and so reschedule for the next attempt somewhere in the next 3 hours.
For our own one-off use this is a feasible system but what would happen if everyone used the identical method?
InvestigationThis pattern has the characteristic of the less time we have left the more often we try. To my mind this is a sensible approach. What are the implication of a large number of clients using the same system?
Firstly of course the use of random intervals avoids the possibility of say everyone attempting to dial in at 00.00 saturating the server for a short while, then all trying again at exactly the same time in the future with similar results. Anyone will tell you that a steady flow is better than floods and droughts.
But is the flow even throughout the window? In a perfect system we would have 100% utilisation of our server for 100% of the time. It turns out (by using a simple simulation) that there is an even distribution of successful call to the server during the first half of the time window. If the server is lightly loaded then there is a long tail to roughly the 75% of maximum window point and the server becomes very lightly loaded. On the other hand, the more the server is loaded then the longer the even usage is. At very high loadings the method performs very efficiently.
Expressed another way: Heavily loaded systems approach 100% efficiency. We never see a last minute rush. On a less heavily loaded system there is not such a rush to finish on time so load is spread out more especially after the half way point when sometimes (depending on circumstances) there is a very sudden change to more server resource available as the few remaining clients take their time to attempt reconnection.
For a system that is theoretically loaded at L% the final finishing time as a percentage of the window duration is roughly 50% + (L/2) for relatively long connection periods. For a larger number of shorter sessions the load-last connect curve becomes more like 75% + (L/4). In all cases when the system is loaded almost to capacity the 'efficiency' improves so that the utilisation of server resources is maximised throughout the whole time window.
DiscussionIf I was a service provider then I would be happy if all my customers used this scheme. The service I'd offer with confidence is "Get your data by <time>...".
How would the service stand up to interruptions or degradation? Having run a basic simulation, it appears that if the interruption happens in the early part of the time window then it makes an efficient job of continuing where it left off. Interruptions later in the period when there may well be greater availability of server resources as the tail is dealt with can be dealt with (when the service is restored) at full speed. Degradation of service has the inevitable consequence of reducing the rate at which service requests can be dealt with but the remaining resources are now more highly utilised.
What happens if some clients decided to jump to the head of the queue by setting their time window to just the early part of the window used by the others? This 'selfish' action should mean that they get their data by an earlier time. In which case what is to stop everyone doing it and defeating the system? One answer is to have random 'dark' periods when there is no service inserted during the early stages. This would increase the chances of a client having scheduled an attempt for a dark period failing, then finding that the dark period extends beyond their private finish time. In this case they have now failed to get any data and is the penalty for being pushy. On the other hand the patient clients may be frustrated by the dark period but as they have more time in hand can retry successfully later.
Because of the even load during the first half of the window period the server can predict when service requests should finish. This allows for the resources of the server to be tuned. Because of the built-in "works harder when under greater pressure" nature of the scheme it would not be a difficult or risky operation to tweak say the number of modems. And of course, since we log each night when the last client called we have an immediate measure of the safety factor.
The simulation used to obtain the results is very crude but probably reliable. It would be interesting to have a better understanding of the behaviour of the final tail at lower loadings where although the overall objective of completing all service requests by a given time is easily achieved it is an inefficient sweeping-up process.
ConclusionA simple, efficient, reliable, resilient, measurable and easily controllable method for scheduling time-limited service requests. The "works harder under more load" characteristic is an important feature. There don't appear to be any vices and mis-use can be penalised. A better understanding of the process could lead to an optimisation of the basic method.