Index Home About Blog
From: floyd@ptialaska.net (Floyd Davidson)
Newsgroups: comp.dcom.telecom.tech
Subject: Re: ESS overloads?
Date: 10 Dec 1999 09:38:53 GMT

Lisa or Jeff <hancock4@bbs.cpcn.com> wrote:
>Say too many people wnat to make a call at the same time, more than
>the central office can handle.  So people can't get a dial tone.
>
>For both modern and older ESS units, what happens then?
>
>Do the switches queue the offhook requests in the order made, so that
>dial tone is given in request order?

An interesting point.  And it applies to resources other than
just dial tone too.  Dial tone is an indication that a line has
been connected to a digit receiver, so that is the resource
which is requested.  Computer memory allocated to keep track of
a call is another resource with a finite number available.
Connection routes through the switch are another.  Trunks to
other switches, or to specific services such as busy signals or
recorded announcements also have limited availability.

The computer that runs the switch associates each call with
requested resources, and keeps a queue for each resource when
there are more requests than can be filled.  A queue is a
timed event though, and any call that has been in some given
queue for too long is rerouted to a different queue, and could
eventually just be "dropped on the floor", so to speak, if
absolutely none of the requested resources are made available to
it.

Typically a call which cannot be connected to a required
resource will be connected instead to a "treatment" (unless of
course those are all busy too!).  The basic treatment of last
resort is a fast busy signal, which indicates to the customer
that hanging up and trying again is advised.  (Other actions
might be alternate route selections, in which case the customer
will never realize that it happened and may only experience a
slight delay.)  A treatment might be one or more recorded
announcements.  Eventually though, if no resources are available
the switch just shuts off that call attempt and ignores the line
until it goes on-hook again.

However, in a telephone switch a queue doesn't necessarily work
the way one might think.  It isn't a wait in line type of queue
where the first call into the queue is the first one to be
handled.  Instead, for most queues, the _last_ call in is the
first out.  The reason for that is an assumption that the
longest held call is the least likely to be completed, while the
shortest held one is the most likely.  The longest held call in
the queue is the one most likely to have been hung up and not be
there when the process tries to work with it!

And usually the above described method works in the best
interest of callers.  The most common reason for resource
exhaustion is not from too many callers, but from an outage
caused by an equipment failure that appears to be a request for
a resource.  For example, for telephone lines provided through
any kind of a remote unit, lines are be trunked to the switch
via T Carrier.  If the carrier system between the remote and the
switch goes down, even momentarily, then all lines through it
become off-hook in order to busy them out and avoid directing
calls toward them.  But when they go off-hook they request a
digit receiver!  Hence the digit receiver queue instantly fills
up, and not one of the requests is actually a valid call
attempt.  Customers on other lines who pick up a phone the
instant after that outage begins will not likely be affected at
all.  If a first in first out queue were used, the call attempt
most likely to not be processed would be the valid attempts,
while those that are processed would be the invalid ones.

In a properly configured and provisioned switch is should be a
very rare occasion that customer calls are dropped off the end
of a queue when processing was possible. The above described
method provides the best service from the customer's point of
view.

>Can the switch get overloaded and become disabled in any way?

No.  It just dribbles calls onto the floor instead of into the
cash bin, thats all!

In fact, telephone switches are a fault tolerant real time on
line transaction processing (OLTP) computer.  Usually we think
of that description in terms of the bank's automatic teller
machine, but there is a difference.  With an ATM the records of
what happened are the one single most important part of the
transaction.  No failure mode should ever result in money being
in limbo, having been removed from records in one place and not
yet delivered to another and with no record of where it is.  The
process can die, and the whole thing has to be started over
again, but the record of what was done must be saved at all
cost.  Telephone OLTP is different.

With a telephone call we must, at all costs, preserve the
process itself, and the administrative records of what it is are
only of secondary importance.  Hence, if you are talking to
Grandma and the switch crashes, the ideal thing is that you
continue talking to Grandma and never even know the telco has a
problem.  No new calls can be processed until the computer is
rebooted, but already connected calls continue to function.  The
computer may lose track of them completely (the memory allocated
to keep a record of it has been wiped clean).  But if the
computer discovers that a line is off-hook and connected to
another line, and the computer has no idea how that happened, it
does _not_ hang those two lines up and put them in a known
state.  Instead it puts them in a queue!  That one is a list of
lines whose status is unknown and should be monitored for
change.

If the call happens to be a toll call, the records of when it
started and ended may be lost.  Lucky you!  No bill.  But you
won't even know that the switched crashed, much less that you
should keep talking half the night.

>Does the switch take any other pre-emptive actions (like
>reducing the allowable time to dial or eliminating high end
>functions) to increase is basic call handling capacity?

There may or may not be things other than the last-in/first-out
strategy described above.  Some are arranged so that a complete
reconfiguration can be manually invoked during a disaster to
grade resource allocation differently.  An example would be to
restrict the percentage of incoming calls compared to outgoing
calls.  That might be done when a local disaster such as an
earthquake happens.  Otherwise the exchange can become clogged
with so many incoming calls that locals cannot use the system
for disaster relief work.  And of course it might even be
arranged to where some trunks are absolutely reserved for calls
to or from certain blocks of telephone numbers, which would be
used by police, fire, hospitals etc.

>In typical overloads caused for whatever reason (eg natural
>disaster), how long is the wait for a dial tone?

It depends.  But given the last-in-first-out queue, one of the
things that should be obvious is that waiting while you have
a dead line might not be productive.  If you actually end up
in the queue, you may be better off to hang up and try again
than to continue waiting.

  Floyd

--
Floyd L. Davidson                          floyd@barrow.com
Ukpeagvik (Barrow, Alaska)



From: floyd@ptialaska.net (Floyd Davidson)
Newsgroups: comp.dcom.telecom.tech
Subject: Re: ESS overloads?
Date: 10 Dec 1999 17:53:43 GMT

Terry Kennedy  <terry@gate.tmk.com> wrote:
>Floyd Davidson <floyd@ptialaska.net> writes:
>> With a telephone call we must, at all costs, preserve the
>> process itself, and the administrative records of what it is
>> are only of secondary importance.  Hence, if you are talking
>> to Grandma and the switch crashes, the ideal thing is that
>> you continue talking to Grandma and never even know the telco
>> has a problem.  No new calls can be processed until the
>> computer is rebooted, but already connected calls continue to
>> function.  The computer may lose track of them completely
>> (the memory allocated to keep a record of it has been wiped
>> clean).
>
>  Do any current full-digital switches (5ESS, DMS-whatever)
>work this way?  The last switch I worked with that I *know* can
>do this is a 1AESS, but that's a computer-controlled relay
>switch.
>
>  I've suffered through unpleasant experiences on an Ericsson
>AXE (I bet you can tell which CLEC I'm talking about just from
>that 8-). As an example, I was talking to the switch tech about
>some stuck lines, when I heard him go "Oh, S#:t!", followed by
>the sound of a couple hundred hookswitch relays on my modems
>all releasing at once as the switch crashed.

I can only speak positively about DMS systems, all of which can
reboot without killing any calls that are already connected.  It
has been several years now since I've worked on one, so I don't
remember some of the specifics.  A warm restart definitely.  I'm
all but positive that calls can survive a cold restart; however,
they will definitely lose all of the call information (AMA for
example).  A reload restart, where it goes to disk and reloads
the main cpu and all peripherals, well...  I think that one
dumps calls as the peripherals are restarted.

The peripherals are the key to the way that works.  The main cpu
keeps track of calls and routing, but the actual connection and
data flow is entirely within the peripheral devices.  The line
or trunk modules even collect digits without bothering the main
cpu (it wasn't always that way, but has been for about ten years
now).  Once the called number is known, the information is
passed to and processed by the cpu and a call route is selected.
Each device (a line module, a network module, and another line
module for example) is instructed with what to do.  At that
point the cpu no longer is concerned at all with that call until
one of the modules (peripheral devices) reports some change in
status.

That change is just plugged into the memory block, that if I
remember right is named a Call Control Block (CCB).  Whatever it
is called, it is a data structure kept in memory by the main
cpu.  It functions as a state machine, and the change in status
reported by the peripheral device triggers a cpu interrupt to
update the data.  At regular intervals various audit processes
access the data and check it for unresolved states.  Hence, if a
calling line has changed from off-hook to on-hook, an audit will
notice that it is on-hook and various requirements have not
been met.  So it launches processes to accomplish those
requirements, such as issuing instructions to the other
peripheral devices to release resources to the idle pool, update
call detail or automatic message accounting records and
Operational Measurements records, and probably 102 other things.

After a cold reboot, the cpu's list of Call Control Blocks is
empty (a cold reboot re-initializes all data and restarts all
processes, but does not reload the entire operating system from
disk), and when the peripheral devices start triggering
interrupts telling the cpu that a particular CCB needs to be
updated with a new status, the cpu discovers that this update is
for a non-existent call.  That results in a log message being
generated to that effect, and in fact the majority of log
messages after a cold reboot, once things have settled down and
are actually working again, will be noise about resetting
devices from an unknown state.

A DMS is just one big distributed computing device.  I'm not
sure what they have now, but last I knew it was either a pair of
68020's or 68030's in the main cpu, and each peripheral was
either a pair of 68000's or for maintenance modules was made up
of 8085's.  Of course everything is duplicated and running in
parallel.  So a peripheral device actually has 4 68000's in it.

One DMS-100/200 could be programmed to control every nuclear
power plant in the country, and make it look easy, while
it also provided PBX services to boot.

I assume a 4E is just as much fun.

  Floyd


--
Floyd L. Davidson                          floyd@barrow.com
Ukpeagvik (Barrow, Alaska)



From: floyd@ptialaska.net (Floyd Davidson)
Newsgroups: comp.dcom.telecom.tech
Subject: Re: ESS overloads?
Date: 11 Dec 1999 14:20:59 GMT

Wally Roberts  <wally@terps.com> wrote:
>Al Varney wrote:
>
>> #  Do the switches queue the offhook requests in the order
>> #  made, so that dial tone is given in request order?

>> techniques, designed to respond appropriately to different
>> causes of overload.  For dial tone, long-term off-hooks tend
>> to be ignored while there are shorter-term off-hooks waiting
>> to be served, and the shorter- term off-hooks are served in a
>> sort of round-robin fashion.  Once you get dial tone, you
>> compete equally (in most cases) for further call resources.

If I remember right, there are very few queues on a DMS that are
FIFO (first in first out), and for some reason I'm thinking
there was exactly one!  All the rest are LIFO (last in first
out).

>My recollection of the 1A off-hook routines was that a
>long-term off-hook would be ignored provided it was on the
>receiver-off-hook ("roh") suspense list, because that
>subscriber had received dial tone, ignored it, then removed
>from the digit receiver, then provided recording treatment ("if
>you would like to make a call please hang up, then....")
>followed by the roh screamer.

Each step of the way, on a DMS, could result in being queued up
for a wait.  No digit receivers available, no announcement
treatment available, no howler treatment available, each has a
separate queue.  The only one that has no queue is the roh list,
which is just a particular state flagged in the Call Control
Block data which will be noted by audit processes.  The audit
process that would be monitoring the roh list would be the
absolute lowest priority of any process on the switch. (With
possibly the single exception of playing astroids on a
maintenance terminal, but BNR never let that kinda stuff get out
into the field anyway... :-(

>But, when the switch is suffering an overload because of
>traffic (not because of a fault or physical damage) those off
>go off-hook and waiting for dial tone tend to receive dial tone
>in the order they came off-hook; i.e., the person off hook the
>longest will get dial tone first once the call processing has
>time to look at off-hooks who have not been sent to roh
>suspense.

It definitely would not be done that way with a DMS.  Either
there is a digit receiver available, or the call is placed into
a LIFO queue.  A timeout in that queue causes a search for the
recorded announment treatment which probably would overflow to
a fast busy if necessary, and otherwise would go to another
LIFO (I'm not sure which it would likely get from the queue
though, a recorded announement or the fast busy) and from that
timeout (either treatment or queue) it once again can either
immediately go to a howler treatment or be queued for one.
A timeout from either one results in the roh state.

>So, it's best to stay off-hook when there is no dial tone, as
>opposed to repeated reorignations.

That may very well depend on which type of switching system.
Certainly for a DMS it doesn't pay to hang on very long.

  Floyd



--
Floyd L. Davidson                          floyd@barrow.com
Ukpeagvik (Barrow, Alaska)


Index Home About Blog