Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialog Timeout configuration is not working #69

Closed
FerUy opened this issue Jul 4, 2017 · 27 comments
Closed

Dialog Timeout configuration is not working #69

FerUy opened this issue Jul 4, 2017 · 27 comments
Assignees
Labels
Milestone

Comments

@FerUy
Copy link
Contributor

FerUy commented Jul 4, 2017

After experiencing a real life situation with a customer, where USSD Gw Dialog Timeout is set to 15000 (15 secs), but not seeing it effective with tests and traces, a simulation was conducted with identical results. USSD Gw Dialog Timeout was set to 15 secs in the GUI, which is reflected in the correspondent UssdManagement_ussdproperties.xml configuration file (attached).

Then, the following test was conducted with both Restcomm Connect and USSD Gw running (the latter in simulator mode):

  • A simple USSD RVD project was built where a menu is sent to the user (USSD Collect).
  • After dialing the corresponding USSD shortcode in jSS7 simulator, the menu's text correctly appeared in jSS7 simulator's "screen"
  • Then, no action was done to let the session expire.
  • Expected result was seeing some action from the USSD Gateway after 15 seconds (i.e. both TCAP and SIP dialogues terminated), but instead the only thing that happens occurs after 60 seconds and from Restcomm-Connect.

This behaviour is identical as seen on live system onsite with a customer. Attached are the extracts of server logs for the commented simulation test of both USSD Gw and Restcomm-Connect, as well as the correspondent Wireshark trace.

This fix is critical, as apart than the malfunction itself (TCAP dialog configuration is useless, seems to be hardcoded), it impacts performance with high traffic USSD sessions that need to be terminated quickly.

USSD-Restcomm_dialogTimeoutTest.pcap.pcapng.zip
USSDgw_server.log.zip
Restcomm_server.log.zip
UssdManagement_ussdproperties.xml.zip

@FerUy FerUy added the bug label Jul 4, 2017
@FerUy FerUy added this to the 7.2.0 milestone Jul 4, 2017
@nhanth87
Copy link
Contributor

nhanth87 commented Jul 5, 2017

Here is it @FerUy
RestComm/jss7#66

@FerUy
Copy link
Contributor Author

FerUy commented Jul 5, 2017

Thanks @nhanth87. Good to know it's a known issue and actually I doubted initially to raise it here and not in jss7. Having said that, and as this issue is affecting a live implementation of USSD Gw, I'd rather keep it here too. Moreover, the product (USSD Gateway) is delivered to customers with this in place, so it must be fixed.

@nhanth87
Copy link
Contributor

nhanth87 commented Jul 5, 2017

@FerUy your volunteer is appreciated 👍

@vetss
Copy link
Contributor

vetss commented Jul 5, 2017

@FerUy

  1. dialogTimeout in UssdManagement_ussdproperties.xml means the case:
  • USSD PULL
  • timeout from HTTP application
    To simulate it you need:
  • send a PULL request from JSS7 Simulator to USSD GW
  • your HTTP application must not answer in this 15 seconds. (RectcommConnect is SIP - you can not simulate ir by it !!!)
    The test you described does not simulate of usage of "dialogTimeout in UssdManagement_ussdproperties.xml" parameter
  1. "i.e. both TCAP and SIP dialogues terminated" - this behaviour is covered by another parameter for both PULL and PUSH cases - TcapStack_management.xml - (it is in milliseconds). This parameter can be updated by JSS7 management console - TCAP - "Dialog Idle Timeout"

  2. As for what you are going to achieve - please explain - you need a user will face with a timeout / dialog termination / you need a user will NOT face. Let's discuss what you need to achieve and then we will know how to achieve it.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 5, 2017

Hi @vetss

As for our conversation, I here post the results and conclusions of agreed tests:

  • The following parameter was set in TcapStack_management.xml: dialogidletimeout value="30000"
    Note: invoketimeout parameter value needs to be identical to the one of dialogidletimeout parameter, i.e. invoketimeout value="30000" in this case (otherwise, it doesn't work).

  • The following test was carried out:

  1. Started Restcomm-Connect and USSD Gateway in simulator mode.
  2. Dialed a USSD shortcode and got the USSD RVD menu text in jSS7 simulator.
  3. Did nothing for the following 30 seconds while Wireshark was tracing.
  4. USSD Gateway ended the SIP transaction with Restcomm-Connect at precisely the time set in parameter dialogidletimeout, i.e. at 30 seconds.
  5. USSD Gateway did nothing with the TCAP established dialog.
  6. Sent an answer from the jSS7 simulator after those 30 seconds. The answer reaches the USSD Gateway which then sends a TCAP Abort back to the network.

Conclusion:

  • USSD Gateway terminates SIP session at precisely the time set in parameter dialogidletimeout of TcapStack_management.xml, which is OK and the expected result.
  • USSD Gateway drops the local TCAP dialog at precisely the time set in parameter dialogidletimeout of TcapStack_management.xml, which is OK and the expected result.
  • USSD Gateway does not terminate the TCAP dialog with the SS7 peer, which is not the expected result and very bad in terms of signaling resources in the SS7 network. It has another side effect in the CDRs, as too many FAILED_MAP_ERROR results happen, which affects statistics in the accounting/BSS/revenue assurance side.

Please find a Wireshark trace of the aforementioned test attached here.
USSD-Restcomm_TCdialogTimeoutTest.pcap.pcapng.zip

@vetss
Copy link
Contributor

vetss commented Jul 5, 2017

As for now we have two timers at USSD GW:

  1. USSD dialogtimeout timer - it triggers only for PULL case when no response from HTTP / SIP server for long time
  2. TCAP dialogtimeout timer - it triggers when we have SS7 TCAP dialog timeout. It terminates SIP / HTTP dialogs (with proper peer announsments) and terminates SS7 dialog (WITHOUT peer announsements). This timer covers needed events but the problem that it does not announse a SS7 peer, this option does not feet needs.

We can introduce USSD GW level new timeout timer(s) that will have functionality like option 1) but cover all timeout cases:
a) httpSipDialogTimeout - timeout when waiting a response from HTTP / SIP application
b) ss7DialogTimeout - timeout when waiting a response from SS7 subscriber
The first one will replace and overlap timer 1) by the functionality
Timer a) are will be activated just after a message to SS7 peer has sent (and cancelled after a message to SS7 peer has received)
Timer b) are will be activated just after a message to SS7 peer has received (and cancelled after a message to SS7 peer has sent)
These timers will terminate both TCAP and HTTP/SIP dialogs and sends announsements to a peer. Timer b) must be bu default < Timer 2) so this means that in a common use normally timers a) / b) must triggered before timer 2)

PS: for timer a) we can try to reuse of a TCAP dialog timeout timer because it has a possiblity of canceling on timeout process by:

onDialogTimeout() {
  dialog.keepOnline();
  dialog.addUSSDMessage("... explaining a reason of timeout ...")
  dialod.end()
}

@FerUy
Copy link
Contributor Author

FerUy commented Jul 5, 2017

Hi @vetss , I believe you meant this for timer b)

Timer b) (ss7DialogTimeout) will be activated just after a message to SS7 peer has been sent, and cancelled just after a message from SS7 peer has been received (then httpSipDialogTimeout timer is activated).

In other words, if ss7DialogTimeout is 30000 ms, after sending a TCAP message (MAP unstructuredSSRequest or processUnstructuredSSRequest), the USSD Gateway will wait for 30 seconds. If no answer is received back from SS7 peer, USSD Gateway will terminate the corresponding TCAP dialog.

If this is the case and/or I understood it right, I agree with your entire proposal.

@vetss
Copy link
Contributor

vetss commented Jul 6, 2017

@FerUy yes it will terminate TCAP dialog (and also SIP / HTTP parts)

@FerUy
Copy link
Contributor Author

FerUy commented Jul 6, 2017

Great... then we agree 100% @vetss :)

@vetss vetss modified the milestones: 7.1.0, 7.2.0 Jul 6, 2017
@abhayani
Copy link
Contributor

abhayani commented Jul 7, 2017

Sergey,

Instead of introducing new timer shall we have a new flag/parameter in USSD Gw that if set to true, USSD Gw will send TCAP ABORT to peer when TCAP Dialog timesout?

By introducing more timers we are increasing the unnecessary complexities.

@vetss
Copy link
Contributor

vetss commented Jul 7, 2017

Hello @abhayani

The functionality USSD Gw will send TCAP ABORT to peer when TCAP Dialog timesout demands of a modification of TCAP stack (say change of current stack behaviour). At the time when an event come to USSD GW the TCAP dialog is already dead an no possibiliy to send any message to a SS7 peer.

This demands of introducing of another behaviour of TCAP dialog timeout when stack sends TC-ABORT to a peer is a dialog timed out. Now we just kill a dialog without announsing to a peer. I checked TCAP spec and have not found clear recommendations - should we send such TC-ABORT to a peer or not. The spec describes only INVOKE timer...

@abhayani
Copy link
Contributor

abhayani commented Jul 7, 2017

Hi Sergey,

Thanks for details. Yes I think specs is not clear about this. But the testing done at one of our LATAM customer shows that on peer sid eDialog still remains open. I agree this requires TCAP stack level changes and by default we can have this flag false.

IMHO it makes sense to clean resources on peer side too if possible.

@vetss
Copy link
Contributor

vetss commented Jul 7, 2017

Amit,

Also in TCAP level we can send only TC-ABORT. From USSD GW level we can send TC-END with a USSD message to a subscriber with description why the USSD session is terminated.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 7, 2017

@vetss @abhayani @nhanth87 , as told in Slack, TC END in the USSD situation we are talking about makes no sense to me... imagine a user taking too much time to answer to a menu in his handset display (for whatever reason), then we send a TC END with a USSD message like "Application timeout" or whatever, well, he will never get that message and worse, we are violating the USSD session rules -we will receive a TC P-ABORT from the network immediately, being the user oblivious to all of this-, once he does something the dialog is destroyed and nothing else than an MMI message will appear (which is the same as if we send a TC U-ABORT)... in other words, with a TC END we are introducing more signalling and no value added to the user

As for ITU-T Q.773 (TCAP):

Abort::= SEQUENCE {
dtid DestTransactionID,
reason CHOICE
{
p-abortCause P-AbortCause
u-abortCause DialoguePortion
} OPTIONAL
}

as for 3GPP TS 29.002 (MAP):

Table 7.3/6: Service-primitives for the MAP-U-ABORT service
Parameters Request Indication
User reason M M(=)
Diagnostic information U C(=)
Specific information U C(=)

User reason:
This parameter can take the following values:

  • resource limitation (congestion);
    the requested user resource is unavailable due to congestion;
  • resource unavailable;
    the requested user resource is unavailable for reasons other than congestion;
  • application procedure cancellation;
    the procedure is cancelled for reasons detailed in the diagnostic information parameter;
  • procedure error;
    processing of the procedure is terminated for procedural reasons.

Diagnostic information:
This parameter may be used to give additional information for some of the values of the user-reason parameter:

Table 7.3/7: User reason and diagnostic information
User reason Diagnostic information

  • Resource limitation (congestion) -

  • Resource unavailable
    Short term/long term problem
    Application procedure cancellation
    Handover cancellation/
    Radio Channel release/
    Network path release/
    Call release/
    Associated procedure failure/
    Tandem dialogue released/
    Remote operations failure

  • Procedure error -

In conclusion, I'd rather go for a TCAP-U-ABORT with User reason application procedure cancellation

@abhayani
Copy link
Contributor

then we send a TC END with a USSD message like "Application timeout" or whatever, well, he will never get that message

@FerUy Why will he never receive this message? I have tested in live network sending couple of USSD messages back-to-back and they do appear on Phone. The 1st one hides 2nd one. As soon as 1st one is removed (by user action pressing OK or CANCEL), the second appears.

"TCAP-U-ABORT with User reason application procedure cancellation" does makes sense technically. But it will not make any sense to end-user.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 12, 2017

@abhayani because most of the time the user will react after the TCAP dialog has been eliminated from the network due to timeout, especially for services like balance inquire, where easily the user takes more than 30 seconds to read/understand the information sent. So, I strongly believe that statistically we will end up sending rubbish to the network.

IMO, if we want to send a notification to the user about a transaction (and I also have seen it/experienced it, especially with mobile financial services) I will always send an SMS. Of course, that's not feasible today with our USSD Gw, but it would be soon (e.g. through SMPP), and it's only my opinion on the subject.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 17, 2017

Hi @vetss et. al.

As told in other channels of communication, attaching here Wireshark trace with a test having setup the following parameters to 30 seconds at TcapStack_management.xml configuration file:

<dialogidletimeout value="30000"/>
<invoketimeout value="30000"/>

As can be noticed in the trace, TCAP abort is sent to the SS7 network simultaneously (actually before) with the SIP BYE to Restcomm-Connect after exactly 30 seconds, which is what we were looking for, and with the proper type of Abort and user reason as discussed previously in this thread.

This doesn't solve the original reason this issue was brought up, but it does with the aforementioned workaround by setting the TcapStack_management.xml configuration file to the appropriate values as mentioned earlier... which obviously is a giant leap ahead 👍

Great job @vetss !

USSD-Restcomm_dialogTimeoutTest_patch541forIssue69_params30sec.pcap.pcapng.zip

@vetss
Copy link
Contributor

vetss commented Jul 19, 2017

Hello,

it looks like a first fix for TCAP dialog
onlyhttps://github.com/RestComm/ussdgateway/commit/7aa38c12699d931b89141d23b1fe3aa9c42b64c1
works as expected.

Now we have

  1. PULL/PUSH - TCAP dialog timeout based - timeout when a mobile subscriber does not respond for much time (default is 30 seconds) - configurable via http://localhost:8080/jss7-management-console/# - TCAP (that was added by the last fix)
  2. PULL - timeout of waiting of a response from HTTP application http://localhost:8080/ussd-management - http://localhost:8080/ussd-management/# - Server Settings - Dialog timeout error message

What we still need:
a) PUSH - timeout of waiting of a response from HTTP application
b) PULL - timeout of waiting of a response from SIP application
c) PUSH - timeout of waiting of a response from SIP application
For cases a) - c) we can reuse the timer that is configured for "2)" for simplicity and use code templates from "2)". We need to establish timers at USSD GW level for it.

d) update manual for clear explanation for what timer is responsible for what

vetss added a commit to vetss/ussdgateway that referenced this issue Jul 28, 2017
@vetss
Copy link
Contributor

vetss commented Jul 28, 2017

Fixed by:

7aa38c1
ffc0479
d72ce05

@FerUy
Copy link
Contributor Author

FerUy commented Jul 28, 2017

Hi @vetss

As told via chat, I tested the patch by a service between USSD Gw and Restcomm-Connect (RVD). The RVD project is very simple, you can deduce it from the following diagram:

image

The test consisted in going to module opt2 by sending "2" after the welcome module menu is presented to the user. Opt2 module has a dummy external service which answers after 20 seconds (via sleep). Having set dialogtimeout to 15 seconds (while dialogidletimeout was set to 30 seconds) the USSD Gw sends the correct timeout message inside a MAP ProcessUnstructuredSSRequest operation within a returnResultLast component of TCAP/End message at precisely 15 seconds = dialogtimeout. See attached trace. So far so good then.

Only thing that disturbs me at this point is the fact that no further SIP communication is exchanged between USSD Gw and Restcomm after that, like it happens if dialogidletimeout is reached (for example when the user doesn't respond within that period), so after the TCAP U-Abort is sent to SS7 network, a SIP BYE is sent to Restcomm-Connect and therefore both TCAP and SIP dialogues are finished.

Shouldn't then a SIP BYE be sent to Restcomm-Connect when dialogtimeout threshold is surpassed like in the test carried out?

restcommUSSDgw7.1.61_dialogtimeout15_vs_dialogidletimeout30_test.pcap.pcapng.zip

vetss added a commit to vetss/ussdgateway that referenced this issue Jul 30, 2017
@vetss
Copy link
Contributor

vetss commented Jul 30, 2017

Hello @FerUy

thanks for your testing that allowed me to prepare a furter patch. I added sending of SIP BYE for both PULL and PUSH and fixed some little bugs. I will prepare new binanaries. @FerUy please retest them. Remember that we have 4 cases generally PULL / PUSH and SS7 side timeout / RC (SIP) side timeout. Better to test all cases. I tested it for HTTP case.

I have one doubtes for a following case:
PUSH case when USSD GW has sent an initial TC-BEGIN to SS7 network (a first PUSH message has sent to a mobile subscriber) but then we have TCAP dialog timeout. TCAP dialog is that time in "Initiation Sent" state and if a TC User (say USSD GW) wants to terminate a TCAP dialog (because of timeout) then SS7 stack sends no TC-USER-ABORT to a peer.

This is because of TCAP spec that sais :
"When the transaction is in the "Initiation Sent" state, i.e. a Begin message has been sent but no
backward message for this transaction has been received, the result of the TR-U-ABORT request
primitive is purely local."

It is not a big update of SS7 stack to send TC-USER-ABORT to a peer in "Initiation Sent" state, not clear what behavier is correct.
@FerUy do not you have any experiense for this case ?

@FerUy
Copy link
Contributor Author

FerUy commented Jul 30, 2017

Hi @vetss ... thanks, I will test it and revert asap.

Regarding your last question, I have no experience in such scenario, surely because sending a TC-U-ABORT during the Initiation Sent state makes no sense for PUSH USSD. When the USSD Gw sends a TC-Begin it's because the application that triggered it is only expecting one thing: the answer from the USSD user. Otherwise, the application logic is incorrectly designed. Hence, only dialogidletimeout is important during the Initiation Sent state, while dialogtimeout will never be triggered (or its threshold reached/passed) for PUSH USSD. Agreed?

@vetss
Copy link
Contributor

vetss commented Jul 30, 2017

@FerUy

I was describing the case PUSH - Initiation Sent - TCAP dialogidletimeout
In other words USSD GW was sent a first PUSH message (inside a dialog) and no response from a subscriber for a configurable time.
In this case we do not send TC-USER-ABORT to a SS7 peer (because of TCAP stack implementation because of a spec) and just terminate TCAP and HTTP / SIP dialogs. And for this case I have doubts.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 30, 2017

@vetss sorry for the confusion and thanks for the clarification.

Let's stick to the spec, it will eventually send its timeout Abort when it's due. From our side, we are good just terminating TCAP and HTTP/SIP dialogues in USSD Gw.

@FerUy
Copy link
Contributor Author

FerUy commented Jul 30, 2017

Hi @vetss

Just attaching here the trace of the test commented on Slack...

restcommUSSDgw7.1.62_dialogtimeout15_vs_dialogidletimeout30_test.pcap.pcapng.zip

@FerUy
Copy link
Contributor Author

FerUy commented Aug 18, 2017

Hi @vetss, please see my last two comments in RestComm/Restcomm-Connect#2411
Attaching last test logs and trace here as well, as requested by @deruelle
issuesRC2411-USSSD69.zip

@FerUy
Copy link
Contributor Author

FerUy commented Sep 2, 2017

Apart from the race condition commented, which only happens if some especial configuration is provided on RC side, this issue is solved. Will create another one for that, just for perfection sake ;)

@FerUy FerUy closed this as completed Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants