Graphql subscription is not reliable

While I working with Bluescape Graphql subscription I figured out that sometimes the events comes blazing fast and sometimes it takes seconds to get the events. Some other times no matter how long I wait for the event, I never get it and it is completely lost.

My internet connection is very fast and reliable and this is not a network issue.

Is there anything that can help me to make my app more reliable? I need REALTIME events and using cursor would not help me in this case

Hi @Amerehei,

Sorry you are having some delays in subscription events. I completely understand the need for realtime responses.

I haven’t experience any lag, but I’ll try and reproduce the problem on my end.

A few questions:

  • Can you provide some examples of what you are looking to do?
  • what kind of events are you subscribing to?
  • are there a lot of elements in the workspace?
  • how often is the lag occurring?
  • Can you provide some examples of what you are looking to do?

I’m in the early stage of subscription, my subscription is simple yet and it would be more complex in the next few days. Generally I would like to track movement of any element and react based on traits and element type

  • what kind of events are you subscribing to?

This is my current query, in the future I support all other element types, I would like to get any data related to position of elements. to calculate overlap of two elements

  subscription workspace($workspaceId: String!) {
    commands(workspaceId: $workspaceId) {
      ... on UpdateElementCommand {
        workspaceId
        elementId
        data {
          ... on UpdateDocument {
            transform {
              x
              y
              origin {
                point {
                  x
                  y
                }
                anchor
              }
            }
          }
        }
      }
    }
  }
  • are there a lot of elements in the workspace?

Currently, No. I have exactly 20 elements (1 Canvas, 14 Shape, 2 Icon, 2 Text, 1 Document). Maybe the size of elements reduce the Bluescape performance, if it’s the case let me know. My elements are hugely large for example my canvas size is {width: 21309, height: 13255}

  • how often is the lag occurring?

It’s a RANDOM behaviour. 50% blazing fast, 40% 1 to 3 seconds, 10% lost (or too long)

thanks for the additional information.

Are you using the federated endpoint (both ISAM and Elementary schemas are joined):
https://api.apps.us.bluescape.com/v3/graphql

If you are, I would try using the elementary endpoint that bypasses our gateway (this might help with reliability):
https://elementary.apps.us.bluescape.com/graphql

Another thing to try, is to subscribe to the rawHistory event. It will send an event for everything, so it can get a bit noisy, but i’m curious if makes any difference in responses.

The rawHistory subscription looks like:

subscription rawHistorSubscription($workspaceId:String!){
  rawHistory(workspaceId:$workspaceId){
  	__typename
    workspaceId
    actorId
    actorType
    data
  }
}

Hopefully this helps! I’ll keep testing on my end, but I haven’t seen the issues you are having.

I use https://api.apps.us.bluescape.com/v3/graphql endpoint, let me try the elementary endpoint as well

What do you think about very large dimension shapes? Do they affect performance?

One more question, as the events are sequential, When I face a lost event (it seems the event is very very slow to get) I lost the sub-sequential events as well. I need to get the first event before I get the second. So when the first event is slow to receive (or lost) the second one is blocking

Is there any mechanism to detect this problem and is there any workaround to it?

let me know if the other endpoint helps.

I don’t think the size of the canvas should matter, but i can try playing with it to confirm. I’ve never seen it happen, but I also don’t usually use such big canvases. As far as i know it’s just an float so it shouldn’t matter if it is big or small.

I think the best way to ensure you get the subscription events is the cursor.

If you miss an event, you should be able to get all the events from a given point in time. But the problem is that you need the events fired in a particular order, and they are returned in a different order? Or are the slow/missing events not returned at all when using the cursor?

I have CORS error when I want ti fetch elements from https://elementary.apps.us.bluescape.com/graphql endpoint

Do we have this WS endpoint as well? wss://elementary.apps.us.bluescape.com/v3/graphql

Today subscription to events of wss://api.apps.us.bluescape.com/v3/graphql is very fast, I don’t see any lost and slowness

I thought maybe it has some effect in time complexity or your algorithms

The problem is, As long as WebSocket connection is not closed, There is no way for me to detect that I don’t get any updates anymore

Let say we have the following events

1. A (fast)
2. B (fairly fast)
3. C (slow)          <---- cursor
4. D (lost)
5. E
6. F
7. G

As I never get D (Or maybe I eventually get it after several minutes), events E, F and G would be lost too because they are waiting for D. I can update my cursor to C but there is no way for me to understand that D is lost because my wesocket connection is live so I don’t know how can I utilize the cursor in this case!

I have CORS error when I want ti fetch elements from https://elementary.apps.us.bluescape.com/graphql endpoint

The CORS problem is interesting. Can you try the simple curl command to see if it works?

curl --location --request POST 'https://elementary.apps.us.bluescape.com/graphql' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <yourToken>' \
--data '{"query":"query getAllElements($workspaceId: String!){\n    elements(workspaceId: $workspaceId ) {\n        type: __typename        \n        id\n    }\n}","variables":{"workspaceId":"<yourWorkspaceId>"}}'

Do we have this WS endpoint as well? wss://elementary.apps.us.bluescape.com/v3/graphql

Yes, this should also work for sockets:
'wss://elementary.apps.us.bluescape.com/graphql'

Today subscription to events of wss://api.apps.us.bluescape.com/v3/graphql is very fast, I don’t see any lost and slowness

That’s great news! But i would still suggest switching over to the elementary endpoint to reduce the complexity of going through federated endpoint.

The problem is, As long as WebSocket connection is not closed, There is no way for me to detect that I don’t get any updates anymore

Ahh, yes, I see the problem now. Let me investigate this to see if there is a solution for if an event get’s stalled/lost, and the connection is never dropped.

When I run this command in a terminal it successfully returns element of workspace

I’m not super sure if rawHistory is a good option, First of all as data is an undocumented and unstructured object and I use graphql codegen to help me generate types for TS it’s a not proper API for my usecase

Since yesterday HistoryCommand has been working fast, So I cannot compare HistoryCommand and rawHistory

Yes, I agree. I was only suggesting to use this to help trouble shoot what could be causing the subscription lag.

When I run this command in a terminal it successfully returns element of workspace

Does this mean you are no longer getting the CORS errors? Are have you just decided to stick with the federated endpoint?

I have CORS error and I can successfully run curl command.
CORS is a security feature for browsers and curl is not a browser, we use it as a http client.
curl has no idea of the context of command so CORS has no meaning in a terminal command line.

When I open https://mydomain.com try to connect to bluescape because of some limiting headers such as Access-Control-Allow-Origin the browser refuses to proceed and returns CORS error

There is a misconfiguration in elementary endpoint if it’s a public api

Oh yes, sorry - I’m used to thinking about APIs running on a server application, where CORS doesn’t apply. You are right, that a web app would be impacted.

I’ll have to check to see if we can change the header to allow elementary endpoint to be run in a browser to avoid the CORS error.

is the federated endpoint still performing well for you? Was the slow performance something that happened just the other day?

It’s better that the first date, but not too much stable

Hi @Amerehei,

Thanks for the update. We are investigating on our end to see what can be done to improve performance. Stay tuned!

Hi @Amerehei,

We have made some changes that should improve graphQL performance which will be in our November 9th release.

You should hopefully see improved performance after the release, and if you don’t please let us know so we can keep digging!

@kkoechley I think the Graphql subscription is not reliable yet. Sometimes it take about 10 seconds to get events

Sometimes mutation operation is slow as well. For example this is a mutation operation to move an object in workspace, it took 7 seconds

Hi @Amerehei,

Thanks for the update. The update tomorrow should help, it will be good to verify the results after Friday.