Note: this is a Proof of Concept
I thought it would be a good idea to present this mock pull-request to provide insight and provoke discussion on the capability/design/implementation of GraphQL with Cylc, before the looming architectural decisions in the next workshop.
I provided an informal presentation to Bruno & Hilary a month ago, but was tied up with operations work until last week, so have found time to do this now.. It is a Work in Progress, happy to have more input.
GraphQL is agnostic to the server implementation, however, as Cherrypy does not have support (Tornado may have only recently added support via 3rd party) for GraphQL/Graphene (a python implementation of GraphQL) I chose Flask, with it's Flask-GraphQL extension (which includes the GraphiQL interface)..
I paired this with gevent for the purposes of this PoC; being at least as performant as Tornado, but easier to implement, leaving less of a footprint on Cylc's code. It also features web-socket capability, and proven to work with subscription type GraphQL queries (althoughonly HTTP is implemented in this branch (so far)).
The old REST endpoints remain in place, and I haven't migrated anything (gui or httpclient.py), to using this. I focused on the most important feature; mapping the tree to n depth with node data in one query, which will satisfy our requirement for a data driven web-gui (as @oliver-sanders outlined). The three main files where all the magic happens is GraphQL resource definition network/schema.py, the resolver filter functions scheduler.py, and the data cache created in state_summary_mgr.py.
I'll run through some examples, and encourage you to have a play ! :smiley:
The extra requirements:
Python:Flask-GraphQL (any).......................................................FOUND (?)
Python:Flask (any)...........................................................FOUND (1.0.2)
Python:Flask-HTTPAuth (any)..................................................FOUND (3.2.4)
Python:gevent (any)..........................................................FOUND (1.3.6)
Python:graphene (any)........................................................FOUND (2.1.3)
If you start a suite and visit the endpoint '/graphql' from your browser, i.e:
'https://niwa-35595lvm.niwa.local:43005/graphql' (using cylc:passphrase credentials)
you'll be presented with an interface to discover and query your suite. Enter the following query, or even start typing it in (there's auto complete, drop down info available), but you can include or exclude as many fields as you desire;
{
globalInfo {
suite
owner
host
title
description
url
group
reloading
lastUpdated
status
runMode
newestRunaheadCyclePoint
newestCyclePoint
oldestCyclePoint
stateTotals{
held
queued
ready
waiting
submitted
submitFailed
submitRetrying
succeeded
failed
retrying
expired
running
runahead
}
treeDepth
}
}
then run it (ctrl+enter), and you'll see the result:

Although you can always use curl :wink:
curl -v -s -u cylc:$(cat /home/sutherlanddw/cylc-run/baz/.service/passphrase) --digest --cookie-jar cookietemp --anyauth --insecure --header "Content-Type: application/graphql" --data 'query{allTasks{edges{node{name label state}}}}' 'https://niwa-35595lvm.niwa.local:43005/graphql'
The documentation is very useful for discovering all the available data and filter fields.
Now, we could have a flat structure where we query all the tasks, and I've added some filters in addition to the usual task.point:state, you can include a list of these items or the converse exid & exitems, there is also states list and depth (aka node_depth, range from zero to specified):
{
allTasks(id: "[fqb]*.2017*", states: ["succeeded","waiting"], depth: 2) {
edges {
node {
id
name
label
state
title
description
URL
spawned
submittedTime
startedTime
finishedTime
meanElapsedTime
host
jobHosts{
submitNum
jobHost
}
outputs {
submitted
submitFailed
started
failed
succeeded
expired
}
nodeDepth
}
}
}
}
{
"data": {
"allTasks": {
"edges": [
{
"node": {
"id": "UUxUYXNrOmJhYS4yMDE3MDIwMVQwMDAwKzEz",
"name": "baa",
"label": "20170201T0000+13",
"state": "succeeded",
"title": "",
"description": "some task baa",
"URL": "",
"spawned": true,
"submittedTime": null,
"startedTime": null,
"finishedTime": null,
"meanElapsedTime": 10,
"host": "localhost",
"jobHosts": [
{
"submitNum": 1,
"jobHost": "niwa-35595lvm.niwa.local"
}
],
"outputs": {
"submitted": true,
"submitFailed": false,
"started": true,
"failed": false,
"succeeded": true,
"expired": false
},
"nodeDepth": 1
}
},
{
"node": {
"id": "UUxUYXNrOnF1eC4yMDE3MDEwMVQwMDAwKzEz",
"name": "qux",
"label": "20170101T0000+13",
"state": "succeeded",
"title": "Some Top family",
"description": "some task qux",
"URL": "",
"spawned": true,
"submittedTime": null,
"startedTime": null,
"finishedTime": null,
"meanElapsedTime": 20,
"host": "localhost",
"jobHosts": [
{
"submitNum": 1,
"jobHost": "niwa-35595lvm.niwa.local"
}
],
"outputs": {
"submitted": true,
"submitFailed": false,
"started": true,
"failed": false,
"succeeded": true,
"expired": false
},
"nodeDepth": 2
}
},
{
"node": {
"id": "UUxUYXNrOmJhYS4yMDE3MDEwMVQwMDAwKzEz",
"name": "baa",
"label": "20170101T0000+13",
"state": "succeeded",
"title": "",
"description": "some task baa",
"URL": "",
"spawned": true,
"submittedTime": null,
"startedTime": null,
"finishedTime": null,
"meanElapsedTime": 10,
"host": "localhost",
"jobHosts": [
{
"submitNum": 1,
"jobHost": "niwa-35595lvm.niwa.local"
}
],
"outputs": {
"submitted": true,
"submitFailed": false,
"started": true,
"failed": false,
"succeeded": true,
"expired": false
},
"nodeDepth": 1
}
}
]
}
}
}
md5-e831183700ee001042664572702adcba
{
allTasks(states: ["succeeded", "waiting"], first: 2, after: "YXJyYXljb25uZWN0aW9uOjA=") {
edges {
node {
name
label
state
title
nodeDepth
}
}
pageInfo{
hasPreviousPage
hasNextPage
startCursor
endCursor
}
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
{
"data": {
"allTasks": {
"edges": [
{
"node": {
"name": "baa",
"label": "20170201T0000+13",
"state": "succeeded",
"title": "",
"nodeDepth": 1
}
},
{
"node": {
"name": "foo",
"label": "20170101T0000+13",
"state": "succeeded",
"title": "Some Top family",
"nodeDepth": 4
}
}
],
"pageInfo": {
"hasPreviousPage": false,
"hasNextPage": true,
"startCursor": "YXJyYXljb25uZWN0aW9uOjE=",
"endCursor": "YXJyYXljb25uZWN0aW9uOjI="
}
}
}
}
md5-1ba432400751be4f40710b09f8522190
query allFamilies($vstates: [String]){
allFamilies(states: $vstates ){
edges{
node{
name
label
tasks(states: $vstates) {
edges {
node {
name
label
state
}
}
}
families(states: $vstates) {
edges {
node {
name
state
}
}
}
parents{
edges{
node{
name
}
}
}
}
}
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
{
"vstates": ["held", "succeeded"]
}
md5-4c00aa8f42908047e5cc0f7eb8f16d7a
query allFamilies($vstates: [String], $ndepth: Int){
allFamilies(states: $vstates, depth: $ndepth, items: ["root.*"]){
edges{
node{
name
label
tasks(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
}
}
}
families(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
tasks(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
}
}
}
families(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
tasks(states: $vstates) {
edges {
node {
name
state
}
}
}
families(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
tasks(states: $vstates, depth: $ndepth) {
edges {
node {
name
state
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
{
"vstates": ["held", "succeeded", "waiting"],
"ndepth": 4
}
The query could just be template'd out from the max tree depth given in the globalInfo query.. BTW it doesn't matter how deep your query is, the data will fill it out as far as it can (according to filters or w/e).
So, that's where I'm at, but there's not reason why we can't push forward with the GraphQL development.. There's still a lot I haven't tried yet, but just this alone is enough to convince me of it's utility...
This is a re-posting of the pull request:
https://github.com/cylc/cylc/pull/2873
@dwsutherland , having to learn a bit more of Relay in order to get it working with Vue. Looking at other Vue projects, looks like most devs are adopting Vue + Apollo.
Q1/ do you know if the only part that we are using of Relay is pagination?
Q2/ do you think it would be too much to implement pagination for Apollo instead?
The reasons for Q2 are for it being framework agnostic, but also due to the better integration with Vue (though the Vue + Relay might work, just need more time, but the project is maintained by 1 dev I think...).
I think Graphene offers easy support for Relay, which is nice. But if that's not too complicated to replace.... :grimacing:
As we will likely use Tornado, we can combine
In the Tornado branch also had to change the scheduler.py due to the way Tornado's main loop works. So that would have to replicated (and tested) with GraphQL.
The Tornado + GraphQL seems pretty simple, and quite similar to FLask + GraphQL I think.
@dwsutherland it worked! :tada:

Server with Tornado + graphene. Of course querying won't return anything as there's nothing in the scheduler.py. But I think if we go with the approach you suggested (i.e. Python3 => Tornado), then perhaps it would be just a matter of you replicating some changes from your flask branch over the new work.
Minor note; the packaging might be useful... tornado has 2 dependencies that I added manually here. But tornado-graphql has another three that would have to be added too... adding these dependencies manually is starting to look a bit weird.
As we will likely use Tornado, we can combine ... the flask branch; the tornado branch; ....
@dwsutherland @kinow - just to note (for the record on this PR Issue) as discussed we don't know yet if GraphQL will be needed or used in the suite server program - as opposed the new UI Server component ... so if the aforementioned combining is done it will be to investigate the technologies further but not intended for merge. (And this PR should be closed and moved to an "exploratory" Issue with a link to the dev branch).
I think that's what @dwsutherland did, @hjoliver.
@kinow - you're right - apologies, my mistake! (The perils of checking in too late at night...). Amending previous comment...
Not a problem. I did double check as I was replying in early morning too, pre coffee.
And
so if the aforementioned combining is done it will be to investigate the technologies further but not intended for merge
Well noted. And the changes that I did for Tornado's main loop in scheduler.py have not been well tested. Definitely not ready for merge.
@dwsutherland , having to learn a bit more of Relay in order to get it working with Vue. Looking at other Vue projects, looks like most devs are adopting Vue + Apollo.
Q1/ do you know if the only part that we are using of Relay is pagination?
Q2/ do you think it would be too much to implement pagination for Apollo instead?The reasons for Q2 are for it being framework agnostic, but also due to the better integration with Vue (though the Vue + Relay might work, just need more time, but the project is maintained by 1 dev I think...).
I think Graphene offers easy support for Relay, which is nice. But if that's not too complicated to replace.... grimacing
Yes to both Q1 and Q2 (but proof is "in the pudding" so to speak), I guess it's just Vue that's the show stopper for the Relay compliant endpoint being used. But it appears Apollo claims to work with any endpoint:
https://github.com/apollographql/apollo-client
Universally compatible, so that Apollo works with any build setup, any GraphQL server, and any GraphQL schema.
We may need to just implement our own pagination and cursors.
Well done on the Tornado GraphQL-endpoint implement!
I've dropped the use of Relay;
SameBranch
for ease of use with Apollo-Vue (can be put back in place easily).
I've also tidied up the resolvers, so the Task/Family queries/Types all use the same function (at top of schema).
There’s one issue I’m trying to get my head around; meta data (suite/family/task) has a mix of predefined & custom/arbitrary defined fields, so you cannot specify them in general for suites in the schema definition (if it was an individual suite, then it might be possible to do it on start (but not reload, so not desirable)).. So we have a few options:
meta = graphene.JSON()class QLMeta(graphene.ObjectType):
"""Meta data fields, and custom fields JSON blob"""
class Meta:
default_resolver = dict_resolver
title = graphene.String(default=None)
description = graphene.String(default=None)
group = graphene.String(default=None)
URL = graphene.String(default=None)
custom = graphene.JSON()
This would just mean an extra level in the data, and a json.dump of all fields that aren’t default.
[{“key”: “title”,”value”: “some suite”},{“key”….]meta = graphene.Field(QLMeta)
custom_meta = graphene.JSON()
This would just reduce the depth while leaving meta more granular.
Perhaps to start 1 or 4.
I don’t think [meta]group has a corresponding [[[meta]]]group under runtime (but could include it as predefined for both I suppose).
There are other fields ([[[environment]]], [[[directives]]] ..etc) that more easily fit into the JSON blob category, if they are desired at the gui.
BTW - WRT the QL in front of the schema types; I can drop them and just use schema.Task externally for the sake of namespace (it was just an easy way to recognize what they were at a glance).
Some quick comments (with an old dinosaur hat on):
[meta]group should probably be retired? It was used to group suites in gscan but is not widely used. These days, suites can be registered with a directory hierarchy, so the functionality of group is less obvious.
[[[environment]]] and [[[directives]]] both need to be ordered. (E.g. an environment variable setting may reference another environment variable defined earlier. If our configuration file format has native support for list, these settings should probably be defined as lists.)
So we have a few options
If I understand it correctly option 2 looks superior to me.
We aught to be able to pull the default fields individually (as we are likely to want to use this data in the GUI), if users want custom fields I see no harm in giving them the whole dictionary. I doubt anyone is likely to use the API for this anyway.
[meta]groupshould probably be retired?
Been a long time coming, but I've made a lot of data structure changes (most satisfy recommendations), and now need more feedback/review:
(Note: all the nomenclature is up for change/review, i.e. if you don't like the use of proxy for task cycle point instance)
Task-Job Separation
This is actually a pseudo separation, although a true separation may be desirable in the future, and it involved:
job_pool.py, to store and manage job data objects.The full field query result being
{
"data": {
"jobs": [
{
"id": "20170101T0000+13/baa/01",
"batchSysJobId": "3338",
"batchSysName": "background",
"batchSysConf": {},
"directives": {},
"environment": {
"GREETING": "Hello from baa!"
},
"envScript": "echo \"Hi first, I'm second\"",
"errScript": "echo 'Boo!'",
"exitScript": "echo 'Yay!'",
"extraLogs": [
"/home/sutherlander/startrek/captains.log"
],
"executionTimeLimit": null,
"finishedTime": 1551516906,
"finishedTimeString": "2019-03-02T21:55:06+13:00",
"host": "localhost",
"initScript": "echo 'Me first'",
"jobLogDir": "/home/sutherlander/cylc-run/baz/log/job/20170101T0000+13/baa/01",
"owner": null,
"paramEnvTmpl": {},
"paramVar": {},
"postScript": "sleep 10",
"preScript": "sleep 10",
"script": "sleep 10; echo \"$GREETING\"",
"shell": "/bin/bash",
"startedTime": 1551516876,
"startedTimeString": "2019-03-02T21:54:36+13:00",
"state": "succeeded",
"submitNum": 1,
"submittedTime": 1551516876,
"submittedTimeString": "2019-03-02T21:54:36+13:00",
"workSubDir": null,
"taskProxy": {
"id": "baa.20170101T0000+13"
}
}
]
}
}
Of course you'd query from the task:
{
taskProxies(id: "baa.*") {
id
jobs {
id
state
submitNum
}
}
}
Result:
{
"data": {
"taskProxies": [
{
"id": "baa.20170201T0000+13",
"jobs": []
},
{
"id": "baa.20170101T0000+13",
"jobs": [
{
"id": "20170101T0000+13/baa/01",
"state": "succeeded",
"submitNum": 1
},
{
"id": "20170101T0000+13/baa/02",
"state": "failed",
"submitNum": 2
},
{
"id": "20170101T0000+13/baa/03",
"state": "running",
"submitNum": 3
}
]
}
]
}
}
Perhaps if the job objects were created prior to run, then they could be directly modified via the web gui (say for trigger-edit), and job script created from the object (perhaps with a true job-task separation).
Separation of Task/Family to definition & proxy/instance
This is to reduce the duplication of information, and distinguish between the abstract task/family and it's cycle point instance/proxy...
state_summary_mgr.py..Query
{
tasks(id: "bar") {
meta {
title
description
URL
userDefined
}
proxies{
id
jobs {
id
state
submitNum
}
prerequisites {
expression
conditions{
taskId
exprAlias
reqState
satisfied
message
taskProxy{
state
}
}
satisfied
cyclePoints
}
}
}
}
Result
{
"data": {
"tasks": [
{
"meta": {
"title": "Some Top family",
"description": "some task bar",
"URL": "https://github.com/dwsutherland/cylc",
"userDefined": {
"importance": "Critical",
"alerts": "none"
}
},
"proxies": [
{
"id": "bar.20180101T0000+13",
"jobs": [
{
"id": "20180101T0000+13/bar/01",
"state": null,
"submitNum": 1
}
],
"prerequisites": [
{
"expression": "c0 | c1",
"conditions": [
{
"taskId": "foo.20180101T0000+13",
"exprAlias": "c0",
"reqState": "succeeded",
"satisfied": true,
"message": "unsatisfied",
"taskProxy": {
"state": "running"
}
},
{
"taskId": "qux.20180101T0000+13",
"exprAlias": "c1",
"reqState": "succeeded",
"satisfied": true,
"message": "satisfied naturally",
"taskProxy": {
"state": "succeeded"
}
}
],
"satisfied": true,
"cyclePoints": [
"20180101T0000+13"
]
},
{
"expression": "c0",
"conditions": [
{
"taskId": "bar.20171201T0000+13",
"exprAlias": "c0",
"reqState": "succeeded",
"satisfied": true,
"message": "satisfied naturally",
"taskProxy": {
"state": "succeeded"
}
}
],
"satisfied": true,
"cyclePoints": [
"20171201T0000+13"
]
}
]
},
{
"id": "bar.20171201T0000+13",
"jobs": [
.
.
.
So a query like:
{
taskProxies {
id
state
namespace
prerequisites {
conditions {
taskId
}
}
}
}
Would give you all the information required for the dependency graph.
The previously-mentioned/initial capabilities are still in place for the most part. And there are other optimisations I've made, and obviously more to come.. But next, and while waiting for review, I'll be working on:
class StopSuite(graphene.Mutation):
"""Stop the suite."""
class Arguments:
stop_type = graphene.String(required=True)
stop_item = graphene.String()
stop_args = graphene.List(graphene.String)
command_queued = graphene.Boolean()
def mutate(self, info, stop_type, stop_item=None, stop_args=[]):
if stop_type in ['now']:
stop_cmd = 'stop_now'
else:
stop_cmd = 'set_stop_' + stop_type
action = {}
for key in stop_args:
action[key] = True
item = ()
if stop_item:
item = (stop_item,)
schd = info.context.get('schd_obj')
schd.command_queue.put((stop_cmd,item,action))
return StopSuite(command_queued=True)
class Mutation(graphene.ObjectType):
stop_suite = StopSuite.Field()
mutation {
stopSuite(stopType: "after_task", stopItem: "baa.20170101T0000+13"){
commandQueued
}
}
I'll make general improvements along they way including; documentation/descriptions on the objects (which are available via the endpoint), functionality, sophistication/features and approach.. (perhaps protobuf objects instead of GraphQL objects to hold the data)
The repo will shift to cylc/wip-graphql at some point soon, but it's still here.
@dwsutherland Cool! I'm going to have to read your comment in detail.
A first minor suggestion. Perhaps change submitMethodId to batchSysJobId? (To align with fields in job.status.)
@dwsutherland Cool! I'm going to have to read your comment in detail.
A first minor suggestion. Perhaps change
submitMethodIdtobatchSysJobId? (To align with fields injob.status.)
Done. (updated above)
Hi @dwsutherland , I'm trying your branch flask-gevent-graphql, and as always I'm trying to run my all-time favourite etc/examples/tutorial/cycling/five/suite.rc.
I am running it with cylc run --no-detach --verbose --debug five as I normally do, but it failed due to a local variable temp used before assignment.
kinow@kinow-VirtualBox:~/Development/python/workspace/cylc$ cylc run --no-detach five
._.
| | The Cylc Suite Engine [7.8.1-25-gb22c27b]
._____._. ._| |_____. Copyright (C) 2008-2019 NIWA
| .___| | | | | .___| & British Crown (Met Office) & Contributors.
| !___| !_! | | !___. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____! This program comes with ABSOLUTELY NO WARRANTY;
.___! | see `cylc warranty`. It is free software, you
!_____! are welcome to redistribute it under certain
2019-03-04T00:31:11Z INFO - Suite server: url=http://kinow-VirtualBox:43082/ pid=29417
2019-03-04T00:31:11Z INFO - Run: (re)start=0 log=1
2019-03-04T00:31:11Z INFO - Cylc version: 7.8.1-25-gb22c27b
2019-03-04T00:31:11Z INFO - Run mode: live
2019-03-04T00:31:11Z INFO - Initial point: 20130808T0000Z
2019-03-04T00:31:11Z INFO - Final point: 20130812T0000Z
2019-03-04T00:31:11Z INFO - Cold Start 20130808T0000Z
2019-03-04T00:31:11Z INFO - [prep.20130808T0000Z] -submit-num=1, owner@host=kinow-VirtualBox
2019-03-04T00:31:11Z ERROR - local variable 'temp' referenced before assignment
Traceback (most recent call last):
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/scheduler.py", line 269, in start
self.run()
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/scheduler.py", line 1783, in run
has_updated = self.update_state_summary()
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/scheduler.py", line 1826, in update_state_summary
self.state_summary_mgr.update(self)
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/state_summary_mgr.py", line 79, in update
self._get_tasks_info(schd, parents_dict, ancestors_dict))
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/state_summary_mgr.py", line 332, in _get_tasks_info
prereq_list.append(prereq.api_dump())
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/prerequisite.py", line 261, in api_dump
expression = temp,
UnboundLocalError: local variable 'temp' referenced before assignment
2019-03-04T00:31:11Z ERROR - error caught: cleaning up before exit
2019-03-04T00:31:11Z INFO - Suite shutting down - ERROR: local variable 'temp' referenced before assignment
2019-03-04T00:31:11Z INFO - DONE
Traceback (most recent call last):
File "/home/kinow/Development/python/workspace/cylc/bin/cylc-run", line 25, in <module>
main(is_restart=False)
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/scheduler_cli.py", line 134, in main
scheduler.start()
File "/home/kinow/Development/python/workspace/cylc/lib/cylc/scheduler.py", line 300, in start
raise exc
UnboundLocalError: local variable 'temp' referenced before assignment
Mutations. And, as a sneak peak, I've added the all-in-one stop suite mutation:
Cool!
I've separated the network implementation and interface in the python3 branch so (with a small change) you could write a simple adapter to map the network endpoints onto your GraphQL layer which would save having to duplicate the schd.command_queue.put((stop_cmd,item,action)) type logic.
Something along the lines of:
-class SuiteRuntimeServer(ZMQServer):
- """Suite runtime service API facade exposed via zmq."""
+class SuiteRuntimeInterface:
+ """Suite runtime service API facade."""
API = 4 # cylc API version
@@ -652,7 +652,7 @@ class SuiteRuntimeServer(ZMQServer):
return (True, 'Command queued')
@authorise(Priv.CONTROL)
- @ZMQServer.expose
+ @expose # and so on for all the others
def trigger_tasks(self, items, back_out=False):
"""Trigger submission of task jobs where possible.
@@ -664,3 +664,17 @@ class SuiteRuntimeServer(ZMQServer):
self.schd.command_queue.put(
("trigger_tasks", (items,), {"back_out": back_out}))
return (True, 'Command queued')
+
+
+class SuiteRuntimeServer(ZMQServer, SuiteRuntimeInterface):
+
+ @staticmethod
+ def expose(fcn):
+ return ZMQServer.expose(fcn)
+
+
+class GraphQLAdapter(GraphQLServer, SuiteRuntimeInterface):
+
+ @staticmethod
+ def expose(fcn):
+ return GraphQLServer.expose(fcn)
+
+ # ...
Something to keep in mind over the whole "commandQueued" thing.
This is legacy from our old REST interface where we were restricted to a simple REQ-REP model.
Now that we are looking at using sockets for the HTTP and TCP interfaces we can keep the socket open and trickle back data until the client looses interest i.e:
REQ - stop_suite, kill=True
REP - commandQueued
REP - commandSucceeded
I guess that's really more of SUB-PUB model but anyway this functionality would be really useful for the GUI (could display a waiting symbol). The lack of this at present trips a lot of users up.
It's especially useful when commands fail, at present users are not informed of command failure (from where they issued the command) and have to look in the suite log to find out why it failed.
Hi @dwsutherland , I'm trying your branch
flask-gevent-graphql, and as always I'm trying to run my all-time favouriteetc/examples/tutorial/cycling/five/suite.rc.I am running it with
cylc run --no-detach --verbose --debug fiveas I normally do, but it failed due to a local variabletempused before assignment.
@kinow - Ok, I see the issue (didn't fail for me), needed to only include satisfied (like how it currently is).. just put a fix in.. that "should" fix it..
Something to keep in mind over the whole "commandQueued" thing.
This is legacy from our old REST interface where we were restricted to a simple REQ-REP model.
Now that we are looking at using sockets for the HTTP and TCP interfaces we can keep the socket open and trickle back data until the client looses interest i.e:
REQ - stop_suite, kill=True REP - commandQueued REP - commandSucceededI guess that's really more of SUB-PUB model but anyway this functionality would be really useful for the GUI (could display a waiting symbol). The lack of this at present trips a lot of users up.
It's especially useful when commands fail, at present users are not informed of command failure (from where they issued the command) and have to look in the suite log to find out why it failed.
Yes, I thought about that while writing the mutations - but for the time being stuck with how REST/http was interacting with the suite..
With acync and websockets it should be doable I think (?)... Will wait until I re-implement with Tornado after your python3 merge. I may have to create/modify your ZeroMQ feed, if I'm going to divorce it completely. But I might start with Tornado served alongside ZeroMQ, and divorce it later, have to think about it...
Modified the stopSuite mutation args and added some meta:
mutation {
stopSuite(stopType: "after_task", items: ["baa.20170201T0000+13"]){
commandQueued
}
}
mutation {
stopSuite(stopType: "cleanly", actions: {kill_active_tasks: False}) {
commandQueued
}
}
md5-5338d82bd9dab1562d4c83a9fcc8231b
mutation {
putBroadcast(points: ["20170401T0000+13","20170301T0000+13"], namespaces: ["foo"],settings: [{environment: {GAME: "dangerous dave"}}]) {
modifiedSettings
badOptions
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
mutation {
clearBroadcast (points: ["20170401T0000+13"], namespaces: ["foo"],settings: [{environment: {GAME: "dangerous dave"}}]) {
modifiedSettings
badOptions
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
mutation {
expireBroadcast(cutoff: "20170301T0000+13") {
modifiedSettings
badOptions
}
}
md5-711d5c8e62833db9210ec9cade07aec0
query {
taskProxies(id: "foo"){
broadcasts
cyclePoint
}
}
md5-7cd4c0ff76ac2fb9cfcb4385b6d25782
{
"data": {
"taskProxies": [
{
"broadcasts": {
"environment": {
"GAME": "dangerous dave",
"GAME2": "dangerous dave2"
}
},
"cyclePoint": "20170301T0000+13"
},
{
"broadcasts": {
"environment": {
"GAME": "dangerous dave"
}
},
"cyclePoint": "20170201T0000+13"
},
{
"broadcasts": {
"environment": {
"GAME": "dangerous dave"
}
},
"cyclePoint": "20170101T0000+13"
}
]
}
}
At the moment the task proxy only updates when the state summary does (in the main loop when state or w/e changes), but that will change once I pull that proxy item out..
BTW - The cycle point arg in broadcast needs revisited in another ticket, although it may be intentional for "*" to be all current and future cycle points, it would be nice if we could use actual globs in putting and clearing them on a respective set of points.. At the moment you have to use exact (glob in expire would be the largest/furthermost point) ..
Mutations. And, as a sneak peak, I've added the all-in-one stop suite mutation:
Cool!
I've separated the network implementation and interface in the python3 branch so (with a small change) you could write a simple adapter to map the network endpoints onto your GraphQL layer which would save having to duplicate the
schd.command_queue.put((stop_cmd,item,action))type logic.
@oliver-sanders - Nice! will force homogeneity also, to ensure they stay in sync (during dev at least).
@dwsutherland The broadcast API is quite different. It populates a data structure, but does not apply the settings right away. (The settings in the data structure are normally looked up on job submission. So, yes a * for cycle applies to all cycles, and a * for task applies to all tasks.)
The other commands such as hold, release, trigger, reset, poll, kill, remove etc all search for tasks in the pool to apply action. (Note: insert is also an odd one...)
@dwsutherland I might have something else weird in my environment. Synced the repo, confirmed my branch is up to date, then a cylc run --no-detach --verbose --debug five starts and has a few errors
kinow@kinow-VirtualBox:~/Development/python/workspace/cylc$ cylc run --no-detach --verbose --debug five
2019-03-06T09:09:23+13:00 DEBUG - Loading site/user global config files
2019-03-06T09:09:23+13:00 DEBUG - Reading file /home/kinow/.cylc/global.rc
._.
| | The Cylc Suite Engine [7.8.1-26-g9b9b9]
._____._. ._| |_____. Copyright (C) 2008-2019 NIWA
| .___| | | | | .___| & British Crown (Met Office) & Contributors.
| !___| !_! | | !___. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____! This program comes with ABSOLUTELY NO WARRANTY;
.___! | see `cylc warranty`. It is free software, you
!_____! are welcome to redistribute it under certain
2019-03-06T09:09:23+13:00 DEBUG - creating suite run directory: /home/kinow/cylc-run/five
2019-03-06T09:09:23+13:00 DEBUG - creating suite log directory: /home/kinow/cylc-run/five/log/suite
2019-03-06T09:09:23+13:00 DEBUG - creating suite job log directory: /home/kinow/cylc-run/five/log/job
2019-03-06T09:09:23+13:00 DEBUG - creating suite config log directory: /home/kinow/cylc-run/five/log/suiterc
2019-03-06T09:09:23+13:00 DEBUG - creating suite work directory: /home/kinow/cylc-run/five/work
...
...
2019-03-05T20:09:24Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--', '/home/kinow/cylc-run/five/log/job', '20130808T0000Z/prep/01']
2019-03-05T20:09:24Z DEBUG - Performing suite health check
[2019-03-06 09:09:24,423] ERROR in app: Exception on /put_messages [POST]
Traceback (most recent call last):
File "/home/kinow/.local/lib/python2.7/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/kinow/.local/lib/python2.7/site-packages/flask/app.py", line 1816, in full_dispatch_request
return self.finalize_request(rv)
File "/home/kinow/.local/lib/python2.7/site-packages/flask/app.py", line 1833, in finalize_request
response = self.process_response(response)
File "/home/kinow/.local/lib/python2.7/site-packages/flask/app.py", line 2114, in process_response
self.session_interface.save_session(self, ctx.session, response)
File "/home/kinow/.local/lib/python2.7/site-packages/flask/sessions.py", line 384, in save_session
samesite=samesite
TypeError: set_cookie() got an unexpected keyword argument 'samesite'
127.0.0.1 - - [2019-03-06 09:09:24] "POST /put_messages HTTP/1.1" 500 412 0.003503
2019-03-05T20:09:24Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode -- /home/kinow/cylc-run/five/log/job 20130808T0000Z/prep/01
[jobs-submit ret_code] 0
[jobs-submit out]
[TASK JOB SUMMARY]2019-03-05T20:09:24Z|20130808T0000Z/prep/01|0|6608
[TASK JOB COMMAND]2019-03-05T20:09:24Z|20130808T0000Z/prep/01|[STDOUT] 6608
2019-03-05T20:09:24Z INFO - [prep.20130808T0000Z] -(current:ready) submitted at 2019-03-05T20:09:24Z
2019-03-05T20:09:24Z DEBUG - [prep.20130808T0000Z] -ready => submitted
...
...
(same error happens multiple times)
...
...
127.0.0.1 - - [2019-03-06 09:43:42] "POST /put_messages HTTP/1.1" 500 412 0.001593
2019-03-05T20:43:43Z DEBUG - Performing suite health check
2019-03-05T20:43:44Z DEBUG - Performing suite health check
2019-03-05T20:43:45Z DEBUG - Performing suite health check
2019-03-05T20:43:46Z DEBUG - Performing suite health check
2019-03-05T20:43:47Z DEBUG - Performing suite health check
2019-03-05T20:43:48Z DEBUG - Performing suite health check
2019-03-05T20:43:49Z DEBUG - Performing suite health check
2019-03-05T20:43:50Z DEBUG - Performing suite health check
2019-03-05T20:43:51Z DEBUG - Performing suite health check
2019-03-05T20:43:52Z DEBUG - Performing suite health check
2019-03-05T20:43:53Z DEBUG - Performing suite health check
(only this printed for a long time, till I kill the suite)
Could you have another look, please?
@dwsutherland helped me in Riot, and now it's working. For the record, I upgraded my libraries, until they looked as
==========================================================================================
Package (version requirements) Outcome (version found)
==========================================================================================
*REQUIRED SOFTWARE*
Python (2.6+, <3)............................FOUND & min. version MET (2.7.15.candidate.1)
*OPTIONAL SOFTWARE for the GUI & dependency graph visualisation*
Python:pygtk (2.0+)......................................FOUND & min. version MET (2.24.0)
graphviz (any)..............................................................FOUND (2.40.1)
Python:pygraphviz (any).....................................................FOUND (1.4rc1)
*OPTIONAL SOFTWARE for the HTTPS communications layer*
Python:Flask (any)...........................................................FOUND (1.0.2)
Python:graphene (any)........................................................FOUND (2.1.3)
Python:urllib3 (any)........................................................FOUND (1.24.1)
Python:gevent (any)..........................................................FOUND (1.3.7)
Python:Flask-GraphQL (any).......................................................FOUND (?)
Python:Flask-HTTPAuth (any)..................................................FOUND (3.2.4)
Python:OpenSSL (any)........................................................FOUND (19.0.0)
Python:requests (2.4.2+).................................FOUND & min. version MET (2.21.0)
*OPTIONAL SOFTWARE for the configuration templating*
Python:EmPy (any)............................................................NOT FOUND (-)
*OPTIONAL SOFTWARE for the HTML documentation*
Python:sphinx (1.5.3+)....................................FOUND & min. version MET (1.8.4)
==========================================================================================
But the issue persisted. It went away after a pip install --upgrade flask-graphql
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Requirement already up-to-date: flask-graphql in /home/kinow/.local/lib/python2.7/site-packages (2.0.0)
Requirement already satisfied, skipping upgrade: graphql-core>=2.1 in /home/kinow/.local/lib/python2.7/site-packages (from flask-graphql) (2.1)
Requirement already satisfied, skipping upgrade: graphql-server-core>=1.1 in /home/kinow/.local/lib/python2.7/site-packages (from flask-graphql) (1.1.1)
Requirement already satisfied, skipping upgrade: flask>=0.7.0 in /home/kinow/.local/lib/python2.7/site-packages (from flask-graphql) (1.0.2)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from graphql-core>=2.1->flask-graphql) (1.11.0)
Requirement already satisfied, skipping upgrade: rx>=1.6.0 in /home/kinow/.local/lib/python2.7/site-packages (from graphql-core>=2.1->flask-graphql) (1.6.1)
Requirement already satisfied, skipping upgrade: promise>=2.1 in /home/kinow/.local/lib/python2.7/site-packages (from graphql-core>=2.1->flask-graphql) (2.2.1)
Collecting Werkzeug>=0.14 (from flask>=0.7.0->flask-graphql)
Using cached https://files.pythonhosted.org/packages/20/c4/12e3e56473e52375aa29c4764e70d1b8f3efa6682bef8d0aae04fe335243/Werkzeug-0.14.1-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: click>=5.1 in /usr/lib/python2.7/dist-packages (from flask>=0.7.0->flask-graphql) (6.7)
Collecting Jinja2>=2.10 (from flask>=0.7.0->flask-graphql)
Using cached https://files.pythonhosted.org/packages/7f/ff/ae64bacdfc95f27a016a7bed8e8686763ba4d277a78ca76f32659220a731/Jinja2-2.10-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /home/kinow/.local/lib/python2.7/site-packages (from flask>=0.7.0->flask-graphql) (1.1.0)
Requirement already satisfied, skipping upgrade: typing>=3.6.4; python_version < "3.5" in /home/kinow/.local/lib/python2.7/site-packages (from promise>=2.1->graphql-core>=2.1->flask-graphql) (3.6.6)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /usr/lib/python2.7/dist-packages (from Jinja2>=2.10->flask>=0.7.0->flask-graphql) (1.0)
graphene-tornado 2.0.1 has requirement Jinja2==2.9.6, but you'll have jinja2 2.10 which is incompatible.
graphene-tornado 2.0.1 has requirement werkzeug==0.12.2, but you'll have werkzeug 0.14.1 which is incompatible.
Installing collected packages: Werkzeug, Jinja2
Found existing installation: Werkzeug 0.12.2
Uninstalling Werkzeug-0.12.2:
Successfully uninstalled Werkzeug-0.12.2
Found existing installation: Jinja2 2.9.6
Uninstalling Jinja2-2.9.6:
Successfully uninstalled Jinja2-2.9.6
Successfully installed Jinja2-2.10 Werkzeug-0.14.1
Thanks @dwsutherland ! Continuing with the tests now.
@kinow - BTW I know a while back you requested that the fields be "non-null" so you didn't have to check if they were (presumably so they just aren't present when null?)...
That option (required=True or NonNull) works a little different; it just means the field has to be populated/not-null on the server end, so if such a field is requested (even amongst others) but the resolving data object doesn't have it then the response will be an error...
https://docs.graphene-python.org/en/latest/types/list-and-nonnull/#nonnull
So if we have that in fields like meta, proxies, families; does that guarantee that they will be at least an empty list/dictionary? i.e. is there any chance that the elements that have children elements to be null?
Right now when we display values to the UI, vue.js takes care of escaping characters and handling null. So so far I haven't had any issues. But if I start accessing the objects returned, and eventually one of them becomes null, then accessing something like taskA.namespace[0] would crash the app if .namespace was null.
Or is it guaranteed by GraphQL that they will always be empty data structures?
@kinow - We can set default_value for a graphene.List or GeneralScalar, but it's probably better to add in this validation (non-null for both) so the data provision always returns the desired empty type ([] and {} resp.) and error otherwise... Which puts the responsibility on the API developer to be consistent, instead of the client developer to jump through hoops.
https://graphql.org/learn/schema/#lists-and-non-null
That's perfect @dwsutherland
Lists work in a similar way: We can use a type modifier to mark a type as a List, which indicates that this field will return an array of that type. In the schema language, this is denoted by wrapping the type in square brackets, [ and ]. It works the same for arguments, where the validation step will expect an array for that value.
The Non-Null and List modifiers can be combined. For example, you can have a List of Non-Null Strings:
So at least for the fields that are lists... having a guarantee that they will be a list, and never null, would already be extremely helpful.
Which puts the responsibility on the API developer to be consistent, instead of the client developer to jump through hoops.
Nicely said! And in the end makes the app less likely to crash while used.
Just tidied up the schema; centralised the query arguments, changed globalInfo to suiteInfo, added metadata to mutations and consolidated broadcast:

Update: I've added all the Mutations.
Some of the commands are redundant for the gui:
ping task - information available in taskProxies query (exists and running)
ping suite - information available in suiteInfo query.
remove_cycle - just remove task proxies with glob for name (taskActions).
put_message - put_messages does the same and more (no need for the back compat).
The task actions have been collated into on mutation, i.e.:

(note the use of aliases to use the same mutation multiple times in one request)
NEXT: Implementing for Cylc8 using Tornado web sockets, and add subscriptions.
This is very impressive & promising @dwsutherland, great work.
There is a point I feel compelled to raise, & will pre-register here but mainly hope to initiate some discussion on in today's video conference. Namely, we need to investigate the performance of the suggested data structure for a range of data-fetching instances that are realistic (or at least as realistic as we can mock at this stage) relative to real-life Cylc usage.
I think it is crucial we invest in optimising, with respect to what we foresee to be the most important scaling aspects, the query performance. If we otherwise decide upon & go ahead using a certain structure, & realise further down the line that response times are overly large for certain scenarios & therefore that amendments are needed, it will be a lot more difficult & expensive to change, given this will heavily influence development of both the back- & front-end. Also the GraphQL schema is intrinsically tied to the suite (or "workflow" as we now call it) status summary data, so this structure is central to our whole application. So, while what you have created here seems to work & looks very positive, & I imagine there is little to debate on the fields themselves, I think we need to be very careful to check the details of the design of the structuring i.e. the nesting, levels & grouping etc. of fields.
In particular, having done some initial research into GraphQL API performance considerations, I have observed that there are at least two key problems that can be run into & we need to ensure we avoid:
1) N+1 queries: a problem not occurring for REST, as outlined very well in numerous resources, but my favourites are blog posts: here & here.
2) Exponential query scaling cases: as outlined in a recent research paper which is summarised here & particularly in a less maths-heavy way here.
You or someone else may have already looked into performance, & if so can we hear more about your methodology & findings (via another comms channel if more appropriate)? If not, can we get started on some performance profiling/tracing? There seem to be a number of free load testing tools we can try, for example, but we should have some way to gather quantitative results for:
Relating to the latter, we could consider using the utility DataLoader to reduce the number of requests & improve performance.
@sadielbartholomew - This does need to be looked at, and excuse the brevity in my response (I'll read through more thoroughly tomorrow)..
There is a point I feel compelled to raise, & will pre-register here but mainly hope to initiate some discussion on in today's video conference. Namely, we need to investigate the performance of the suggested data structure for a range of data-fetching instances that are realistic (or at least as realistic as we can mock at this stage) relative to real-life Cylc usage.
Also the GraphQL schema is intrinsically tied to the suite (or "_workflow_" as we now call it) status summary data, so this structure is central to our whole application. So, while what you have created here seems to work & looks very positive, & I imagine there is little to debate on the fields themselves, I think we need to be very careful to check the details of the design of the structuring i.e. the nesting, levels & grouping etc. of fields.
In particular, having done some initial research into GraphQL API performance considerations, I have observed that there are at least two key problems that can be run into & we need to ensure we avoid:
- N+1 queries: a problem not occurring for REST, as outlined very well in numerous resources, but my favourites are blog posts: here & here.
- Exponential query scaling cases: as outlined in a recent research paper which is summarised here & particularly in a less maths-heavy way here.
There's no actual _suggested data structure_. What I've done is define the relationships between data objects, and then show cased what can be done with these relationships.
The relationship is defined by adding a IDs to a field in the parent object type, and fields are only resolved when asked for. Hence, the level of nesting being calculated in the server is decided not by the code, but by the query; if you you want a completely flat structure, you can! if you want nesting to the extreme, that's possible! if you want to combine multiple queries to save on requests, easy!
So the need for us now, is to find out the most efficient query structure for both the server and the front end. And of course a thorough review of code/data-access efficiency in my resolvers.
I think it is crucial we invest in optimising, with respect to what we foresee to be the most important scaling aspects, the query performance.
If we otherwise decide upon & go ahead using a certain structure, & realise further down the line that response times are overly large for certain scenarios & therefore that amendments are needed, it will be a lot more difficult & expensive to change, given this will heavily influence development of both the back- & front-end.
You or someone else may have already looked into performance, & if so can we hear more about your methodology & findings (via another comms channel if more appropriate)? If not, can we get started on some performance profiling/tracing? There seem to be a number of free load testing tools we can try, for example, but we should have some way to gather quantitative results for:
- response time (latency): server response speed per request;
- throughput: how many requests the server can handle in a set time interval.
Relating to the latter, we could consider using the utility DataLoader to reduce the number of requests & improve performance.
I wasn't going to look into this until the schema model and design was settled (i.e. what we want from it), but your more than welcome to start.. We also need to add tests in after the dust settles, as the integration into the UI server is next..
Thanks!
(updated)
I've added in single node ID queries:
{
job(id: "20170101T0000+13/foo/01"){
id
startedTime
jobLogDir
}
task(id: "foo"){
id
}
taskProxy(id: "foo.20170101T0000+13") {
id
}
family(id: "FAM3") {
id
}
familyProxy(id: "FAM4.20170101T0000+13") {
id
}
}
And dependency graph edges (much like the current feed):
fragment tProxy on QLTaskProxy {
id
state
}
fragment fProxy on QLFamilyProxy {
id
state
}
query{
edges(startPoint: "20170101T0000+13", endPoint: "20170101T0000+13", groupAll: true) {
edges {
tailNode{
... tProxy
... fProxy
}
headNode{
... tProxy
... fProxy
}
cond
suicide
}
suitePollingTasks
leaves
feet
}
}
{
"data": {
"edges": {
"edges": [
{
"tailNode": {
"id": "FAM.20170101T0000+13",
"state": "succeeded"
},
"headNode": null,
"cond": false,
"suicide": false
},
{
"tailNode": {
"id": "FAM.20170101T0000+13",
"state": "succeeded"
},
.
.
.
{
"tailNode": {
"id": "baa.20170101T0000+13",
"state": "waiting"
},
"headNode": {
"id": "FAM4.20170101T0000+13",
"state": "waiting"
},
"cond": false,
"suicide": false
},
{
"tailNode": {
"id": "poll_jin.20170101T0000+13",
"state": "running"
},
"headNode": null,
"cond": false,
"suicide": false
},
{
"tailNode": {
"id": "poll_jin.20170101T0000+13",
"state": "running"
},
"headNode": {
"id": "baa.20170101T0000+13",
"state": "waiting"
},
"cond": false,
"suicide": false
}
],
"suitePollingTasks": {
"poll_jin": [
"jin",
"baz",
"succeed",
"<jin::baz>"
]
},
"leaves": [
"poll_jin",
"bar",
"qux",
"qaz",
"baa",
"foo"
],
"feet": [
"poll_jin",
"FAM",
"FAM4",
"baa"
]
}
}
}
(obviously this may change with employed graphing technology)
Last Commit to Flask PoC branch
https://github.com/dwsutherland/cylc/tree/flask-gevent-graphql
Next step UI Server!
Looks like federated GraphQL is a thing: https://medium.com/@aaivazis/a-guide-to-graphql-schema-federation-part-1-995b639ac035
Understandable as some people may have microservice architectures, with multiple GraphQL endpoints. It may be useful in case it provides links to tools that handle merging schema, caching, etc. And I agree on the performance penalty, but just looking at all possible alternatives 😬
Most helpful comment
Update: I've added all the Mutations.
Some of the commands are redundant for the gui:
ping task- information available in taskProxies query (exists and running)ping suite- information available in suiteInfo query.remove_cycle- just remove task proxies with glob for name (taskActions).put_message- put_messages does the same and more (no need for the back compat).The task actions have been collated into on mutation, i.e.:

(note the use of aliases to use the same mutation multiple times in one request)
TODO for mutation & queries:
NEXT: Implementing for Cylc8 using Tornado web sockets, and add subscriptions.