Skip to main content

Introduction

There are many features of the zef ecosystem to explore but several are core concepts from which the rest can be understood. This tutorial aims to quickly demonstrate the core parts of the zef language. At the end, you should understand:

  • How to query a graph, by traversing relations and accessing atomic values.
  • Append new information to a graph using a GraphDelta and a shorthand syntax.
  • Understand how to use the lazy zef operator syntax.
  • Be able to share your graph with others and collaborate.

In particular, these programmatic operations and types will be covered:

  • Graph
  • ZefRef
  • GraphDelta
  • ET, RT
  • >>, <<, >, <
  • collect, run, for_each
  • Z["..."]
  • now, value, single, terminate, map, uid, all, frame

Goal

Let's create a database representation of something similar to the relation database shown at https://www.sqltutorial.org/sql-sample-database/.

This information is meant to describe a company's distribution of employees, jobs and departments. As we are using a graph database, we will not duplicate this exact structure as it is inflexible. Instead, we will show some of the unique flexibility of graph databases and zefDB's flat layout. In this tutorial, the equivalent graph layout to the above relational table layout will be:

where * indicates multiple relations may be present.

Filling in some structure

Firstly, create a blank graph and add to it some entities representing people and departments.

from zef import *
from zef.ops import *

g = Graph()

Creating a graph is done with Graph(). By default, this creates a local graph which is not synchronized with ZefHub.

z_alice = ET.Employee | g | run
z_bob = ET.Employee | g | run
z_charlie = ET.Employee | g | run
z_alex = ET.Employee | g | run

z_hr = ET.Department | g | run
z_research = ET.Department | g | run

(z_alice, RT.WorksFor, z_research) | g | run
(z_bob, RT.WorksFor, z_hr) | g | run
(z_charlie, RT.WorksFor, z_hr) | g | run
(z_charlie, RT.WorksFor, z_research) | g | run

The notation <template> | g creates an Effect to append a single fact to the graph. We run this effect, and the return value is a ZefRef which is a reference to that created entity or relation. In this list we have two "types" which are an entity type ET and a relation type RT. You can think of these as nearly equivalent to a string naming the type.

info

An ET.Something or RT.Something will add a new type or reuse an existing type. Under the hood, these are tokenized to integers for optimized accesses and are consistent across different graphs.

Don't be afraid to create new types. As you add new types, they will be added to your user namespace, and can be later "forgotten" if you would like to clean up your namespace.

A key difference here to a relational database is that the employee can have multiple relations to different departments, a feature which typically requires multiple table rows in a relational database. In the above, Charlie can work for both the HR and Research departments. In contrast, Alex works for no department (maybe they are a new hire or an intern).

The above is a little clumsy and verbose. We could compress many of the elements into fewer templates:

g = Graph()

[z_alice,z_bob,z_charlie,z_alex,
z_hr,z_research] = [ET.Employee, ET.Employee, ET.Employee, ET.Employee,
ET.Department, ET.Department] | g | run

(z_alice, RT.WorksFor, z_research) | g | run
(z_bob, RT.WorksFor, z_hr) | g | run
(z_charlie, RT.WorksFor, [z_hr, z_research]) | g | run

In the first line, a template consisting of 4 ET.Employees and 2 ET.Departments requests that 6 new entities be created. The return value is then "unpacked" into 6 different variables. The final line of code creates two relations from z_charlie to z_hr and z_research.

Even with the compressed syntax there is a problem: multiple database transactions are caused by each ... | g | run. Instead, we can write:

g = Graph()

receipt = [
ET.Employee["alice"],
ET.Employee["bob"],
ET.Employee["charlie"],
ET.Employee["alex"],

(Z["alice"], RT.WorksFor, Z["research"]),
(Z["bob"], RT.WorksFor, Z["hr"]),
(Z["charlie"], RT.WorksFor, Z["hr"]),
(Z["charlie"], RT.WorksFor, Z["research"]),

ET.Department["hr"],
ET.Department["research"],
] | transact[g] | run

z_alice = receipt["alice"]
z_bob = receipt["bob"]
z_charlie = receipt["charlie"]
z_alex = receipt["alex"]

z_hr = receipt["hr"]
z_research = receipt["research"]

Here, each item in the list passed to the GraphDelta is one fact to be added to the database, but all will be done in the one transaction. There are two kinds of facts shown here: a) a new entity (e.g. ET.Employee) and b) a new relation (a, RT.Something, c). To refer between the facts, each can be given an internal (and temporary) name. A string following an ET or RT sets this internal name (e.g. ET.Employee["alice"]) and the special object Z can be used to reference this name (e.g. Z["alice"]). After the GraphDelta has been run on the graph, a "receipt" is returned which allows access to the created entities and relations via their internal names.

The order is also not important - for example, the fact describing the relation between Z["alice"] and Z["research"] is given before the fact of "research" is determined (in ET.Department["research"]). You can use this to organise the GraphDelta as makes sense in your particular context.

tip

One can think of the Z["name"] notation as similar to using a variable named name. Because a GraphDelta uses a declaritive style (that is, it declares the facts to create) rather than an imperative style (which would list a series of actions to perform in a specific order) variables are not necessary.

For example, the GraphDelta

reciept = [
ET.Employee["alice"],
(Z["alice"], RT.WorksFor, Z["research"]),
ET.Department["research"]
] | transact[g] | run

z_alice = receipt["alice"]
z_research = receipt["research"]

could be re-imagined in an imperative style with the pseudo-code (notice the analogy between Z["alice"] and alice:

alice = create_entiy(ET.Employee)
research = create_entity(ET.Research)
link_entities_by_relation(alice, RT.Department, research)

Note this pseudo-code does not work and the functions create_entity and link_entities_by_relation do not exist!

Attaching some details

The above was only the high-level structure between some employees and some departments. Note that we didn't need to specify exactly what an ET.Employee entity is, or what properties it might have. In fact, in zefDB there is no such concept of a "property" of an entity. Instead, we use a flat representation to attach other details. For example:

[
(z_alice, RT.FirstName, "Alice"),
(z_alice, RT.LastName, "Smith"),
(z_alice, RT.Email, "alice.smith@invalid.address"),
(z_alice, RT.Email, "alice.smith52@backup.address"),
(z_alice, RT.HireDate, Time("2022-01-11")),
(z_alice, RT.Salary, QuantityFloat(73100.0, EN.Unit.AUD)),

(z_bob, RT.FirstName, "Bob"),
# ...
(z_charlie, RT.FirstName, "Charlie"),
# ...
(z_alex, RT.FirstName, "Alex"),
# ...

(z_hr, RT.Name, "HR"),
# ...
(z_research, RT.Name, "Research"),
# ...
] | transact[g] | run

We have shown a variety of types available for values. It is possible assign String, Int, Float, Time, QuantityFloat, QuantityInt to what are known as "atomic entities". In contrast, entities only have an identity but never a value. To show the current state of z_alice use the yo op:

yo(now(z_alice))                # option 1

z_alice | now | yo | collect # option 2

You should see something like

================================================== Historical  View ==================================================
===================================== Seen from: 3: 2022-01-12 12:36:28 (+0800) =================================
<...snip...>
1x: (z:ET.Employee) -------------------------(RT.WorksFor)-----------------------> (ET.Department)
(z) ----(603d3be2b5ba277a48cc05798023a1df)---> (264a2177d0a36cce48cc05798023a1df)

1x: (z:ET.Employee) ------------------------(RT.FirstName)-----------------------> (AET.String)
(z) ----(8e6809d8f232af7b48cc05798023a1df)---> (72f79ec427603a7448cc05798023a1df [latest val: Alice])
<...snip...>
1x: (z:ET.Employee) -------------------------(RT.HireDate)-----------------------> (AET.Time)
(z) ----(93b8c0bdc806fd6548cc05798023a1df)---> (21157012f1e9d0ad48cc05798023a1df [latest val: 2022-01-11 00:00:00 (+0800)])

1x: (z:ET.Employee) --------------------------(RT.Salary)------------------------> (AET.QuantityFloat.AUD)
(z) ----(061547c736a45a2648cc05798023a1df)---> (6bd9b5b7b42ea9bb48cc05798023a1df [latest val: <QuantityFloat: 73100 EN.Unit.AUD>])
<...snip...>

Alice is an employee with many relations heading outwards from the entity. These are all "at the same level", i.e. flat. The relation connecting Alice to the research department entity (of type ET.Department) is at the same level as the relation connecting Alice to the atomic entity containing her first name "Alice" (of type AET.String).

caution

If you simply ask for yo(z_alice) (try this) you will not see all of the above information. That's because each ZefRef is simply a pair of (identity, reference_frame) and z_alice was returned from a reference frame when the entity was first created, before any of Alice's properties were attached, although after the relation to the department. This should be noticable in the header of the yo output above, where the second line shows "Seen from 3". This emphasizes that the "time slice" (here 3) the information about the entity is shown from may be in the past.

Compared with the SQL table definition, Alice is missing several columns:

  • employee_id: this is not needed as every entity on the graph has a unique ID already (try uid(z_alice))
  • phone_number: we chose to not provide a phone number, which is valid on a graph. A relation database might choose to use a sentinel value to indicate a missing phone number instead.
  • department_id: instead of indirectly connecting the employee to a department, we have directly linked the employee using a relation.
  • email: we have provided two email addresses for Alice. This is easy to do in a graph database but requires much forethought in a relational database.
info

Each line in the GraphDelta that added one property (e.g. (z_alice, RT.FirstName, "Alice")) is a shorthand for several elementary operations. The same effect could be obtained via:

ae = AET.String | g | run              # Create the atomic entity without a value
ae | assign_value["Alice"] | run # Assign a value to the atomic entity
(z_alice, RT.FirstName, ae) | g | run # Connect the atomic entity to z_alice
# ... the above is the same as ...
(z_alice, RT.FirstName, "Alice") | g | run

If we did not know Alice's name and wanted to obtain this from the ZefRef of z_alice, then we could write the following:

first_name_ae = z_alice | now >> RT.FirstName | collect
name_string = first_name_ae | value | collect
# ... or ...
name_string = z_alice | now >> RT.FirstName | value | collect

Here, the operation >> says to "traverse the prior ZefRef along the following outgoing relation", that is RT.FirstName. The output of this is itself another ZefRef. There is an equivalent operation << for traversing along an incoming relation, hence z_alice >> RT.FirstName << RT.FirstName | collect == z_alice. Finally, having the ZefRef to the atomic value containing the name, we grab the value itself using value.

You might be wondering what the | collect operations are for, and why there seem to be | characters everywhere. This is due to the lazy evaluation features of zefDB. If we write

z_alice | now

or

z_alice >> RT.FirstName

we obtain an object of type LazyValue instead of a ZefRef that we might expect. This allows for the composition of expressions without evaluation, useful for functional design and an essential aspect of distributed queries and computing. To cause evaluation, we can pass this LazyValue through a collect.

tip

For named operations, like now and value, we can cause immediate evaluation using the function call syntax:

value(now(z_alice) >> RT.FirstName) == z_alice | now >> RT.FirstName | value | collect

There is no equivalent immediate evaluation for >> and collect must be used here for immediate evaluation.

To obtain a quick visualisation of the current graph layout, we can use the graphviz operator. This requires both the program graphviz and the python package graphviz to be installed. Then you can simply run:

# Make sure both the graphviz program and the python package graphviz are
# installed before running this command.
viz = g | now | all | graphviz | collect
# In Jupyter the above is sufficient, but at the REPL you should run viz.view()

This will autogenerate a visualisation with a layout that will likely require some zooming to view all of the individual parts:

Exploiting relations

Relations in all graph databases can connect entities, that is, a relation can begin or end at an entity. In Zef graphs, relations are also allowed to begin or end at a relation. This is very helpful to represent facts which are connected to a pair of entities. For example, if an employee is associated with multiple departments, like "Charlie" in our example above, we might want to attach additional facts that describe what role or what fraction of time Charlie spends in the HR and Research departments. This might be better explained by example, creating a new employee:

# We create a new employee to show how everything can be added in one step
receipt = [
ET.Employee["zaphod"],
(Z["zaphod"], RT.FirstName, "Zaphod"),

(Z["zaphod"], RT.WorksFor["zaphod-hr"], z_hr),
(Z["zaphod"], RT.WorksFor["zaphod-research"], z_research),

(Z["zaphod-hr"], RT.Role, "busy body"),
(Z["zaphod-hr"], RT.Fraction, 0.1),

(Z["zaphod-research"], RT.Role, "general nuisance"),
(Z["zaphod-research"], RT.Fraction, 0.9),
] | transact[g] | run

z_zaphod = receipt["zaphod"]

Notice that we can assign internal names to the RT.WorksFor relations and this is one method to create relations out of relations. Alternatively, we can first obtain a relation using the relation operator.

z_charlie_hr = relation(now(z_charlie), now(z_hr))
z_charlie_research = relation(now(z_charlie), now(z_research))

[
(z_charlie_hr, RT.Role, "Recruitment"),
(z_charlie_research, RT.Role, "Manager"),
] | transact[g] | run

The lack of a RT.Fraction for Charlie must be understood in the graph context. It could be interpreted as "Charlie spends equal time in all roles" or "Charlie provides floating support" or it could be that this is incomplete and represents an error in the data model. Exactly which of these interpretations apply would be part of the query model.

If we wish to remove a fact from the database, we can terminate it:

z_zaphod | terminate | run

# Alternatively...
# [terminate[z_zaphod]] | transact[g] | run

Terminating an entity such as z_zaphod will also terminate any relations that were pointing to it. Hence, the RT.WorksFor relation connecting z_zaphod to z_hr was also terminated, along with the RT.Role in turn.

tip

Knowing when to place facts on a relation between two entities, as opposed to one of the entities directly can often be determined by what behavior would be expected by terminate. For example, the RT.Role fact would become meaningless if either of z_zaphod or z_hr were removed, hence placing it on the connecting relation makes sense.

In contrast, a hire date associated with Zaphod makes more sense to be attached directly to z_zaphod, even though it might also be the same time as when Zaphod started in the HR department. If the department entity were terminated, then it would be incorrect to also remove the hire date of Zaphod.

Simple queries

So far, we have created items on the graph, and then accessed them using the ZefRef that was returned from the creation effect. If we instead approach the graph without any prior references, we need to be able to access these previously stored facts.

A zef query is not formulated like an SQL query. We can directly interact with a graph, and programmatically apply filtering, traversals and transformations. Here is how we could obtain all employee entities, and print their IDs and first names:

employees = g | now | all[ET.Employee] | collect

# Using zef ops
employees | map[lambda z: (uid(z), value(z >> RT.FirstName))] | for_each[unpack[print]]

# Using traditional python
for z in employees:
print(uid(z), value(z >> RT.FirstName))
a99ab42783bf8280aa357bee18c1170e8e888650f08dc743 Alice
8363e0e7ed1e19e1aa357bee18c1170e8e888650f08dc743 Bob
25cc025e51d68504aa357bee18c1170e8e888650f08dc743 Charlie
90fa2cd19ccf3102aa357bee18c1170e8e888650f08dc743 Alex

employees is a variable holding an object of type ZefRefs. This is a collection of ZefRef of the same reference frame, in this case the frame g | now.

Notice that there should be no "Zaphod" in the list, as this node was terminated. However, zefDB is an append-only database with time travel built in, so it is possible to go back and view the facts of the graph at an earlier time:

# This code assumes z_zaphod is the same variable as would have been created
# from the code earlier in this tutorial.

employees = z_zaphod | frame | all[ET.Employee] | collect
employees | map[lambda z: (uid(z), value(z >> RT.FirstName))] | for_each[unpack[print]]
a99ab42783bf828015fbde488dcb3f278e888650f08dc743 Alice
8363e0e7ed1e19e115fbde488dcb3f278e888650f08dc743 Bob
25cc025e51d6850415fbde488dcb3f278e888650f08dc743 Charlie
90fa2cd19ccf310215fbde488dcb3f278e888650f08dc743 Alex
2264ab71f672361615fbde488dcb3f278e888650f08dc743 Zaphod

Here, ... | frame gets the reference frame of z_zaphod, which is the frame in which the ET.Employee for Zaphod was created. We then ask for all ET.Employee in the context of this reference frame.

It is also possible ask for all employees that ever existed:

# First create a new employee:
z_trillian,_,_ = (ET.Employee, RT.FirstName, "Trillian") | g | run

employees = g | all[ET.Employee] | collect
employees | map[lambda z: (uid(z), value(z >> RT.FirstName, now(g)))] | for_each[unpack[print]]
a99ab42783bf82808e888650f08dc743 Alice
8363e0e7ed1e19e18e888650f08dc743 Bob
25cc025e51d685048e888650f08dc743 Charlie
90fa2cd19ccf31028e888650f08dc743 Alex
2264ab71f67236168e888650f08dc743 Zaphod
a5cd4cf65191e44a8e888650f08dc743 Trillian

Here, we simply ask for all ET.Employee of the graph, which not a reference frame, in contrast to g | now | all[...]. We see all employees, including both "Zaphod" and "Trillian" despite there never being a single time in which both were employeed (remember the order, the Zaphod entity was created, then terminated, then the Trillian entity was created).

caution

Objects returned from a reference frame are ZefRefs which are effectively a (identity, reference_frame) pair. The objects returned by g | all[...] are EZefRefs ("eternal ZefRefs"), which are the identity alone. More discussion on the differences between a ZefRef and EZefRef can be found here(todo).

To ask for a particular employee we can filter on these results. For example:

z_alice = (g | now | all[ET.Employee]
| filter[Z >> RT.FirstName
| value
| equals["Alice"]]
| single
| collect)

The filter op filters a list based on the predicate function/op curried into it. Note that Z >> RT.FirstName is effectively the same as lambda z: z >> RT.FirstName. Finally, the single op takes a list of a single element and returns that element. If the list is empty, or contains two or more elements, it would raise an error.

As this is programmatic, we can now continue to use the ZefRef result in z_charlie to query the graph:

print(f"The employee {value(z_alice >> RT.FirstName)} {value(z_alice >> RT.LastName)} was hired on {value(z_alice >> RT.HireDate)}")
The employee Alice Smith was hired on 2022-01-11 00:00:00 (+0800)

Complex queries

In SQL, joins between multiple tables are necessary to perform interesting queries. In graph databases, we instead traverse relations. We have already seen how we can traverse a single relation with z_zaphod >> RT.FirstName. We can also traverse multiple relations:

names = z_hr | now << L[RT.WorksFor] >> RT.FirstName | map[value] | collect
print(names)
['Bob', 'Charlie']

Here, z << L[RT.WorksFor] asks for all sources of RT.WorksFor relations incoming to z. It will always returna list, even if there is only a single relation, hence z << RT.WorksFor is equivalent to z << L[RT.WorksFor] | single. A shorthand notation also us to traverse a list of ZefRefs along RT.FirstName and obtain their names.

Finally, we have seen how to attach facts to relations, by allowing relations to start from other relations. To query for this we use > and < which are analogous to >> and <<:

roles = z_charlie | now > L[RT.WorksFor] >> RT.Role | map[value] | collect
print(roles)
['Recruitment', 'Manager']

The ... > L[RT.WorksFor] returns a list of the relations outgoing from the entity, instead of their targets, as would be returned by ... >> L[RT.WorksFor]. In fact, we can say that z >> L[RT.Something] == z > L[RT.Something] | target. An alternative way to describe this is ">> and << follow relations to one of their end, whereas > and < stop on the relation".

Labelling composite queries

Most of the time, we would like to reuse queries. As these are lazy operations, this as simple as assigning them to a variable:

get_roles_simple = now > L[RT.WorksFor] >> RT.Role | map[value]

get_roles = (Assert[is_a[ET.Employee]]
| now
> L[RT.WorksFor]
| map[Z >> O[RT.Role]] # Follow RT.Role if it exists, otherwise return None
| map[value_or["<missing role name>"]]) # Get the value, but if None return "<missing role name>"

z_charlie | get_roles | collect | run[print]

z_alice | get_roles | collect | run[print]
['Recruitment', 'Manager']
['<missing role name>']

The get_roles_simple form is nearly identical to a snippet from above. The more complex form underneath also instead an assertion (which will complain if the ZefRef passed in is not an ET.Employee) and proper handling for the case that a RT.Role is missing. This zefop can now be used more concisely in interesting cases:

all_roles = (g | now | all[ET.Employee]
| map[get_roles | collect]
| concat
| distinct
| collect)
print(all_roles)
['<missing role name>', 'Recruitment', 'Manager']

Note that we can also write a python function to implement get_roles and decorate it so that it can be used like an op:

@func
def get_roles2(z):
assert is_a(z, ET.Employee)
rels = z | now > L[RT.WorksFor]

roles = []
g = Graph(z)
for rel in rels:
if rel | has_out[RT.Role] | collect:
role = rel >> RT.Role | value | collect
else:
role = "<missing role name>"
roles.append(role)
return roles

z_charlie | get_roles2 | collect | run[print]
z_alice | get_roles2 | collect | run[print]

A function decorated with @func can also be permanently attached to a graph via the @func(g) syntax. See here(todo) for more detail.

Persisting data for later use

Although we have left this for last, it is actually one of the simplest steps in this tutorial. If you would like to save the graph you are working on to be able to return to it in the future, then simply call!!!:

g | sync | run
g | tag["employee-records"] | run

This will indicate you would like to synchronize this graph on ZefHub. The default graph permissions are private to your user. To access the graph in the future you can simply pass this tag as the requested graph:

g = Graph("employee-records")
caution

At the time of writing this, tags must be unique across all of ZefHub. It will soon be possible to set tags in a namespace for your user/project/globally. If you get an error tagging the graph, try a different unique tag name instead of "employee-records".

To share your graph with others, you can give them permissions to view/append/host the graph:

"user.name@email.com" | grant[KW.view][g] | run

"best.friend@email.com" | grant[KW.view][g] | run
"best.friend@email.com" | grant[KW.append][g] | run

These users can then access the graph via Graph("employee-records"). To remove privileges, the revoke op is similarly available.

caution

At time of writing, the error response for these grant requests is not reliable. If an email of an account is not present, you may receive a timeout error instead of a proper response.

But wait!

Eagle-eyed readers would have noticed a difference between the code in this tutorial and the initial layout image. In particular, instead of a RT.Role relation connecting a RT.WorksFor relation with an ET.Role, the RT.Role in the actual code we used simply pointed at a string.