Skip to main content

5 posts tagged with "zef"

View All Tags

· 9 min read

In this blog, we will see how we can use ZefOps to solve Advent of Code - Day 13!

We will see how ZefOps, compared to plain Python or Python with other libraries, allows us to write short, readable and composable code.

Before you start reading, get familiar with the problem we are solving today and even give it a shot here before digging in.

AOC

Problem Explanation ❓

After reading the problem statement, let's explain and simplify some points:

  • We can divide our input into 2 lists:

    • points (x,y)
    • fold instructions (axis, val)
  • Each fold instruction could only reduce or maintain the number of points, but never increase them.

  • 2 points overlapping become the same point, thus reducing the number of points.

With these points in my mind, let us attempt the solution to part 1!

Solution Breakdown 💡

Parsing the Input

We will use for this blog, the input I was given while solving the problem, that will differ from yours.

The first step of almost all of Advent of Code problems is to parse the input. This is usually a daunting problem to beginners, but with ZefOps it won't be.

As mentioned above let us manually divide our big input into 2 strings, str_points and str_folds.

from zef.ops import *       # don't forget to import ZefOps module from zef

str_points: str = """..."""
str_folds: str = """..."""

Parsing Points String

We start by parsing str_points using this simple ZefOp chain:

points =  str_points | split["\n"] | map[split[","] | map[int]] | collect                    

We can also stack the chain like this for better readability:

points = (
str_points
| split["\n"] # ["897,393", "...]
| map[split[","] | map[int]] # [[897,393], [...]
| collect
)

Let us look at the simplicity of the ZefOps in parsing the input from a single string to a list of points [x,y].

We first used split["\n"], as the name suggests to split the string into a list of individual strings on each line: ["897,393",...]

Then we used, map[split[","] | map[int]] to map each individual string onto the smaller ZefOp chain which splits the line on the character , and maps each resulting string onto an int: [[897,393], [...]

We see in this small example how composable ZefOps are. We are able to stack ZefOp chains inside ZefOp chains

Lastly, we collect to evaluate the lazy expression.

Oh you didn't know that ZefOps were lazy by nature! Well now you know. Computation will only happen on ZefOp chains once you call collect or run collect is a ZefOp used at the end of a Zef pipeline that makes it eager and returns a value. Without collect, the expression is just data.

Parsing Fold Instructions Strings

The second step is to similarly parse the fold instructions string str_folds.

foldings = (
str_folds
| split["\n"] # ["fold along y=6", ...]
| map[split[" "] | last | split["="]] # [["y", "6"], [...]
| map[lambda p: (int(p[0] == 'y'), int(p[1]))] # [(1, 6), ..]
| collect
)

As the first parsing, we split the long string over the new line character \n, then we map each string onto a small ZefOp chain split[" "] | last | split["="] which produces a list of lists of 2 strings ["axis", "value"].

We then map of these ["axis", "value"] lists, on a lambda expression that returns a tuple of (axis_encoded, value). The encoding of the axis will help us later. 0 for x and 1 for y.

Core logic

After parsing the input, it is time to write the logic of our algorithim to solve part 1!

Let me describe the algorithim flow in plain english and then we will see the equivilant Zef code.

  • For each point given a fixed axis 0 or 1 and a val to fold on,
    • If the point x or y, (0 or 1 i.e axis value) coordinate is larger than than val then return its new coordinate after the fold
    • Else just return the point itself, i.e it is on the side that won't be folded.
  • Cast the list of points on a set so that points that overlap become a single point, given that property of set.

The new coordinate of a point could be found using this simple algorithim.

Note either x or y will remain and the other will be mapped according to the equation:

  • If axis is x i.e 0 then:
    • point will be (val - (p[axis] - val), p[1])
  • Else
    • point will be (p[0], val - (p[axis] - val))

To understand how this work, make a simple matrix of points on a piece of paper and try to figure out how does a point's new location after a fold relates to its old position.

Putting this all together we can create a function fold that takes the list of points and fold_info and returns the new set of points after the fold has been performed.

def fold(points: set, fold_info: tuple) -> set:
axis, val = fold_info
new_coordinate = lambda p: ( (p[0], val - (p[axis] - val)) if axis == 1
else (val - (p[axis] - val), p[1]) )
dispatch = lambda p: new_coordinate(p) if p[axis] > val else tuple(p)
return set(
points
| map[dispatch]
| collect
)

As you can see from the code, we simply have 2 lambda functions to dispatch and return new coordinate and a simple ZefOp chain that maps each point onto the dispatch lambda function. The last step is to call set over the resulting list to remove duplicate points, points that now overlap.

Part 1

That is pretty much all the code we need to solve part 1. As stated in the problem statement, part 1 solution asks to find the number of points after the first fold instruction. So let's do that:

fold(points, foldings[0]) | length | collect 

We called fold with the starting lists of points and the first fold instruction, foldings[0]. As we said above, we will get back the set of points after the fold. All we gotta do is to just call length on that set, and that's the answer to part1!

Part 2

Part 2 is where it gets interesting as the problem statement makes it sound complicated but as a matter of fact, it is quite easy 😜

The idea is to run the points through the whole list of fold instructions in order and then somehow find a secret code 🕵🏻 encoded by the remaining points.

So this first step is to run our points through the whole list of fold instructions. But here is where it might be tricky, we run the points through one instruction and then with new set of points we run that over the next instruction. If that explanation wasn't a clue enough of what ZefOp to use, will we are going to use reduce. As the explanation suggests, with each fold instruction we reduce our points and pass that onto the next instruction. Until we have done all the instructions.

This looks as simple as:

foldings | reduce[fold][points]

After running reduce on the list of points, we need to now figure out the secret code.

So the idea, is to treat our remaining points as part of a Row x Col grid with Row being the highest y value in our list of points i.e number of rows, and respectively Col being the highest x value.

Now the fun part is to print this whole grid in our terminal, with 🟨🟨 for any (x,y) in our list of points and ⬜️⬜️ for any (x,y) that isn't in our points.

I know I know, this is all confusing but we are almost done! We put the logic above in a function output that takes the list of points!

def output(points):
for y in range((points | map[second] | max | collect) + 1):
for x in range((points | map[first] | max | collect) + 1):
if (x,y) in points: print("🟨🟨", end="")
else: print("⬜️⬜️", end="")
print("")

Let us put it all together now, where we run the result of the reduction on the output function like this:

foldings | reduce[fold][points] | run[output]

And the result is 🥁

Output

When I tell you I was shocked to see this, I am not lying. Probably one of the most exciting problems I got to do on Advent of Code!

Other Solutions 😓

For this portion of the blog, I scoured the internet for random Python solutions of AoC day 13 just to compare them with the ZefOp solution.

Bear in mind, beside the nice readability and composability of the ZefOp solution, if we remove the multiline chains, made for easier readability, the ZefOp solution stands at only 10 lines 😲!

Using Numpy, around 40 loc.

Using matplotlib around 45 loc.

Using plain python only around 50 loc.

Using plain python only around 70 loc.

Using Multiple Approaches around 35 loc.

These were some solutions that I randomly selected with no intention of offending their respective authors or critiquing their styles and solutions. The intention is to put these solutions beside the ZefOp solution to showcase its readability and easy composability.

Takeaways! 🔚

ZefOps make you think data first. It forces you to write clean, composable code!

Using ZefOps is an acquired skill, the more you push yourself to use ZefOps, instead of other libraries or even plain python, the more you discover cool shortcuts and ways of doing things you never knew existed.

Its like a superpower 🕷

· 7 min read

This is last blog of the wordle blog series. Be sure to check part 1 and part 2 before reading this blog!

In this blog, we are going to adapt the code we wrote in part 1 to create the GraphQL backend to our game Worduel 🗡

We will see how easy it is to dynamically generate a GraphQL backend using ZefGQL, run it using ZefFX, and deploy it using ZefHub.

In this blog, we won't implement all of the endpoints that are actually needed for Worduel to run, the full code, including the schema and all endpoints, is found in this Github repo.

Worduel

Let's start building 🏗

So to get started we have to create an empty Zef graph

g = Graph()

After that we will use a tool of ZefGQL which takes a string (that contains a GraphQL schema) and a graph to parse and create all the required RAEs relations, atomic entities, and entities on the graph.

Parsing GraphQL Schema

The link to schema used for this project can be found here.

schema_gql: str = "...."                # A string contains compatible GraphQL schema
generate_graph_from_file(schema_gql, g) # Graph g will now contain a copy of the GraphQL schema
schema = gql_schema(g) # gql_schema returns the ZefRef to ET.GQL_Schema on graph g
types = gql_types_dict(schema) # Dict of the GQL types connected to the GQL schema

Adding Data Model

After that we will add our data model/schema to the graph. We use delegates to create the schema. Delegates don't add any data but can be seen as the blueprint of the data that exists or will exist on the graph.

Psst: Adding RAEs to our graph automatically create delegates, but in this case we want to create a schema before adding any actual data

[
delegate_of((ET.User, RT.Name, AET.String)),
delegate_of((ET.Duel, RT.Participant, ET.User)),
delegate_of((ET.Duel, RT.Game, ET.Game)),
delegate_of((ET.Game, RT.Creator, ET.User)),
delegate_of((ET.Game, RT.Player, ET.User)),
delegate_of((ET.Game, RT.Completed, AET.Bool)),
delegate_of((ET.Game, RT.Solution, AET.String)),
delegate_of((ET.Game, RT.Guess, AET.String)),
] | transact[g] | run # Transact the list of delegates on the graph

If we look at the list of delegates closely we can understand the data model for our game.

Resolvers

ZefGQL allows developers to resolve data by connecting a type/field on the schema to a resolver. You don't have to instantiate any objects or write heaps of code just to define your resolvers.

ZefGQL lifts all of this weight from your shoulders! It dynamically figures out how to resolve the connections between your GraphQL schema and your Data schema to answer questions.

ZefGQL Resolvers come in 4 different kinds with priority of resolving in this order:

Default Resolvers

It is a list of strings that contain the type names for which resolving should be the default policy i.e mapping the keys of a dict to the fields of a type. We define the default resolvers for types we know don't need any special traversal apart from accessing a key in a dict or a property of an object using getattr

Example

default_list = ["CreateGameReturnType", "SubmitGuessReturnType", "Score"] | to_json | collect
(schema, RT.DefaultResolversList, default_list) | g | run

Delegate Resolvers

A way of connecting from a field of a ET.GQL_Type to the data delegate. Basically, telling the runtime how to walk on a specific relation by looking at the data schema.

Example

duel_dict = {
"games": {"triple": (ET.Duel, RT.Game, ET.Game)},
"players": {"triple": (ET.Duel, RT.Participant, ET.User)},
}
connect_delegate_resolvers(g, types['GQL_Duel'], duel_dict)

You can view this as telling ZefGQL that for the subfield games for Duel type, the triple given is how you should traverse the ZefRef you will get in runtime.

Function Resolvers

We use function resolvers, when resolving isn't as simple as walking on the data schema. In our example, for our mutation make_guess we want to run through special logic. Other usages of function resolvers include when the field you are traversing isn't concrete but abstract. An example is a field that returns the aggregate times by running a calculation.

Example

@func(g)
def user_duels(z: VT.ZefRef, g: VT.Graph, **defaults):
filter_days = 7
return z << L[RT.Participant] | filter[lambda d: now() - time(d >> L[RT.Game] | last | instantiated) < (now() - Time(f"{filter_days} days"))] | collect

user_dict = {
"duels": user_duels,
}
connect_zef_function_resolvers(g, types['GQL_User'], user_dict)

We are attaching the user's subfield duels to a function that traverse all of the user's duels but filters on the time of the last move on that duel to be less than 7 days old. We could have used a delegate resolver but we wouldn't be able to add the special filtering logic.

Fallback Resolvers

Fallback resolvers are used as a final resort when resolving a field. It also usually contains logic that can apply to multiple fields that can be resolved the same way. In the example below, we find a code snippet for resolving any id field.

Example

fallback_resolvers = (
"""def fallback_resolvers(ot, ft, bt, rt, fn):
from zef import RT
from zef.ops import now, value, collect
if fn == "id" and now(ft) >> RT.Name | value | collect == "GQL_ID":
return ('''
if type(z) == dict: return z["id"]
else: return str(z | to_ezefref | uid | collect)''')
else:
return "return None"
""")
(schema, RT.FallbackResolvers, fallback_resolvers) | g | run

The returns of the function should be of type str as this logic will be pasted inside the generated resolvers.

The function signature might be a bit ugly and shows a lot of the implementation details. This part will definitly be improved as more cases come into light.

Running the Backend 🏃🏻‍♂️

The final API code, will contain a mix of the above resolvers for all the types and fields in the schema. After defining all of the resolvers, we can now test it locally using the ZefFX system.

Effect({
"type": FX.GraphQL.StartServer,
"schema_root": gql_schema(g),
"port": 5010,
"open_browser": True,
}) | run

This will execute the effect which will start a web server that knows how to handle the incoming GQL requests. It will also open the browser with a GQL playground so that we can test our API.

It is literally as simple as that!

Deploying to prod 🏭

To deploy your GraphQL backend, you have to sync your graph and tag it. This way you can run your API from a different process/server/environment because it is synced to ZefHub:

g | sync[True] | run               # Sync your graph to ZefHub
g | tag["worduelapi/prod"] | run # Tag your graph

Now you are able to pull the graph from ZefHub by using the tag.

g = Graph("worduelapi/prod")

Putting it all together, the necessary code to run your GraphQL backend looks like this:

from zef import *
from zef.ops import *
from zef.gql import *
from time import sleep
import os

worduel_tag = os.getenv('TAG', "worduel/main3")
if __name__ == "__main__":
g = Graph(worduel_tag)
make_primary(g, True) # To be able to perform mutations locally without needing to send merge requests
Effect({
"type": FX.GraphQL.StartServer,
"schema_root": gql_schema(g),
"port": 5010,
"bind_address": "0.0.0.0",
}) | run

while True: sleep(1)

As a side-note: In the future, ZefHub will allow you it remotely deploy your backend from your local environment by running the effect on ZefHub. i.e: my_graphql_effect | run[on_zefhub]

Wrap up 🔚

Just like that, a dynamically-generated running GraphQL backend in no time!

This is the end of the Wordle/Worduel blog series. The code for this blog can be found here.

· 13 min read

The purpose of this blog is to show off the ease of importing from external data into Zef and using a proxy view to expose Zef to 3rd-party packages. It is also a diary of how the development of these features worked, allowing me to polish them as we go!

There are many reasons you could want to expose a Zef graph using networkX:

  • You have existing code that uses the networkX framework
  • You want to use a 3rd party library that can accept a networkX graph, for example a plotting library like plotly.
  • You want to use a graph analysis algorithm that isn't yet available in Zef.

The outline of this process is:

  1. Get some external data
  2. Import it into a Zef graph
  3. Expose the data using a "proxy"
  4. Do the analysis
  5. Spit out some pretty visualisations

In this post, I will focus only on the highlighted points 1 and 2, i.e. getting your data into a Zef graph.

1. Get some external data

We'll use the Northwind dataset as an example, which describes sales and orders for a company. This is available from here https://code.google.com/archive/p/northwindextended/downloads. To convert this to CSV files, I wrote a little script available export.py, which creates a temporary SQLite DB to export each table as its own CSV file. If you'd like to follow along as home, to save you the bother I've made these CSV exports available northwind.zip.

After running this script, we find there are 14 CSV files in this dataset. I'll use products.csv to demonstrate some of the features below and its first few rows look like:

ProductIDProductNameSupplierIDCategoryIDQuantityPerUnitUnitPriceUnitsInStockUnitsOnOrderReorderLevelDiscontinued
1Chai8110 boxes x 30 bags18390101
2Chang1124 - 12 oz bottles191740251
3Aniseed Syrup1212 - 550 ml bottles101370250
..............................

So long as the CSV files can be imported using the pandas python module, then we are able to import these into a Zef graph.

2. Import into Zef

If you have seen the "Import from CSV" how-to in our docs, you might first think to jump straight to the Zef pandas_to_gd op to import it. However, the CSV files in this dataset are in a representation that is best suited for SQL, with columns representing both fields AND relationships between different entities.

Instead, we need to provide a declaration for the set of tables that the CSV files represent, which all together produce the right graph structure. For example, we should be able to specify the purpose of each column in the products.csv table to be something like:

ProductIDProductNameSupplierIDCategoryIDUnitPrice...
PurposeIDFieldEntityEntityField...
ETET.ProductET.SupplierET.Category...
RTRT.IDRT.ProductNameRT.SuppliedByRT.InCategoryRT.UnitPrice...
Data typeIntStringIntIntQuantityFloat.dollars...

The above is just a layout for me to organise my thoughts on what each column should do. I could have instead said the above in sentences:

  • The ProductID column should represent the ID of the ET.Product entities which this table defines in each row. The IDs will be stored on the Zef graph using RT.ID relations.
  • The ProductName column gives fields for each row, which will have a relation type of RT.ProductName and be a string.
  • The SupplierID column is a different entity of type ET.Supplier, uniquely identified by the integer in this column. It is linked to the ET.Product via a RT.SuppliedBy relation.
  • ...

We will need to introduce a couple of more "purposes" in a moment, but otherwise this has nearly covered all of the main uses of the dataset.

Writing this all out by hand is tedious, so I wrote up a quick parser in Zef to produce the initial layout for you and then allow you to edit it. I made the following by running:!!!

from zef.experimental import sql_import

decl = sql_import.guess_csvs("products.csv")

decl | write_file["guess.yaml"] | run

then editing the file guess.yaml a little bit:

default_ID: ID
definitions:
- tag: products
data_source:
filename: products.csv
type: csv
kind: entity
ID_col: ProductID

cols:
- name: ProductID
purpose: id
RT: ID
data_type: Int

- name: ProductName
purpose: field
RT: ProductName
data_type: String

- name: SupplierID
purpose: entity
ET: Supplier
RT: SuppliedBy
data_type: Int

I then wrote up a function import_actions which will take this declaration of how to map the CSV data to the graph and do the busy-work to product a graph. Here is how to run that:

decl = "edited.yaml" | load_file | run | get["content"] | collect
g = Graph()
actions = sql_import.import_actions(decl)
actions | transact[g] | run

The import_actions function will use the information in decl to find the files from which to read the raw data.

While the above works okay on the products.csv table, I needed to include two more things to allow the import of all CSV files simulataneously: a way to tag a table as a "entity" or "relation" style and a source/target purpose.

Interlude into GUI land

Editing the yaml file by hand is clunky. However, it does have the benefit of being easy to a) read as plain text, and b) save the declaration of the import without custom data structures.

To get rid of the clunkiness, and also indulge myself to explore a new package, I decided to write a little UI using pyimgui to better edit the file. Try it out on my sample sql_import.yaml with:!!!

python -m zef.experimental.sql_ui.wizard sql_import.yaml

You should see something like:

The UI is in its early stages so might not work fully. It also requires installing the pyimgui module with sdl support (i.e. pip3 install pyimgui[sdl2] or the like). Weirdly this seems to have problems with python version 3.10 on macos... we will be looking into this.

To read more about using this GUI, check out our docs page on Multiple interlinked CSVs

"relation" kinds of tables

If there is a one-to-many or many-to-many relationship between two objects in a SQL database, then there are various ways to represent this. In the Northwind example, the "Order Details" table demonstrates this, as it allows each Order to contain multiple different products (the one-to-many relation), with particular (order,product)-specific prices, quantities and discounts.

OrderIDProductIDUnitPriceQuantityDiscount
102481114120.0
10248429.8100.0
102487234.850.0
102491418.690.0
...............

Hence we can think of this table as not being entity-centric but rather relation-centric. Any fields (e.g. the UnitPrice column) is information that should be attached to the relation between the order and the product. As Zef graphs we always have directed relations, so we need to also specify a "source" and "target" column:

OrderIDProductIDUnitPriceQuantityDiscount
PurposeSourceTargetFieldFieldField
ETET.OrderET.Product
RTRT.UnitPriceRT.QuantityRT.Discount
Data typeIntIntQuantityFloat.dollarsIntFloat

The import!

My full processing of the CSV files is shown below. My edits to the sql_import.yaml file are available here: sql_import.yaml.

from zef import *
from zef.ops import *
from zef.experimental import sql_import

decl = sql_import.guess_csvs("*.csv")
decl | write_file["sql_import.yaml"] | run

# ... edit sql_import.yaml externally...

decl = "sql_import.yaml" | load_file | run | get["content"] | collect
g = Graph()
actions = sql_import.import_actions(decl)
actions | transact[g] | run

After this import, the graph looks like:

>>> yo(g)

<...snip...>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Atomic Entities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[6007 total, 6007 alive] AET.String
[2985 total, 2985 alive] AET.Float
[2527 total, 2527 alive] AET.Int
[2232 total, 2232 alive] AET.QuantityFloat.dollars
[2469 total, 2469 alive] AET.Time

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Entities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1660 total, 1660 alive] ET.Order
[154 total, 154 alive] ET.Product
[12 total, 12 alive] ET.Employee
[93 total, 93 alive] ET.Customer
[53 total, 53 alive] ET.Territory
[4 total, 4 alive] ET.Region
[3 total, 3 alive] ET.Shipper
[29 total, 29 alive] ET.Supplier
[8 total, 8 alive] ET.Category

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Relations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[4 total, 4 alive] RT.RegionDescription
[4 total, 4 alive] (ET.Region, RT.RegionDescription, AET.String)
[2155 total, 2155 alive] RT.Order
[2155 total, 2155 alive] (ET.Order, RT.Order, ET.Product)
[9 total, 9 alive] RT.ReportsTo
[9 total, 9 alive] (ET.Employee, RT.ReportsTo, ET.Employee)
<...snip...>

If you don't want to do the import yourself, you can also look at the graph that I created at the tag zef/blog/northwind, that is, you can access it via:

g = Graph("zef/blog/northwind")

Final comments

As always, answering a simple question has opened up many more questions which I can't help myself but discuss below...

One final purpose: "field_on"

There's another addition that's needed to import many production SQL tables: a way to handle databases with optimised tables.

While we do not need this kind of purpose for the Northwind dataset, it is useful to mention it to "complete the story". The above example of the "Order Details" table which produced pure relations (and not entities) is because the Northwind dataset is in "first normal form".

A "denormalised" dataset could also be used for performance reasons. For example, obtaining all of the products ordered by a particular customer would require an SQL JOIN query to obtain:

SELECT DISTINCT(order_details.ProductID)
FROM order_details INNER JOIN orders
ON order_details.OrderID = orders.OrderID
WHERE orders.CustomerID = "VINET";

Instead, the join of orders and order_details could itself be stored as a table (or view) in the SQL database:

OrderIDCustomerIDOrderDate...ProductIDQuantity...
10248VINET1996-07-04...1112...
10248VINET1996-07-04...4210...
10248VINET1996-07-04...725...
10249TOMSP1996-07-05...149...
10249TOMSP1996-07-05...5140...
.....................

The concept of avoiding joins is rather weird when coming from a graph perspective, as the equivalent of joins are trivial on a graph. However, you may not have any choice in the data you want to import. In this case, we can mark those columns as belonging to the relation between the order and product using "field_on":

OrderIDCustomerIDOrderDateProductIDQuantity
PurposeIDEntityFieldEntityfield_on
ETET.OrderET.CustomerProduct
RTIDCustomerRT.OrderDateRT.ProductRT.Quantity
TargetProductID
Data typeIntIntTimeIntInt

Here the "Quantity" column cannot be set as a field as it would have multiple values for the same ET.Order. So instead we designate that it should be attached to the "target" column ProductID. This means we could write a Zef query for the total quantity of an order as:

z_order > L[RT.Product] >> RT.Quantity | value | add | collect

Batteries-included?

The imported tables still duplicate a lot of information. For example, each of the ET.Order, ET.Supplier, ET.Customer, ET.Employee have a RT.Region which is a scalar string. This is not very connected, as it would be better to have these RT.Regions point at a ET.Region. That way, we could ask for all suppliers in the same region as a customer without doing string matching.

I was tempted to add this in as another purpose, something like entity_from_matching_field. But this would have degenerated into providing an arbitrary language to describe the endless possible databases out there. Instead, we can post-process the imported Zef graph, which gives us access to the entire Zef ops capabilities.

You might be worried about exposing the gorey details of the import process and only the post-processed data. This is easy, if we export only g | now to a new graph after performing the post-processing.

A comment on speed

If you run the commands in this blog post on a large dataset, then you are going to be waiting several minutes for the import. Even as part of writing this post up, and using the Northwind dataset which is relatively small, I found I had to optimise some aspects of the GraphDelta implementation.

This is largely due to the current pure-python implementation of the evaluation engine for ZefOps. In the future this will be implemented in C++ along with the core of the GraphDelta code.

Import directly from SQL

To be honest this blog post is a little crude, using CSV files which lack type information rather than the SQL source directly. The information available in the SQL schema can also assist automatic detection of connected entities and whether a table represents a single entity or many-to-many relations.

The benefit of handling CSV files is it allows us to accomplish almost all kinds of imports, so long as we provide enough additional information. If we supported only SQL, then this would limit the flexibility.

The other reason that the SQL schema is not used directly, is that it's a fair bit more work to support a SQL connection or export. But our intent is to extend these tools and make imports as close to one-click as possible!

Analysis using external tools

I'll leave this for the next blog post! But as a sneak preview...

Wrap up

If you'd like to find out more about Zef and ZefHub (and get early access), get us out at zefhub.io.

· 6 min read

In the last blog post, we created a console-playable Wordle game in few lines of Python using ZefOps. In this blog post, we will write a Wordle solver (or more like your own Wordle assistant) that suggests what your next move could be 😎

So before digging deeper, be sure to check part 1!

Wordle

What will we do? 🤔

Our aim by the end of this blog post is to write a solver that given a list of guesses + discarded letters = a list of possible answers. So you can think of it as an eliminator of bad guesses given our previous guesses.

The idea is pretty straightforward, and given our first one or two guesses are good enough, we can arrive at the correct guess in around 4 guesses 😲 wordlist Let's look at an example:

["a", "b", "3", "c", "5"] | filter[is_alpha] | collect      # returns ["a", "b", "c"]

Each item of the list passes through the filter's predicate which evaluates to a boolean value True or False. If the value is True the item passes the filter, otherwise it gets discarded.

PS: is_alpha is a ZefOp that takes a string and checks if its is only consists of english alphabet and then returns True or False

So if we pass the wordlist through enough filters we will reduce our wordlist to only the possible guesses at that stage. So the more information we have, i.e correctly placed letters or misplaced letters, the more filters we can create.

Let's start building 🏗

  • Start by importing ZefOps and loading our word list
from zef import * 
from zef.ops import *

url = "https://raw.githubusercontent.com/charlesreid1/five-letter-words/master/sgb-words.txt"
wordlist = url | make_request | run | get['response_text'] | split['\n'] | map[to_upper_case] | collect
  • Let's add our discarded letters and guesses from the game we are stuck on
discard_letters = 'ACLNRT'

guesses = [
["_", "_", "_","_","[E]"],
["_", "U", "_", "[E]","S"]
]
  • Now let's write our filters generator ⚙️
def not_contained_filters(discard_letters: str):
return discard_letters | map[lambda c: filter[Not[contains[c]]]] | collect

def correct_or_misplaced_filters(guess: str):
misplaced = lambda p: [filter[Not[nth[p[0]] | equals[p[1][1]]]], filter[contains[p[1][1]]]]
correct = lambda p: [filter[nth[p[0]] | equals[p[1]]]]
return (guess
| enumerate
| filter[Not[second | equals['_']]]
| map[if_then_else_apply[second | is_alpha][correct][misplaced]]
| concat
| collect
)

Believe it or not, this is all we need. It might look complicated but it is simpler than it looks. So let's dissect it 🗡

Basically, these 2 functions use ZefOps to generate ZefOps of type filter with baked-in predicate functions given both the discarded_letters and our previous guess.

Function: not_contained_filters

Let's look at the first function not_contained_filters. The function takes the discarded_letters as a string and maps each letter c to a filter function that has a predicate function Not[contains[c]]] which is the ZefOp Not taking as an argument another ZefOp contain.

If this looks complex try to read it as an english sentence. filter what doesnot contain the letter c So given this example ["ZEFOP", "SMART"] | filter[Not[contains["A"]]] | collect only ZEFOP will pass the filter.

Do you wanna guess the output when we pass discard_letters = 'ACLNRT' to this function as an argument?

Well, we are mapping each letter of that string to a filter so we end up with this OUTPUT:

[
filter[Not[contains['A']]],
filter[Not[contains['C']]],
filter[Not[contains['L']]],
filter[Not[contains['N']]],
filter[Not[contains['R']]],
filter[Not[contains['T']]]
]

A list of filters one for each discard letter. This way any word that doesn't pass all these filters will not be part of our possible answers.

Function: correct_or_misplaced_filters

Let's look at the second function 'correct_or_misplaced_filters', which is pretty similar to the one above. We are returning filters for when we have a correctly placed letter or a misplaced letter. This function could be divided into 2 other functions, but with the if_then_else_apply ZefOp we can simply do it in the same function without duplicating the logic. The apply at the end of of the ZefOp name means we apply the passed functions to the first argument.

Let's take a closer look at the return statement of this function and run our second guess from our guesses list above to walk through the function logic:

OUTPUT:

misplaced = lambda p: [filter[Not[nth[p[0]] | equals[p[1][1]]]], filter[contains[p[1][1]]]]
correct = lambda p: [filter[nth[p[0]] | equals[p[1]]]]

guess = ["_", "U", "_", "[E]","S"]
(guess # ["_", "U", "_", "[E]", "S"]
| enumerate # [(0, "_"), (1, "U"), (2...]
| filter[Not[second | equals['_']]] # [(1, "U"), (3, "[E]"), (4, "S")]
| map[if_then_else_apply[second | is_alpha][correct][misplaced]] # [[filter[nth[p[0]] | equals[p[1]..]
| concat # [filter, filter, filter..]
| collect
)

The comments show the transformation the guess input is going through until we get out a list of filters that contain predicate functions that satisfy correct and misplaced letters requirements.

So the output of this snippet given the guess = ["_", "U", "_", "[E]", "S"] is this OUTPUT:

[
filter[nth[1] | equals['U']], # Second letter should equal U
filter[Not[nth[3] | equals['E']]], # Fourth letter should NOT equal E
filter[contains['E']], # Word contains an E
filter[nth[4] | equals['S']] # Fourth letter should equal S
]

Put it all together 🧩

When we put both of these functions along with our 2 inputs we end up with a pipeline of filters that we can run the whole wordlist through.

filters_pipeline = [
filter[length | equals[5]], # Just making sure it is a 5 letter word
not_contained_filters(discard_letters),
guesses | map[correct_or_misplaced_filters] | concat | collect
] | concat | as_pipeline | collect # Flatten all sublists and turn them into a pipeline

We are creating a list of filters coming from the discarded letters and the mapping of each guess in our guesses.

PS: as_pipeline takes a list of ZefOps and returns a single ZefOp that we can call or pipe things through

possible_solutions = wordlist | filters_pipeline | collect
possible_solutions | run[print]

We pipe our entire wordlist through the filters pipeline to end up with all possible solutions. In this example, given our wordlist and guesses+discarded letters the possible solutions are: ["GUESS"], who could have guessed that 😉

Wrap up 🔚

And just like that we used ZefOps to generate ZefOps that are used with other ZefOps on our wordlist.. Pheww, how Zef!

Given this code is pure ZefOps and ZefOps compose, we can reduce it into one line. But let's not do that, or may be...

Worduel 👀

In part 3 of this series, we are going to take this to the next level, where we will use ZefDB, ZefGQL, ZefFX to create a competitive web game of Wordle where you can take your friends, collegues, or your mom to a game of Worduel 😜

· 8 min read

In this blog post we are going to build a console-playable Wordle game using Python and zef in 30 lines 🔥

The purpose of this blog is to showcase the usage of ZefOps to create easy, readable, composable, extendable, highly-decoupled, and [enter more buzz words here 😍] code!

So before getting started, let's quickly review what is Wordle?

Wordle

What is Wordle? 🤔

Wordle is a simple game where you have six chances at guessing a five-letter word.

After each guess, the game will give you hints.

  1. A green tile means that you guessed the correct placement of a letter.

  2. A yellow tile means that the letter is in the word, but your guess had the wrong position.

  3. And lastly, a grey tile means the letter is not in the word.

Rules 🔢

So the rules are pretty straightforward. Given we are playing the game in a console, let us remap the rules a bit.

After each guess,

  1. A letter appearing by itself == Green tile 🟩

  2. A letter appearing with [ ] around it == Yellow tile 🟨

  3. A dash appearing means == Grey tile ◻️

Building the game 👷🏻

To run the code below, you'll need early access to Zef (it's free) - sign up here!

  • Let's import ZefOps. Any operator we might need should be there 😜
from zef import * 
from zef.ops import *
  • Then we load our 5-letter word list

For this example I am using this wordlist I found on Github.

Using ZefOps, we can either load the list from a link, a file stored locally, or simply from your clipboard 😲

# Load from request response (Choose this one)
url = "https://raw.githubusercontent.com/charlesreid1/five-letter-words/master/sgb-words.txt"
wordlist = url | make_request | run | get['response_text'] | split['\n'] | map[to_upper_case] | collect

# Load from local file
wordlist = 'wordlist.txt' | load_file | run | get['content'] | split['\n'] | map[to_upper_case] | collect

# Load from clipboard
wordlist = from_clipboard() | run | get['value'] | split['\n'] | map[to_upper_case] | collect

We can already see the power and ease of ZefOps. In just one line we are able to load a string of words, split it on new lines, then convert each string to uppercase.

We will get more familiar with the lazy nature of ZefOps and why we need "collect" in a bit. But notice, to go from one stage to another aka transform your input, you just have to pipe | operators. This way your input will flow through your operator chain aka pipeline giving you the output you need.

  • Now we initialize some game related variables
# Game Variables
counter, to_be_guessed = 6, random_pick(wordlist)
discard_words, discard_letters, guesses_list = set(), set(), []

Btw, random_pick is also a ZefOp. Given a list, a string, or a simple iterable, it returns a random item/character from the input. So here, given our words list, we choose a random word that we will have to guess.

Also notice we can call ZefOps similar to a function using zefop(args).

  • Now for some ZefOp 🪄 magic 🪄
# Predicate function constructed using zefops
is_eligible_guess = And[length | equals[5]][contained_in[wordlist]][Not[contained_in[discard_words]]]

Using ZefOps, we're able to pack a lot into a single line (and still maintain readability). Let's look into it:

  1. Firstly, this ZefOp declares a function that takes an input string and checks if its length is equal to 5 and is contained in the word list and it is not a previous guess.

  2. If you pay attention we didn't have to pass an input yet or even compute a result. You can think of this as a mini program, one that we can use in multiple places, and extend easily by piping more ops into it. We've also just designed our very own ZefOp composed of other ZefOps. The beauty of it all it is just data 0️⃣1️⃣ more on that later...

  3. "CRANE" | is_eligible_guess turns into a LazyValue. Put simply, a value + zefop is a LazyValue 🥱. A LazyValue is not computed until we do | collect to make it eagerly execute. We will see more value out of LazyValues later on.

  • Now for the meatiest 🥩🥩🥩 part of the code
def make_guess(guess, to_be_guessed, discard_letters):
def dispatch_letter(arg):
i, c = arg
nonlocal to_be_guessed
if c == to_be_guessed[i]: # Rule 1 🟩
to_be_guessed = replace_at(to_be_guessed, i, c.lower())
return f" {c} "
elif c in to_be_guessed: # Rule 2 🟨
to_be_guessed = replace_at(to_be_guessed, to_be_guessed.rindex(c), c.lower())
return f"[{c}]"
else: # Rule 3 ◻️
if Not[contains[c.lower()]](to_be_guessed): discard_letters.add(c)
return " _ "

return (guess # "CRANE"
| enumerate # ((0, "C"), (1, "R"), ...)
| map[dispatch_letter] # ["_", "[R]", "_", ...]
| join # " _ [E] _ _ _ "
| collect
), discard_letters

This is the main logic behind Wordle. After each guess, we match our guess characters with the actual word. The focus of this function is in the return statement of the function. It is a chain of ZefOps that takes our "guess" as an input.

We run our guess word through enumerate to get back a list of tuples of (character, index). We then pass that list to map, which is a ZefOp that, as the name suggests, maps each item of an input to an output given a dispatch function which we pass as the second argument using [ ]. The output of map is always a list of the individual outputs, so we pipe through | join to connect the list as a string. collect is finally piped so we evaluate this LazyValue.

discard_letters isn't part of the game logic but just makes it more playable.

Psst: join can be called with a joiner i.e join["_"] instead of the default which is empty string join[""]

Polishing and running ✨

Now we have to use these 2 simple functions along with couple of ifs and some more ZefOp 🪄 magic 🪄 to make the game playable!

"~Welcome to Wordle~" | run[print]    # boujee way of printing using zefops

while counter > 0:
guess = input("Your guess:").upper()
if is_eligible_guess(guess): # Calling our predicate zefop on the guess
counter -= 1
discard_words.add(guess)
guess_result, discard_letters = make_guess(guess, to_be_guessed, discard_letters)
discard_string = discard_letters | func[list] | sort | join | prepend [' [Not in word]: '] | collect
guess_string = guess_result | pad_right[20] | append[guess + discard_string] | collect
guesses_list = guesses_list | append[guess_string] | collect
guesses_list | join['\n'] | run[print]

if guess == to_be_guessed:
f"Your guess {guess} is correct!" | run[print]
counter = -1
else:
f"{'Previous guess' if guess in discard_words else f'Invalid guess {guess}'}! Try again." | run[print]

if counter == 0: f"Your ran out of trials, the word was {to_be_guessed}" | run[print]
  • Okay, so what's going on?

"While" allows you to loop until a condi... I am joking 🤡 I know you are looking at those lines with zefops.

  1. discard_string is a string of all the letters we guessed over time that aren't part of the word we are trying to guess. We compute it by taking the set of discard_letters piping it through func[list] which is equivalent to casting set to list. Then we sort it and join it into a string. Finally we prepend another string to the string we created. prepend/append work on both list and string.

  2. guess_string is the guess_result piped through pad_right which pads our string with whitespace to a specifc length. Then we append our guess and discard_string to the padded string.

  3. guesses_list is appended with the guess_string. This is just to print out the full list of guesses nicely in the console after each guess.

  4. join appears again but this time with the new line joiner. We pipe through run[print] to perform a side effect of printing to the console. collect is used when computing a result.

Wrap up 🔚

Just like that, in 30 lines (or less) we created Wordle in Python using ZefOps!

The takeaway from this is how easy ZefOps are. They are short. They are composable. They are lazy. They are data. They are extensible. They are pure. They are Zef!

If you'd like to find out more about Zef and get early access to run this code yourself, sign up on zefhub.io. It's completely free, we won't bombard you with emails, and we'll get you set up quick!

Stuck? 😰

If you are stuck and want some help, in part 2 we create a Wordle solver using Python and ZefOps.