← older all writing

newer →

How I Made an App with Guaranteed Type Safety Across the Frontend/Backend Divide (and everywhere else)

August 28, 2024

The type safety in Elm is one of my favorite things about the language. Misspell the name of a field? The compiler’s type checker will tell you.

I cannot find a `first_nme` variable:

12| full_name = first_nme ++ " " ++ last_name
                ^^^^^^^^
The `Main` module does not expose a `first_nme` 
variable. These names seem close though:

    first_name

Pass a string to your function instead of an integer? The type checker will tell you.

The 2nd argument to `add` is not what I expect:

3|   add 1 "hello"
           ^^^^^^^
This argument is a string of type:

    String

But `add` needs the 2nd argument to be:

    number

Write a case statement that doesn’t cover all possible cases? Type checker.

This `case` does not have branches for all possibilities:

15|>    case responseStatus of
16|>        Succeeded ->
17|>            displaySuccessMessage
18|>
19|>        Failed ->
20|>            displayFailureMessage

Missing possibilities include:

    WaitingForResponse

These days I’m using Elixir on the backend, and while the type system with dialyxir type checking isn’t as nice as Elm, it is still very helpful in many of the same ways.

The type checking in Elm and Elixir has helped me catch a lot of bugs a lot sooner than they would have been caught otherwise. That said, the one place where those checks completely fall down is in the communication between the the frontend and the backend.

If my frontend asks the backend API for the current user, and expects a response that looks like this:

{
  user: {
    username: "some_username",
    email: "some_email@website.com",
  }
}

but I mess up on the back end and send a response that’s missing the email field:

{
  user: {
    username: "some_username"
  }
}

no amount of type-checking is going to stop that from resulting in an error on the frontend. Type checking within the frontend and the backend doesn’t cover cases of type mismatches between the frontend and the backend.

For me, trying to be diligent about test coverage hasn’t been enough to prevent the occasional “Oops I forgot to change this on both the frontend and the backend” error from happening. So I was intrigued when I read about the elm-graphql library and its promise that I would be able to generate Elm code on the frontend that was type-safe relative to a GraphQL schema that I could define on the backend.

Once I started looking into switching my application’s API to GraphQL to take advantage of this, it started me down a rabbit hole that ended with me having an application that has type safety built into every single point of interface between the backend and the frontend, and the interface boundary between TypeScript and Elm on the frontend as well.

Once I got type-safe API requests with GraphQL, I realized that I could also get some extra guarantees on the code that generates my GraphQL responses with some related tricks. Once I got those guarantees locked in, I realized I could get type-safe Elm app initialization data with similar techniques, too. After that, I incorporated the elm-ts-interop library for a final layer of type safety between my Elm and TypeScript code. After all of that, because of a combination of the elm-graphql library, the elm-ts-interop library, and a custom testing set up, data mismatches between my backend and frontend code should be all but impossible in the future, even if I stopped writing tests for the individual GraphQL operations altogether.

I think this is really cool, so I thought I would share how it works and the process of figuring it all out. If you’re interested, read on.

Part 1: Type Safety for API Requests and Responses

One of the nifty things about GraphQL is that when you create a GraphQL API, you automatically get the ability to ping GraphQL and ask it what all of the queries you can make are. If you ask it the right way, GraphQL will tell you every type of operation available and exactly what kind of data you can get in response.

The way that elm-graphql works under the hood is it runs this massive query to learn every possible query you can make to your GraphQL api, and then it uses that information to generate type-safe elm code that corresponds to it.

So if you can run a GraphQL query where the raw query looks like this:

{
  getCurrentUser {
    username
    email
  }
}

then elm-graphql will generate some Elm code that makes it so you can generate that query like this:

getCurrentUser =
    SelectionSet.succeed User
        |> with CurrentUser.username
        |> with CurrentUser.email

and because of Elm’s ironclad type-safety, you can rest assured that you’ll get a User out of it with exactly the data you expect. If you misspell the username field:

getCurrentUser =
    SelectionSet.succeed User
        |> with CurrentUser.usernom
        |> with CurrentUser.email

You’ll get an Elm error:

I cannot find a `CurrentUser.usernom` variable:

12|         |> with CurrentUser.usernom
                    ^^^^^^^^^^^^
The `CurrentUser` module does not expose a `usernom` 
variable. These names seem close though:

    CurrentUser.username

If you forget to ask for the email field:

getCurrentUser =
    SelectionSet.succeed User
        |> with CurrentUser.username

You’ll get an Elm error:

Something is off with the body of the `getCurrentUser`
definition:

11|>    SelectionSet.succeed User
12|>        |> with CurrentUser.username

The body is:

    SelectionSet (String -> User) CurrentUser

But the type annotation on `getCurrentUser` says it should be:

    SelectionSet User CurrentUser

If email can be null but your code doesn’t handle the possibility that it might be null, you’ll get an Elm error:

13|         |> with CurrentUser.email
               ^^^^^^^^^^^^^^^^^^^^^^

The argument is:

    SelectionSet (String -> User) CurrentUser

But (|>) is piping it to a function that expects:

    SelectionSet (Maybe String -> User) CurrentUser

Because the backend on this project is written with Elixir and Phoenix, I chose to use Absinthe, a popular GraphQL tool for Elixir, for the backend GraphQL functionality. So let’s get a picture of what the process of getting the current user via GraphQL looks like.

We define a GraphQL user type in Elixir with Absinthe:

object :user do
  field(:username, non_null(:string))
  field(:email, non_null(:string))
end

We define a GraphQL field for getting the current user data in Elixir with Absinthe:

field :current_user, non_null(:user) do
  resolve(fn _, _, %{context: %{current_user: current_user}} -> 
    {:ok, 
      %{
        username: current_user.username,
        email: current_user.email
      }
    }
  end)
end

We run the elm-graphql command to generate the GraphQL boilerplate code for Elm:

elm-graphql http://localhost:4000/graphql_api

We write our Elm code for sending the current_user request and creating an Elm User model from the response:

getCurrentUser =
    SelectionSet.succeed User
        |> with CurrentUser.username
        |> with CurrentUser.email

With this setup, as long as we run that elm-graphql command before deploying to production, our frontend GraphQL code can’t get out of sync with our backend GraphQL schema. If it does, the Elm compiler will spit out an error like the ones above.

It took me a while to get everything working this way for the entire backend API. There were a lot of endpoints to convert to the GraphQL way of doing things. But I liked the results — it’s always nice to have one less possible point of failure. But I realized quickly that there was a place this safety could still fall down: the frontend type-checking ensured consistency between the GraphQL schema and the frontend, but it didn’t guarantee that I would generate the right data for those GraphQL responses in my Elixir code.

As I was working, I sometimes found that even though my GraphQL schema matched up perfectly with my Elm code, the Elixir code I wrote that generated the GraphQL responses wasn’t guaranteed to match up with what I had specified in the schema.

I could no longer easily forget the e-mail field in my Elm code or my GraphQL schema without creating Elm compiler errors, but I could still forget the e-mail field in the Absinthe resolve callback:

field :current_user, non_null(:user) do
  resolve(fn _, _, %{context: %{current_user: current_user}} -> 
    {:ok, 
      %{
        username: current_user.username,
        # If the email line gets removed, 
        # everything will still compile just fine
        email: current_user.email
      }
    }
  end)
end

Granted, this is the sort of thing that writing good test cases should catch. But I wanted this kind of safety to be as automatic as possible, and to have good test cases be an extra layer on top of the automated type safety.

I wanted another layer of protection. I wanted a way to make sure that the GraphQL resolver functions were always checked automatically to make sure their fields were all present and accounted for. That they had all of the correct fields, correctly written.

Part 2: GraphQL Response Comprehensiveness Checking

There is no simple way I could think of to accomplish this automatically with Dialyzer type-checking, but remember, just as the queryability of the GraphQL schema can be used to ensure type safety between Elm and Phoenix, the schema can be queried on the Elixir side as well.

So eventually, I thought, “What if I just used that querying capability to generate and run a unit test for every single possible query in the schema?”

If I had each query ask for every single possible field, then if any of the fields were missing or misnamed, the test would fail.

describe "all GraphQL operations" do

Here is a simplified version of what the test generation code that I ultimately wrote looks like:

# First we get every single operation available in the schema
for operation <- get_all_operations(Api.Schema) do
  # Then we create a test case
  test "operation `#{operation}` responds successfully" do
    # Inside the test case, we run the operation
    %{result: {status, response}} = run_operation(unquote(Macro.escape(operation)))

    # And then we check to see that it returned successfully
    assert status == :ok
    assert Map.has_key?(response, :data)
    refute Map.has_key?(response, :errors)
  end
end

The bones are the get_all_operations function and the run_operation function. get_all_operations runs that big GraphQL schema query to get a list of all possible GraphQL operations, and run_operation recursively figures out every field that can be asked for for a specific operation, every subfield, etc, etc, down to every piece of data at every level, and it asks for all of them in the response. If any of them don’t exist, the test will fail.

For example, if a User looks like this:

type User {
  username: String
  email: String
  profile: UserProfile
}

type UserProfile {
  bio: String
  status: String
}

Then run_operation would generate a query that looks like this:

{
  user {
    username
    email
    profile {
      bio
      status
    }
  }
}

There are some other details that I’ve left out for the sake of simplicity, like generating dummy arguments for operations that need arguments. But the essentials are just the above. Figure out all the operations, figure out every piece of data you can ask them for, and ask for each one. This covers, automatically, every single operation in the entire GraphQL schema and ensures that all of them have every single field accurately specified.

So we’re doing pretty well. We have type safety between the frontend and the backend through elm-graphql, and we have comprehensiveness checking for GraphQL responses via an automatically generated testing suite. All of the interface points between the backend and frontend except one have some pretty nice guarantees.

The last place where the backend and frontend meet that so far has no guarantees is the initialization data for the Elm app.

Part 3: Type Safety for the Initialization Data

When the frontend Elm application for this project starts up, it gets initialized with data that is stored in special data attributes sent with the HTML that is generated on the server side. Here is a simplified version of what that data attribute might look like:

<div 
  id='myapp' 
  data-appdata='{
    "user":{
      "email":"some_email@example.com",
      "username":"some_username"
    }
  }'
  >
</div>

So far, there is nothing to ensure that this data is exactly the shape that the application is expecting. The guarantees we have are for requests made to the GraphQL API, and this initialization data is generated in a completely different way. No elm-graphql, no guarantees.

Once I started down this type safety rabbit hole, I really wanted to see just how far I could get. And errors where I changed something about the shape of the data the frontend needed for initialization and didn’t get that shape exactly right on the backend absolutely were happening now and again over the course of development. I wanted type safety everywhere. So I thought about it for a while and eventually realized that the only good tool I really had for safety across the frontend/backend bounary was still elm-graphql.

So was it possible I could just…make the initialization data take the form of GraphQL responses, too? That would make it so that I could get type safety guarantees for the initialization data the same way I did it for the API.

This is, admittedly, a significant diversion from the way GraphQL is supposed to be used. The point of GraphQL is to be an API, and to make the request responses adaptable to exactly the data that an individual request requires. You can ask for exactly what you need without getting anything you don’t. In this case, though, I would be getting exactly the same data 100% of the time, and it wouldn’t be as part of an actual API endpoint, I would just be server-rendering the response into my HTML.

But it could, at least in theory, work.

I decided to try it. I built a second GraphQL schema specifically for this initialization data, aka the “flags” that we pass into the Elm application.

defmodule MyAppWeb.Graphql.Flags.Schema do
  use Absinthe.Schema

  query do
    import_fields(:elm_app)
  end
end

Fortunately, from the comprehensiveness checking work from before, I already had Elixir code for making a GraphQL request that asked for every single possible field of data, so that part was no problem.

Query.comprehensive(%{
  operation_type: operation_type,
  operation_name: operation_name,
  schema: MyAppWeb.Graphql.Flags.Schema
})

Generating the appropriate corresponding Elm code was also no problem. I could generate that with the same elm-graphql command I used for the API.

elm-graphql http://localhost:4000/graphql_initialization_data

I setup the Elm app to get initialized with that GraphQL response, and to decode its content on the Elm side.

@spec graphql_initialization_data(Conn.t()) :: binary()
def graphql_initialization_data(conn) do
  {:ok, flags} =
    # Get the query string with all of the fields included
    Graphql.Flags.full_query(:query, :elm_app)
    # Generate the GraphQL response to that query with Absinthe
    |> Absinthe.run(
      Graphql.Flags.Schema,
      context: %{current_user: CurrentUser.get(conn), conn: conn}
    )

  # Encode the result into a big JSON string
  Jason.encode!(flags)
end

<!-- put the data into an attribute on the Elm app's HTML div -->

<div
  id='myapp'
  data-appdata={graphql_initialization_data(@conn)}
  >
</div>

const appDiv = document.getElementById('myapp')
// Grab the GraphQL response data from the div
const graphqlFlags = appDiv.dataset.appdata

Elm.Main.init({
  node: appDiv,
  // Pass the data to the Elm app when it initializes
  flags:  graphqlFlags
})

-- Use the decoding functions generated by elm-graphql 
-- to decode the JSON string into usable data types
graphqlFlagsDecoder : String -> Decoder Model
graphqlFlagsDecoder graphqlFlags =
    graphqlFlags
        |> JsDecode.decodeString 
            (Graphql.Document.decoder graphqlSelectionSet)

When I tried to tie this all together, I hit a snag. All the bits and pieces seemed to be in their proper places and doing what they were supposed to do, but I was getting decoding errors when I tried to actually run the Elm app with this setup.

Specifically, Elm was spitting out messages like this:

Problem with the given value:

{"email":"some_email@example.com"}

Expecting an OBJECT with a field named `email3832528868`

Why was it expecting a field named email3832528868 instead of email? What was this random string of numbers doing at the end?

I did some digging into the elm-graphql codebase, and eventually I managed to figure out what was going on. elm-graphql is written, understandably, under the assumption that the responses it gets will be in response to requests it sends. And one of the things it does to avoid naming collisions is it aliases the names of some of the response fields by appending a hash of the arguments to the end of the field name. Specifically, it does this with every operation or leaf node that has arguments. More specifically, it uses the Murmur3 hashing algorithm.

If I wanted my decoding to work, I needed to generate the responses in the form that elm-graphql expected, which meant I needed them to include these hashes just like they would if elm-graphql had made the request.

So I got to work.

I found the Elixir murmur library, added it to the project, and updated the query generation code.

# note the `operation_alias` variable
"""
#{operation_type} #{operation_name}#{args_declaration} {
  #{operation_alias}#{operation_name}#{args_passage} #{queried_fields}
}
"""

It took a fair amount of reading and tweaking to figure out exactly how elm-graphql was using and hashing the fields and arguments, and to get the parallel Elixir implementation to line everything up in a precisely identical way, but I was eventually able to get it to work. Figuring out adding those hashes did, in the end, lead to a functioning app with type-safe GraphQL responses for initialization data.

Fuck yeah.

Now, if the initialization data doesn’t exactly match what Elm is expecting, the application will error. On top of that, the same comprehensiveness checking being done for the API was trivial to adapt to the initialization data schema as well.

So, the API is type-safe. The initialization data is type-safe. The GraphQL responses for both are automatically checked for comprehensiveness. At this point, there’s only one boundary left that doesn’t have these same guarantees: the boundary between the Elm frontend code and the TypeScript frontend code.

Part 4: TypeScript-to-Elm Interop Type Safety

At the end of the day, as much as I would love to, you cannot, in fact, do everything there is to do on the frontend with Elm alone. There is some JavaScript functionality, especially of Phoenix’s own JavaScript libraries, that Elm doesn’t cover, so there must be some interoperation between Elm and (in my project’s case) TypeScript code.

Some context on Elm and JavaScript/TypeScript interoperation: in order to preserve the language’s type-safety guarantees, Elm cannot directly call JavaScript functions. Instead, events can be sent between the Elm app and other JavaScript code. For example, if I wanted to be able to send and receive websocket messages from an Elm app, I might write some code like this (example code borrowed from the elm-lang.org guide):

const app = Elm.Main.init({
  node: document.getElementById('myapp')
})

app.ports.sendMessage.subscribe(function(message) {
  socket.send(message)
})

socket.addEventListener("message", function(event) {
  app.ports.messageReceiver.send(event.data)
})

The question is how, in a situation like this, can we ensure that the information being passed into and out of the Elm app is exactly the type we’re expecting? In this case, there was a clear answer.

Enter: elm-ts-interop. elm-ts-interop works for Elm and TypeScript data types a lot like elm-graphql does for Elm and GraphQL data types. If you use elm-ts-interop’s data encoding and decoding toolkit in your Elm code, you can run a script that will generate corresponding TypeScript type definitions.

For a simple example, if I wanted to log messages and errors to the console, I might define a basic bare-bones elm-ts-interop encoder like this:

logMessageToConsole =
    TsEncode.string

logErrorToConsole =
    TsEncode.string

If those are the only outgoing events I define in my Elm code, then running the elm-ts-interop command would generate a Main.elm.d.ts TypeScript type definitions file with these definitions in it (among others):

export interface ElmApp {
  ports: {
    interopFromElm: PortFromElm<FromElm>
  }
}

export type FromElm =
  | { data: string; tag: "logMessageToConsole" }
  | { data: string; tag: "logErrorToConsole" }

and with those definitions imported, I could write an app.ts file that might look something like this:

const app = Elm.Main.init({
  node: document.getElementById('myapp')
})

app.ports.interopFromElm.subscribe((fromElm) => {
  switch (fromElm.tag) {
    case "logMessageToConsole":
      console.log(fromElm.data)
      break
    case "logErrorToConsole":
      console.error(fromElm.data)
      break
  }
})

Note the switch statement handling both possible outgoing event messages. If I make a mistake and forget to handle the "logMessageToConsole" case, TypeScript’s compiler will now report an issue. If I try to handle the "logMessageToConsole" case and pass fromElm.data to a function that doesn’t take string arguments, TypeScript will report an issue. Etc, etc. The same sorts of checks will happen for events sending data into the Elm app. For example, I might define an incoming event decoder in my Elm code like this:

focusedElementHasChanged =
    TsDecode.maybe TsDecode.string

which would lead to elm-ts-interop creating a TypeScript definitions file with this in it:

export interface ElmApp {
  ports: {
    interopToElm: PortToElm<ToElm>
  }
}

export type ToElm =
  | { data: string | JsonValue; tag: "focusedElementHasChanged" }

and I could then write some code in my app.ts file that might look something like this:

const app = Elm.Main.init({
  node: document.getElementById('myapp')
})

document.body.addEventListener('focusin', function (event) {
  app.ports.interopToElm.send({
    tag: "focusedElementHasChanged",
    data: event.target.id
  })
})

In this case, just like with the outgoing events, if I misspell "focusedElementHasChanged", or pass in the wrong type of data in the send function, or any other data type mismatching mistake I could make, TypeScript will catch it now.

This checking also works for the initialization data from Part 3. I can ensure that that encoded GraphQL data comes in as a string, and type check any additional data that comes in as well. For example, maybe my Elm app wants to know the browser’s current unix timestamp when the app initializes. For that, we might have some Elm code like this:

flags =
    succeed Tuple.pair
        |> andMap (field "currentTime" Timestamp.decoder)
        |> andMap (field "graphqlFlags" graphqlFlagsDecoder)

which would cause elm-ts-interop to create a TypeScript type definitions file with these definitions in it:

export type Flags = { currentTime: number; graphqlFlags: string }

export namespace Main {
  function init(options: { node?: HTMLElement | null; flags: Flags }): ElmApp
}

so that there would be accurate type checking when I call the Elm.Main.init function in my app.ts file:

const app = Elm.Main.init({
  node: document.getElementById('myapp'),
  flags: { 
    currentTime: new Date().getTime(),
    graphqlFlags: graphqlFlags
  }
})

Just like the others, if I were to forget a field, pass the wrong type of data to a field, or add a field that wasn’t explicitly included on the Elm side, TypeScript’s type checking will now catch it.

All Typed Up

So that’s that. Type safety between the frontend and backend via the API, via the initialization data, type safety between different languages on the frontend, and comprehensiveness checking of the returned fields in the GraphQL response generation code.

Now, when my code is compiling and all of my tests are passing, I have more confidence than ever that everything in the app is working exactly the way it’s supposed to.

I have left a lot of the nitty-gritty code out of my examples for the sake of clarity and brevity. Particularly the hundreds of lines that make the gears turn when building those massive GraphQL queries on the backend. I have thought about trying to isolate some of this into a library, but that would be a whole other project on its own.

10/10/2024 update: this code has been open-sourced as the absinthe_query_all package.

I hope that for anyone wanting to try to replicate all of this that this will serve as enough of an outline and starting point to get there. For anyone who was just curiously along for the ride, I hope you enjoyed it. Personally, getting this all wired together and working, while it took a lot more work than I originally expected, has been an incredibly fascinating process and an incredibly satisfying accomplishment. And it really does catch a lot of errors much earlier in the development process than before.

With all of this done, it feels like the biggest point of weakness in terms of type safety in my application is now the Elixir backend, if only because Dialyzer doesn’t feel like it provides the same level of safety as the rest of the system does. And I think it’s saying something that the type safety within a single language feels less reliable than the safety that has been accomplished across the application’s language and framework boundaries. That said, with the news about the upcoming built-in type system in future versions of Elixir, I’m optimistic that that experience is about to improve, too.

If you have any thoughts or questions about any of the above, please drop me a message in my contact form. I’d love to know if others have found this as interesting as I did.

← older all writing

newer →