Musing on Coding in Ceptr: Making Sense of Data

This is #3 in a series of dev posts, which just jump right into deep end of the techie pool.

Everything in Ceptr gets built out of semantic trees.  Why? Alan Perlis’  epigram  #106 on programming hints at the answer:

It’s difficult to extract sense from strings, but they’re the only communication coin we can count on.

Ceptr creates a new “communication coin” that we can still count on (in both senses) but from which it’s much much easier to “extract sense.”  In fact, the “sense” gets baked in at the bottom layer in that everything in Ceptr is built out of semantic trees.  Extracting “sense” however has lots to do with pattern matching.  Regular Expressions provide a rich expressive capacity for discovering the patterns in text.  We have build Semantic Tree Regular Expressions (Semtrex), as way to do the same for our semantic trees.  As a first introduction to Semtrex, you should read Marc Clifton’s very good and detailed article. It gives some good backstory on semantic trees with examples. In this post I want to share a little about some of the coding issues involved in implementing them, which I’m very familiar with at the moment because of some bugs that recently showed up in my code!

My implementation is heavily based on the code and theory provided by Russ Cox’s  great article on regular expression matching.   As he describes, regular expression implementations usually convert the regular expression to a state machine that then gets run against a string to see if it matches.  For example, from  Cox’s article here’s the state machine for the regex abab|abbb:

RegEx State Machine

In these state diagrams a labeled transition means: read a character and if it matches the label, move to the next state.  As he explains in the article, for any state that has two branches we can theoretically assume that the machine that magically chooses the correct branch for any given string.  Practically we implement this magic by trying both branches either sequentially with a backtracking algorithm, or by running both branches simultaneously.

Now here’s the thing about regular expressions for semantic trees: because trees are not linear (like strings) the transition from one state to the next also has to include some instructions about how move through the tree.

Here’s an example state diagram for the Semtrex:  /A/(B|C/D,E)


(Note that any up arrows which pop up the number of levels indicated do also have an implied next child as well.)

Debugging the code that runs this machine is tricky, because it’s hard to see the state of the whole machine in gdb debugger.  What I did to make this easier was come up with a textual representation for the state machine.  The machine above looks like this:


Each state is an item in parens.  If the state has a transition that matches a symbol the symbol is in the parents.  Special states are splits (s) and match (m).  The trick about splits is how to represent both branches.  What you see is that one of the branches follows right out of the split, but the second comes later in square brackets.  The outgoing arrow just points to “X”, which means some other part of the tree, so you have to know which… I know kinda ugly, but it’s enough for debugging

Below is actual debugging output (only slightly doctored) of that machine being run against the tree (A (C (D)) (E)).  The output uses ANSI color coding to show in red the current state and tree position.

D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: transition: down
D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: pushing split branch for backtracking to state (C)
D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: Fail: so popping to--(C (D))
D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: transition: down
D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: transition: popping -1
D_STX_MATCH: FSA:(A)-Down->(s)-->(B)-Next->(E)-Up1->(m)[-->(C)-Down->(D)-Up1->X]
D_STX_MATCH: tree:(A (C (D)) (E))
D_STX_MATCH: transition: popping -1
D_STX_MATCH: Matched!

Adding this visualization was enough for me to find the bug I was having where the state machine wasn’t popping back up enough in some circumstances, and it’s revealed another bug that I’m still working on in the case of grouping where we collect up matched data.

If this kind of stuff intrigues you, feel free to show up and ask questions at one of our bi-monthly dev chats which happen every 1st and 3rd Wednesday of the month at 3pm Eastern/Noon Pacific.

Hope to see you there!


Musings on Coding in Ceptr: Signaling

This is #2 in a series of dev posts, which just jump right into deep end of the techie pool.

At the core of Ceptr you will find agent-like receptors which send each-other messages.  We’ve provided a simple yet sufficient instruction set for rich message programming, taking into account both synchronous and asynchronous programming needs.

The overall signaling model looks like this: Signal_Aspect_ReceptorMembrane_Processing

  1. signals are sent on a carrier
  2. “through” an aspect
  3. and optionally keyed to a conversation or request
  4. are processed in the receptor’s membrane
  5. by being matched against expectations associated with the aspect/carrier/conversation
  6. each of which trigger an action which either sends more signals or transforms the receptor’s state

Requests:  Some signals are requests, which, when received create a stateful condition of expecting a response to the requesting signal identifier.  Additionally requests have end conditions, i.e. timeouts, response counts (there can be more than one), etc..

Listening: The LISTEN instruction allows you to add expectations (with their patterns and actions) on the flux (the receptor’s membrane).

Conversations: Some signals are identified as being part of a conversation.  Conversations are created with the CONVERSE scoping instruction.  You can think of this instruction kind of like the “block” or “scope” construct that exists in most programming languages which creates a lexical scope for variables.  CONVERSE creates a scope for sending and listening to signals.  CONVERSE instructions can be nested, such that the system recognizes one conversation as a sub-part of another.  The system also allows having multiple SAY/REQUEST instructions inside of a conversation to different destinations.  This provides underlying support for various multi-party protocols.

Sync/Async: The signaling instructions have waiting and non-waiting versions.  This allows you to program in both synchronous and asynchronous style.  For example the REQUEST instruction has an optional action parameter.  If provided, the REQUEST instruction reduces to the request signal’s identifier, and the action process gets executed sometime later when the response arrives.  Otherwise the tree with the REQUEST instruction goes into the waiting state and later, when the response message arrives, reduces directly to its body.  The action parameter to the LISTEN instruction is similarly optional and indicates whether the tree with the LISTEN should wait or run asynchronously.  Similarly for conversations, the CONVERSE instruction can be set to wait or not for the conversation to be completed.

Addressing: Receptor addressing in Ceptr has to take into account the fact that receptors exist inside of other receptors.  In other words, the network topology isn’t a two dimensional plane.  Also, as a programmer, it helps to remember that the “place” where programs execute is in the membrane of the receptor.  Thus you can visualize that any signals you generate are either internally bound, i.e. to the inside of the receptor, externally bound to a peer receptor inside the same parent, or bound to some receptor outside the parent.  This means that each receptor gets to define its own address space to manage the coherence of messaging inside itself, as well as what messages go in and out.  More on addressing in a future post…

Here’s a summary of the instructions from the Ceptr instruction set that have to do with signalling:

Instruction Parameters Notes
SAY to_address, key_by_aspect, over_carrier, message Reduces to the signal’s identifier
REQUEST Of_address, key_by_aspect, on_carrier, message, expect_response_on, until, *action If action provided  then request reduces to the request’s signal identifier.  if no callback, then blocks and reduces to the response’s message.
RESPOND respond_on_carrier, response_message Reduces to the response signal’s identifier
LISTEN at_aspect, for_carrier, match_pattern, *action, *with_params, *until_end_conditions If action provided, reduces to REDUCTION_ERROR_SYMBOL(NULL_SYMBOL) and adds an expectation with the pattern to the flux.  If no action, then builds a special END_CONDITION of COUNT=1 and blocks processing and then later reduces to the first signal that matches the pattern
CONVERSE scope, *end_conditions,*wait,*topic
Defaults: end_conditions:UNLIMITED

wait: FALSE

topic: none

Use end_conditions specify when the system will terminate the conversation.  Use wait to specify whether the CONVERSE instruction should block waiting for the conversation to complete. If you use asynchronous REQUEST or LISTEN instructions in the scope the conversation and you don’t set wait to TRUE you will want to use the THIS_SCOPE instruction to get the conversation identifier so you can call the COMPLETE instruction someplace later or the conversation will never get cleaned up.
COMPLETE with_something, *conversation_ident Cleans up a conversation and causes the CONVERSE instruction to reduce to value of with_something. If conversation_ident provided, completes that conversation, otherwise assumes the value of THIS_SCOPE.  If the specified conversation doesn’t exist (i.e. because it was cleaned up by its end conditions) COMPLETE will invoke the error handler.  Note that it’s possible that the CONVERSE instruction was already reduced, in which case the value of with_someting will get ignored.  If the COMPLETE instruction is not inside the CONVERSE scope, it will be reduced to the value of conversation_ident
THIS_SCOPE Reduces to the identifier of the current conversation. Executing THIS_SCOPE not somewhere inside a CONVERSE will invoke the error handler.
SELF_ADDR Reduces to the receptor address of the current receptor.


Musings on Coding in Ceptr: First Dev Post

This is the first in a series of posts I want to write about both building Ceptr and coding in Ceptr. In many of them I jump right into the middle of deep tech, so hold on to your horses, and enjoy!

I want to start out with a few key points of what Ceptr might look like to you from a programmer’s perspective:

  1. In Ceptr you should not think of yourself as programming a computer, but rather a network of simple computers some of which are embedded inside each-other. We call these simple computers receptors.
  2. The fundamental data unit in Ceptr is not a 32 or 64 bit word; it’s a semantic tree.  Now this may sound weird, but it creates an important mental shift.  Think of each receptor as a computer whose CPU operates by evaluating tree structures, not linear sequences of bits.  The “program counter” doesn’t move forward along a linear memory space, loading the next instruction into a register for evaluation but rather walks along a tree structured memory space recursively reducing branches down to their result.
  3. Not surprisingly, Ceptr programs are just another type of semantic tree.  This means that fundamentally the Ceptr programming environment is homeoiconic.
  4. Ceptr uses an expanded version of Regular Expressions for pattern matching, that we call SemTrex (SEMantic Tree Regular Expressions).  Regular Expressions are designed to scan a linear sequence of characters, but SemTrex expressions operate on trees and can match on semantics as well as value.
  5. Ceptr provides a native signaling system for you to send messages between these receptors.  Much of Ceptr programming is about expectations which trigger code when signals arrive, based both on explicit pattern matching on the content of the arriving signal (using SemTrex), and based on the entry path of the signal (kind of like known port numbers in TCP/IP). Thus, the Ceptr “instruction set” includes support for rich inter-receptor communication, including single message sending, request-response patterns, and persistent connection based “conversations,” and most importantly patterns for defining multi-receptor protocols.
  6. These protocols can be defined as abstractions, kind of like template classes.  Thus, lots of Ceptr programming involves thinking about what agents are doing in a given protocol, and what the interactions of that protocol are, and finally building more complicated protocols out of simpler ones. 

I know this is all abstract.  In future posts I’ll get into the nitty gritty with actual code examples.  Here’s a list of some the topics I hope to cover in more depth:

  • Semantic trees, semantic alternation and, especially, what it feels like to code when you can’t use data without thinking about what’s the semantic use of that data
  • Homoiconicity, and how in Ceptr, like in a few other programming languages, a program’s representation and structure match closely, making meta-programming much easier.
  • Templates, slots, grammars & parameters: the marriage between the known and the unknown, and the core of composability.
  • Transcoding: the semantic version of cast or type conversion.
  • Phenotype vs. Genotype, or as programmers would say: class/type/prototype vs. instance.
  • Coherence contexts: beyond namespaces.
  • Protocols as live, pluggable code rather than as specifications for coding.
  • Signaling between receptors: Carrier/Signal/Protocol i.e. our version of mulit-agent systems or actor model computation
  • Scaffolding systems into existence, or, as a Alan Perlis put it in his epigrams: “Everything should be built top-down, except the first time.”  and how this applies to Ceptr.

So if any of these sounds particularly interesting tell me in the comments, and I’ll push that higher in my priority list.

January 2016 Events in SFO/Bay Area

January is a HUGE month for us at the MetaCurrency Project as we approach the release of Ceptr, the next economy / next Internet tools we’ve been building these past years.

Come play and explore with us at one of our events in the San Francisco / Bay Area!

Jan 20 6pm – Thrivable Future Salon at IFTF in Palo Alto.

Jan 25 7pm – Ceptr Technology: Under the Hood at Impact HUB Oakland.

Jan 26-28 – Deep Wealth Design 3 Day workshop in Richmond. Register at

Jan 29 10am-5pm – Living Wisdom Labs / Land Weaving / Wealth Stewardship Circles (contact us for an invitation)


Distributed Receptor-Based Computing

[I shared this brief in preparation for the Rebooting the Web of Trust event in San Francisco November 3&4 2015.]

The MetaCurrency Project’s requirements for decentralization combined with a commitment to leveraging organizing principles of living systems led us to invest a lot of time in the past 5 years developing Ceptr. Ceptr is a rebuild of much of the computing stack optimized for decentralized sense-making, computation and composability. This means semantics baked into the lowest levels of memory storage, self-describing protocols which let anything talk with anything else, blockchain-like abilities for decentralized data storage and computation.

Our approach addresses many issues people are tackling in a variety of projects (web of trust, blockchain, semantic web, decentralized applications, federated identity, ease of interoperability, rapid code evolution with massive reuse, mesh networks, etc.) through a unified approach.  We know providing such a different approach creates barriers to adoption and understanding, but we also know that applications in Ceptr will be able to be developed faster, easier and should significantly outperform projects which have been cobbled together from more traditional tools.

  • There’s a lot of things we’ve worked out pretty well, like:
  • Lightweight virtual machines (receptors) for distributed applications/computing
  • Configurable byzantine fault tolerance options for distributed data storage with intrinsic data integrity in signed chains
  • Deeply integrated semantics: semantic memory, semantic stacks, and a generalized parsing and pattern matching system via our semantic tree regular expression engine
  • Built-in code repository for sharing vocabularies, code and protocols
  • Optimization for rapid evolution and easy deprecation (of code, vocabs, protocols, etc.)
  • Pluggable, self-describing protocols (Ceptr protocols are not textual descriptions for human protocol developers, but pluggable semantic code. A receptor can install a protocol to read a message after receiving the message it didn’t know how to read.)
  • Network-wide event subscription (through persistent “listeners”)
  • Massive interoperability, reusability, mashability, and composability
  • Ensuring fractal coherence (This needs a paper of its own to explain why it’s important.)

Some of Ceptr is operational, some is barely operational code-scaffolding, and some is yet to be built (especially these next topics). And we could use some support designing good implementation strategies for some of these things.

Hierarchal Deterministic Keys and Key Management: Since every agent that wants to communicate on the Ceptr Network gets its own address and keys for signing & encryption, we’d love ideas for making it easy for people to manage a lot of keys. Also, in many configurations we’d like to use hierarchal keys (e.g. a master receptor and with synched slave instances, an agent/person and the receptors they create (whether a standalone receptor or sharded peers).

Capabilities-Based Security: Any tips, tricks, techniques or warnings about using capabilities-based security tokens + authorized users (Ceptr messages are always signed/authenticated by source), building appropriate chains, enforcing at appropriate times/sequences.

Lightweight Signing & Multi-Party Signatures: Every network message in Ceptr is signed. We’d like to keep that as light as possible. We also would like build a few multi-party signature scenarios into our BFT algorithms.

Resource Balancing in Distributed Systems with Diverse Capacities: When you have a handful of receptor instances with different resources (CPU, GPU, bandwidth, storage, network centrality, etc.), are there existing tools algorithms for “load balancing” based on these capacity differentials?

Ceptr Identity and Web of Trust

Network Addresses – Network addresses in Ceptr are permanent IDs associated with the agent (person, program, or legal entity) communicating on the network. These UUIDs have a history and are accountable for their activities. Addresses are not temporary, reassigned, or segmented for routing.

We use a progressive trust approach, where people are able to self-assert into the network as an anonymous client. Later they may upgrade via a triangulation/vouching process to create a trusted peer address. When they register their address (from a sparse UUID space) on the network, they must publish a public key (to a sharded DHT), and may provide a public key-server address, revocation-server address, and identity server address. Identity information can be selectively shared via capacity-based tokens.

Nodes on the network can configure their own thresholds for serving or routing traffic (whether to service requests from anonymous clients, or addresses with a history of spamming – based on activity/reputation metrics built into the Ceptr routing protocols.

Since every Ceptr Network message is signed with the public key of the sender you can authenticate the message source. Confirming the sender’s identity can be done by means of their authorized identity service(s). We expect Ceptr to have a rich ecosystem of capabilities-based identity services from peers and trusted authorities, ranging from validating a pseudonym to confirming government issued passport/citizenship/voter registration or financial/credit reporting. Users can control access via capabilities tokens and manage rights and restrictions on shared information via smart contracts.

Reading List

People have been asking how they can find out more about Ceptr… What it is… How it works… etc.  There are a few layers of answers to this question. The first is that we need to publish more, but nonetheless, there’s some pretty substantial stuff for those who want to dig in.

And there’s still HOPE

Marc Clifton is a very prolific and thorough technical writer who has written about Ceptr & HOPE (Higher Order Programming Environment). A bunch of the design of HOPE is connected to his interactions with us around Ceptr, but is an environment he built on C# to play with. It has a number of differences, but is still a good way to see what Ceptr-like stuff looks like in action.  The following articles are listed newest first (If you want to be thorough, you can start at the bottom of the list and work your way up.) This may actually be the largest  bulk of documentation about Ceptr that exists so far, even though it’s not precisely about Ceptr.  Many of the articles have VIDEOS which really help see what it all looks like in action. Watch them!

  1. Strong Type Checking with Semantic Types
  2. Distributed Semantic Computing
  3. Introducing Semtrex (This one is important and has critical Ceptr code (Semtrex) compiled to integrate with his C# interface. Eric and Art are listed as authors, but Marc still did the bulk of the writing)
  4. The Kitchen Sink
  5. The Semantic Database In Action
  6. Semantic Database: Concept, Architecture and Implementation
  7. The Semantic Web and Natural Language Processing
  8. Hunt the Wumpus
  9. APOD Website Scraper, a HOPE demonstration
  10. HOPE – Higher Order Programming Environment

Semantic Trees

Ceptr is built out of Receptors — lightweight virtual machines which can be organized fractally. This means you can compose new communication and computing patterns out of receptors as coherent building blocks. But receptors themselves are built out of another coherent structure: semantic trees.

Similar to how our body is made out of cells, and the organization of cells is what makes different kinds of organisms. Cells, themselves, are built out of amino-acid complexes at a lower level.

We produced a couple short videos showing how semantic trees work for both DATA and PROCESS for our MIT-KIT Webinar about Ceptr. This blog post just focuses on the excerpt about Semantic trees for data structures and execution of code.

Continue reading


Announcing MIT / KIT Webinar — April 2, 2015

We’ve been asked to present our work on Ceptr via an MIT webinar as  part of their (Kerberos and Internet Trust) series highlighting notable new technologies.  You can find the event on the MIT web site, and participate. We’ll post a recording to this site afterward.

We’ve been building all kinds of low-level underlying tech in Ceptr, but it’s hard to make that kind of stuff visible to an end-user rather than a programmer. For this call, so that people can actually see what’s happening under the hood, we’ve created some video demos and some visualization interfaces to render the underlying semantic trees in a web-browser via JavaScript.

Here’s a preview of some of what we’ll be showing.

Continue reading

Building Meaning through Semantic Alternation

In designing Ceptr we’ve discovered a pattern in how systems build meaning. It’s likely that someone else has already written about this, but we haven’t found it, so we don’t know what anyone else calls it. We’re calling it Semantic Alternation.

Something has meaning or significance in a particular context. This object (a physical structure) is footstool when in front of the easy chair, a table when holding the chess board between two chairs, and when we’re low on seating for an event and put it at the end of the dining table, it’s a chair. (See more pictures below)

Meaning is not fixed and independent; it is fluid and bound to a context.

We build up meaning that way. For example, there are phones (sounds) we make when speaking which are the same, but based on the context of surrounding sounds appear to be a vowel (like “u” in “round” or “rule”) and in other contexts seem to be a consonant (like “w” in “way”). At the next level up, the word “rule” may be used as a noun (“the golden rule”) or a verb (“to rule with an iron fist”).  Sometimes phrases even have different meaning based on the surrounding phrases in a sentence. And a sentence could be interpreted differently based on its surrounding context of other sentences.

This process of assembling parts into a structure and giving meaning to those parts in the context of that assemblage is central to how we build meaning. So for example, an integer in this group of fields may be an age, and in another context may be the number of children you have. This is quite common, but we don’t seem to have a name for the fractal pattern of repeating this across multiple levels of meaning.

Let’s take a look at an example to make this clearer.  Suppose I have a couch-matching profile (our imaginary version of couch-surfing we often talk about in Ceptr discussions) with my home address which has been looked up to create a “Home_Location” in the form of a latitude and longitude.  You could picture it like this:

Home_Location is a name given to a structure called LatLong in the context of a couch-matching profile. The LatLong is built from assembling a float structure called Latitude and a float called Longitude (along with some definitions of constraints and processes for mapping 2D coordinates on a sphere).  The float is built by assembling an integer called Mantissa and an integer called Exponent (along with a process for using those parts to represent significant digits and raising them to a power).  The integer is built by assembling a sequence of 32 (or 64) things called bits together and additively interpreting each successive bit to be the next power of 2 (little-endian or big-endian).  The bits are built by assembling a set of transistors in a flip-flop circuit in the computer memory and “naming” one of the outputs (or output states) ON vs. OFF. And so on.

There is an alternating pattern of Structuring and Semantics (naming the elements assembled together in the context of that structure). Hence the name “Semantic Alternation.”

In Ceptr, this is how you define new types/structures and then name them so that they can be used for storing something.

These pictures illustrate the real world example of a single “structure” assembled into different semantic contexts as a “table,” “chair,” and “footstool.”

Notes for later idea development: Meaningfulness is constructed by alternating cycles of attributing meaning to an existing structure, then creating new assemblages of meaningful units (an new structure), then assigning meaning to the new assemblage/structure (semantics / naming), assembling anew based on the new meaning.  See Lat/Long example. Give biological examples, stereoscopy, auditory triangulation, rgb rods/cones, neural structures, cellular behavior, hormonal signals.