Menu
  Introduction How VSM works Summary Examples Discussion Graphs SciCura About

2.   VSM  (Visual Syntax Method)

(This is older text. The latest version is in the official publication, see vsmjs.github.io.)

i. Goal

With developing VSM, we aimed for a method that enables people to consider any unit of information, and reformulate it into a semantically precise, computable form that clarifies its full meaning and context.1 Please read the Introduction page if you still need to understand why this is important.
Or at least read the Blue Boxes with our key concepts: ‘Piece of Information‘, ‘Context details‘, ‘Computable Information‘.

See also the introduction’s Venn diagram.

VSM should be applicable to the widest range of use-cases, so it works for diverse, heterogeneous information. At the same time, VSM should be easily usable by many people with diverse expertise, which requires that it keeps the inherent complexity that real-world information may possess, manageable for humans. So, VSM enables people to manually create computable information, with high ease of use, and high expressive power.
VSM achieves this by drawing particular strengths from prominent ways to capture information: natural language, controlled languages, ontologies and RDF triples.1 • Controlled language: Wiki:Controlled_natural_language, Wiki:Formal_language;
• Natural language: e.g. English, Dutch;
RDF: Wiki:RDF, W3C, tutorial;
• Ontology: Wiki:Ontology_(information_science), Wiki:Open_Biomedical_Ontologies.

One can compare VSM with how a high-level programming language enables people to make a procedure or algorithm explicit, by encoding it in a way that both humans and computers can manage. Only now, VSM is a tool to make an idea, thought, or piece of information explicit, by encoding it with intuitive, elemental components, in a way that both humans and computers can readily manage.

ii. Design

VSM is designed as a procedure in combination with a user interface that enables people to construct computable information.

As a User-Interface (UI), VSM is an input-component that can be embedded in other software, e.g. a web page. It enables people to enter (i.e. make explicit to the computer) a possibly complex piece of information, one per input-component.

As a procedure, VSM is like a language. It has a simple, small set of rules for constructing information as a so-called VSM-sentence. This is a quasi-linear statement form2 This ‘statement’ form is inspired by controlled languages. But unlike them, VSM does not use their complicated rules on word order. Instead, VSM uses a simpler and more powerful way to clarify structure, as we will show. that is flexible and intuitive for humans to work with. Just as a natural language consist of words and a grammar, VSM has two building blocks too: VSM-terms and VSM-connectors. VSM‘s rules describe how these can be meaningfully combined into virtually limitless combinations, in order to semantically represent essentially any type of complex information.

iii. Current status

We have built a prototype VSM user-interface, for the purpose of demonstrating VSM‘s utility. It is embedded in the page with interactive examples. We have also been using it at our lab for a specific biocuration project, to test its use in real life. A full-fledged and well-designed, easily reusable web-component for curation based on VSM, however, needs a large programming effort. This is currently happening in the ‘vsmjs’ organization on GitHub.3 There we’re building several modules that culminate in the vsm-box web-component.

iv. Caveat

Even though VSM‘s design incorporates several ideas from existing technologies, we would like to alert the reader that it also includes significant differences. As such, it has been our repeated experience that experts with backgrounds in related areas may make initial assumptions that cause them to misunderstand key concepts of VSM. In what follows, we address such assumptions through extensive clarifications, but we like to advise people familiar with related technologies to be especially mindful of these intentional differences.

v. Approach of this text

VSM is a practical way for constructing meaning that is inspired by human thinking; i.e. it combines a focus on usability with a new semantic framework. For most of VSM‘s intended audience, the prime interest will likely be its usability aspect. Therefore we have written this first paper on VSM from that perspective. Each of the semantic principles that are required to make VSM work, are introduced starting from the perspective of usability, or inspired by how humans/curators like to think. Future papers that further explore VSM‘s semantics, technical implementations of its user interface, etc., will be published in other specialized publications.

To get a quick overview, you can just read the ‘Short Story’ sections below. These introduce the ideas in informal, easily consumable language, for casual readers and new users of VSM. But please keep critical comments or questions until after you read the detailed Full Stories too, which focus on detailed justifications of design choices.

While VSM was designed to solve a problem in the biosciences, we believe that VSM is much more generally applicable. This is why this site includes many non-biological examples as well.

1. VSM-terms

VSM-terms are like the words of a language. But they are clearer, not ambiguous.

The Short Story 1/5

This is a screenshot of the user interface: an input component called a VSM-box :

When you type a word in it, you get an autocomplete list. It offers terms from one or more pre-loaded dictionaries, i.e. lists of words for which people already agreed on a precise definition. – This example just uses a toy-dictionary called Persons:4 To use a VSM-box in a real application, it must be linked to some dictionary/ies, e.g. some of the many available biological Ontologies (like on BioPortal) or Controlled Vocabularies (CV).

Also, these CVs are often incomplete works-in-progress, so you may need to create extra terms, or even new dictionaries, for your research field.
It is an essential step anyway for communicating with colleagues, to agree on what words mean what! Building an agreed-upon dictionary just formalizes this process and creates a common work of reference.

Then by selecting an item from the list, you enter a VSM-term: …

… and the computer knows exactly which John you talk about, because the blue term is linked to an ID (an identifier, a representative long number) that the computer can use for computation.

To check a term’s definition and ID, you can mouse-hover or tap it:

Next, you can add more VSM-terms. A cool thing is that this works like Facebook search: if you have two friends with the same name, it’s no problem: the autocomplete shows extra information (photo, city) to help you choose the right one. – Likewise, a VSM UI helps you disambiguate between terms from multiple dictionaries. Here we select what kind of ‘chicken’ we mean:

and what kind of ‘with’ we mean:

In the end we create a human-readable sequence of words, with 5 terms that are each linked to an unambiguous ID:

Note: each blue VSM-term is always a specific ‘thing’. So it’s not Johns in general, but a particular John, in a particular situation. This is because we’re capturing information embedded in a context – e.g. the specific observation that some entity behaved in some way in some particular situation / experiment.

Next, we’ll have to clarify to the computer how these 5 terms hang together, i.e. the syntax of the sentence. – But that’s for after the Full Story on VSM-terms, if you like :

(or jump to Short Story 2 →)

The Full Story  (formal story)

A VSM-box, the user-interface of VSM, supports the entering of terms by an autocomplete function, by way of a drop-down-list selection panel that lets users choose specific terms (Fig. 1a). Each term represents a single concept2 Wikipedia:Concept , or a ‘unit of meaning’. A term can consist of one or more words, for instance “homo sapiens”, “is located in”, or “binds to”. A term is in fact a human-friendly representation for a unique identifier (ID): a number or code that computers work with. Each shown term and its linked ID come from one of many lists, such as biological ontologies or controlled vocabularies (e.g. Gene Ontology), lists of genes (e.g. from NCBI Gene) or even curator-built lists.3http://www.geneontology.org/
Gene Ontology Consortium: going forward, Nucleic Acids Res, 2015.
https://www.ncbi.nlm.nih.gov/gene/

In order to embed a VSM-box component into other software, one must link it to one or more of these term-lists; this gives the user access to available terminology and IDs.

If a term occurs in multiple lists, then the autocomplete helps the user to disambiguate between the available concepts/IDs. For example for a mouse and a human gene with the same name, the selection panel would also display the species of each. Once selected, only the term remains visible to the user while the ID is stored in the background, by the curation system. The UI can still show full information when mouse-hovering a term, or could display a clarifying icon or other customizations.5 The VSM-box UI that we use in the Short Story and other examples, is just a prototype and demo implementation.
In the Full Story sections, we also discuss possible alternatives or extensions. Some of these are already implemented in the ‘vsmjs’ organization on GitHub, and some are plans or ideas for implementation.

In addition to supporting the above terms with multiple meanings, or homonyms, a VSM UI also supports synonyms: multiple terms may be used to represent the same concept. This enhances user-friendliness in several ways.4 Jointly creating digital abstracts: dealing with synonymy and polysemy, BMC Res Notes, 2012. First, it enables VSM-sentences to correspond better to the text of the publication that is curated, because gene and ontology terms often have many synonyms that are frequently used in the biological literature. When curating, synonyms (linked to the correct ID) may make it easier to mentally map a long VSM-sentence onto terminology used in the publication, especially for subsequent curators who double-check the VSM-sentence. One could still add a UI-option to display terms by their official name if wished.
Second, some official terms may be excessively long (e.g. in GO5 Gene Ontology (geneontology.org) is a dictionary about gene functionality etc. – Example of a long term: “positive regulation of transcription from RNA polymerase II promoter“, and there are even longer ones.
Some terms are that long because GO classifies biological concepts in an extensive tree structure of ever-more-specific terminology.
), and curators may prefer to use an available abbreviation to keep produced statements readable.
And third, it enables to make VSM-sentences look more like natural language sentences. For example, consider a single relation concept that has two synonyms: the verb “is-located-in” and the preposition “in”, so that both represent the same ID. Then the VSM-sentence “A activates B in C” is most natural to read when one can use the preposition “in” instead of the verb form (Fig. 1b vs 1c).

In summary: VSM‘s support for synonyms and homonyms enables a curator to write easily readable information, with reduced basis for confusion; and once specified, each term in a statement represents one specific ID.

Figure 1. Entering VSM terms. (a) A curator enters a sequence of terms, each linked to an identifier that represents a particular ‘concept’. (b) The full sequence of terms. This is just as easily readable as a natural language sentence, among others because the user was able to use the preposition “in” for the relation “is located in”, both of which would here represent the same identifier. (c) This sequence represent the same five IDs but illustrates a less ‘natural’ choice of terms.

Note: in text that follows, we will enclose example VSM-sentences and VSM-terms in double quotes, and replace spaces in multi-word terms by dashes, like in this example VSM-sentence: “A is-located-in C”. When we want to accentuate natural language phrases, we use ‘single quotes’.


2. VSM-connectors

VSM-connectors correspond to the grammar or syntax of a language. But they are simpler, yet just about as powerful. There are three main types, yet you can achieve plenty with just the first one:

2.1. Indicating structure with the trident

The Short Story 2/5

Next, how can we make a computer understand the syntax of the above 5-term sentence, like we do?

Simple! You organize terms into ‘triple‘ units. Each triple is a trio of terms that relate to each other as a Subject, Relation, and Object. And you indicate a triple by clicking above the three terms, in said order. By doing so you add a trident connector in the VSM-box, with a distinctly drawn ‘leg’ based on the role you gave to each term.

The first trident / triple is obvious :

Next, we can easily repeat this, once we realize that the preposition “with” is just the same as a relation, being: “using”. (See: 6 • I like to say that they are different ‘avatars’: i.e. manifestations, appearances, different lexical forms that represents the same thing. (Much more about that in The Full Story).

VSM-terms for “with” / “using” / “uses” / … would in fact all have the same ID.
So it makes no difference for a computer’s understanding which form you choose. Still, by choosing the form that we’d expect to see in e.g. English, we can create VSM-sentences that are much easier to read for humans.
).  Realizing that, we can spot the triple: “the eating (of chicken, by John)[=subject]   “happens using” [=relation]   “a fork” [=object].   So we add a second trident:

All the terms are connected now. – You can hover over any connector to highlight it. This is useful in longer sentences, to easily see which terms it connects:


Try it yourself! Add those connectors in this interactive VSM-box :

Easy!

Btw, here is a non-example: an imaginary case where a chicken holding a fork is being eaten:

It’s interactive too. So unless the chicken actually has the fork, you may want to correct it : just remove the second trident (hover it and click the top-right ‘x’) and re-add a correct one.


In case you wonder why the (correct) second trident connects not to John, but to eats : If you would connect it to John, then it would only mean that John is using the fork for something, but not necessarily that he’s eating chicken with it: then he could e.g. be eating chicken with his hands, and use the fork to hold a tomato! – So it’s the eating (by John), done by-use-of/with a fork.7 A note to impatient knowledge-representation experts: this intuitive description will become elegantly rigorous during the Full Story sections.

Note #2: The John-holds-tomato case is shown as a VSM-sentence later, on the Examples page.


The only bit of thinking you need to do, is to make sure that you (re)formulate your information as a series of VSM-terms, in such a way that you can identify triples among them, with VSM-connectors. And that will be quite easy for more complex cases too. – But that’s for after the Full Story :

(or jump to Short Story 3 →)

The Full Story

Once a sequence of VSM-terms is entered, they may look like a word-disambiguated natural language statement. But such a sentence is typically not copied verbatim from a paper (read also 8 In fact, we are able to make it look a lot like natural language, because we are allowed to make VSM-terms appear as conjugated verbs, nouns, prepositions, etc.
That ensures that we can keep complex information (i.e. longer sentences) easily readable for us, humans, too!
A computer doesn’t understand (and thus doesn’t use/need) conjugations etc., so we need to clarify the structure of a sentence in another (preferably easy!) way.
)  :

📖 The First Founding Principle of VSM.  A VSM-sentence is a curator’s interpretation, or structured rephrasing, of some information’s essence. A curator needs to reflect this understanding by entering terms in such a way that a syntax, i.e. a structure of how VSM-terms link together to create a composed meaning, can be specified with VSM‘s syntax connectors or VSM-connectors9 VSM-connectors show the conceptual structure that exists in the information explicitly. While VSM-terms’ text can help in showing a readable formulation, the VSM-terms’ IDs and the VSM-connectors capture an intuitive, underlying conceptualization. . These VSM-connectors are added through the user interface, and they are available in three main types: the trident (including its three bident subtypes), the list-connector, and the co-reference.

In what follows, we explain how VSM-connectors are used to syntactically specify valid sets of terms, from simple statements to statements with complex hierarchical structures that convey rich contextual information.

We start with the trident. A curator adds structure to a statement by identifying triples, or subunits consisting of three terms. These triples look similar to RDF-triples initially.6RDF in general: Wiki:RDF, W3C, tutorial.
• E.g. triples represented in RDF Turtle.
The UI enables a user to specify triples by linking terms with a trident connector. A trident is added via three consecutive clicks, one above each term according to the role it has in the triple, in the order of: subject, relation, and direct object. The corresponding parts of the trident, which we call its legs, are: a simple line, a line decorated with a filled up-pointing triangle, and an arrow, respectively. Fig. 2a shows an example of a one-triple statement specified with a trident, which expresses that a “John” “eats” a “chicken”.

Figure 2. Identifying triples with tridents. (a-b) The user adds VSM trident connectors that indicate triplets among the terms and define the intended syntax. The resulting composite semantics is explained in the main text. (b-d) show how meaning changes when attaching the second connector’s subject leg to a different term.

Next we show by example how to use multiple tridents, after which we can explain more founding principles. Fig. 2b shows the addition of more terms and context to the first statement, via a second trident. This specifies a second triple subunit that expresses: this “eating” happens “with” (or “using”), a “fork”. We represented this second triple’s relation by a preposition-type synonym, for natural-language-like readability. As contrast, Fig. 2c shows an alternative version in which a chicken is using a fork instead. This illustrates that attaching a connector leg to a different term fundamentally changes the meaning. Similarly, Fig. 2d states that John uses a fork, although it does not say that he eats chicken with it (he could be eating chicken with his hand, while using the fork for another purpose). Only by connecting the second trident to the “eating” term does one capture the correct meaning: ‘the eating, by John, is by use of a fork’.

As shown, any change in connecting a VSM connector leg can profoundly affect the meaning. This is further elaborated in:

📖 The Second Founding Principle of VSM.  A connector leg always attaches to one particular term, and never to an entire other triple. This is even true when a leg attaches to a term that is used as a relation in another triple; and this makes VSM different from RDF.10 In RDF you need to make a ‘reification’ construct in order to point to a relation. I.e. you need to create an additional ‘object’ that represents that ‘relation’ (or in fact, its entire embedding triple), and that you can then point to.
Because RDF treats ‘relations’ as fundamentally different things than ‘objects’.
This difference is intentional, and lies at the basis of making the syntax-definition process simple, flexible, scalable, and intuitive, following the thinking of the curator. It is rooted in VSM‘s specific approach to treat all terms equally: both so-called entities and relations are first-class citizens. In VSM, any term is only viewed as a relation versus an entity, in the context of a particular trident that connects to it. Because of that, any term that functions as a relation under one trident can function as a subject or object (i.e. an entity) when viewed under another connecting trident.

For instance in Fig. 1b, the verb term “activates” can (and should) equally be thought of as a noun: ‘activation’. In Fig. 2b, the verb term “eats” is in meaning identical to its noun-form ‘the eating’. If both lexical alternatives are available as terms in a dictionary for VSM, they should carry the same ID.

In conclusion: every VSM-term should primarily be thought of as an entity.

(See also this note: 11 This Principle is crucial, so let’s repeat it in other words:
Only when some trident attaches to a VSM-term with its relation-leg, only then is the term made to be seen as a relation. And then only so, under that one trident, i.e. in that one triple, locally.

So anything that e.g. RDF views as a relation, VSM views only locally as a relation (under a particular connector) and views it elsewhere as an entity / as ‘reified’ / ‘as a noun’.
(The RDF phrase “re-ified” comes from Latin, meaning ~‘thing-ified’ or ‘made into an object’).

This was an essential and necessary step for designing VSM‘s expressivity and simplicity in what will follow: think of all VSM-terms as nouns / ‘thingified concepts’ / something that you can mentally point to in the same way as one points to any other VSM-term.
<– Really, read it).

2.2. Scaling up to more complex information

The Short Story :  Just keep adding tridents 3/5

1)  Mechanism

Here is a biological example. It says: ‘some molecule A activates a molecule B in a location C’ :

Hereby “in” represents the same ID as “is-located-in”, but it’s easier to read like that in a sentence.
The two tridents express, resp.:

Did you see what happened here? : The first trident turned “activates” into a more specific term “the activation of B by A”  (i.e. it is ‘placed in the context of A and B ‘).  And that enabled the second trident to connect to that particular’ized concept, and specify it further: “that-A-B-activation  is-located-in  C”.

Now, this ‘context sharing’ happens for all connected terms. For example the first term now represents ‘the A that activates B, in C’, and the last term ‘the C in which…’, etc.
So every time you add a connector,  each of the connected terms  receives extra context  from all others.12 Some more ways to say this:  ‘each term is placed in the context of / accumulates extra context-meaning from all others’, or:  ‘all connected VSM-terms enrich each other’s meaning’.
This context meaning is shared with indirectly connected VSM-terms too.

In the example, even the term “in” now specifically represents ‘the being located in C of the A-B-activation’. This is really useful. Because then you can say more about that concept again. For example: ‘that being-located in…’ is only “probable”. Then we can form the sentence: “A activates B probably in C”. (To add adverbs, see next Short Story).
And that turns each VSM-term into a specific thing that you can refer to again, with a new connector, to add more and more context.

2)  Teaser

Let’s extend the above sentence. Let’s add an extra context detail, saying that our “A” is also ‘bound to a molecule D’. In a VSM-box we can enter two more terms at the end, drag them in place for human readability, and connect them to “A”:

That last step shows that new connectors get auto-sorted for visual comfort, after you add them on top.

Now, using the same principle, we can keep adding nested context with VSM, just like we can in natural language.

3)  Showcase

So let’s build an example from a real scientific paper.7 Dai 2007: ‘A WUSCHEL-LIKE HOMEOBOX Gene Represses a YABBY Gene Expression Required for Rice Leaf Development’.  We will spell out each added trident:

And we got an easy to read sentence, an intuitive structure, and a clear, computable piece of information!  – So VSM enables us to easily build complex statements, by reusing the same building blocks.13Easy to read: relative to the complexity of the information. – And of course (just like natural language) relative to the reader’s knowledge of the research field.

To quote one of our curators:
“From a user’s side, I don’t think that complexity is difficult to handle at all.
In fact, I’m excited about all the complexity we can now handle in such an elegant manner.
It allows us to focus on the biology; no more need to worry about the entry format so much.”

Notes: 1) This is not a sentence copied verbatim from text. It is reformulated information, boiled down to its essence by writing it with VSM.  – 2) The connectors are manually added.14 There was once this guy at a conference who mistakenly thought that the connectors were a ‘parse-tree‘ (as in: generated by a text-mining algorithm). Because they look a bit like it. – He later admitted he had been answering an email during my presentation, though. (I thank him for immunizing us against one more possible misunderstanding though). Though it should be possible to create computer-assistance.

(you could jump to Short Story 4 →)

The Full Story

📖 The Third Founding Principle of VSM.  In order to scale this method up to statements with many more terms, one must keep in mind another key principle, which will be crucial for correctly creating and interpreting the meaning of a VSM-sentence. It defines: by adding a trident, each of its three connected terms receives extra ‘context’ or meaning from the other two terms. In other words: each term becomes a more specific instance, being further specified by all the VSM-terms it is connected to, directly and also indirectly. For example in Fig. 2a, the first trident makes “eats” become ‘the eating, of a chicken, by John’; likewise “chicken” becomes ‘the chicken eaten by John’; and likewise also “John” becomes a more specific concept.

When one connects a second trident to any of these three terms, one in fact refers to that term’s full, context-enriched meaning, as created by its connection under the first trident. So, with the second trident in Fig. 2b, one expresses: “the-eating-of-chicken-by-John happens-with fork”. And by induction, all five of its terms now carry an enriched meaning, accumulated from the other four. Next, each term is individually referable again, in the same way. Extending this way, one can keep adding further context to every single term, recursively.

For instance to express ‘John eats a chicken, occasionally using a fork’, one would add two terms and connect a new trident to “using”, creating a new triple that states: ‘the use of fork (for eating of chicken by John), happens occasionally’.  (See Bidents (section 2.3) for how to intuitively add adverbs).

It is essential to realize that a trident in itself carries no meaning. This should be evident from the fact that one can never connect to a trident, but only to individual terms. The reality is that VSM-connectors only act as operators on individual VSM-terms’ meanings. They specify or narrow down what a term could possibly mean. For example in Fig. 2a, “eating” is not specified to be done in any particular way – that context is not given, so it could be with hands, a stick, fork and knife etc. But in Fig. 2b, the meaning is narrowed down by making explicit what is used for John’s action of eating.

(Additional perspective can be seen in this footnote:15 While Principle 2 defines the meaning of individual VSM-terms, Principle 3 defines how the meaning of each of them changes by attaching one, or multiple connectors to them.
At this point, Principle 3 may be intuitive or even seem trivial for simple cases as in Fig 2, but it will be a crucial insight for working with more advanced cases that need the coreference connector (sections 2.5 and 2.6).
, and in this footnote on ‘context’:16 In other words, ‘adding more context‘ means: ‘further narrowing down the meaning of a concept, as to what range of possible meanings it may represent‘ (making it more precise), or ‘removing some ambiguities’, or ‘further eliminating unknowns’.
E.g. in just “John eats chicken”, “eats” could happen in any way. But after we add the “eats with fork” trident, we narrow down “eats”‘s meaning by anchoring down at least one more aspect of how it happens.
(Meanwhile, other unknowns that could be specified about it, still remain: where he eats, together with who, how quickly, etc., but that would lead us to Principle 4 on a further page already).
).


Fig. 3a-c show longer VSM-sentences created in this way, representing information curated from scientific publications. For example in Fig. 3a, the second trident triple from the left expresses that “the-being-twisted-of-leaf-lamina pertains-to Oryza-that-underexpresses-YAB3-by-RNA-interference”. Here, the term “in” will have been linked by the curator to the concept “pertaining-to” rather than “is-located-in”. Each term represents an identifier from a controlled vocabulary, e.g. “leaf lamina” is provided by PO (Plant Ontology) and “twisted” by PATO (Phenotype And Trait Ontology).

We want to emphasize that any additional biological context, like the details captured here, helps to increase the quality and value of the information. Added detail could for instance resolve two seemingly contradictory ‘bare triple’ statements through reconciliatory contextual details.17 E.g. both triple statements “cat is alive” and “cat is dead” can be true. But they are true in their own specific context. We can make this context explicit by expanding the statements to e.g. “cat is alive in 2020″ and “cat is dead in 2090″. Added detail is also vital when the collected information needs to be filtered down, for instance based on various quality measures that correspond to different use-cases.

The vertical stacking order in which connectors are added or displayed is irrelevant for the connection structure. Still, some orderings may appear more intuitive than others, for example based on which subunit one thinks of as central in the statement (see ‘Head’ later). Various visual sort algorithms may be developed for this UI aspect. In our prototype, we implemented one that automatically rearranges the stacking of connectors as shown in e.g. Fig. 3, no matter in what order they were added.

Note that although a VSM-sentence can be mapped to RDF10 See this Wikipedia page, or the Resource Description Framework project site. , either directly when two non-relation legs connect to a term, or via RDF-reification when a non-relation leg attaches to a term also used as a relation, such RDF graph renditions quickly become unintelligible to a curator (see later on the Discussion page for a full example).

Figure 3. VSM-sentences based on scientific papers. The VSM-sentences are structured, reformulated versions of information from life science publications: (a-b) from11 Dai 2007: ‘A WUSCHEL-LIKE HOMEOBOX Gene Represses a YABBY Gene Expression Required for Rice Leaf Development’. , (c) from12 Jiang 1999: ‘Multistep regulation of DNA replication by Cdk phosphorylation of HsCdc6′. . Some of their terms come from ontologies, some others are yet to be added to official controlled vocabularies, especially relation terms.


A consequence of Principle 3’s definition of VSM‘s semantics is that tridents must never create loops. A loop exists when one can follow a path of connections from one term that leads back to that term.

Consider for instance a VSM-sentence that would express “John eating apple calls Eve buying apple”. The trident that connects “Eve buying apple” must connect to a second “apple” VSM-term (Fig. 4b). It must not ‘reuse’ the first “apple” that would already be connected in “John eating apple” (Fig. 4a). Because if it did, it would create a semantic interpretation problem: a VSM-sentence that is read as ‘John, eating an apple, → that is being bought by Eve, who is called by John, eating an apple, → …’. The sentence would have to be read like that, because every connection that a VSM-term has, must be followed when interpreting the meaning.

So with a connection loop, any interpretation would get stuck in an infinite semantic loop. Therefore, with tridents (and bidents and list-connectors, see later), one must only create hierarchical structures.

This may also be clear intuitively, because two distinct apples are at play here, so two “apple” VSM term concepts are needed.

18 In other words, two distinct instances of “apple” are needed here. Each instance would be stored with its own (instance-) identifier. This, however, is a more technical aspect of VSM that is covered by VSM-Graphs, see later. (Note that if the sentence would mention one and the same apple twice, then a coreference connector would be needed; see later).

Figure 4. Tridents must never create semantic loops. (a) shows an incorrectly constructed VSM-sentence. Here, the interpretation of meaning, performed by following connectors that further specify terms’ meanings, would result in an infinite loop, as described in the main text. (b) is the correct version that can be properly interpreted, and captures what was intended.

2.3. A variation on the trident: bidents

The Short Story 4/5

A phrase like ‘white mouse’ is not a triplet of terms. So how do we write this with VSM?

Simple. Just read the phrase silently as: it’s a “mouse [being] white”. Meanwhile you add a trident, from “mouse” as a subject, to “white” as an object, but you skip the implicit relation‘s leg: just click twice over “white”.19 You could also read it as
“mouse [specified-to-be] white”, or as
“mouse [has-color] white”.
And the computer would understand it like that too, because it should already know that “white” is a color, and so it could infer the more specific “[has‑color]” if needed.
The connector now has only two legs, so we call it a bident :

Such ‘attribute’ connections keep VSM-sentences easy to read and write. And that is a main goal in VSM‘s design. Still, the Full Story discusses some limitations to ensure computer-understanding.


This 1-minute VSM-box demo Youtube video illustrates what we’ve learned so far.

From here on we’ll explain the rest of VSM as Full Stories, mainly.  A preview:

(you could skip past all that, and jump to Short Story 5 →)

The Full Story

Some information may feel unnatural to express with triples. For instance, some verbs have no object, as in “plant flowers”. Although this can be captured with a triple (as RDF would require) like “plant belongs-to class-of-flowering-things”, the task of stretching and reformulating such information to fit a triple structure is a burden that is best not imposed on the curator. VSM is focused on usability for the curator, and therefore the VSM UI supports tridents where one leg can be left away: bidents. In the example above, no object needs to be defined and the object leg is left out (Fig. 5a). A user can create this first type of bident just like a trident, but with the third mouse click outside the VSM component, or by pressing Escape instead of the third mouse click.

Bidents also make VSM‘s knowledge representation structurally consistent. For instance, compare the statements “X stimulates: A activates B”, and “X stimulates activation-of B”, where the latter uses a second type of bident, one that omits the subject leg (Fig. 5b-c). Apart from using the appropriate synonym for the “activation” relation (with same ID), the principal difference is that the activator is unspecified in the latter case. Appropriately, the structure of both statements is similar, thanks to the bident. This consistency would not be present when using a second triple “activation having-object B” with an artificial relation inserted.20 Note for experts:
One could also enter “activation  of  B”, by using an explicit VSM-term “of“, which would have the meta-meaning ‘has object’.

Note #2: please do not misunderstand and think that ‘VSM knows’ if two sentences (two graph structures) would represent a same, rephrased meaning. VSM does not ‘know’. VSM is just a representation form, to be used by people and algorithms. – Just like ‘English does not know’ what you mean. Your brain does, and it uses English to represent information at some level.
Note that this bident is also useful for capturing passive constructs where a subject is irrelevant, as could be the case in e.g. “X accelerates destruction of B” or “to write(=writing of) papers is rewarding”. One starts this bident by two clicks above the relation.

A third type of bident omits the relation leg. Again this helps usability for the curator, in this case by allowing statements to better reflect a basic structure also used in natural language: the attribute. Attributes include adjectives, adverbs, and numbers. For example, a statement about a ‘white mouse’ would require a “mouse having-color white” when using only the triple structure. However, if the term “white” is already classified as ‘a color’ in its vocabulary or ontology, then making curators explicitly insert the relation “having-color” would be a waste of effort, since it can be inferred automatically. Then a bident structure can be used, as in Fig. 5d-e. One adds it with two clicks above the object term.

In order to make the semantics of relation-omitting bidents clear:
• In general, any category of terms that can be used as attributes can be associated with its own default, implicit attribute relation. For example: numbers (“.. [has-count] 5″), measures (“.. [has-size] big”), or posttranslational modifications (“.. [has-state] phosphorylated”); see also Fig. 5f-g.
• In other cases, it could be associated with the attribute’s linked concept as well. For example: “<tree‑x> leaf” would mean “leaf [from-plant-of-species] <tree‑x>”, e.g. “oak leaf”. Or “<gene‑x> expression” would mean “expression [of] <gene‑x>”, e.g. “Cas9 expression”.
• However, for attribute constructions in natural language that use an implicit relation that is not yet automatically inferable, one must use either an explicit triple or a single term. For example, ‘actin filament’ should then be represented as “filament composed-of actin” or as “actin-filament”.

The above demonstrates that knowledge representation in VSM more closely follows the thinking of the curator, or the conceptualization in te human mind, and that is what makes it more intuitive.

Figure 5. Bidents. The user can add a trident that misses any one of its three legs, i.e. a bident, for ease of use as explained in the main text. The bident in (a) accomodates a verb without object. The bident in (c) represents “activation-of B” with an unknown subject, which is structurally similar to case (b) where a subject “A” is known. The bident in (d) frees the curator from the obligation to explicitly define a relation that could be computationally inferred, as in (e). (f-g) show other attributes with an implicit, inferable relation: “has-precision”, “has-count”, and “has-concentration”.21 This figure also shows that numbers can be represented with VSM-terms. This makes sense because one “5″ can be conceptually different from another “5″; because just like other VSM-terms, a number‑concept can be embedded in a particular context. For example, one could be an “approximately  5″, another a “5  ±  2″, and another an “at-least  5″.
For this, the autocomplete UI needs to support the entry of numbers, by immediately creating a new VSM-term for any of them as needed, after the user presses Enter.
(This was not yet implemented in the prototype used on these web pages, but it is implemented now in the vsm-dictionary module (see its spec), which supports ‘vsm-box’).
Bidents are drawn with a tiny indicator for the omitted leg, to remind the user of their semantic equivalence with tridents.

2.4. Group types that combine items : the list‑connector

Consider a statement like ‘A, B, and C react to D’. In order to build this with triples only, as in RDF, one would first make an “A and B”, then connect their “and” to “and C”, and then connect this second “and” to “react-to D”. However, it may be unknown in which order reactants combine. This construct with triples is artificial, as it may suggest that A and B combine first and then together combine with C. In addition, the longer the list that needs to be made, the more burdensome all the artificial triples would become to the curator.

With VSM‘s list-connector, one can express that any number of list-item terms all come together as one group, in a way specified by the meaning of a list-relation term. For example, Fig. 6a states that reactants all together react to D, as a group with unspecified order. The fact that group order is unspecified, would be embedded in the meaning/ID of the chosen list-relation “and” (or: “and-(as-unordered-set)”). The curator is free to choose other list-relations as well: for instance a term “and-(as-ordered-list)”; or “sum‑of”; or “either‑or”, which is used in Fig. 6b to state that ‘either A, B, or C bind to D’. This term “either‑or” is also an example of a single conceptual component that is formulated with two separated parts in natural language, and that must be captured with a single term in VSM.

The position of a list-relation among its list-items has no semantic importance, although some placing can make a sentence easier to read; like putting “and” second-to-last in Fig. 6a. Just like with tridents, only the resulting connection- or graph-structure is important. There is only one exception to this rule: the order in which list-items are placed is important when the list-relation attaches meaning to the ordering has meaning, as with a “and-(as-ordered-list)”.

As a semantic operator, a list-connector turns its list-relation VSM-term into the specific concept/idea of a group of items combined in the way defined by this list-relation. Similar to how tridents operate, the list-connectors in Fig. 6a-b contextualize their list-relation term into a specific “the-A-B-C-group”, and (for lack of a less artificial way of phrasing) “the-eitherOr-union-of-A-B-C”, respectively. Then once again, another connector can link to this term as it represents the specific, whole group.

Figure 6. List-connectors. In (a), the list-connector assigns “and” as a list-relation that groups a number of list-elements. The connected list-relation “and” represents the combination of the list-elements ‘as a whole’, and as such can represent the left side of a reaction. Other terms can be used as a list-relation. In (b), the list-connector makes the VSM-term “either-or” represent the phrase ‘either A, B, or C’.

In Fig. 6 we show a UI that distinguishes list-connectors by drawing them with a double ‘backbone’, a leg ending in a filled square that connects the list-relation, and undecorated legs that connect list-items. The UI should support switching between trident and list-connector creation modes, for instance by holding the Shift button down while first clicking above the list-relation, and next above all list-items. If order is important for the list-relation, then it is not the order of clicking above terms, but the order of terms as they appear in the VSM-sentence that counts.

2.5. Referring and further specifying: the co‑reference

The coreference connector provides the extra functionality needed for both common and advanced use cases. Its first, basic function is to let users build semantically correct, easily readable sentences with a term that refers back to another term; for instance an “it” as in Fig. 7. We call this “it” the child term. The child term is a placeholder that receives meaning from its parent term: the term it refers to. A child term can have any label (“that”, “them”, etc.), and represents the same ‘thing’ as its parent. So for example in Fig. 7a, “it” refers to the same “apple” as its parent term; so this contrasts with the earlier Fig. 4b, where two distinct “apple“s were mentioned.  – Note that the no-loops rule (that followed from Principle 3) does not apply to coreferences, as will become clear later.  – In Fig. 7, the UI draws the coreference in dashed lines, with an undecorated leg connecting to the child, and an open triangle arrow connecting to the parent; and the child term is surrounded with dashes. Users add a coreference by first Ctrl+clicking above the child term, and next above the parent term.

The second function of the coreference is that it enables us to think about two distinct concepts. Although the parent and child term refer to one and the same ‘thing’ in the real world, they refer to that thing as it occurs in two different situations, or perspectives, or contexts.

Consider for example “Bob activates device in evening causes it beeps in morning” (Fig. 7c). The meaning embedded into the parent concept “device” is that it is activated by someone, but should not be that it is beeping yet. Therefore (and because of no-loops), we should not connect “device” directly to “beeps” with a trident. Instead, we create a child concept “it”, which inherits the context of ‘having been activated in the evening’ from its parent, and in addition receives the extra context of “beeping in morning” from its own connection environment. These two concepts can then be separately enriched with more context, or referred to later on, e.g. “Jane charges it (#1)” and “Bob silences it (#2)”. – A biological example could involve a molecule being tagged in one context, e.g. in some cell cycle phase or location, which causes it to form a bond in a next/further context – or see Fig. 7b.

Figure 7. The co-reference connector. (a,b,c) The coreference has two functions. First, it allows to build VSM-sentences in which a same concept is referred two twice (here: same apple/protein/device), that has no loops (created by tridents/bidents/lists), and that is natural to read (via a placeholder with label like “it”). Second, it enables one to work with two distinct concepts, each at a different stage of specification: in (c) the first “device” is not yet beeping, while the second occurrence of the device, represented by the child-term “it”, is beeping. A curator could still add further specifics about the device, in both of its contexts. Similarly in (a), the parent-term “apple” is not yet described as being eaten by Eve, and in (b), the first occurrence of “protein B” is not specified to be degrading.

Now we can clarify why the no-loops rule, which followed from Principle 3, only applies to tridents, bidents, and lists-connectors, but not to coreferences. Unlike the other connectors, a coreference does not enrich the meaning of the parent term, and so it should not be followed when interpreting the parent’s meaning. As a semantic operator, the coreference works only on the child concept; but also here it does not enrich an existing concept. In fact, it creates a new copy of the parent’s context-embedded meaning and places that at the position of the child, where it can be further contextualized.

The two distinct concepts (parent and child) are not particularly about past versus present, but about how much context is present in each part of the discourse. Hereby it is helpful to think of the coreference as a unidirectional ‘wind arrow’ that points to where context is ‘flowing from’, as in ‘inherited from’. This notion is crucial to understand the following issue: the case where (trident/bident/list-)connector attachments are non-interchangeable, as explained in the next section.

2.6. Non-interchangeable connections   (advanced, but nice topic)

When connecting multiple tridents or bidents to a term, we use one assumption: that the order of connecting them does not matter. This is usually true. In natural language one can test this by observing that the meaning of some statement does not change when switching attribute positions; e.g. ‘big white mouse’ has the same meaning as ‘white big mouse’. Likewise in VSM, each new connection simply adds context in an interchangeable, additive way: in Fig. 8a, “mouse” is specified to be both “big” and “white” at the same time.

However, in some cases order does matter, as e.g. in the statement “half-of black dogs”. By connecting “half-of” and “black” to “dogs”, one can construct three different meanings (see the sets in Fig. 8b1-3):

  1. ‘some half of the group of dogs, that consists of only black ones’: here “dogs” is specified to “half-of” and “black” at the same time (while still being half-of in total);
  2. ‘some half of: black ones of the dogs’;
  3. ‘black ones of: some half of the dogs’.

In case 1, the attributes are ‘applied’ simultaneously, while in cases 2 and 3 they are applied in one of the two possible orders.

What happens when order matters, conceptually, is that we first compose an idea, and then we isolate it and refer to it, without letting the further added context mix in with it. This is in fact exactly how also the coreference connector works: in e.g. case 2 in Fig. 8b2 one can first construct a parent concept (“black dogs”), and then create a placeholder child concept (e.g. labeled “them”22 “them” is a somewhat arbitrary chosen label for the referring (child) term. It ‘reads’ rather nicely in Fig. 8b2, though less so in 8b3. Any other label can be used too, like “it” or “these”. ), which points with a coreference connector to the parent. This isolates the parent’s meaning and transfers it onto the child, unidirectionally. Finally, one can use this child term “them” for building the rest of the VSM-sentence, e.g. by connecting it to “half-of” and “escape”. The coreference connector points from the further-enriched child to its isolated, meaning-providing parent.

In case 3 (Fig. 8b3), the same reasoning is applied: here one first constructs an isolated “half-of dogs”, then one limits that conceptual unit of dogs to those being black, and specifies that they escape. Note that one can still further specify the isolated part, as in e.g.: ‘black ones of: some female half-of dogs’.

Figure 8. Interchangeable vs. non-interchangeable attributes. In (a), attribute order does not matter. But in advanced cases, the order in which multiple connections are added to a term can affect meaning. Here, the attributes “half-of” and “black” can be applied either (b1) simultaneously, or (b2-3) in a particular order. In the latter cases, one applies the first connection to the term “dogs”, then isolates that concept as parent for a coreferencing child placeholder, that is e.g. labeled ‘them’, and one then further connects this child.


Notes (which may go a bit too deep for a first reading) :

• The implicit relation between “dogs” and “black” would be automatically inferable to be “limited-to-all-those-having-color” (and not just “having-color”), because black would be classified as a color, and dogs represents not just one dog, but a group of dogs.23 A note on how to create the meaning of some ‘group of dogs’, starting from a single “dog” concept.
Consider a bident-based construct like “11 dog”, or stated explicitly: “dog <has-amount> 11″. The association with the relation “has-amount” is what converts the meaning of “dog” into a group. That is how that relation would work, i.e. how it would have to be semantically interpreted.
Alternatively, this relation could be made explicit, but for this example’s brevity and focus we used an implicit relation. Similarly, “half-of” pertains to the group of dogs, so it does not create the meaning of half of one dog.

• Each “half-of” creates an unspecified subset that is some half of the group. So in case 1 where “black” also specifies that this half subset consists of only black dogs, and in Fig. 8b where more than half of the dogs are black, the final subset remains partly unspecified. Therefore we had to choose an arbitrary subset of black dogs in Fig. 8b1. This is consistent with the other two cases, where the ‘half’ (dark blue line) is also not further specified and thus had to be chosen arbitrarily in Fig. 8b2-3.

• In natural language one can imply multiple pieces of information simultaneously. For example, a phrase like ‘the black half of dogs escapes’ would not only express the information of case 1, but also tell that (an other) “half-of dogs not has-color black”24 Here “has-color” would be attached (as subject) with a relation-less bident to “not” (as object). 25 [Advanced note]: One might think of representing this as “exactly half-of dogs has-color black”, instead, inspired by a similar expression in natural language. However, natural language is ambiguous.
- The primary meaning of “exactly” is: exact, as in not more nor less than the stated amount; (in contrast to “around”, which defines uncertainty about the exactness of the amount).
- A secondary meaning would imply, or intend to incorporate a meaning like: for the other part of the group, the opposite of what is said is true. This is what “exactly”‘s meaning would have to be linked to in that sentence. (And from that, an algorithm could infer that “half‑of dogs not has‑color black”, which is the additional information we proposed in the main text).

It is up to a VSM-box’s term-provider whether or not to include such intricate, compacted meanings.

In any case, when dealing with concepts that represent groups, curators will need to learn to pay attention to ambiguities in natural language. Or at least they should interpret VSM expressions like “exactly half‑of” in a plain, literal way. With the primary meaning, it would quite simply express “half‑of <has‑precision> exact”, i.e. ‘a/some exact half’.
, which would need to be captured with a second VSM-sentence in addition.

• Note:26 The issue of non-interchangeable connections was first noticed when the designer of VSM heard someone talk about their ‘old new me’ vs. ‘new new me’. Also here, the ‘new me’ is first contextually isolated, before being referred to again with the extra adjective.

3. Additional functionality

While terms and connectors provide the core functionality of VSM, some additional features can further raise both the usability and expressivity of VSM. We built some of these functions into our prototype UI already, while some others are initial ideas that can be further developed.

3.1. Templates

The Short Story 5/5

How can we make VSM immediately support the way how people are working today? Because people often search through scientific literature for one type of information (initially at least), and then enter it repetitively into a spreadsheet or database form.

Just like you’d expect: with VSM-templates: these are partial, pre-constructed VSM-sentences:

… with empty fields where a VSM-term can be filled in:

VSM-templates imitate spreadsheet rows or entry forms. But in addition they offer a convenient autocomplete (which is quicker and less error-prone that spreadsheets), with each field linked to its own, particular set of term dictionaries.27Each dictionary (or Controlled Vocabulary, CV) is a list of terms or words, with an agreed-upon meaning and definition, within a particular topic.

Note for experts:  In our prototype UI, we can specify a number of “preferred CV s” for each empty field. Terms from those CVs are then conveniently ranked on top of the autocomplete suggestion list. – Alternatively, one may want to configure the template-UI to leave less flexibility, and to require that users enter terms only from a specific CV.
And they offer an intuitive clarity of what to fill in where, because of the readable ‘sentence’ format. Our test-users literally loved all this.28 Isn’t it funny that ‘literally’ has become a figuratively used expression? — But really, our test-users would be horrified if they had to go back to the technology they used before VSM-templates.

They described this new curation technology – VSM – and what could be done automatically for them as a result, as being a real time‑saver, and even as ‘Magic!’.


What you may not expect (compared to current technology), is that VSM-templates are still extensible!
Just like for any other VSM-sentence, you can always add extra context details: just add or insert new terms and connect them up:


Boom!
Now we have a curation technology that works just like what exists, but is better at the same time (with term lookup), and that is immediately extensible to its full power!
So VSM is: easy to use, extremely flexible, and ready to be computed upon! Exactly what we wanted.

(you could jump to the Summary → or the Examples → or the Discussion → page)


The Full Story

While VSM enables flexible information capture, some curation tasks involve the repetitive entry of similarly structured facts. To streamline a routine entry of facts, one can envision a VSM-template. This is a predefined structure of connectors, with a series of terms, and empty fields that need to be filled in with a term by the user. – In a typical curation project, template designers would accompany a template with set of curation guidelines. These describe what type of term is expected in empty fields, in addition to how to interpret text in relevant scientific literature (which requires the most effort). Note that VSM-sentences (and -templates) can closely resemble natural language, which makes it clear for the curator to know what to fill in where. This makes the effort it takes to interact with the data entry software minimal.

Fig. 9 shows a number of templates. Each empty field can be associated with one or more preferred dictionaries, e.g. genes or cell types. This can help the autocomplete function to either rank these terms easily accessible on top, or display these terms only. Curators need not be restricted to capture only what a template was designed for: since a template is just a partial VSM-sentence, they can still insert any extra terms and connect them to the rest of the structure. For broader curation tasks, a curation platform could provide access to several VSM-templates in a menu, each dedicated to a particular information type that the curator may want to capture from a paper. The software may even enable users to design new templates.

Figure 9. VSM-templates. In the VSM-template in (a), the first two empty fields would be associated with protein terms, and the third one expects cell types and cell lines, which this template also clarifies in grey text. (b) is an example of this template filled in, and the result is a normal VSM-sentence. (c) is a template where the main relation is easily selectable from a list of pre-associated, one-click terms; and the last empty field would limit its term-lookup to a controlled vocabulary of experiment types.

3.2. External coreference

In natural language, one can build a story by referring back to any concept that was conveyed in an earlier sentence (e.g. ‘John buys cheese’; ‘One hour after that, a mouse eats it’). Because each term in a VSM-sentence is a specific concept, one can refer back to any of them; and not only from within a VSM-sentence but from other VSM-sentences as well. Software for managing multiple VSM-sentences could support such coreferencing between sentences, visually or through internal IDs. This could for instance be used for unambiguously defining the steps of biological experiment protocols.

3.3. Head

As explained earlier, each term receives extra ‘context’ from all its connected terms. As a consequence, every term actually represents the entire content of the VSM-sentence, although from its own ‘perspective’: e.g. in Fig. 2b the fifth term can be read as ‘the fork used for eating of chicken by John’. In natural language though, a sentence conveys information only from one particular perspective, e.g. in Fig. 2b, the sentence implies focus on ‘the eating’. This perspective or ‘intended meaning’ is embedded in the term “eats”; and this term is also the one that subsequent sentences would refer to when stating e.g. “that was observed in 2020″. The VSM UI could enable a user to assign this ‘focal term’ or ‘head’ in a VSM-sentence via an Alt+click on or above the term, and draw dashes above the term and under the connectors.

Next, when one wants to refer to a sentence as a whole, e.g. for adding meta-information, software could automatically make an external coreference toward this intended term.

3.4. General and Data terms

As VSM terms are embedded into a particular context, they represent specific concepts. Ontologies, however, define relations between general concepts, i.e. concepts that are not bound to a particular context. In addition, RDF is able to work with literal text-strings or data, which do not represent any ‘concept’. In order to support all these three fundamentally different types, the VSM UI may let the user change a term’s type between specific, general, or data (via Ctrl+click); and then shows them with a different background color. Then, a VSM-sentence could express information like “duck [in general] is-a-type-of bird [in general] according-to ontology-version-X [specific]”, or “bird [in general] has 2 wings”, or “chicken [in general] has-alias hen [literal string]”, or “proteinX has-sequence MRHIAHTQ [literal, data]”.

In fact, the distinction between general and specific concepts is described in 📖 The Fourth Founding Principle of VSM, but we can explain that much better after we show a picture of VSMGraphs, later on the VSMGraphs page.

Note that in case one would use VSM as an interface to generate RDF or OWL models, VSM does not impose the use of either OWL Instances or OWL Classes.29 In the VSM view, the only difference between (VSM‘s) specific and general is whether a concept is thought of as being attached to any explicit or implicit context, or not, respectively. Still, it makes sense to translate (the default, specific) VSM-terms to OWL Instances, as bioscience findings are typically reported only under particular experimental conditions or biological context.

A further treatise on the meaning of general vs. specific concepts in VSM, a topic that covers the subject of semantics alone, falls outside the scope of this page and will be discussed later. The focus of this page is to present VSM as a user-oriented method for the construction of any contextualized knowledge; a method that is founded on VSM‘s well-defined and human-thought inspired semantics, and reflected in a robust, intuitive, and general-purpose UI design.



An overview of VSM is on


Many interactive examples are on


More about VSM‘s implications and roll-out, is on



or go back to the Introduction





    Blue notes list:

  1. 1 Please read the Introduction page if you still need to understand why this is important.
    Or at least read the Blue Boxes with our key concepts: ‘Piece of Information‘, ‘Context details‘, ‘Computable Information‘.
    See also the introduction’s Venn diagram.
  2. 2 This ‘statement’ form is inspired by controlled languages. But unlike them, VSM does not use their complicated rules on word order. Instead, VSM uses a simpler and more powerful way to clarify structure, as we will show.
  3. 3 There we’re building several modules that culminate in the vsm-box web-component.
  4. 4 To use a VSM-box in a real application, it must be linked to some dictionary/ies, e.g. some of the many available biological Ontologies (like on BioPortal) or Controlled Vocabularies (CV).
    Also, these CVs are often incomplete works-in-progress, so you may need to create extra terms, or even new dictionaries, for your research field.
    It is an essential step anyway for communicating with colleagues, to agree on what words mean what! Building an agreed-upon dictionary just formalizes this process and creates a common work of reference.
  5. 5 The VSM-box UI that we use in the Short Story and other examples, is just a prototype and demo implementation.
    In the Full Story sections, we also discuss possible alternatives or extensions. Some of these are already implemented in the ‘vsmjs’ organization on GitHub, and some are plans or ideas for implementation.
  6. 6 • I like to say that they are different ‘avatars’: i.e. manifestations, appearances, different lexical forms that represents the same thing. (Much more about that in The Full Story).
    VSM-terms for “with” / “using” / “uses” / … would in fact all have the same ID.
    So it makes no difference for a computer’s understanding which form you choose. Still, by choosing the form that we’d expect to see in e.g. English, we can create VSM-sentences that are much easier to read for humans.
  7. 7 A note to impatient knowledge-representation experts: this intuitive description will become elegantly rigorous during the Full Story sections.
    Note #2: The John-holds-tomato case is shown as a VSM-sentence later, on the Examples page.
  8. 8 In fact, we are able to make it look a lot like natural language, because we are allowed to make VSM-terms appear as conjugated verbs, nouns, prepositions, etc.
    That ensures that we can keep complex information (i.e. longer sentences) easily readable for us, humans, too!
    A computer doesn’t understand (and thus doesn’t use/need) conjugations etc., so we need to clarify the structure of a sentence in another (preferably easy!) way.
  9. 9 VSM-connectors show the conceptual structure that exists in the information explicitly. While VSM-terms’ text can help in showing a readable formulation, the VSM-terms’ IDs and the VSM-connectors capture an intuitive, underlying conceptualization.
  10. 10 In RDF you need to make a ‘reification’ construct in order to point to a relation. I.e. you need to create an additional ‘object’ that represents that ‘relation’ (or in fact, its entire embedding triple), and that you can then point to.
    Because RDF treats ‘relations’ as fundamentally different things than ‘objects’.
  11. 11 This Principle is crucial, so let’s repeat it in other words:
    Only when some trident attaches to a VSM-term with its relation-leg, only then is the term made to be seen as a relation. And then only so, under that one trident, i.e. in that one triple, locally.
    So anything that e.g. RDF views as a relation, VSM views only locally as a relation (under a particular connector) and views it elsewhere as an entity / as ‘reified’ / ‘as a noun’.
    (The RDF phrase “re-ified” comes from Latin, meaning ~‘thing-ified’ or ‘made into an object’).
    This was an essential and necessary step for designing VSM‘s expressivity and simplicity in what will follow: think of all VSM-terms as nouns / ‘thingified concepts’ / something that you can mentally point to in the same way as one points to any other VSM-term.
  12. 12 Some more ways to say this:  ‘each term is placed in the context of / accumulates extra context-meaning from all others’, or:  ‘all connected VSM-terms enrich each other’s meaning’.
    This context meaning is shared with indirectly connected VSM-terms too.
    In the example, even the term “in” now specifically represents ‘the being located in C of the A-B-activation’. This is really useful. Because then you can say more about that concept again. For example: ‘that being-located in…’ is only “probable”. Then we can form the sentence: “A activates B probably in C”. (To add adverbs, see next Short Story).
  13. 13Easy to read: relative to the complexity of the information. – And of course (just like natural language) relative to the reader’s knowledge of the research field.
    To quote one of our curators:
    “From a user’s side, I don’t think that complexity is difficult to handle at all.
    In fact, I’m excited about all the complexity we can now handle in such an elegant manner.
    It allows us to focus on the biology; no more need to worry about the entry format so much.”
  14. 14 There was once this guy at a conference who mistakenly thought that the connectors were a ‘parse-tree‘ (as in: generated by a text-mining algorithm). Because they look a bit like it. – He later admitted he had been answering an email during my presentation, though. (I thank him for immunizing us against one more possible misunderstanding though).
  15. 15 While Principle 2 defines the meaning of individual VSM-terms, Principle 3 defines how the meaning of each of them changes by attaching one, or multiple connectors to them.
    At this point, Principle 3 may be intuitive or even seem trivial for simple cases as in Fig 2, but it will be a crucial insight for working with more advanced cases that need the coreference connector (sections 2.5 and 2.6).
  16. 16 In other words, ‘adding more context‘ means: ‘further narrowing down the meaning of a concept, as to what range of possible meanings it may represent‘ (making it more precise), or ‘removing some ambiguities’, or ‘further eliminating unknowns’.
    E.g. in just “John eats chicken”, “eats” could happen in any way. But after we add the “eats with fork” trident, we narrow down “eats”‘s meaning by anchoring down at least one more aspect of how it happens.
    (Meanwhile, other unknowns that could be specified about it, still remain: where he eats, together with who, how quickly, etc., but that would lead us to Principle 4 on a further page already).
  17. 17 E.g. both triple statements “cat is alive” and “cat is dead” can be true. But they are true in their own specific context. We can make this context explicit by expanding the statements to e.g. “cat is alive in 2020″ and “cat is dead in 2090″.
  18. 18 In other words, two distinct instances of “apple” are needed here. Each instance would be stored with its own (instance-) identifier. This, however, is a more technical aspect of VSM that is covered by VSM-Graphs, see later.
  19. 19 You could also read it as
    “mouse [specified-to-be] white”, or as
    “mouse [has-color] white”.
    And the computer would understand it like that too, because it should already know that “white” is a color, and so it could infer the more specific “[has‑color]” if needed.
  20. 20 Note for experts:
    One could also enter “activation  of  B”, by using an explicit VSM-term “of“, which would have the meta-meaning ‘has object’.
    Note #2: please do not misunderstand and think that ‘VSM knows’ if two sentences (two graph structures) would represent a same, rephrased meaning. VSM does not ‘know’. VSM is just a representation form, to be used by people and algorithms. – Just like ‘English does not know’ what you mean. Your brain does, and it uses English to represent information at some level.
  21. 21 This figure also shows that numbers can be represented with VSM-terms. This makes sense because one “5″ can be conceptually different from another “5″; because just like other VSM-terms, a number‑concept can be embedded in a particular context. For example, one could be an “approximately  5″, another a “5  ±  2″, and another an “at-least  5″.
    For this, the autocomplete UI needs to support the entry of numbers, by immediately creating a new VSM-term for any of them as needed, after the user presses Enter.
    (This was not yet implemented in the prototype used on these web pages, but it is implemented now in the vsm-dictionary module (see its spec), which supports ‘vsm-box’).
  22. 22 “them” is a somewhat arbitrary chosen label for the referring (child) term. It ‘reads’ rather nicely in Fig. 8b2, though less so in 8b3. Any other label can be used too, like “it” or “these”.
  23. 23 A note on how to create the meaning of some ‘group of dogs’, starting from a single “dog” concept.
    Consider a bident-based construct like “11 dog”, or stated explicitly: “dog <has-amount> 11″. The association with the relation “has-amount” is what converts the meaning of “dog” into a group. That is how that relation would work, i.e. how it would have to be semantically interpreted.
  24. 24 Here “has-color” would be attached (as subject) with a relation-less bident to “not” (as object).
  25. 25 [Advanced note]: One might think of representing this as “exactly half-of dogs has-color black”, instead, inspired by a similar expression in natural language. However, natural language is ambiguous.
    - The primary meaning of “exactly” is: exact, as in not more nor less than the stated amount; (in contrast to “around”, which defines uncertainty about the exactness of the amount).
    - A secondary meaning would imply, or intend to incorporate a meaning like: for the other part of the group, the opposite of what is said is true. This is what “exactly”‘s meaning would have to be linked to in that sentence. (And from that, an algorithm could infer that “half‑of dogs not has‑color black”, which is the additional information we proposed in the main text).
    It is up to a VSM-box’s term-provider whether or not to include such intricate, compacted meanings.
    In any case, when dealing with concepts that represent groups, curators will need to learn to pay attention to ambiguities in natural language. Or at least they should interpret VSM expressions like “exactly half‑of” in a plain, literal way. With the primary meaning, it would quite simply express “half‑of <has‑precision> exact”, i.e. ‘a/some exact half’.
  26. 26 The issue of non-interchangeable connections was first noticed when the designer of VSM heard someone talk about their ‘old new me’ vs. ‘new new me’. Also here, the ‘new me’ is first contextually isolated, before being referred to again with the extra adjective.
  27. 27Each dictionary (or Controlled Vocabulary, CV) is a list of terms or words, with an agreed-upon meaning and definition, within a particular topic.
    Note for experts:  In our prototype UI, we can specify a number of “preferred CV s” for each empty field. Terms from those CVs are then conveniently ranked on top of the autocomplete suggestion list. – Alternatively, one may want to configure the template-UI to leave less flexibility, and to require that users enter terms only from a specific CV.
  28. 28 Isn’t it funny that ‘literally’ has become a figuratively used expression? — But really, our test-users would be horrified if they had to go back to the technology they used before VSM-templates.
    They described this new curation technology – VSM – and what could be done automatically for them as a result, as being a real time‑saver, and even as ‘Magic!’.
  29. 29 In the VSM view, the only difference between (VSM‘s) specific and general is whether a concept is thought of as being attached to any explicit or implicit context, or not, respectively.


    Grey notes list:

  1. 1 • Controlled language: Wiki:Controlled_natural_language, Wiki:Formal_language;
    • Natural language: e.g. English, Dutch;
    RDF: Wiki:RDF, W3C, tutorial;
    • Ontology: Wiki:Ontology_(information_science), Wiki:Open_Biomedical_Ontologies.
  2. 2 Wikipedia:Concept
  3. 3http://www.geneontology.org/
    Gene Ontology Consortium: going forward, Nucleic Acids Res, 2015.
    https://www.ncbi.nlm.nih.gov/gene/
  4. 4 Jointly creating digital abstracts: dealing with synonymy and polysemy, BMC Res Notes, 2012.
  5. 5 Gene Ontology (geneontology.org) is a dictionary about gene functionality etc. – Example of a long term: “positive regulation of transcription from RNA polymerase II promoter“, and there are even longer ones.
    Some terms are that long because GO classifies biological concepts in an extensive tree structure of ever-more-specific terminology.
  6. 6RDF in general: Wiki:RDF, W3C, tutorial.
    • E.g. triples represented in RDF Turtle.
  7. 7 Dai 2007: ‘A WUSCHEL-LIKE HOMEOBOX Gene Represses a YABBY Gene Expression Required for Rice Leaf Development’.
  8. 8 The term “leaf lamina” represents the Plant Ontology ID ‘PO:0020039′. You can check this ID, term, synonyms and a precise definition on this page.
  9. 9 The term “twisted” represents the Phenotype And Trait Ontology ID ‘PATO:0001989′. You can check this ID, term, synonyms and a precise definition on this page.
    Note that “sinuous” has been assigned as this ID’s preferred (synonymous) term, since we made this example. But the VSM-term’s ID and associated meaning remain the same, of course.
  10. 10 See this Wikipedia page, or the Resource Description Framework project site.
  11. 11 Dai 2007: ‘A WUSCHEL-LIKE HOMEOBOX Gene Represses a YABBY Gene Expression Required for Rice Leaf Development’.
  12. 12 Jiang 1999: ‘Multistep regulation of DNA replication by Cdk phosphorylation of HsCdc6′.