API Documentation

Conventions:

  1. The API will return all results in json format.
  2. We'll use the terms "category", "element", and "link" for the data.
    Each category has 0..n elements, each element has 1 category. Category names are unique. Element names are unique within their category. Links connect exactly two elements (no matter weather they're in the same or in two different categories) and are undirected.
  3. Double url encode strings in urls (e.g. if you want to query /api/category/<catname> with catname := "text/javascript" then query /api/category/text%252Fjavascript).
  4. Some elements have a special syntax (similar to markdown):
    1. Links: [title](url). If ] or [ is needed in the title, it must be escaped with a \. Like so: [Cool \] bracket](url). The full matching RegEx with anonymous title and url group reads as ^\[((?:[^\]\[]|\\\[|\\\])*)\]\(([^()]*)\)$ For links the title is also a valid ID in URLs. Therefore, no two links with the same title can exist within one category. E.g. [cool] element and [\[cool\] element](http://with-link.com) would collide inside the same category.

      Sanity checks

      There is no server-side sanity check done on the links. There might be http:, ftp:, or even javascript: URIs among the links.

Each successful call to the api will result in a response with the hedaer field ETag set. The value of that field is the timestamp of the latest database version. If you do a request to the api you may pass the database version your client is currently working on in the ETag header field or as parameter db_version if you prefer. This will help to avoid race conditions. If you try to change things in the database while passing an outdated database version, the server will return with a 409 status code. If you specify the ETag-header and the db_version parameter, the parameter is preferred and the header field will be ignored.
You are free to omit the database version (sending no ETag in the header and setting no db_version parameter). The database will then of course omit the version check and just tries to do the change. Beware that this may cause race conditions.

The settings are excluded from all of this, because they are stored separately and don't have any kind of versioning.


GET /api/all_timestamps

This will return a list of timestamps as strings:

[
  "<string:timestamp1>",
  "<string:timestamp2>",
  ...
]
        

Some TagTool instances may be secured by a password. There are three ways to authenticate. Furthermore, the read-only option might be enabled which allows all GET requests (except /api/token) even without previous authentication.

  • Send the password or the auth-token along with the auth parameter.
  • Send the password along with the password parameter.
  • Send the auth-token in a cookie called TagToolAuth.

For every successful request against an endpoint of a password protected TagTool instance, a Set-Cookie header will be set in the response. The new cookie will correspond to the one described as third authentication method above.


GET /api/token

This will return the auth token. Of course, you have to be authenticated to do so. Note that 204 with empty body is returned if the instance isn't password protected.


GET /api/auth_status

This will return a json status object like this:

{
  'passwordSet': <boolean if instance is password protected>,
  'readOnly':    <boolean if read-only mode is enabled>,
  'loggedIn':    <boolean if the request is authenticated>
}
        

All PUT and DELETE calls (except for the settings) as well as the bulk action accept a parameter strict that can be set to true or false (dafault). If set to true the server will return status code 404 if you try to delete a non-existing entity and 400 if you try to add an existing one. If set to false those exceptions will be ignored silently. This applies recursively which means, if you e.g. delete an element which automatically results in deletion of all of it's links, it would be silently ignored if one or more of those links don't exist.

PUT/DELETE /api/category/<string:name>

PUT will create the category, while DELETE will delete it.


GET /api/category/<string:name>

This will return all elements of the specified category as nested list like this:

[
  [
    "<string:category_name>",
    "<string:element_name>"
  ],
  [
    ...
  ],
  ...
]
        

Or you can get all categories, their elements, and number of links of each element by calling

GET /api/categories

The result will look like this:

{
  "<string:category_name>": {
    "<string:element_name>": <integer:link_count>,
    "<string:element_name>": <integer:link_count>,
    ...
  },
  "<string:category_name>": {
    ...
  },
  ...
}
        
PUT/DELETE /api/category/<string:category_name>/<string:element_name>

PUT will create the element, while DELETE will delete it.


You can get all links of an element (links are undirected).
GET /api/category/<string:category_name>/<string:element_name>
The result will look like this:
[
  [
    [
      "<string:category_name_source>",
      "<string:element_name_source>"
    ],
    [
      "<string:category_name_target>",
      "<string:element_name_target>"
    ]
  ],
  [
    ...
  ]
]
        

Links are represented as list of lists with two elements. The order (i.e. which element will be source and which target) is not specified!


You can search for elements.

GET /api/search?q=<query>&cat=<cat>&forcefuzzy=<force fuzzy search>&fuzzthresh=<fuzzy threshold>&fuzzlimit=<fuzzy limit>

Will return a list of list of category element pairs link so:

[
  [
    "<string:category_name>",
    "<string:element_name>"
  ],
  [
    ...
  ],
  ...
]
        

Parameters:

  • q (string, default: ""): the query string
  • cat (string, default: no restriction): Restrict the search to a category. Raises 404 if category doesn't exist
  • forcefuzzy (bool, default: true): If true, only a fuzzy search will be performed. Elsewise the search works in 5 stages: perfect match, case-insensitive match, case- and whitespace-insensitive match, case- and whitespace-insensitive infix match. The first stage that yields results will cancel the later ones. Only if none of the 4 previous stages find matches the fuzzy search will be run as 5th stage.
  • fuzzthresh (float, default: 60): Threshold for the fuzzy search (between 0 and 100). It gives the minimum wanted ratio of the most similar substring.
  • fuzzlimit (int, default: 3): Limit the number of fuzzy results. Results of stages 1-4 (see forcefuzzy) are returned either way. Only if these are less then fuzzlimit, the results will be extended with fuzzy results (if existing) to have (at most) length fuzzlimit. Pass -1 to return all results.
PUT/DELETE '/api/link/between/<string:e1_cat>/<string:e1_elem>/and/<string:e2_cat>/<string:e2_elem>'

"e1" and "e2" are standing for "element 1" and "element 2" (order doesn't matter). "cat" stands for "category", "elem" for "element". PUT will create the link while DELETE will destroy it. If you're using not existent categories and/or elements while creating a new link the server will automatically create them as needed.


Call

GET /api/links

To get a list of all links. Or

GET /api/links/of/<string:category>/<string:element>

to get a list of all links of a specified element. The resulting list will look like this for both calls:

[
  [
    [
      "<string:category_name_source>",
      "<string:element_name_source>"
    ],
    [
      "<string:category_name_target>",
      "<string:element_name_target>"
    ]
  ],
  [
    ...
  ]
]
        

The order (i.e. which element will be source and which target) is not specified!

GET /api/similarities

A list of pairs and their similarity will be returned. Every pair that doesn't appear in the list has a similarity of 0 or isn't disjoint. The pairwise similarity is defined as $s_{i, j} = \frac{|l_i\ \cap\ l_j|}{0.5\ \cdot\ (\ |l_i|\ +\ |l_j|)}$ where $l_i$ is the set of all links of element $i$. I.e. a measure of how many links $i$ and $j$ have on common.

The resulting list of triples will look like this:

[
  [
    [
      "<string:category_name_1>",
      "<string:element_name_1>"
    ],
    [
      "<string:category_name_2>",
      "<string:element_name_2>"
    ],
    <float:similarity_1_2>
  ],
  [
    ...
  ]
]
        
From the formula, one can see that similarity $\in\ [0, 1]$. Please note that the pairs are not sensitive to order. If the pair $(i, j)$ is in the list $(j, i)$ won't.
GET /api/diff/query?start=<start>&end=<end>
The parameters start and end are expected to be timestamps. Default value for start is the first database version. Default value for end is the latest database version. You have several options to combine the parameters with the ETag header (s. above for more details):
  1. Nothing is passed.
    This will return a diff from an empty db to the most recent db. I.e. this is a dump of the current data base in diff format.
  2. Neither start nor end but ETag is passed. Or only end passed.
    Both will result in a diff from an empty db to the version identified by the ETag.
  3. Only start is passed.
    This will result in a diff from the specified start to the latest version.
  4. start and ETag
    This will result in a diff from the specified start to the version identified by the ETag.
  5. end and ETag
    This will result in a diff from the version identified by the ETag to the specified end.
  6. start, end and ETag will cause an HTTPException

Diff results look like this:
{
  "from": <old_timestamp>,
  "to": <new_timestamp>,
  "deletions": {
    "categories": ["<cat_name1>", "<cat_name2>", ...],
    "elements": [
      [<cat_name>, <elem_name1>],
      ...
    ],
    "links": [
      [
       [<src_cat_name>, <src_elem_name1>],
       [<dest_cat_name>, <dest_elem_name1>]
      ],
      ...
    ]
  },
  "insertions": {
    "categories": [...],
    "elements": [...],
    "links": [...]
  }
}
        
Thus an empty diff would look like this:
{
  "from": null,
  "to": null,
  "deletions": {
    "categories": [],
    "elements": [],
    "links": []
  },
  "insertions": {
    "categories": [],
    "elements": [],
    "links": []
  }
}
        
Whenever you query a diff starting at an empty db, <old_timestamp> will be null.

It is possible to be notified by the server, if the database changes. You can find more about this here: WebSocket > Diff

PUT '/api/rename/<old_name>/to/<new_name>'

This will rename the category old_name to new_name. The call will result in a 404 error, if you try to rename a not existing category and in a 400 if a category with the name new_name already exists.

The settings for old_name will persist but also be copied to new_name.


PUT '/api/rename/<category>/<old_name>/to/<new_name>'

This will rename the element old_name in the category category to new_name. The call will result in a 404 error, if you try to rename a not existing element (i.e. the category or the element does not exist) and in a 400 if an element with the name new_name already exists in the specified category.

PUT '/api/rename/<old_category>/<old_name>/to/<new_category>/<new_name>'

This will rename the element old_name in the category old_category to new_name and move it to category new_category. The call will result in a 404 error, if you try to rename a not existing element (i.e. the category or the element does not exist) and in a 400 if an element with the name new_name already exists in the new category. Furthermore, it will result in a 404 if the new category does not exist.

If you'd like to e.g. delete a whole bunch of elements or rename multiple elements with one request, this is the way to go.

POST '/api/bulkaction'

The request expects a diff parameter. Such a diff is a json object, that looks similar to the diff described here except the from and to fields don't exist and a key renamings exists.

This example would rename the category Persons to Person, then rename the Person 'Hans' to 'Hans-Peter' and finally delete the link between 'Hans-Peter' and the Person 'Karl':

{
  "insertions": {
    "categories": [],
    "elements": [],
    "links": []
  },
  "renamings": {
    "categories": [["Persons", "Person"]],
    "elements": [[["Person", "Hans"], ["Person", "Hans-Peter"]],
                 [["Hobby", "Marwin"], ["Person", "Marvin"]]]
  },
  "deletions": {
    "links": [[["Person", "Hans-Peter"], ["Person", "Karl"]]],
    "elements": [],
    "categories": []
  }
}
        

The order in which the single actions are executed is exactly as shown above:

  1. Insertions
    1. Categories
    2. Elements
    3. Links
  2. Renamings
    1. Categories
    2. Elements
  3. Deletions
    1. Links
    2. Elements
    3. Categories

Empty entries can also be left out. So the following diff is equivalent to the one above:

{
  "renamings": {
    "categories": [["Persons", "Person"]],
    "elements": [[["Person", "Hans"], ["Person", "Hans-Peter"]],
                 [["Hobby", "Marwin"], ["Person", "Marvin"]]]
  },
  "deletions": {
    "links": [[["Person", "Hans-Peter"], ["Person", "Karl"]]]
  }
}
        

If one (or more) of the specified actions fails all other actions will be dropped/rolled back as well.

GET /api/recently_changed?seconds=<integer>&ignore_if_all=<boolean>

This will return a list of recently changed elements and categories. “Recently” is specified by the seconds parameter, which can be omitted to use the server side default (which is configured individually by the server admin). Categories are considered to be changed if they have been added recently. Elements are considered to be changed if they either

  • have been added recently,
  • new links involving them have been added recently, or
  • links involving them have been deleted recently.

The result will be list of [<category>, <element>] pairs and category stings, like this:

[
  ["<cat_name>", "<change_elem_name>"],
  ["<another_cat_name>", "<another_changed_elem_name>"],
  "<new_cat_name>",
  ...
]
        

If ignore_if_all is set to true (default) the server will return an empty result list, if all elements and all categories are considered as recently changed. Set it to false to return the full list instead in such cases.

GET /api/user_count

The response will contain a positive integer or zero. It is the approximated number of users currently "connected" with the server. It's basically the number of WebSockets connected to the server.


It's possible to get the current number of users with the WebSocket provided by the api (WebSocket > User count).

The API offers a Socket.IO interface. To connect to it you may want to use an existing library for your needed programming language, as there are some things going on between WebSocket and Socket.IO. Different kind of events are sent over the WebSocket. Namely diff, user_count, and settings events (s. below). On connection a diff event will be triggered which informs you of the current state of the database. Anyway, you can pass an db version (s. here) on connect to only obtain the diff from that timestamp to the current version (s. stack overflow or our example below on how to achieve that).


See Diffs > Structure on how a diff is structured.

When ever the diff event is fired, it will pass an JSON encoded diff of the most recent change(s).


The value will be a positive integer (including 0). Find more information here: User count > GET.


Whenever the settings change, this event will be fired.

See Settings > Structure on how the settings object looks like.


Here is a draft of a JavaScript client for the Socket.IO interface:

var socket = io('http://<server>', { query: 'db_version='+'<some_version_optional>'});

socket.on('connect', function() {
    socket.on('diff', function(data) {
        diff = parseJSON(data);
        doSomethingWithTheDiff(diff);
    });

    socket.on('user_count', function(data) {
        userCount = parseInt(data); // parseJSON(data) will work as well here
        updateUserCounter(userCount);
    });
});
        

This whole part is not affected by the ETag specifications. There is no versioning of the settings.

The server stores default settings for the visualizations of the clients.
The client must be able to handle "missing" settings and have an own default.

The settings can be queried with

GET /api/settings

It'll return a json object like described below.


PUT /api/settings

will update the settings. You must set the header Content-Type: application/json and send the data as described here in the request body.
If you set null as value, the default setting will be removed.
You can also send only a partial settings object.

For clarification: Sending an empty object ({}) would be a valid action as well since an empty object is a partial settings object too.
But of course this request would have no effect.

Example

Assume the server side settings look like this:

{
  "sizing": {
    "foo": "none",
    "bar": "links",
    "fb": "links"
  }
}
        

If you now PUT this data:

{
  "sizing": {
    "foo": "links",
    "foobar": "none",
    "fb": null
  }
}
        

The server side settings will look like this:

{
  "sizing": {
    "foo": "links",
    "bar": "links",
    "foobar": "none"
  }
}
        

The full settings object looks like this:

{
  "sizing": {
    "<cat_name_1>": <sizing_opt_1>,
    "<cat_name_2>": <sizing_opt_2>,
    ...
  },
  "display-priority": {
    "<cat_name_1>": <dp_opt_1>,
    "<cat_name_2>": <dp_opt_2>,
    ...
  },
  "display-type": {
    "<cat_name_1>": <dt_opt_1>,
    "<cat_name_2>": <dt_opt_2>,
    ...
  }
}
        

Where each cat_name_n may be any string identifying a category.

  • Each sizing_opt_n is either "none" or "links".
    "none" means all elements should be displayed the same size while "links" means that elements should be displayed bigger the more links they have.
  • Each dp_opt_n is an integer (positive or negative).
    It can be understood as analogue to the css property z-index but for the displaying order of the categories; the higher the display-priority of a category is set the more at the beginning it should be displayed. If two categories have the same display-priority it's up to the client to find an order. If no display-priority is set the client may use 0 as default.
  • Each dt_opt_n is one of "cloud" or "links".
    "cloud" means that all elements should be displayed in a tag cloud style while "links" means that the names should be treated as links (see conventions for syntax).

For PUT request only you also have the option to set sizing_opt_n and/or dp_opt_n to null. This will cause the given setting to be removed (see Settings > PUT).

GET /api/io?format=<string:format>

format must be csv or excel. Default is csv.

The method will return a file as attachment. The file is called <prefix>-<timestamp of db>.<csv or xlsx> where the prefix is some string depending on the server config. The table has a header row with 4 columns: Category1,Element1,Category2,Element2. Each other row has either the first 1, 2, or 4 columns filled. The first rows will contain all links (first 4 columns filled), the next rows will contain all elements that have no links (first 2 columns filled), and the last rows all categories without any elements (first column filled). For more details see IO > Table Structure.

A timestamp can be specified to get a desired version of the database (see ETag). If none is provided, the latest version will be returned.


POST (Import)

POST /api/csv

Expects an argument file that contains a multi-part encoded file. The content will be added to the database. Please see below on how the file has to be structured.

Takes an optional argument format that must be csv or excel, depending on what file format you upload. Default is guessing (inefficient).

The table header (first row) is optional. If present, it will be stripped.

This method allows you to compress the Table as rows with less than 4 columns will be padded with empty cells. Furthermore, it fills empty category cells with the last available category of the column. This, for example:

cat1,elem1,cat2,elem2
    ,elem3
    ,     ,    ,elem4
    ,elem5,cat1,elem1
        

will be translated to:

cat1,elem1,cat2,elem2
cat1,elem3,cat2,
cat1,     ,cat2,elem4
cat1,elem5,cat1,elem1
        

The semantics are:

  • Add a link between [cat1, elem1] and [cat2, elem2] and create categories and elements as needed.
  • Add element [cat1, elem3], category cat2, and category cat1 if needed.
  • Add category cat1, element [cat2, elem4], and category cat2 if needed.
  • Add a link between [cat1, elem5] and [cat1, elem1] and create categories and elements as needed.

You might have noticed that filling up the category columns causes additional actions, but all of those actions will have no effect as the corresponding categories would have been created by a previous row either way.

Note that the first cell of the first row cannot be empty, unless the Table is entirely empty.


The Table has 1..n rows. The first row is the header:

Category1,Element1,Category2,Element2

Each other row has 4 columns with either

  • first 4 columns filled: <cat1>,<elem1>,<cat2>,<elem2>
    Representing a link between [cat1, elem1] and [cat2, elem2],
  • first 2 columns filled: <cat1>,<elem1>
    Representing the element [cat1, elem1], or
  • first column filled: <cat1>
  • Representing category cat1.

Note that the import accepts a compressed form of this table as described under IO > POST (Import).


CSV

CSV files will be handled with panda's read_csv with all the default settings.

  • Encoding: UTF-8
  • Line-Ending: Unix-style (\n), old Mac OS (\r), or Windows-Style (\r\n)
  • Delimiter: ,
  • Escape Char: None (see quoting)
  • QuoteChar: "
  • Quoting: Minimal
  • Double Quotes: True

More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Excel

Excel import is handled with panda's read_excel and xlrd library as engine.

More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html