API Documentation
Conventions:
- The API will return all results in json format.
- We'll use the terms "category", "element", and "link" for the data.
Each category has 0..n elements, each element has 1 category. Category names are unique. Element names are unique within their category. Links connect exactly two elements (no matter weather they're in the same or in two different categories) and are undirected. - Double url encode strings in urls
(e.g. if you want to query
/api/category/<catname>
withcatname := "text/javascript"
then query/api/category/text%252Fjavascript
). - Some elements have a special syntax (similar to markdown):
-
Links:
[title](url)
. If]
or[
is needed in the title, it must be escaped with a\
. Like so:[Cool \] bracket](url)
. The full matching RegEx with anonymous title and url group reads as^\[((?:[^\]\[]|\\\[|\\\])*)\]\(([^()]*)\)$
For links the title is also a valid ID in URLs. Therefore, no two links with the same title can exist within one category. E.g.[cool] element
and[\[cool\] element](http://with-link.com)
would collide inside the same category.
Sanity checks
There is no server-side sanity check done on the links. There might be
http:
,ftp:
, or evenjavascript:
URIs among the links.
-
Links:
The ETag
Each successful call to the api will result in a response with the
hedaer field ETag
set. The value of that field is the
timestamp of the latest database version. If you do a request
to the api you may pass the database version your client is currently
working on in the ETag
header field or as parameter
db_version
if you prefer. This will help to avoid race
conditions. If you try to change things in the database while passing
an outdated database version, the server will return with a 409
status code. If you specify the ETag
-header and the
db_version
parameter, the parameter is preferred and the
header field will be ignored.
You are free to omit the database version (sending no ETag
in the header and setting no db_version
parameter). The
database will then of course omit the version check and just tries to
do the change.
Beware that this may cause race conditions.
The settings are excluded from all of this, because they are stored separately and don't have any kind of versioning.
Getting the timestamps
GET /api/all_timestamps
This will return a list of timestamps as strings:
[ "<string:timestamp1>", "<string:timestamp2>", ... ]
Authorization
Some TagTool instances may be secured by a password. There are three
ways to authenticate. Furthermore, the read-only option might be enabled
which allows all GET
requests (except /api/token
)
even without previous authentication.
-
Send the password or the auth-token along with the
auth
parameter. - Send the password along with the
password
parameter. - Send the auth-token in a cookie called
TagToolAuth
.
For every successful request against an endpoint of a password
protected TagTool instance, a Set-Cookie
header will be
set in the response. The new cookie will correspond to the one
described as third authentication method above.
Getting the auth token
GET /api/token
This will return the auth token. Of course, you have to be
authenticated to do so. Note that 204
with empty body is
returned if the instance isn't password protected.
Getting the auth settings
GET /api/auth_status
This will return a json status object like this:
{ 'passwordSet': <boolean if instance is password protected>, 'readOnly': <boolean if read-only mode is enabled>, 'loggedIn': <boolean if the request is authenticated> }
Strict mode
All PUT and DELETE calls (except for the settings) as well as the
bulk action accept a parameter strict
that can be set to
true
or false
(dafault).
If set to true
the server will return status code 404
if
you try to delete a non-existing entity and 400
if you try to add an existing one.
If set to false
those exceptions will be ignored silently.
This applies recursively which means, if you e.g. delete an element which
automatically results in deletion of all of it's links, it would be
silently ignored if one or more of those links don't exist.
Working with categories
Adding, Deleting
PUT/DELETE /api/category/<string:name>
PUT will create the category, while DELETE will delete it.
Getting
GET /api/category/<string:name>
This will return all elements of the specified category as nested list like this:
[ [ "<string:category_name>", "<string:element_name>" ], [ ... ], ... ]
Or you can get all categories, their elements, and number of links of each element by calling
GET /api/categories
The result will look like this:
{ "<string:category_name>": { "<string:element_name>": <integer:link_count>, "<string:element_name>": <integer:link_count>, ... }, "<string:category_name>": { ... }, ... }
Working with elements
Adding, Deleting
PUT/DELETE /api/category/<string:category_name>/<string:element_name>
PUT will create the element, while DELETE will delete it.
Getting
You can get all links of an element (links are undirected).GET /api/category/<string:category_name>/<string:element_name>The result will look like this:
[ [ [ "<string:category_name_source>", "<string:element_name_source>" ], [ "<string:category_name_target>", "<string:element_name_target>" ] ], [ ... ] ]
Links are represented as list of lists with two elements. The order (i.e. which element will be source and which target) is not specified!
Searching
You can search for elements.
GET /api/search?q=<query>&cat=<cat>&forcefuzzy=<force fuzzy search>&fuzzthresh=<fuzzy threshold>&fuzzlimit=<fuzzy limit>
Will return a list of list of category element pairs link so:
[ [ "<string:category_name>", "<string:element_name>" ], [ ... ], ... ]
Parameters:
-
q
(string
, default: ""): the query string -
cat
(string
, default: no restriction): Restrict the search to a category. Raises 404 if category doesn't exist -
forcefuzzy
(bool
, default: true): If true, only a fuzzy search will be performed. Elsewise the search works in 5 stages: perfect match, case-insensitive match, case- and whitespace-insensitive match, case- and whitespace-insensitive infix match. The first stage that yields results will cancel the later ones. Only if none of the 4 previous stages find matches the fuzzy search will be run as 5th stage. -
fuzzthresh
(float
, default: 60): Threshold for the fuzzy search (between 0 and 100). It gives the minimum wanted ratio of the most similar substring. -
fuzzlimit
(int
, default: 3): Limit the number of fuzzy results. Results of stages 1-4 (seeforcefuzzy
) are returned either way. Only if these are less thenfuzzlimit
, the results will be extended with fuzzy results (if existing) to have (at most) lengthfuzzlimit
. Pass-1
to return all results.
Working with links
Adding, Deleting
PUT/DELETE '/api/link/between/<string:e1_cat>/<string:e1_elem>/and/<string:e2_cat>/<string:e2_elem>'
"e1" and "e2" are standing for "element 1" and "element 2" (order doesn't matter). "cat" stands for "category", "elem" for "element". PUT will create the link while DELETE will destroy it. If you're using not existent categories and/or elements while creating a new link the server will automatically create them as needed.
Getting
Call
GET /api/links
To get a list of all links. Or
GET /api/links/of/<string:category>/<string:element>
to get a list of all links of a specified element. The resulting list will look like this for both calls:
[ [ [ "<string:category_name_source>", "<string:element_name_source>" ], [ "<string:category_name_target>", "<string:element_name_target>" ] ], [ ... ] ]
The order (i.e. which element will be source and which target) is not specified!
Similarities (or second order links)
GET
GET /api/similarities
A list of pairs and their similarity will be returned. Every pair that doesn't appear in the list has a similarity of 0 or isn't disjoint. The pairwise similarity is defined as $s_{i, j} = \frac{|l_i\ \cap\ l_j|}{0.5\ \cdot\ (\ |l_i|\ +\ |l_j|)}$ where $l_i$ is the set of all links of element $i$. I.e. a measure of how many links $i$ and $j$ have on common.
The resulting list of triples will look like this:
[ [ [ "<string:category_name_1>", "<string:element_name_1>" ], [ "<string:category_name_2>", "<string:element_name_2>" ], <float:similarity_1_2> ], [ ... ] ]From the formula, one can see that
similarity
$\in\ [0, 1]$.
Please note that the pairs are not sensitive to order. If the pair $(i, j)$
is in the list $(j, i)$ won't.
Working with diffs
Querying
GET /api/diff/query?start=<start>&end=<end>The parameters
start
and end
are expected to
be timestamps. Default value for start is the first database version.
Default value for end is the latest database version. You have several
options to combine the parameters with the ETag
header
(s. above for more details):
-
Nothing is passed.
This will return a diff from an empty db to the most recent db. I.e. this is a dump of the current data base in diff format. -
Neither start nor end but ETag is passed. Or only end passed.
Both will result in a diff from an empty db to the version identified by the ETag. -
Only start is passed.
This will result in a diff from the specified start to the latest version. -
start and ETag
This will result in a diff from the specified start to the version identified by the ETag. -
end and ETag
This will result in a diff from the version identified by the ETag to the specified end. - start, end and ETag will cause an HTTPException
Diff results
Diff results look like this:{ "from": <old_timestamp>, "to": <new_timestamp>, "deletions": { "categories": ["<cat_name1>", "<cat_name2>", ...], "elements": [ [<cat_name>, <elem_name1>], ... ], "links": [ [ [<src_cat_name>, <src_elem_name1>], [<dest_cat_name>, <dest_elem_name1>] ], ... ] }, "insertions": { "categories": [...], "elements": [...], "links": [...] } }Thus an empty diff would look like this:
{ "from": null, "to": null, "deletions": { "categories": [], "elements": [], "links": [] }, "insertions": { "categories": [], "elements": [], "links": [] } }Whenever you query a diff starting at an empty db,
<old_timestamp>
will be null
.
Diffs via WebSocket
It is possible to be notified by the server, if the database changes. You can find more about this here: WebSocket > Diff
Renaming things
Renaming a category
PUT '/api/rename/<old_name>/to/<new_name>'
This will rename the category old_name
to new_name
.
The call will result in a 404
error, if you try to rename a not existing
category and in a 400
if a category with the name new_name
already exists.
The settings for old_name
will
persist but also be copied to new_name
.
Renaming an element
PUT '/api/rename/<category>/<old_name>/to/<new_name>'
This will rename the element old_name
in the category
category
to new_name
.
The call will result in a 404
error, if you try to rename a not existing
element (i.e. the category or the element does not exist) and in a 400
if an element with the name new_name
already exists in the
specified category.
PUT '/api/rename/<old_category>/<old_name>/to/<new_category>/<new_name>'
This will rename the element old_name
in the category
old_category
to new_name
and move it to
category new_category
.
The call will result in a 404
error, if you try to rename a not existing
element (i.e. the category or the element does not exist) and in a 400
if an element with the name new_name
already exists in the
new category. Furthermore, it will result in a 404
if the new category
does not exist.
Bulk actions
Doing more than one thing at a time
If you'd like to e.g. delete a whole bunch of elements or rename multiple elements with one request, this is the way to go.
POST '/api/bulkaction'
The request expects a diff
parameter. Such a diff is a
json object, that looks similar to the diff described here
except the from
and to
fields don't exist and
a key renamings
exists.
This example would rename the category Persons to Person, then rename the Person 'Hans' to 'Hans-Peter' and finally delete the link between 'Hans-Peter' and the Person 'Karl':
{ "insertions": { "categories": [], "elements": [], "links": [] }, "renamings": { "categories": [["Persons", "Person"]], "elements": [[["Person", "Hans"], ["Person", "Hans-Peter"]], [["Hobby", "Marwin"], ["Person", "Marvin"]]] }, "deletions": { "links": [[["Person", "Hans-Peter"], ["Person", "Karl"]]], "elements": [], "categories": [] } }
The order in which the single actions are executed is exactly as shown above:
- Insertions
- Categories
- Elements
- Links
- Renamings
- Categories
- Elements
- Deletions
- Links
- Elements
- Categories
Empty entries can also be left out. So the following diff is equivalent to the one above:
{ "renamings": { "categories": [["Persons", "Person"]], "elements": [[["Person", "Hans"], ["Person", "Hans-Peter"]], [["Hobby", "Marwin"], ["Person", "Marvin"]]] }, "deletions": { "links": [[["Person", "Hans-Peter"], ["Person", "Karl"]]] } }
If one (or more) of the specified actions fails all other actions will be dropped/rolled back as well.
Recent Changes
GET
GET /api/recently_changed?seconds=<integer>&ignore_if_all=<boolean>
This will return a list of recently changed elements and categories.
“Recently” is specified by the seconds
parameter, which can be omitted to use the server side default (which
is configured individually by the server admin).
Categories are considered to be changed if they have been added
recently.
Elements are considered to be changed if they either
- have been added recently,
- new links involving them have been added recently, or
- links involving them have been deleted recently.
The result will be list of [<category>, <element>]
pairs and category
stings, like this:
[ ["<cat_name>", "<change_elem_name>"], ["<another_cat_name>", "<another_changed_elem_name>"], "<new_cat_name>", ... ]
If ignore_if_all
is set to true
(default)
the server will return an empty result list, if all elements and all
categories are considered as recently changed. Set it to
false
to return the full list instead in such cases.
User counter
GET
GET /api/user_count
The response will contain a positive integer or zero. It is the approximated number of users currently "connected" with the server. It's basically the number of WebSockets connected to the server.
WebSocket
It's possible to get the current number of users with the WebSocket provided by the api (WebSocket > User count).
WebSocket
The API offers a Socket.IO interface.
To connect to it you may want to use an existing library for your needed
programming language, as there are some things going on between WebSocket
and Socket.IO. Different kind of events are sent over the WebSocket. Namely
diff
,
user_count
, and
settings
events (s. below). On connection a diff event will be triggered which
informs you of the current state of the database. Anyway, you can pass
an db version (s. here) on connect to only obtain
the diff from that timestamp to the current version (s.
stack overflow
or our example below on how to achieve that).
Database changes: diff
events
See Diffs > Structure on how a diff is structured.
When ever the diff
event is fired, it will pass an JSON
encoded diff of the most recent change(s).
User count: user_count
events
The value will be a positive integer (including 0). Find more information here: User count > GET.
Settings: settings
events
Whenever the settings change, this event will be fired.
See Settings > Structure on how the settings object looks like.
A JavaScript client example
Here is a draft of a JavaScript client for the Socket.IO interface:
var socket = io('http://<server>', { query: 'db_version='+'<some_version_optional>'}); socket.on('connect', function() { socket.on('diff', function(data) { diff = parseJSON(data); doSomethingWithTheDiff(diff); }); socket.on('user_count', function(data) { userCount = parseInt(data); // parseJSON(data) will work as well here updateUserCounter(userCount); }); });
Server side default settings
This whole part is not affected by the ETag specifications. There is no versioning of the settings.
The server stores default settings for the visualizations of the clients.
The client must be able to handle "missing" settings and have an own default.
GET
The settings can be queried with
GET /api/settings
It'll return a json object like described below.
PUT
PUT /api/settings
will update the settings. You must set the header
Content-Type: application/json
and send the data as described here
in the request body.
If you set null
as value, the default setting will be removed.
You can also send only a partial settings object.
For clarification: Sending an empty object ({}
) would be
a valid action as well since an empty object is a partial settings object too.
But of course this request would have no effect.
Example
Assume the server side settings look like this:
{ "sizing": { "foo": "none", "bar": "links", "fb": "links" } }
If you now PUT this data:
{ "sizing": { "foo": "links", "foobar": "none", "fb": null } }
The server side settings will look like this:
{ "sizing": { "foo": "links", "bar": "links", "foobar": "none" } }
Data structure
The full settings object looks like this:
{ "sizing": { "<cat_name_1>": <sizing_opt_1>, "<cat_name_2>": <sizing_opt_2>, ... }, "display-priority": { "<cat_name_1>": <dp_opt_1>, "<cat_name_2>": <dp_opt_2>, ... }, "display-type": { "<cat_name_1>": <dt_opt_1>, "<cat_name_2>": <dt_opt_2>, ... } }
Where each cat_name_n
may be any string identifying a category.
-
Each
sizing_opt_n
is either"none"
or"links"
.
"none"
means all elements should be displayed the same size while"links"
means that elements should be displayed bigger the more links they have.
-
Each
dp_opt_n
is an integer (positive or negative).
It can be understood as analogue to the css propertyz-index
but for the displaying order of the categories; the higher thedisplay-priority
of a category is set the more at the beginning it should be displayed. If two categories have the samedisplay-priority
it's up to the client to find an order. If nodisplay-priority
is set the client may use0
as default. -
Each
dt_opt_n
is one of"cloud"
or"links"
.
"cloud"
means that all elements should be displayed in a tag cloud style while"links"
means that the names should be treated as links (see conventions for syntax).
For PUT request only you also have the option
to set sizing_opt_n
and/or dp_opt_n
to null
.
This will cause the given setting to be removed (see Settings > PUT).
IO (Import and Export)
GET (Export)
GET /api/io?format=<string:format>
format
must be csv
or excel
.
Default is csv
.
The method will return a file as attachment. The file is called
<prefix>-<timestamp of db>.<csv or xlsx>
where
the prefix is some string depending on the server config. The table has
a header row with 4 columns: Category1,Element1,Category2,Element2
.
Each other row has either the first 1, 2, or 4 columns filled. The first rows
will contain all links (first 4 columns filled), the next rows will
contain all elements that have no links (first 2 columns filled), and
the last rows all categories without any elements (first column filled).
For more details see IO > Table Structure.
A timestamp can be specified to get a desired version of the database (see ETag). If none is provided, the latest version will be returned.
POST (Import)
POST /api/csv
Expects an argument file
that contains a multi-part
encoded file. The content will be added to the database.
Please see below on how the file has to be
structured.
Takes an optional argument format
that must be
csv
or excel
, depending on what file format
you upload. Default is guessing (inefficient).
The table header (first row) is optional. If present, it will be stripped.
This method allows you to compress the Table as rows with less than 4 columns will be padded with empty cells. Furthermore, it fills empty category cells with the last available category of the column. This, for example:
cat1,elem1,cat2,elem2 ,elem3 , , ,elem4 ,elem5,cat1,elem1
will be translated to:
cat1,elem1,cat2,elem2 cat1,elem3,cat2, cat1, ,cat2,elem4 cat1,elem5,cat1,elem1
The semantics are:
-
Add a link between
[cat1, elem1]
and[cat2, elem2]
and create categories and elements as needed. -
Add element
[cat1, elem3]
, categorycat2
, and categorycat1
if needed. -
Add category
cat1
, element[cat2, elem4]
, and categorycat2
if needed. -
Add a link between
[cat1, elem5]
and[cat1, elem1]
and create categories and elements as needed.
You might have noticed that filling up the category columns causes additional actions, but all of those actions will have no effect as the corresponding categories would have been created by a previous row either way.
Note that the first cell of the first row cannot be empty, unless the Table is entirely empty.
Table Structure
The Table has 1..n rows. The first row is the header:
Category1,Element1,Category2,Element2
Each other row has 4 columns with either
- first 4 columns filled:
<cat1>,<elem1>,<cat2>,<elem2>
Representing a link between[cat1, elem1]
and[cat2, elem2]
, - first 2 columns filled:
<cat1>,<elem1>
Representing the element[cat1, elem1]
, or - first column filled:
<cat1>
Representing category
cat1
.
Note that the import accepts a compressed form of this table as described under IO > POST (Import).
IO Format
CSV
CSV files will be handled with panda's read_csv
with all the default settings.
- Encoding: UTF-8
- Line-Ending: Unix-style (
\n
), old Mac OS (\r
), or Windows-Style (\r\n
) - Delimiter: ,
- Escape Char: None (see quoting)
- QuoteChar: "
- Quoting: Minimal
- Double Quotes: True
More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Excel
Excel import is handled with panda's read_excel
and xlrd
library as engine.
More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html