WhiteDB shared memory database
==============================


Principles and goals
---------------------

WhiteDB is a lightweight database library operating fully in main memory.
Disk is used only for dumping/restoring database and logging.

Data is persistantly kept in the shared memory area: it is available simultaneously
to all processes and is kept intact even if no processes are currently using the
database.

WhiteDB has no server process. Data is read and written directly from/to memory,
no sockets are used between WhiteDB and the application using WhiteDB.

WhiteDB keeps data as N-tuples: each database record is a tuple of N elements. 
Each element (record field) may have an arbitrary type amongst the types provided
by WhiteDB. Each record field contains exactly one integer (4 bytes or 8 bytes). 
Datatypes which cannot be fit into one integer are allocated separately
and the record field contains an (encoded) pointer to the real data.

WhiteDB is written in pure C in a portable manner and should compile and function 
without additional porting at least  under Linux (gcc) and Windows 
(native Windows C compiler cl). It has Python and experimental Java bindings.

The Python bindings and their usage is explained in the separate manual
'python.txt'.

WhiteDB has several goals:

- speed
- portability
- small footprint and low memory usage
- usability as an rdf database
- usability as an extended rdf database, xml database and outside these scopes
- integration with the Gandalf rule engine (work in progress)

NOTE: The name 'wgdb' is also used in some places, such as the name of loadable
modules and libraries. In documentation it may be used interchangeable with
WhiteDB, the letter 'G' refers to the Gandalf reasoner.


Obtaining and licence
---------------------

WhiteDB releases can be obtained from http://www.whitedb.org

The development version can be obtained from the source repository:
https://github.com/priitj/whitedb

WhiteDB is licensed under GPL version 3.


Using WhiteDB in applications
-----------------------------

See 'demo.c' and 'query.c' in the 'Examples' directory of the
distribution package for complete examples of basic database usage.

Compiling and linking against WhiteDB installation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Include the API headers in your programs:

[source,C]
----
#include <whitedb/dbapi.h>
#include <whitedb/rdfapi.h> /* only for using the raptor API */
#include <whitedb/indexapi.h> /* only for using the index API */
----

- Include -lwgdb to LDFLAGS in your Makefile or linker arguments

If you used a non-standard installation prefix, using -I and -L
compiler/linker flags is required as usual.

Dynamic linking under Windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Include the API headers

[source,C]
----
#include <dbapi.h>
#include <rdfapi.h> /* only for using the raptor API */
#include <indexapi.h> /* only for using the index API */
----

This requires providing the header file directory to the compiler.

- Compile and link against the library

  cl.exe /I"..\whitedb-0.6\Db" yourprog.c ..\whitedb-0.6\wgdb.lib

This produces 'yourprog.exe' that requires 'wgdb.dll' to run.

The 'compile.bat' and Examples directory also contains compilation
examples.

Compiling with database source files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See 'Examples/compile_demo.sh' ('Examples\compile_demo.bat' under
Windows). This compiles the demo program 'demo.c' with the WhiteDB source
files.

These programs and scripts may be used as templates for creating
database applications.

Additionally, the shell script 'unite.sh' allows creation of an amalgamation,
resulting in two files: whitedb.c and whitedb.h.  
These files allow ease of use by simply including the source file (whitedb.c)
with your application, and #include'ing the whitedb.h header file, allowing
you to effectively embed whitedb without having to worry about source and
header files en masse.

Database API
-------------

The database API prototypes and macros are all found in the 'Db/dbapi.h' file.
You should include this single header file in all the files of your application
calling WhiteDB functions.

The database API has functions for:

- creating and deleting the database
- creating and deleting records
- setting and reading record fields
- encoding and decoding data stored in the record fields
- dumping and restoring database contents to/from disk
- read and write locking the database for concurrency control

It is a good idea to check the usage of API calls from the example program
'Examples/demo.c'



Preliminaries
~~~~~~~~~~~~~

All the API calls follow these principles:

- each function has a wg_ prefix.
- function names are all lower case, _ used as a separator
- each function takes the pointer to the database as a first argument.

The database pointer is obtained when creating a new database or attaching to 
an existing one.

You can have several databases open at any time:
they will simply have different pointers. Observe that the pointer
you will get from two different processes for the same database will
be different.

The record pointer is returned when creating records or when fetching query
results. This `void *` type pointer points directly to the record data in the
shared memory segment and should be used with all the functions that read or
manipulate record fields. You can also encode a record pointer and write it
into another record, forming a link between records.

All the record fields are ordinary C integers (32 or 64 bytes). 
In order to allow exact control over the integer length the datatype

`wg_int`

is used for all encoded data. This datatype is in normal usage
equivalent (typedef-d) to an int (or a 64-bit integer if the database
is configured as 64-bit).

Strings given to the API functions are ordinary 0-terminated C strings,
their length is an ordinary C string length as computed by strlen.

Checks and errors
~~~~~~~~~~~~~~~~~

WhiteDB library performs a few checks for most library operations to ensure
sanity. Checking causes a very small speed penalty and can be disabled by
setting '--disable-checking' during installation.

One of the standard checks is whether the database pointer passed as the
first argument is not NULL and the first segment of the database area
contains the specific integer indicating that the segment is actually
created as a WhiteDB database.

Whenever a field record is accessed, WhiteDB checks that the field number is
not larger than the record length. Validity checks are also performed during
data decoding and encoding.

In case WhiteDB recognizes an error, the API function called returns an error
value specified in the API doc. For example, failed record creation returns a
NULL pointer. A WG_ILLEGAL value is returned in case of encoding error, NULL
in case of string decoding errors, -1 in case of length decoding errors.

In addition to returning a specific error value, WhiteDB prints an error
message to stderr. In some cases the error message is a small error trace
through several layers of internal calls. Printing to stderr can be inhibited
by defining a macro WG_NO_ERRPRINT during WhiteDB compilation.

Notice that in error cases, nothing is printed to stdout (only stderr) and
WhiteDB does not exit: the corresponding API call returns an error value which
should be handled by the code calling the API function.


Creating and deleting the database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Functions:

[source,C]
----
void* wg_attach_database(char* dbasename, wg_int size);
void* wg_attach_existing_database(char* dbasename);
void* wg_attach_logged_database(char* dbasename, wg_int size);
void* wg_attach_database_mode(char* dbasename, wg_int size, int mode);
void* wg_attach_logged_database_mode(char* dbasename, wg_int size, int mode);
int wg_detach_database(void* dbase);
int wg_delete_database(char* dbasename);

void* wg_attach_local_database(wg_int size);
void wg_delete_local_database(void* dbase);
----

Details:

 void* wg_attach_database(char* dbasename, int size)

Returns a pointer to the database, NULL if failure. Size in bytes.
Created database is a contiguous block of shared memory of
size bytes. It cannot be shrinked or extended later.

The returned pointer should be passed to all the WhiteDB API calls as
the first parameter. 

Database name should be an integer. 
The call wg_attach_database(NULL, 0) creates a database with a 
default name ("1000") and default size 10000000 (10 megabytes).
Both defaults can be configured from 'Db/dbmem.h'.

If the size parameter is > 0, the named shared memory segment exists and
it is smaller than the given size, the call returns NULL.

NOTE: The typical default shared memory allocatable size of a linux system
is under 100 megabytes. 
You can see the allocatable size in bytes by doing
`cat /proc/sys/kernel/shmmax`.
You can set the shared memory size by becoming root and doing
`echo shared_memory_size > /proc/sys/kernel/shmmax`
where shared_memory_size is a number of bytes.

 void* wg_attach_existing_database(char* dbasename)
 
Like `wg_attach_database()`, but does not create a new database when no
database with name dbasename exists. In the latter case returns NULL.

 void* wg_attach_logged_database(char* dbasename, wg_int size)

Like `wg_attach_database()`, but starts journal logging when the
database is initialized. If the named segment already exists and does
not have logging enabled, the function returns NULL.

 void* wg_attach_database_mode(char* dbasename, wg_int size, int mode)

Like `wg_attach_database()`, but create the memory segment with the
given permissions. The parameter `mode` is the nine permission bits
(3 per user, group and others, usually given in octal. Example:
0660 gives the read-write permission to the user and group and
no permissions to others).

NOTE: read-only permissions do not work. Also, this parameter has no
effect on the Windows platform currently.

 void* wg_attach_logged_database_mode(char* dbasename, wg_int size, int mode)

Like `wg_attach_logged_database()`, but create the memory segment with the
given permissions. See the function `wg_attach_database_mode()` for details.

 int wg_detach_database(void* dbase)

Detaches a database: returns 0 if OK. 
Exiting from the process detaches database automatically.

 int wg_delete_database(char* dbasename)

Deletes a database: returns 0 if OK. 
NB! Database is not deleted unless all processes who have previously 
attached have detached from it and at least one process has made
a delete call.

 void* wg_attach_local_database(int size)

Returns a pointer to local memory database, NULL if failure. Size
is given in bytes. The database is allocated in the private memory
of the process and will neither be readable to other processes nor
persist when the process closes.

In every other aspect the database behaves similarly to a shared
memory database.

 void wg_delete_local_database(void* dbase)

Deletes a local memory database. Memory allocated for the database
will be freed.


Creating, deleting, scanning records
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Functions:

[source,C]
----
void* wg_create_record(void* db, wg_int length);
void* wg_create_raw_record(void* db, wg_int length);
wg_int wg_delete_record(void* db, void *rec);
void* wg_get_first_record(void* db);
void* wg_get_next_record(void* db, void* record);
void *wg_get_first_parent(void* db, void *record);
void *wg_get_next_parent(void* db, void* record, void *parent);
----

Details:

 void* wg_create_record(void* db, wg_int length)

Creates a new record of length length and initialises all fields
to 0 (used as a NULL value in WhiteDB).
Returns NULL when error, ptr to record otherwise.

 void* wg_create_raw_record(void* db, wg_int length)

Same as wg_create_record(), except the initial field values
are not indexed. Use together with wg_set_new_field().

NOTE: using this together with index templates has complex and probably
unexpected consequences. Not recommended.

 wg_int wg_delete_record(void* db, void *rec)

Deletes a record with a pointer rec. 
Returns 0 if OK, non-0 on error.
You should not worry about deallocation of data in the record
fields: this is done automatically.

 void* wg_get_first_record(void* db)

Returns first record pointer, NULL when error or no records available. 

 void* wg_get_next_record(void* db, void* record)

Returns next record pointer, NULL when error or no records available.
record parameter is a pointer to the (previous) record.

 void *wg_get_first_parent(void* db, void *record)

Return the first parent of the record. Record A is a parent of record B if
record A contains a link to record B. Records may have more than one parent.
Consult the section "Encoding and decoding data stored in the record fields"
on how to link records.

Returns the pointer to the first parent record or NULL if there are no
parents or when the database is not configured to track parents.

 void *wg_get_next_parent(void* db, void* record, void *parent)

Return the next parent of the record. The argument 'parent' is a record
returned by a previous call of `wg_get_first_parent()` or
`wg_get_next_parent()`. Returns NULL if there are no more parent records.

NOTE: the current implementation of this function can be slow if there are
many parents to a record.

Setting and reading record fields
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Functions:

[source,C]
----
wg_int wg_get_record_len(void* db, void* record);

wg_int wg_set_field(void* db, void* record, wg_int fieldnr, wg_int data);
wg_int wg_set_new_field(void* db, void* record, wg_int fieldnr, wg_int data);

wg_int wg_get_field(void* db, void* record, wg_int fieldnr);   
wg_int wg_get_field_type(void* db, void* record, wg_int fieldnr); 

wg_int wg_set_int_field(void* db, void* record, wg_int fieldnr, wg_int data);
wg_int wg_set_double_field(void* db, void* record, wg_int fieldnr, double data);
wg_int wg_set_str_field(void* db, void* record, wg_int fieldnr, char* data);

wg_int* wg_field_addr(void* db, void* record, wg_int fieldnr);
----


Details:


 wg_int wg_get_record_len(void* db, void* record)

Gives record length (0,...). Returns negative int when error.
 
 wg_int wg_set_field(void* db, void* record, wg_int fieldnr, wg_int data)

Sets field fieldnr value to encoded data. Field numbers start from 0.
Passed data must be 0 (NULL value) or encoded (see next chapter).
Returns negative int when err, 0 when ok.

Do not worry about deallocating earlier data in the field: this is done
automatically.

 wg_int wg_set_new_field(void* db, void* record, wg_int fieldnr, wg_int data)

Same as wg_set_field() except it can only be used to write the contents
of newly created fields that do not have values. Writing will be somewhat
faster than with wg_set_field(). It is the responsibility of the caller
to ensure that the field to be written really is one that contains no earlier
data. Use together with wg_create_raw_record().

NOTE: using this together with index templates has complex and probably
unexpected consequences. Not recommended.

 wg_int wg_get_field(void* db, void* record, wg_int fieldnr)

Returns encoded data in field fieldnr. Data should be decoded later for ordinary use,
see next chapter.

 wg_int wg_get_field_type(void* db, void* record, wg_int fieldnr)

Returns datatype in field fieldnr. Datatypes are defined by these macros, avoid
using corresponding numbers, since these may change:

[source,C]
----
#define WG_NULLTYPE 1
#define WG_RECORDTYPE 2
#define WG_INTTYPE 3
#define WG_DOUBLETYPE 4
#define WG_STRTYPE 5
#define WG_XMLLITERALTYPE 6
#define WG_URITYPE 7
#define WG_BLOBTYPE 8
#define WG_CHARTYPE 9
#define WG_FIXPOINTTYPE 10
#define WG_DATETYPE 11
#define WG_TIMETYPE 12
---- 

The following are convenience functions for common datatypes:


 wg_int wg_set_int_field(void* db, void* record, wg_int fieldnr, wg_int data)

Like wg_set_field but automatically encodes data: pass ordinary integer.

 wg_int wg_set_double_field(void* db, void* record, wg_int fieldnr, double data)

Like wg_set_field but automatically encodes data: pass ordinary double.

 wg_int wg_set_str_field(void* db, void* record, wg_int fieldnr, char* data)

Like wg_set_field but automatically encodes data: pass ordinary null-terminated string.


The following is a macro returning an address (C pointer) of a field:

  wg_int* wg_field_addr(void* db, void* record, wg_int fieldnr)

Avoid wg_field_addr in normal cases: use wg_get_field and wg_set_field instead. 
The wg_field_addr macro performs no checks whatsoever: it is useful only for achieving maximum
speed. While it is safe to read a value from the address returned, use it with extreme caution
when storing data to the field. It is OK to directly store an encoded value to the field only if it 
currently contains an immediate value (immediates are NULL, short integer, date, time, char), 
is not indexed and no logging is used. 


Encoding and decoding data stored in the record fields
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The general principle of data storage in records is that each datatype has
to be encoded before storage and decoded after reading before ordinary usage.

Data stored in the fields is deallocated automatically if not used any more
in any records. 

Hence you should not use the decoded data in your own variables after storage,
unless you are sure the corresponding records are not deleted before you are
using your variables again.

The encoding principles are following, from smallest and fastest to
largest and slowest:

- 0, small (28 bit) integers, fixpoint
  doubles, chars, dates and times 
  are stored directly in the field, no additional
  allocation is done, no special deallocation is done. 
  
- Records are encoded as an offset from the start of the shared memory segment
  to the start of the record. The encoded value is stored directly in a field.
  It can be decoded into a direct pointer to the start of the record data in
  the shared memory.

- large integers and doubles are allocated one copy per data item, in a 4 
  byte or 8 byte chunk. 
  
- Short simple strings up to 32 bytes are allocated one copy per data item,
  always 32 bytes. 
  
- Long strings, strings with added language property, xmlliterals, uris, blobs
  are kept uniquely: only one copy of each item is allocated. They are deallocated
  automatically when the reference count falls to zero (reference counting
  garbage collection is used).
  
- Long strings, xmlliterals, uris and blobs have different types (not equal even
  if they look the same when printed) and they all contain two strings:
  * main part (string, xmlliteral, uri, blob)
  * extra part (string language, xmlliteral namespace, uri prefix, blob type)
    where all these are ordinary 0-terminated C strings except blob, which is not
    0-terminated. 
  It is always possible to give a NULL value as an extra part.   
  
- Strings and blob returned by decoding strings, xmlliterals, uris and blobs
  should not be changed or used directly except for immediate copying to buffer.
  Prefer to use the decode...copy functions instead of direct decode functions
  giving a pointer to a string in the database.
  
- A WG_ILLEGAL value is returned in case of encoding error. 
  A value returned in case of decoding error is sometimes not recognizable as
  an error. In string-type value decoding NULL is returned in case of 
  decoding errors, length and date/time decoding errors return  -1.


Functions:


[source,C]
----
wg_int wg_get_encoded_type(void* db, wg_int data);
wg_int wg_free_encoded(void* db, wg_int data);

wg_int wg_encode_null(void* db, wg_int data);
wg_int wg_decode_null(void* db, wg_int data);

wg_int wg_encode_int(void* db, wg_int data);
wg_int wg_decode_int(void* db, wg_int data);

wg_int wg_encode_char(void* db, char data);
char wg_decode_char(void* db, wg_int data); 

wg_int wg_encode_record(void* db, void* data);
void* wg_decode_record(void* db, wg_int data);

wg_int wg_encode_double(void* db, double data);
double wg_decode_double(void* db, wg_int data);

wg_int wg_encode_fixpoint(void* db, double data);
double wg_decode_fixpoint(void* db, wg_int data);

wg_int wg_encode_date(void* db, int data);
int wg_decode_date(void* db, wg_int data);

wg_int wg_encode_time(void* db, int data);
int wg_decode_time(void* db, wg_int data);

int wg_current_utcdate(void* db);
int wg_current_localdate(void* db);
int wg_current_utctime(void* db);
int wg_current_localtime(void* db);

int wg_strf_iso_datetime(void* db, int date, int time, char* buf);
int wg_strp_iso_date(void* db, char* buf);
int wg_strp_iso_time(void* db, char* inbuf);

int wg_ymd_to_date(void* db, int yr, int mo, int day);
int wg_hms_to_time(void* db, int hr, int min, int sec, int prt);
void wg_date_to_ymd(void* db, int date, int *yr, int *mo, int *day);
void wg_time_to_hms(void* db, int time, int *hr, int *min, int *sec, int *prt);

wg_int wg_encode_str(void* db, char* str, char* lang);

char* wg_decode_str(void* db, wg_int data);
char* wg_decode_str_lang(void* db, wg_int data);

wg_int wg_decode_str_len(void* db, wg_int data); 
wg_int wg_decode_str_lang_len(void* db, wg_int data); 
wg_int wg_decode_str_copy(void* db, wg_int data, char* strbuf, wg_int buflen);
wg_int wg_decode_str_lang_copy(void* db, wg_int data, char* langbuf, wg_int buflen);                         

wg_int wg_encode_xmlliteral(void* db, char* str, char* xsdtype);
char* wg_decode_xmlliteral_copy(void* db, wg_int data);   
char* wg_decode_xmlliteral_xsdtype_copy(void* db, wg_int data); 

wg_int wg_decode_xmlliteral_len(void* db, wg_int data);
wg_int wg_decode_xmlliteral_xsdtype_len(void* db, wg_int data);
wg_int wg_decode_xmlliteral(void* db, wg_int data, char* strbuf, wg_int buflen);                           
wg_int wg_decode_xmlliteral_xsdtype(void* db, wg_int data, char* strbuf, wg_int buflen);                                                 

wg_int wg_encode_uri(void* db, char* str, char* nspace); 
char* wg_decode_uri(void* db, wg_int data);   
char* wg_decode_uri_prefix(void* db, wg_int data); 

wg_int wg_decode_uri_len(void* db, wg_int data);
wg_int wg_decode_uri_prefix_len(void* db, wg_int data);                          
wg_int wg_decode_uri_copy(void* db, wg_int data, char* strbuf, wg_int buflen);                                                 
wg_int wg_decode_uri_prefix_copy(void* db, wg_int data, char* strbuf, wg_int buflen);

wg_int wg_encode_blob(void* db, char* str, char* type, wg_int len);
char* wg_decode_blob(void* db, wg_int data);
char* wg_decode_blob_type(void* db, wg_int data);
wg_int wg_decode_blob_len(void* db, wg_int data);
wg_int wg_decode_blob_copy(void* db, wg_int data, char* strbuf, wg_int buflen);
wg_int wg_decode_blob_type_len(void* db, wg_int data);
wg_int wg_decode_blob_type_copy(void* db, wg_int data, char* langbuf, wg_int buflen);
                                
wg_int wg_encode_var(void* db, wg_int data);
wg_int wg_decode_var(void* db, wg_int data);
----
  
Details:


 wg_int wg_get_encoded_type(void* db, wg_int data)

Return a type of the encoded data (see the documentation for
`wg_get_field_type()`)

 wg_int wg_free_encoded(void* db, wg_int data)

Deallocate encoded data. 

You need to deallocate data if and only if you have encoded it yourself 
(not read from the field) and have not stored it into any fields.

In case the data is stored in a field, you should never deallocate it,
otherwise unexpected errors will occur.

In case a field is written over or a record is deleted, deallocation
is done automatically and properly.

 wg_int wg_encode_null(void* db, wg_int data)
 wg_int wg_decode_null(void* db, wg_int data)

Not strictly needed; encoded value 0 stands for NULL.

 wg_int wg_encode_int(void* db, wg_int data)
 wg_int wg_decode_int(void* db, wg_int data)

Encode/decode integers. Observe that shorter integers (28 bits) take
less space and are a bit faster: they are kept directly in the field.

 wg_int wg_encode_char(void* db, char data)
 char wg_decode_char(void* db, wg_int data)

Encode/decode a single char. Kept directly in the field.

 wg_int wg_encode_record(void* db, void* data)
 void* wg_decode_record(void* db, wg_int data)

Encodes/decode a pointer to the record.

 wg_int wg_encode_double(void* db, double data)
 double wg_decode_double(void* db, wg_int data)

Encode/decode ordinary doubles. Allocated separately.

 wg_int wg_encode_fixpoint(void* db, double data)
 double wg_decode_fixpoint(void* db, wg_int data)

Encode/decode doubles as small and fast fixpoint numbers.
Data must be a double between -800...800, four places after
comma are kept after rounding.

 wg_int wg_encode_date(void* db, int data)
 int wg_decode_date(void* db, wg_int data)

Unencoded date is a number of years since year 0. 
Use 1 as the first year.

Kept directly in the field.

 wg_int wg_encode_time(void* db, int data)
 int wg_decode_time(void* db, wg_int data)

Unencoded time is a number of 100-ths of a seconds
past midnight.

Kept directly in the field.

 int wg_current_utcdate(void* db)
 int wg_current_localdate(void* db)
 int wg_current_utctime(void* db)
 int wg_current_localtime(void* db)

Gives current unencoded date or time, either utc or local.

 int wg_strf_iso_datetime(void* db, int date, int time, char* buf)

Stores unencoded date and time as an iso datetime with 100-ths of seconds 
in the buf using iso format like 2010-03-31T12:59:00.33

 int wg_strp_iso_date(void* db, char* buf)
 int wg_strp_iso_time(void* db, char* inbuf)

Parses unencoded date or time from the part of iso string like
2010-03-31 or 12:59:00.33 and returns it.

 int wg_ymd_to_date(void* db, int yr, int mo, int day)
 int wg_hms_to_time(void* db, int hr, int min, int sec, int prt)

Return scalar date or time like the above ISO string parsing
functions, except the parameters are given as integer
values (for ex: 2010, 1, 7).

 void wg_date_to_ymd(void* db, int date, int *yr, int *mo, int *day)
 void wg_time_to_hms(void* db, int time, int *hr, int *min, int *sec, int *prt)

Reverse conversion functions for scalar date and time into separate
integer values.

 wg_int wg_encode_str(void* db, char* str, char* lang)
 char* wg_decode_str(void* db, wg_int data)
 char* wg_decode_str_lang(void* db, wg_int data)
 wg_int wg_decode_str_len(void* db, wg_int data)
 wg_int wg_decode_str_lang_len(void* db, wg_int data)
 wg_int wg_decode_str_copy(void* db, wg_int data, char* strbuf, wg_int buflen)
 wg_int wg_decode_str_lang_copy(void* db, wg_int data, char* langbuf, wg_int buflen)

All strings are 0-terminated standard C strings. 

Lang parameter is the extra-string which may be given 0.
Simple decode returns a pointer to the string. `wg_decode_str_copy()` copies the
string to the given buffer with a given buflen.

A WG_ILLEGAL value is returned in case of encoding error, NULL in case
of string decoding errors, -1 in case of length decoding errors.

 wg_int wg_encode_xmlliteral(void* db, char* str, char* xsdtype)
 char* wg_decode_xmlliteral_copy(void* db, wg_int data)
 char* wg_decode_xmlliteral_xsdtype_copy(void* db, wg_int data)
 wg_int wg_decode_xmlliteral_len(void* db, wg_int data)
 wg_int wg_decode_xmlliteral_xsdtype_len(void* db, wg_int data)
 wg_int wg_decode_xmlliteral(void* db, wg_int data, char* strbuf, wg_int buflen)
 wg_int wg_decode_xmlliteral_xsdtype(void* db, wg_int data, char* strbuf, wg_int buflen)

Analogous to str functions, the extra-string represents xmlliteral xsdtype,
may be NULL.

 wg_int wg_encode_uri(void* db, char* str, char* nspace)
 char* wg_decode_uri(void* db, wg_int data)
 char* wg_decode_uri_prefix(void* db, wg_int data)
 wg_int wg_decode_uri_len(void* db, wg_int data)
 wg_int wg_decode_uri_prefix_len(void* db, wg_int data)
 wg_int wg_decode_uri_copy(void* db, wg_int data, char* strbuf, wg_int buflen)
 wg_int wg_decode_uri_prefix_copy(void* db, wg_int data, char* strbuf, wg_int buflen)

Analogous to str functions, the extra-string represents uri prefix,
may be NULL.

 wg_int wg_encode_blob(void* db, char* str, char* type, wg_int len)
 char* wg_decode_blob(void* db, wg_int data)
 char* wg_decode_blob_type(void* db, wg_int data)
 wg_int wg_decode_blob_len(void* db, wg_int data)
 wg_int wg_decode_blob_copy(void* db, wg_int data, char* strbuf, wg_int buflen)
 wg_int wg_decode_blob_type_len(void* db, wg_int data)
 wg_int wg_decode_blob_type_copy(void* db, wg_int data, char* langbuf, wg_int buflen)

Analogous to str functions, except that:
- data is not 0-terminated, length must be always passed.
- the extra-string represents blob type, may be NULL

 wg_int wg_encode_var(void* db, wg_int data)
 wg_int wg_decode_var(void* db, wg_int data)

Data to be encoded is a variable identifier which is an integer. Values
up to 28 bit size may be safely used on any modern hardware.


Dumping and restoring database contents to/from disk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Functions:

[source,C]
----
wg_int wg_dump(void * db,char* fileName);  
wg_int wg_import_dump(void * db,char* fileName); 

wg_int wg_start_logging(void *db);
wg_int wg_stop_logging(void *db);
wg_int wg_replay_log(void *db, char *filename);
----

Details:

 wg_int wg_dump(void * db,char* fileName)

Dump shared memory database to the disk. If the database has journal logging
enabled, this will also restart the journal (creating a fresh journal file).
Returns 0 on success, -1 on non-fatal error and -2 on a fatal error. In case of
a fatal error, the database is in a corrupt state and should not (or cannot) be
used further.

 wg_int wg_import_dump(void * db,char* fileName)

Import database from the disk. If the database has journal logging enabled,
this will also start the journal log (creating a fresh journal file) when the
import is completed. Note that whether the journal is enabled is determined by
the *current* memory segment, not the state of the database at the moment the
dump was created.

Returns 0 on success, -1 on non-fatal error and -2 on a fatal error. In case
of a fatal error, the database is in a corrupt state. Otherwise, the
import failed (dump file not found or incompatible format), but the
memory image was not modified.

 wg_int wg_start_logging(void *db)

Start the journal log. The journal logs are created in the directory
determined at compilation time and have a name following the pattern
'wgdb.journal.<shmname>' where 'shmname' is the name of the database.
Call to this function always causes a new journal file to be created. When
a previous journal file exists at the time the journal is started, it is
backed up into a file named 'wgdb.journal.<shmname>.<serial>' where 'serial'
is the next available suffix. If there are too many backups already present,
the oldest one is overwritten instead.

Returns 0 on success, -1 when logging is already active, -2 when the function
failed and logging is not active and -3 when additionally, the log file was
possibly destroyed

NOTE: Normally, the journal is started upon the database creation by
calling `wg_attach_logged_database()` and it is not necessary to
call this function.

 wg_int wg_stop_logging(void *db)

Suspend the journal log. None of the writes by any client connected
to the database will be logged from this point. Returns 0 on success,
non-zero on failure.

NOTE: Normally it is not necessary to manually stop and start the journal.

 wg_int wg_replay_log(void *db, char *filename)

Restore the database from the journal. If logging is enabled, this function
will also suspend the journal during the restore and restart it afterwards
(creating a fresh journal file). Returns 0 on success, -1 on non-fatal error
and -2 on a fatal error. In case of a fatal error, the database is in a corrupt
state.  Otherwise, the replay failed, but the database currently in memory was
not modified.

Journal restarts and filenames
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The current journal file always has the name 'wgdb.journal.<shmname>' (for
example, 'wgdb.journal.1000'). If the database has logging enabled, all of the
writes will be recorded in that file.

Journal restarts will cause the current journal to be backed up and the
'wgdb.journal.<shmname>' file will be replaced with a fresh journal.

Example 1:

Only 'wgdb.journal.99' exists. The database is dumped to the disk, causing
a journal restart. The filenames after the restart will be:

 wgdb.journal.99 --> wgdb.journal.99.0
 a new empty journal --> wgdb.journal.99

Example 2:

The current journal is 'wgdb.journal.1000'. There is also an older backup
with the name 'wgdb.journal.1000.0'. The new filenames:

 wgdb.journal.1000 --> wgdb.journal.1000.1
 wgdb.journal.1000.0 --> wgdb.journal.1000.0 (unchanged)
 a new empty journal --> wgdb.journal.1000

Example 3:

There are 10 backups (the maximum amount the database is configured to handle).
The oldest one of them is 'wgdb.journal.1000.0', the newest one is
'wgdb.journal.1000.9'. There is also the current journal 'wgdb.journal.1000'.
After the restart, the filenames are:

 wgdb.journal.1000 --> wgdb.journal.1000.0 (overwriting the oldest backup)
 wgdb.journal.1000.1 --> wgdb.journal.1000.1 (unchanged)
 ...
 wgdb.journal.1000.9 --> wgdb.journal.1000.9 (unchanged)
 a new empty journal --> wgdb.journal.1000

Interaction between dump files and journal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the database has journal logging enabled, the latest database
state is normally recoverable by importing the latest dump (if it
exists) and replaying the journal created after that dump (or at
the initialization of the database, if there is no dump).

When dumping the database, the journal will be restarted and will
be generated into a new file. Importing a dump will also have the
same effect.

Journal replay also causes the journal to be restarted so that
the point of restore is distinguishable later. However, in this
situation, the latest database state will be represented, incrementally,
by the latest dump, the recovered journal and the new journal (until
a new dump is created).


Read and write locking the database for concurrency control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Functions:

[source,C]
----
wg_int wg_start_write(void * dbase);          /* start write transaction */
wg_int wg_end_write(void * dbase, wg_int lock); /* end write transaction */
wg_int wg_start_read(void * dbase);           /* start read transaction */
wg_int wg_end_read(void * dbase, wg_int lock);  /* end read transaction */
----

Overview
^^^^^^^^

Concurrency control in WhiteDB is achieved using a single
database-level shared/exclusive lock. It is implemented independently
of the rest of the db API (currently) - therefore use of the locking
routines does not automatically guarantee isolation.

Generally, a database level lock is characterized by very low overhead
but maximum possible contention. This means that processes should spend
as little time between acquiring a lock and releasing it, as possible.


Implementation and current limitations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are three alternative implementations.

-  Simple reader-preference lock using a single global spinlock
   (described by Mellor-Crummey & Scott '92). Reader-preference
   means that this lock can cause writer starvation. Tests have
   shown good performance under N>>P conditions (N- number of
   processes, P- number of CPU-s).

-  A writer-preference version of the spinlock.

-  A task-fair lock implemented using a queue. This lock is not
   susceptible to starvation, but has higher overhead compared to
   the spinlocks. The waiting processes are synchronized using the
   futex kernel interface.

Current limitations:

- dead processes hold locks indefinitely.
- maximum timeout with spinlocks is 2000 ms.
- the task-fair lock is only supported on Linux.

Configuration
^^^^^^^^^^^^^

By default, WhiteDB is compiled with the task-fair lock if it is available
and reader-preference spinlock otherwise. The writer-preference lock is
selected by `./configure --enable-locking=wpspin`. The reader-preference lock
is selected by `./configure --enable-locking=rpspin`.

When using manual build, the LOCK_PROTO macro in 'config.h' (or 'config-w32.h')
can be modified to select the locking method.

For plaforms that do not support the atomic operations, use 
`./configure --disable-locking` or edit the appropriate header file and
comment out the LOCK_PROTO macro. This will allow the code to compile
correctly, but the database should be used by a single user or process only.

Usage
^^^^^

Getting a shared (read) lock:

[source,C]
----
wg_int lock_id;
void *db; /* should be initialized before calling wg_start_read() */

...

/* acquire lock. This function normally blocks until the lock
 * is aquired
 */
lock_id = wg_start_read(db);
if(!lock_id) {
  /* getting the lock failed, do something */
} else {

  ... one or more database reads ...

  /* release the lock */
  if(!wg_end_read(db, lock_id)) {
    /* this is unlikely to fail, but if it does, the consequenses
     * could be severe, so this error should also be handled. */
  }
}
----

Getting an exclusive (write) lock is similar:

[source,C]
----
wg_int lock_id;

...

/* acquire lock. */
lock_id = wg_start_write(db);
if(!lock_id) {
  /* getting the lock failed, do something */
} else {

  ... one or more database write operations ...

  /* release the lock */
  if(!wg_end_write(db, lock_id)) {
    /* handle error */
  }
}
----

Porting
^^^^^^^

For platforms that do not support either GNU C or Win32 builtin functions
that implement the atomic operations in 'dblock.c', appropriate code should
be added to each of the platform-specific helper functions.

The macro _MM_PAUSE can generally be defined as empty on platforms that
do not support Pentium 4/Athlon64-specific "pause" instruction. This will not
have a significant effect (or in other words, the "pause" instruction is
only actually useful on aforementioned processor families).


Writing safely without a write lock
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Although it is in general crucial to use wg_start_write before writing any data in a 
concurrent setting, in simple special cases it is possible to safely avoid write locks 
while writing data. 

The following atomic functions all assume that the field contains an immediate value (NULL,
short integer, char, date or time), the value written is also immediate, the field is not
indexed and logging is not activated. This guarantees that no allocation operations are
performed and thus it is safe to rely on read locks (wg_start_read and wg_end_read)
while writing data:

[source,C]
----
wg_int wg_update_atomic_field(void* db, void* record, wg_int fieldnr, wg_int data, wg_int old_data);
wg_int wg_set_atomic_field(void* db, void* record, wg_int fieldnr, wg_int data);
wg_int wg_add_int_atomic_field(void* db, void* record, wg_int fieldnr, int data);
----

Details:

  wg_update_atomic_field(void* db, void* record, wg_int fieldnr, wg_int data, wg_int old_data);
  
Given the assumptions described before, write data to field which is currently contains old_data.
If the field does not contain old_data while writing, an error is generated and writing is cancelled:
this is checked by the atomic compare-and-swap operation. Returns 0 if the operation was successful.

  wg_set_atomic_field(void* db, void* record, wg_int fieldnr, wg_int data);
  
Perform the wg_update_atomic_field operation using the current value as old_data iteratively 
until it succeeds. In a normal situation the operation is expected to succeed immediately
without any iterations. All the preconditions described before are checked. 
Returns 0 if the operation was successful.

  wg_add_int_atomic_field(void* db, void* record, wg_int fieldnr, int data);
  
Increase or decrease an existing short integer value in the field by adding integer data to this value.
Performs an atomic update operation iteratively until it succeeds. In a normal situation the operation 
is expected to succeed immediately without any iterations. 
All the preconditions described before are checked. Returns 0 if the operation was successful.

The three atomic functions may return any of these errors: 

- -1 if wrong db pointer
- -2 if wrong fieldnr
- -10 if new value non-immediate
- -11 if old value non-immediate
- -12 if cannot fetch old data
- -13 if the field has an index
- -14 if logging is active
- -15 if the field value has been changed from old_data 
- -16 if the result of the addition does not fit into a smallint 
- -17 if atomic assignment failed after a large number (1000) of tries 

Semi-structured data
~~~~~~~~~~~~~~~~~~~~

[source,C]
----
wg_int wg_parse_json_file(void *db, char *filename);
wg_int wg_parse_json_document(void *db, char *buf, void **document);
wg_int wg_parse_json_fragment(void *db, char *buf, void **document);
----

Details:

 wg_int wg_parse_json_file(void *db, char *filename)

Parses JSON data from a file and insert it into the database as structured
records. The parsing is done in two passes - first a syntax checking pass and
then data insertion pass. If the JSON data is invalid, the database contents
are not modified.

Returns 0 if the data is stored successfully, -1 if there is a non-fatal (most
likely syntax) error and -2 if the storing fails and the database is
inconsistent after the function returns.

 wg_int wg_parse_json_document(void *db, char *buf, void **document)

Parses JSON data from the buffer and inserts it into the database. There is
no separate syntax checking pass so the JSON input should be valid.

Returns 0 if the data is stored successfully, -1 if there is a non-fatal error
and -2 if the storing fails and the database is inconsistent after the function
returns. If the function is successful, '**document' is set to point to the
top-level record in the structure. The resulting data is marked as a whole
document, meaning that it is can be treated as a single unit of data by
API functions that are JSON-aware.

 wg_int wg_parse_json_fragment(void *db, char *buf, void **document)

Like `wg_parse_json_document()` except the resulting data is not marked as
a whole document.

Data representation
^^^^^^^^^^^^^^^^^^^

The JSON is converted to a semi-structured schema by creating a hierarchy of
records. The top-level record in this hierarchy is additionally marked as
a document, allowing to handle JSON documents as single units of data.

Conversion of JSON to whitedb semi-structured schema:

|==========================================================================
| JSON              | whitedb                                   | top-level
| object or mapping | record containing references to key-value pairs | yes
| -                 | record containing a single key-value pair       | no
| array             | record (same length as array)                   | yes
|==========================================================================

Array record fields and value fields may contain either immediate values
or recursively other arrays or objects. The object record fields always
contain links to key-value pair records. The key-value pair record holds
the key (a string) in column 1 and the value in column 2.

Example:

The JSON '{"a" : [1, 2, 3], "b" : "c"}' would be converted to

          [ record link | record link ]*
            /                   \
    [  | "a" | record link ]  [  | "b" | "c" ]
                  /
           [ 1 | 2 | 3 ]

Where '*' marks the top-level record in the document.

Utilities
~~~~~~~~~

 void wg_print_db(void *db)

Print entire database contents in stdout, row by row.

 void wg_print_record(void *db, wg_int* rec)

Print just one row, pointed to by rec.

 void wg_snprint_value(void *db, wg_int enc, char *buf, int buflen)

Print a single, encoded value into a character buffer.

 wg_int wg_parse_and_encode(void *db, char *buf)

Parse value from a string, encode it for WhiteDB. Returns WG_ILLEGAL if value
could not be parsed or encoded. Following types are detected automatically
from the input:

- NULL - empty string
- int - plain integer
- double - floating point number in fixed decimal notation
- date - ISO8601 date
- time - ISO8601 time+fractions of second.
- string - input data that does not match the above types

Does NOT support ambiguous types:

- fixpoint - floating point number in fixed decimal notation
- uri - string starting with an URI prefix
- char - single character

Does NOT support types which would require a special encoding
scheme in string form:
record, XML literal, blob, anon const, variables

Note that double values need to have CSV_DECIMAL_SEPARATOR as the
decimal marker, independent of the system locale settings.

 wg_int wg_parse_and_encode_param(void *db, char *buf)

Like `wg_parse_and_encode()`, except the returned value is encoded as a query
parameter. Values encoded like this should be freed with wg_free_query_param()
and cannot be used interchangeably with other encoded values.


Query functions
~~~~~~~~~~~~~~~

[source,C]
----
wg_query *wg_make_query(void *db, void *matchrec, wg_int reclen,
  wg_query_arg *arglist, wg_int argc);
void *wg_fetch(void *db, wg_query *query);
void wg_free_query(void *db, wg_query *query);

wg_int wg_encode_query_param_null(void *db, char *data);
wg_int wg_encode_query_param_record(void *db, void *data);
wg_int wg_encode_query_param_char(void *db, char data);
wg_int wg_encode_query_param_fixpoint(void *db, double data);
wg_int wg_encode_query_param_date(void *db, int data);
wg_int wg_encode_query_param_time(void *db, int data);
wg_int wg_encode_query_param_var(void *db, wg_int data);
wg_int wg_encode_query_param_int(void *db, wg_int data);
wg_int wg_encode_query_param_double(void *db, double data);
wg_int wg_encode_query_param_str(void *db, char *data, char *lang);
wg_int wg_encode_query_param_xmlliteral(void *db, char *data, char *xsdtype);
wg_int wg_encode_query_param_uri(void *db, char *data, char *prefix);
wg_int wg_free_query_param(void* db, wg_int data);
----

 wg_query *wg_make_query(void *db, void *matchrec, wg_int reclen,
  wg_query_arg *arglist, wg_int argc)

Build a query using parameters in match record and argument list formats. The
match record is an array of encoded values of wg_int type. This can either be
allocated by the caller, in which case the reclen should contain the size of
the array, or point to an existing database record, in which case reclen must
be zero.

The argument list format consists of an array of:

[source,C]
----
typedef struct {
  gint column;      /** column (field) number this argument applies to */
  gint cond;        /** condition (equal, less than, etc) */
  gint value;       /** encoded value */
} wg_query_arg;
----

Available conditions are:

 WG_COND_EQUAL       =
 WG_COND_NOT_EQUAL   !=
 WG_COND_LESSTHAN    <
 WG_COND_GREATER     >
 WG_COND_LTEQUAL     <=
 WG_COND_GTEQUAL     >=

argc is the size of the array (at least 1 is required if arglist parameter
is given). The function returns NULL if there is an error, otherwise a pointer
to a query object is returned. When the query is no longer used,
wg_free_query() should be called to release it's memory.

If arglist and matchrec are NULL, the query has no parameters and will return
all the rows in the database.


 void *wg_fetch(void *db, wg_query *query)

Fetch next row from the query result. Returns a pointer to the next
row (same as `wg_get_next_record()`). Returns NULL if there are no more rows.


 void wg_free_query(void *db, wg_query *query)

Release the memory pointed to by query.


 wg_int wg_encode_query_param_*()

Family of functions to prepare the parameters for `wg_make_query()`. They
return a WhiteDB encoded value when successful or WG_ILLEGAL on failure.
Locking the database when using these functions is not required,
since they do not access shared memory.


 wg_int wg_free_query_param(void* db, wg_int data)

Free the storage allocated for the encoded data which has been prepared
with the `wg_encode_query_param_*()` family of functions. It is not advisable
to call this on data encoded with other functions.

Simplified query functions
^^^^^^^^^^^^^^^^^^^^^^^^^^

[source,C]
----
void *wg_find_record(void *db, wg_int fieldnr, wg_int cond, wg_int data,
    void* lastrecord);

void *wg_find_record_null(void *db, wg_int fieldnr, wg_int cond, char *data,
    void* lastrecord);
void *wg_find_record_record(void *db, wg_int fieldnr, wg_int cond, void *data,
    void* lastrecord);
void *wg_find_record_char(void *db, wg_int fieldnr, wg_int cond, char data,
    void* lastrecord);
void *wg_find_record_fixpoint(void *db, wg_int fieldnr, wg_int cond,
    double data, void* lastrecord);
void *wg_find_record_date(void *db, wg_int fieldnr, wg_int cond, int data,
    void* lastrecord);
void *wg_find_record_time(void *db, wg_int fieldnr, wg_int cond, int data,
    void* lastrecord);
void *wg_find_record_var(void *db, wg_int fieldnr, wg_int cond, wg_int data,
    void* lastrecord);
void *wg_find_record_int(void *db, wg_int fieldnr, wg_int cond, int data,
    void* lastrecord);
void *wg_find_record_double(void *db, wg_int fieldnr, wg_int cond, double data,
    void* lastrecord);
void *wg_find_record_str(void *db, wg_int fieldnr, wg_int cond, char *data,
    void* lastrecord);
void *wg_find_record_xmlliteral(void *db, wg_int fieldnr, wg_int cond,
    char *data, char *xsdtype, void* lastrecord);
void *wg_find_record_uri(void *db, wg_int fieldnr, wg_int cond, char *data,
    char *prefix, void* lastrecord);
----

These functions provide a simplified alternative to the query functions.

  void *wg_find_record(void *db, wg_int fieldnr, wg_int cond, wg_int data,
    void* lastrecord);

Returns the first record in the database that where "fieldnr" "cond" "data"
is true. `data` is an encoded value. `cond` is one of the conditions listed
under "Query functions".

The `wg_find_record_*()` group of functions are convinience functions for
using unencoded data directly. The user is not required to encode or free
encoded data when using these functions.

Comparison of the query interfaces
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|=========================================================================
|                                  | Full query       | simplified query
| query type                       | conjunctive      | one clause only
| without index                    | slower           | faster
| with index, fetch one row        | slower           | faster
| with index, fetch many (>5) rows | faster           | much slower
| isolation                        | "read commited"  | none
|=========================================================================

NOTE: the isolation level given here is only an approximation. Up to
"serializable" is currently possible with the use of `wg_start_*()`
and `wg_end_*()` functions, but this may become relaxed during future
development.

Child databases
~~~~~~~~~~~~~~~

Note: child db is not compiled in by default. Use `./configure --enable-childdb`
or for manual build, edit the appropriate 'config-xxx.h' file and enable the
USE_CHILD_DB macro.

 wg_int wg_register_external_db(void *db, void *extdb)

Store information in db about an external database extdb. This allows storing
data from extdb inside db. Returns 0 on success, negative on error.

 wg_int wg_encode_external_data(void *db, void *extdb, wg_int encoded)

Translate an encoded value from extdb to another encoded value which may
be stored into db. Physically the data (assuming there is any memory
allocated) continues to reside in extdb.

Child databases are databases which contain references to data (fields and
records) located in another database, called parent. The requirement is
that both the child and parent are located in the same virtual address
space. A typical scenario is that a "main" shared memory database is used
as the parent and temporary, local memory databases are created as children.

Main difference between referring to local and external data is that
external references are (intentionally) not tracked by the parent database.
This allows instantly deleting the child databases. On the other hand,
extra measures must be taken to ensure that the referenced external data
stays intact while in use by the child database. Read locking the parent
database should be sufficient there.

Typical usage scenario
^^^^^^^^^^^^^^^^^^^^^^

(assuming parent is already created) Create a child database and
assign the parent.

[source,C]
----
  childdb = wg_attach_local_database(size);
  wg_register_external_db(childdb, parentdb);
----

Use parent data in child database. Encoded data from parent
database must be re-encoded before writing it to the child database.

[source,C]
----
  tmp = wg_encode_external_data(childdb, parentdb, parentdata);
  wg_set_field(childdb, childrec, 0, tmp);
----

Free child database, when done.

[source,C]
  wg_delete_local_database(childdb);

There are three main restrictions when using external references:

- External references may not be written into shared memory databases. For this
  reason, `wg_register_external_db()` may only be called with a local (non-shared)
  database as the first argument.

- once an external database X is registered inside another database Y, the
  database Y may no longer be dumped/restored.

- A database that contains external references cannot be indexed.

Getting information about the database state
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[source,C]
----
wg_int wg_database_freesize(void *db);
wg_int wg_database_size(void *db);
----

These functions provide information about the database size and available
free space.

 wg_int wg_database_size(void *db)

Returns the total memory segment size for the database, in bytes.

 wg_int wg_database_freesize(void *db)

Returns the amount of free space in the database memory segment, in bytes.
Note that this is a conservative estimate, meaning that the actual amount
of free space may be more, but no less, than reported.


RDF parsing / exporting API
---------------------------

This API is dependent on libraptor. It is not available on Win32. When
compiling WhiteDB without autotools (using `compile.sh`) the API can be
enabled by defining HAVE_RAPTOR in 'config.h' and modifying build scripts
to link with appropriate libraries.


[source,C]
----
#include "rdfapi.h"

wg_int wg_import_raptor_file(void *db, wg_int pref_fields, wg_int suff_fields,
  wg_int (*callback) (void *, void *), char *filename);
wg_int wg_import_raptor_rdfxml_file(void *db, wg_int pref_fields,
  wg_int suff_fields, wg_int (*callback) (void *, void *), char *filename);
wg_int wg_rdfparse_default_callback(void *db, void *rec);
wg_int wg_export_raptor_file(void *db, wg_int pref_fields, char *filename,
  char *serializer);
wg_int wg_export_raptor_rdfxml_file(void *db, wg_int pref_fields,
  char *filename);
----


 wg_int wg_import_raptor_file(void *db, wg_int pref_fields, wg_int suff_fields,
  wg_int (*callback) (void *, void *), char *filename)

Imports RDF file. Creates records with length = pref_fields + 3 + suff_fields.
The data will be stored as follows:

| pref_fields .. | predicate | subject | object | suff_fields |

The file type is determined automatically from filename. Callback function
should match the prototype of `wg_rdfparse_default_callback()` and can be used
to calculate contents of fields other than the RDF triple.

 wg_int wg_import_raptor_rdfxml_file(void *db, wg_int pref_fields,
  wg_int suff_fields, wg_int (*callback) (void *, void *), char *filename)

As above, but file type is assumed to be RDF/XML

 wg_int wg_rdfparse_default_callback(void *db, void *rec)

Does nothing. Called when importing rdf files with the 'wgdb' commandline tool.
May be modified to add field initialization functionality to commandline
importing.

 wg_int wg_export_raptor_file(void *db, wg_int pref_fields, char *filename,
  char *serializer)

Export triple data to file. The format is selected by the raptor serializer
(more info about serializers can be found at http://librdf.org/raptor/.
There is also serializers enumeration function in libraptor API). The
pref_fields parameters marks the start position of the triple in
WhiteDB records (storage schema is assumed to be the same as described
above for wg_import_raptor_file() function).

 wg_int wg_export_raptor_rdfxml_file(void *db, wg_int pref_fields,
  char *filename)

Export triple data to file in RDF/XML format.


Index API
---------

[source,C]
----
#include <whitedb/indexapi.h>

wg_int wg_create_index(void *db, wg_int column, wg_int type,
  wg_int *matchrec, wg_int reclen);
wg_int wg_drop_index(void *db, wg_int index_id);
wg_int wg_column_to_index_id(void *db, wg_int column, wg_int type,
  wg_int *matchrec, wg_int reclen);
wg_int wg_get_index_type(void *db, wg_int index_id);
void * wg_get_index_template(void *db, wg_int index_id, wg_int *reclen);
void * wg_get_all_indexes(void *db, wg_int *count);
----

Index API header exposes functions to create and drop indexes.

 wg_int wg_create_index(void *db, wg_int column, wg_int type,
  wg_int *matchrec, wg_int reclen)

Create an index on column. Index type must be specified. Currently
supported index types:

 WG_INDEX_TYPE_TTREE - T-tree index on single column

If matchrec is NULL, a normal index is created. If matchrec is non-null,
the index will be created with a template. In this case reclen must specify
the length of the array pointed to by matchrec. If an index has a template,
only records that match the template are inserted into the index. Wildcards
in the template are specified using WG_VARTYPE values.

This function returns 0 if successful and non-0 in case of an error.

 wg_int wg_drop_index(void *db, wg_int index_id)

Delete the specified index.

Returns 0 on success, non-0 on error.

 wg_int wg_column_to_index_id(void *db, wg_int column, wg_int type,
  wg_int *matchrec, wg_int reclen)

Find an index on a column. If type is specified, the first index with
a matching type is returned. If type is 0, indexes of any type may be
returned.

If matchrec is non-NULL and WhiteDB is configured with USE_INDEX_TEMPLATE
option, the provided match record will be used to locate an index with
a specified template. If matchrec is NULL, this function finds a full
index.

Returns an index id on success. Returns -1 on error.

 wg_int wg_get_index_type(void *db, wg_int index_id)

Finds index type.

Returns type (>0) on success, -1 if the index was not found.

 void * wg_get_index_template(void *db, wg_int index_id, wg_int *reclen)

Finds index template.

Returns a pointer to the gint array used for the index template. reclen is set
to the length of the array. The pointer may not be freed and it's contents
should be accessed read-only.

If the index is not found or has no template, NULL is returned. In that case
contents of *reclen are unmodified.

 void * wg_get_all_indexes(void *db, wg_int *count)

Returns a pointer to a NEW allocated array of index id-s. count is initialized
to the number of indexes in the array.

Returns NULL if there are no indexes.


Examples
~~~~~~~~

Create a T-tree index on a column conditionally:

[source,C]
----
  if(wg_column_to_index_id(db, col, WG_INDEX_TYPE_TTREE, NULL, 0) == -1) {
    if(wg_create_index(db, col, WG_INDEX_TYPE_TTREE, NULL, 0)) {
      printf("index creation failed.\n");
    } else {
      printf("index created.\n");
    }
  }
----

Create an index on column 0 that only contains rows where the 2-nd column
is equal to 6 (requires that WhiteDB is compiled with USE_INDEX_TEMPLATE
defined in config.h):

[source,C]
----
  wg_int matchrec[3];
  matchrec[0] = wg_encode_var(db, 0);
  matchrec[1] = wg_encode_var(db, 0);
  matchrec[2] = wg_encode_int(db, 6);
  if(wg_create_index(db, 0, WG_INDEX_TYPE_TTREE, matchrec, 3)) {
    printf("index creation failed.\n");
  }
----

Delete all indexes in the database that have a template:

[source,C]
----
  wg_int *indexes = wg_get_all_indexes(db, &count);
  for(i=0; i<count; i++) {
    wg_int index_id = indexes[i];
    int len;
    void *tmpl = wg_get_index_template(db, index_id, &len);
    if(!tmpl) {
      printf("%d had no template\n", index_id);
    } else {
      wg_drop_index(db, index_id);
      printf("dropped %d\n", index_id);
    }
  }
  free(indexes);
----
