NanoDB - Nano Database Part I
I see IT developers nowadays becoming more and more completely dependent on a database, which usually requires a license when their app becomes commercial. Even though most database functionality is similar in the application, databases still differ somewhere in some details, so a “migration” from database x to database y is always required. For example, click HERE to learn more about a MySQL to Oracle RDB migration.
We know that most apps don’t fully utilize the database features provided, so it’s a “real” waste to run them on a paid database. OOPLs like Java, C#, C++, etc. don’t give you the basic database features like READ, ADD, DELETE, UPDATE, COMMIT, or ROLLBACK, except for some simple sequential reads and writes on a file. However, JAVA, C#, and C++ give you a more convenient RandomAccessFile (RAF). RAF lets you read or write “anywhere” in the file you want, but you have to be careful that the size of what you read or write is correct and accurate. To be able to perform such random accesses, Python needs to map the file contents in memory (hence: mmap). The downside of memory mapping, however, is that your app can quickly run out of memory (memory leak) if the file is very, very large.
Migrating from database X to database Y requires knowledge of both databases and maintaining both versions will not be an easy task. When working with RAFs, you only need to worry about the read/write position and the amount (size) of data you want to access. However, such work is easier to maintain and port to other OOPL than from database to database if you are a real IT developer who can invent something without screaming for a ready-to-use API or having to depend on any database.
When I was still active and needed to show a big JAVA customer how powerful JAVA can be for database development, I designed and developed a “Nano” database file for the R&D department of this big customer in Germany. The file is a key-indexed file that covers all basic DB functions like READ, ADD, DELETE, UPDATE, LOCK, UNLOCK, COMMIT and ROLLBACK as well as self-commit rollback in case of emergency (e.g. unexpected abort, power failure, etc.). Today I will give you the simplified source codes and call them the “NanoDB package” (four APIs: NanoDB, NanoDBManager, NanoDBWorker and NanoDBConnect), as well as two examples in SWING that show you how to work directly with NanoDB (NanoDBdirect) and as a client (NanoDBClient) with NanoDBServer (also in SWING).
A few words about databases in general and the NanoDB package.
When people talk about databases, they usually think of relational databases or RDBs. Before RDB, we had hierarchical databases like ADABAS from SAG. Object Oriented Databases OODB or object databases are more or less new and still in their infancy. MongoDB is one of the OODB players of the OODB era. Objects are indivisible units. Data from ODB is accessed through the object itself, while data from RDB is accessed through a unique primary key and some alternate (auxiliary) keys, which do not have to be unique. To do this, the RDB data is split into tables with rows and columns. Rows or columns are accessed through primary keys or auxiliary keys. And that’s all. Access becomes sluggish when the tables are very large. In other words, data mining with RDB is a real problem when the data is very large (numerous rows) and very diverse (numerous columns). The problems of numerous rows and countless columns do not exist with OODB data. The object is an entity and is accessed by name (similar to primary key) and its inner data is retrieved via the provided getter/setter methods. So no auxiliary key is needed and the task of data mining is simpler and easier. And we live in a world where data is produced in mass quantities on a daily basis (visual, audio, factual, document data, etc.), so we are in real trouble if we are still struggling with RDB technology.
However, OODB also has problems. Like RDB with its SQL dialects which are slightly different but also slightly incompatible. OODB access languages are the OOP languages (OOPL) like Java, C++, C#, Python etc. which provide various ways to store (“freeze”) the active objects as serialized objects. For example, Java serialized objects are not compatible in C# and vice versa. And these are also the problems of MongoDB and similar ones. To overcome the differences of serialized objects, MongoDB develops a metalanguage (similar to SQL for RDB) that allows users to describe and map OOPL objects. However, this incurs overhead due to unnecessary forward and backward mapping. And this eats performance, like SQL with numerous rows and countless columns.
Note: Since Java 11 JavaFX has been unbundled and becomes OpenJFX in close conjunction with OpenJDK and I’m tired of constantly jumping from version to version, I decide to stay with JDK 9 where most and best features (e.g. lambda expression, stream, enum switch case, etc.) are already included. Newfangled features like “var” for local variables or “return value through break keyword” and then “yield” instead of break, etc. are not so significant for me to need to run after the latest JDK version.
NanoDB was developed and implemented in JAVA-9. This means that Java serialized objects or so-called POJOs (Plain Old Java Object) are the data used and are physically stored in a Random Access File (RAF). RAF allows you to read and write anywhere randomly but precisely and accurately. As mentioned, the object is accessed by name and the inner data via implemented getters or setters or as public declared variables. Example: POJO People.java
import java.io.*;
import java.awt.*;
import java.net.URL;
import javax.swing.*;
import java.nio.file.*;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
// @author Joe T. Schwarz (c)
public class People implements java.io.Serializable {
private static final long serialVersionUID = 1234L;
public People(String name, String image) {
this.name = name;
this.image = image;
}
private String name, image;
public String toString() {
return "Name: "+name+", Image: "+image;
}
public String[] getData() {
return new String[] {name, image };
}
public void print() {
System.out.println("Name: "+name+System.lineSeparator()+", Link to Image: "+image);
}
public void picture(JFrame jf) {
(new Picture(jf)).setVisible(true);
}
class Picture extends JDialog {
public Picture(JFrame jf) {
setTitle(name);
setDefaultCloseOperation(DISPOSE_ON_CLOSE);
JButton but = getButton( );
but.setContentAreaFilled(false);
but.setBorderPainted(false);
add(but, BorderLayout.CENTER);
setLocationRelativeTo(jf);
pack();
}
//
private JButton getButton( ) {
try {
int w, h;
ByteArrayInputStream bis = null;
if (image.indexOf("://") > 0) {
byte[] buf = new byte[65536];
ByteArrayOutputStream bao = new ByteArrayOutputStream();
InputStream is = (new URL(image)).openStream();
while ((w = is.read(buf)) != -1) bao.write(buf, 0, w);
bis = new ByteArrayInputStream(bao.toByteArray());
} else if ((new File(image)).exists()) {
bis = new ByteArrayInputStream(Files.readAllBytes((new File(image)).toPath()));
} else { // something wrong with image
JButton but = new JButton("NO IMAGE");
return but;
}
BufferedImage bImg = ImageIO.read(bis);
w = bImg.getWidth(); h = bImg.getHeight();
double ratio = 1d, rw = 400d/w, rh = 300d/h;
if (rw < 1d || rh < 1d) {
if (rw < rh) ratio = rh;
else ratio = rw;
}
if (ratio < 1d) {
// resizing the image
w = (int)(w * ratio);
h = (int)(h * ratio);
Image img = bImg.getScaledInstance(w, h, Image.SCALE_SMOOTH);
bImg = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB);
bImg.getGraphics().drawImage(img, 0, 0 , null);
}
JButton but = new JButton(new ImageIcon(bImg));
return but;
} catch (Exception ex) { }
JButton but = new JButton("NO IMAGE");
return but;
}
}
}
Besides the provided methods (getters and setters), you can have some fancy methods like print() and picture() as you see below. Such functionalities are not possible with RDB.
NanoDB is the interface between physical storage and application. A database in a nutshell, if you will. The object data is indexed using an unique key and stored under the specified NanoDB name without a suffix in the specified path. If a NanoDB content is smaller than 2 MB, the entire content is cached after open(). Otherwise, data or objects are accessed and cached when needed and you can work with it in a similar way to a database. NanoDB is thread-safe, multiple-user. A NanoDB file consists of two areas: KEY area and DATA area.
Each (non)serialized object is stored in the DATA area and associated with an unique key in the KEY area. Each key entry in the KEY area consists of the length of the key, the data size (in the DATA area) and the key itself.
This format allows storing keys and objects of variable size. A key length of two bytes can have a maximum of 32736 bytes for a key, and an object size of four bytes can have a maximum of 2147483647 bytes (or two GB) for an object. However, the larger an object entry is, the slower access is. NanoDB uses RandomAccessFile to access files, which can be very, very large (maximum size: 9,223,372,036,854,775,807 bytes, or 9 exa bytes).
NanoDB API note: For performance reasons, some methods contain similar, repetitive codes that are so trivial that it is not worth turning them into a standalone method. See, for example, addObject(), deleteObject(), and updateObject().
package nanodb;
//
import java.io.*;
import java.nio.*;
import java.util.*;
import java.nio.file.*;
import java.util.concurrent.*;
import java.nio.charset.Charset;
import java.nio.channels.FileLock;
/**
Nano Database.
<br>NanoDB consists of 2 areas: key area and data or record area
<br>- The key entry in the key area has 3 fields: key length, data/record size, key itself
<br>- The data area contains the records, which can be serializable objects or strings
<br>- The key of addObject is automatically locked and must be released (unlocked) by the owner for the others
<br>- The key of deleteObject remains locked until it is released (unlocked) by the owner.
<br>Data must be committed before closed or saved. Otherwise data will be lost.
@author Joe T. Schwarz (c)
*/
public class NanoDB {
/**
constructor
@param fName String, file name
@exception Exception thrown by JAVA
*/
public NanoDB(String fName) {
this.fName = fName;
cs = Charset.forName("UTF-8");
}
/**
constructor
@param fName String, file name
@param charsetName String, character set name (e.g. "UTF-8");
@exception Exception thrown by JAVA
*/
public NanoDB(String fName, String charsetName) {
this.fName = fName;
cs = Charset.forName(charsetName);
}
/**
autoCommit (default: false)
@param auto boolean, true: always aoto-commit after delete/update/add and no rollback, false: commit needed
*/
public void autoCommit(boolean auto) {
this.auto = auto;
}
/**
getKeys() returns an ArrayList of all NanoDB keys
@return ArrayList of strings (as keys) or an empty arraylist if NanoDB must be created
*/
public ArrayList<String> getKeys() {
return new ArrayList<>(keysList);
}
/**
isAutoCommit.
@return boolean true if autoCommit is set
*/
public boolean isAutoCommit( ) {
return auto;
}
/**
lock key.
@param userID String
@param key String
@return boolean true if key is locked, false if key is already locked by other user
*/
public boolean lock(String userID, String key) {
if (lockedList.contains(key)) return false;
List<String> kList = keysLocked.get(userID);
if (kList == null) kList = Collections.synchronizedList(new ArrayList<>());
kList.add(key); // include key
keysLocked.put(userID, kList);
lockedList.add(key);
return true;
}
/**
unlock a locked key
@param userID String
@param key String
@return boolean true if key is unlocked, false if key is already unlocked or unknown
*/
public boolean unlock(String userID, String key) {
if (!lockedList.contains(key)) return false;
List<String> kList = keysLocked.get(userID);
if (kList == null || kList.size() == 0) return false;
kList.remove(key); // remove key
keysLocked.put(userID, kList);
lockedList.remove(key);
return true;
}
/**
isLocked.
@param key String
@return boolean true if key is locked
*/
public boolean isLocked(String key) {
return lockedList.contains(key);
}
/**
isExisted. Excl. deleted key
@param key String key
@return boolean true if querried key exists (deleted keys won't count)
*/
public boolean isExisted(String key) {
return keysList.contains(key);
}
/**
isKeyDeleted
@param key String key
@return boolean true if querried key is deleted
*/
public boolean isKeyDeleted(String key) {
return oCache.containsKey(key) && !keysList.contains(key);
}
/**
readObject
@param userID String
@param key String
@return byte array for (non)serializable object
@exception Exception thrown by JAVA
*/
public byte[] readObject(String userID, String key) throws Exception {
if (!keysList.contains(key)) throw new Exception("Unknown "+key);
// check for lock by other users
List<String> kList = keysLocked.get(userID);
if ((kList == null || !kList.contains(key)) &&
lockedList.contains(key)) throw new Exception(key+" is locked by other User.");
//
if (cache.containsKey(key)) return cache.get(key);
// read from file
byte[] buf = new byte[sizes.get(key)];
raf.seek(pointers.get(key));
raf.read(buf);
return buf;
}
/**
addObject with key by userID
@param userID string
@param key String
@param obj serializable object
@exception Exception thrown by JAVA
*/
public void addObject(String userID, String key, Object obj) throws Exception {
ByteArrayOutputStream bao = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bao);
oos.writeObject(obj);
oos.flush();
oos.close();
addObject(userID, key, bao.toByteArray());
}
/**
addObject with key by userID
@param userID string
@param key String
@param buf byte array for (non)serializable object
@exception Exception thrown by JAVA
*/
public void addObject(String userID, String key, byte[] buf) throws Exception {
if (keysList.contains(key)) throw new Exception(key+" exists.");
if (!auto) {
List<String> kList = keysLocked.get(userID);
if (kList == null) kList = Collections.synchronizedList(new ArrayList<>());
kList.add(key);
lockedList.add(key);
keysLocked.put(userID, kList);
oCache.put(key, new byte[] {});
} else committed = true;
cache.put(key, buf);
keysList.add(key);
}
/**
deleteObject with key by userID
@param userID String
@param key String
@exception Exception thrown by JAVA
*/
public void deleteObject(String userID, String key) throws Exception {
if (!keysList.contains(key)) throw new Exception(key+" is not existed.");
List<String> kList = keysLocked.get(userID);
if (kList == null || !kList.contains(key)) {
if (lockedList.contains(key)) throw new Exception(key+" is locked by other.");
throw new Exception(key+" is unlocked.");
}
if (!auto) {
if (cache.containsKey(key)) oCache.put(key, cache.remove(key));
else { // from file
byte[] buf = new byte[sizes.get(key)];
raf.seek(pointers.get(key));
raf.read(buf);
oCache.put(key, buf);
}
} else committed = true;
keysList.remove(key);
}
/**
updateObject
@param userID String
@param key String
@param obj serializable object
@exception Exception thrown by JAVA
*/
public void updateObject(String userID, String key, Object obj) throws Exception {
ByteArrayOutputStream bao = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bao);
oos.writeObject(obj);
oos.flush();
oos.close();
updateObject(userID, key, bao.toByteArray());
}
/**
updateObject
@param userID String
@param key String
@param buf byte array for (non)serializable object
@exception Exception thrown by JAVA
*/
public void updateObject(String userID, String key, byte[] buf) throws Exception {
if (!keysList.contains(key)) throw new Exception(key+" is not existed.");
List<String> kList = keysLocked.get(userID);
if (kList == null || !kList.contains(key)) {
if (lockedList.contains(key)) throw new Exception(key+" is locked by other.");
throw new Exception(key+" is unlocked.");
}
if (!auto) {
if (cache.containsKey(key)) oCache.put(key, cache.get(key));
else { // from file
byte[] bb = new byte[sizes.get(key)];
raf.seek(pointers.get(key));
raf.read(bb);
oCache.put(key, bb);
}
} else committed = true;
cache.put(key, buf);
}
/**
commit transaction of the given key.
@param userID String
@param key String
@return boolean true if successful
*/
public boolean commit(String userID, String key) {
List<String> kList = keysLocked.get(userID);
if (auto || kList == null || !kList.contains(key) || oCache.remove(key) == null) return false;
committed = true;
return true;
}
/**
commit all uncommitted transactions
@param userID String
*/
public void commitAll(String userID) {
List<String> kList = keysLocked.get(userID);
if (auto || kList == null || oCache.size() == 0) return;
for (String key : kList) if (oCache.remove(key) != null) committed = true;
}
/**
rollback() rollbacks the LAST modified/added action
@param userID String
@param key String, key of object to be rollbacked
@return boolean true if rollback is successful, false: unknown key, no rollback
*/
public boolean rollback(String userID, String key) {
List<String> kList = keysLocked.get(userID);
if (auto || kList == null || !kList.contains(key) || oCache.size() == 0) return false;
byte[] bb = oCache.remove(key);
if (bb == null) return false;
if (bb.length == 0) { // add
keysList.remove(key);
cache.remove(key);
return true;
}
// update or delete. Restore key if it's delete
if (!keysList.contains(key)) keysList.add(key);
cache.put(key, bb);
return true;
}
/**
rollback all uncommitted transactions
@param userID String
*/
public void rollbackAll(String userID) {
List<String> kList = keysLocked.get(userID);
if (auto || kList == null || oCache.size() == 0) return;
for (String key : kList) {
byte[] bb = oCache.remove(key);
if (bb != null) {
if (bb.length == 0) {
keysList.remove(key);
cache.remove(key);
} else { // update or delete. Restore key if it's delete
if (!keysList.contains(key)) keysList.add(key);
cache.put(key, bb);
}
}
}
}
/**
open NanoDB.
<br>If the specified fName from Constructor does not exist, it will be created with this fName.
@exception Exception thrown by JAVA
*/
public void open() throws Exception {
cache.clear();
oCache.clear();
keysList = Collections.synchronizedList(new ArrayList<String>(512));
pointers = new ConcurrentHashMap<>(512);
sizes = new ConcurrentHashMap<>(512);
// load keysList, pointers and sizes
existed = (new File(fName)).exists();
raf = new RandomAccessFile(fName, "rw");
fLocked = raf.getChannel().lock();
// start watchdog
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() { // watch for the unexpected
if (raf != null) try {
close( );
} catch (Exception ex) { }
}
});
if (!existed) return; // new NanoDB
// cached only if size < 2MB (2097152)
cached = raf.length() < 0x200000;
long pt = (long)raf.readInt();
byte[] all = new byte[(int)(pt-4)];
raf.read(all); // get the KeysList block
// keys block-format: keyLength + dataLength + key = 2+4+n bytes
for (int kl, dl, d, i = 0; i < all.length; i += (6+kl)) {
// compute key Length and Data leng
kl = (((int)(all[i]) & 0xFF) << 8) |(((int)all[i+1]) & 0xFF);
dl = (((int)(all[i+2]) & 0xFF) << 24)|(((int)(all[i+3])& 0xFF) << 16)|
(((int)(all[i+4]) & 0xFF) << 8) |(((int)all[i+5]) & 0xFF);
// cache keysList and pointers and sizes
String key = new String(all, i+6, kl, cs);
if (cached) { // cached
byte[] bb = new byte[dl];
raf.read(bb); // read
cache.put(key, bb);
}
pointers.put(key, pt);
sizes.put(key, dl);
keysList.add(key);
pt += dl;
}
}
/**
close and save NanoDB.
<br>If not autoCommit, all changes must be committed before close.
<br>Otherwise all changes will be lost (empty NanoDB file if it must be created)
@exception Exception thrown by JAVA
*/
public void close() throws Exception {
if (!committed) {
fLocked.release();
raf.close();
} else { // committed, rollback the Uncommitted
if (committed && oCache.size() > 0) { // something in oCache
List<String> keys = new ArrayList<>(oCache.keySet());
for (String key:keys) { // recover the uncommitted
byte[] bb = oCache.remove(key);
if (bb.length > 0) { // It's update
if (cache.containsKey(key)) cache.replace(key, bb);
else { // It's delete key
keysList.add(key);
cache.put(key, bb);
}
} else { // It's add
cache.remove(key);
keysList.remove(key);
}
}
}
if (cache.size() > 0) {
ConcurrentHashMap<String, Long> pts = new ConcurrentHashMap<>(pointers.size());
ConcurrentHashMap<String, Integer> szs = new ConcurrentHashMap<>(sizes.size());
String tmp = String.format("%s_tmp", fName);
if (!existed || cached) {
fLocked.release();
pts = pointers;
szs = sizes;
tmp = fName;
raf.close(); // data in cache: so delete
if (cached) (new File(fName)).delete();
}
RandomAccessFile rTmp = new RandomAccessFile(new File(tmp), "rw");
ByteArrayOutputStream bao = new ByteArrayOutputStream(65536);
FileLock fL = rTmp.getChannel().lock();
// the key block
for (String key : keysList) {
int kl = key.length();
int dl = cache.containsKey(key)? cache.get(key).length:sizes.get(key);
bao.write(new byte[] { (byte)((kl & 0xFF00) >> 8),
(byte) (kl & 0xFF),
(byte)((dl & 0xFF000000) >> 24),
(byte)((dl & 0xFF0000) >> 16),
(byte)((dl & 0xFF00) >> 8),
(byte) (dl & 0xFF)
}
);
bao.write(key.getBytes(cs));
}
bao.flush();
long pt = 4+bao.size();
rTmp.write(new byte[] {(byte)(((int)pt & 0xFF000000) >> 24),
(byte)(((int)pt & 0xFF0000) >> 16),
(byte)(((int)pt & 0xFF00) >> 8),
(byte) ((int)pt & 0xFF)
}
);
rTmp.write(bao.toByteArray());
bao.close();
// now the data block
for (String key : keysList) {
byte[] bb = cache.get(key);
if (bb == null) {
bb = new byte[sizes.get(key)];
raf.seek(pointers.get(key));
raf.read(bb);
}
// save data
rTmp.write(bb, 0, bb.length);
szs.put(key, bb.length);
pts.put(key, pt);
pt += bb.length;
}
fL.release();
rTmp.close();
if (existed && !cached) {
fLocked.release();
raf.close();
File fi = new File(fName);
fi.delete(); // delete the old
TimeUnit.MICROSECONDS.sleep(20);
(new File(tmp)).renameTo(fi);
}
}
}
keysLocked.clear();
lockedList.clear();
keysList.clear();
pointers.clear();
oCache.clear();
sizes.clear();
cache.clear();
raf = null;
}
/**
removeLockedKeys from userID
@param userID String
*/
public void removeLockedKeys(String userID) {
List<String> kList = keysLocked.remove(userID); // get locked keyList
if (kList != null) for (String key:kList) lockedList.remove(key);
}
//---------------------------------------------------------------------------------------
private ConcurrentHashMap<String, List<String>> keysLocked = new ConcurrentHashMap<>(256);
private volatile boolean existed = false, cached = false, committed = false, auto = false;
private List<String> lockedList = Collections.synchronizedList(new ArrayList<>());
private ConcurrentHashMap<String, byte[]> oCache = new ConcurrentHashMap<>(256);
private ConcurrentHashMap<String, byte[]> cache = new ConcurrentHashMap<>(256);
private ConcurrentHashMap<String, Integer> sizes;
private ConcurrentHashMap<String, Long> pointers;
private List<String> keysList;
private RandomAccessFile raf;
private FileLock fLocked;
private String fName;
private Charset cs;
}
NOTE:
- The key entry in the key area has 3 fields: key length, data/record size, key itself
- The data area contains the records, which can be serializable objects or strings
- The key of addObject is automatically locked and must be released (unlocked) by the owner for others
- The key of deleteObject remains locked until it is released (unlocked) by the owner.
- Data must be committed before closed or saved. Otherwise data will be lost.
End of Part I