the DistributedDB.net Wiki

Main Page

This Page is locked
Modified: 2008/06/30 01:29 by jdavis - Categorized as: Home
Edit

Welcome to the DistributedDB.net Wiki!



This is a work in progress, currently in early planning phases. Visit Jon's blog to learn about the initial ideas for this project.



DistributedDB.net is a project to provide an Internet-distributed virtual database system for generic, public-definable, distributed data in RDBMS-style tabular format.

The premise of DistributedDB.net (iDDB) is a service and software solution that:

  1. Allows people to declare public schemas,
  2. Allows services to "subscribe" to CRUD operations on the schemas using standard APIs such as REST, and
  3. Optionally implements a first subscription to CRUD operations.

See this early concept in introductory detail: The Three Parts of DistributedDB.net

CRUD operations are strictly virtualized and authenticated events that propogate to subscribers that might update data stores but might execute SaaS methods. Similarly, CRUD operation queries can be redirected, such as to allow a geographically or contextually optimized data store to take over the operation with the client before it becomes optionally replicated.

In introductory detail: Options Of A Schema

The objectives for this project include:

  • To optionally distribute data to multiple servers for redundancy and/or performance such as geographically targeted multi-homing of data services
  • To optionally replicate public data to archive services that would consistently maintain data even if another data host abandons the sharing of data services
  • To genericize the process of performing CRUD operations with data services, making mySQL, Oracle, MSSQL, XML, and other data store solutions a moot discussion by completely abstracting them with generic, open-source querying APIs.

Despite the brief descriptions of this project, the target "audience" is not enterprise applications, rather it is the "public web". The end mission is to allow tabular data to be "posted to the iDDB" in a similar way as discussion newsgroup posts are posted on Usenet.


Edit

Other Distributed Database Systems

Systems worth noting: MapReduce Hadoop

Distributed database technology is a vaguely-defined science that is relatively new to database computing due to the young age of the Internet itself. The vagueness is also due to the fact that requirements for distributed databases vary from one application to the next. Enterprise applications might define distributed databases as being synonymous with grid computing, which is inherently supported by some data store implementations such as mySQL. These systems are also briefly researched and documented in various resources:


But there are other, less stringent definitions of distributed databases, such as peer-to-peer networks.

Internet Systems

Technically, the Internet operates entirely on distributed databases. Look at DNS. It's a distributed database system that can be private but happens to have a public network, but it is limited to IP host, namespace, and service identities.

NNTP is also a distributed database system that happens to have a public network called Usenet, but it is limited to discussion data.

The World Wide Web as an Internet application itself is a non-replicating distributed database, using URI as a naming convention for data identifiers, with the whole of the Internet as a common network (thanks to URI, which is also supported by so many other Internet apps), but it has no structure but HTML (and all the other technologies that go with it).

RSS & Atom, the whole semantic web idea, is a distributed database infrastructure, but the data is very specfically tied to an application (syndication), and here we have a distributed database that has no common network. Instead, we have lots of small networks, with lots of "hubs" like Ping servers.

Peer to Peer applications like the original Napster, Kazaa, et al, are definitely distributed databases, but are intended for fetching single binary objects like multimedia files, and execute on networks that are proprietary.

BitTorrent is a distributed database system similar to Kazaa, which, like RSS & Atom, hijacks multiple networks such as the World Wide Web for distribution seeding, but it is only for fetching single very large binary objects, and it is also a proprietary, trademarked application despite its popularity and near-ubiquitous adoption. The Azureus network, which is essentially BitTorrent with a popular face, has a wiki that has a notable definition of the distributed database implementation that it runs on.

On The Need For Public Distributed Tabular Data

There are tens of thousands of Internet services for data storage and querying, but there is still no public network for tabular data. So the idea for DistributedDB.net was to push for the idea of, heck, why can't any Joe Schmoe just throw something out there on the Internet's distributed database, in his own little table, and be able to call on it and allow others to call on it, without actually hosting it, in the same way one can do as much with Usenet or P2P networks? Specifically this is about structured, tabular data--strings, integers, records, tables, the kind of data you see in an RDBMS rather than a binary download.

That is the idea of DistributedDB.net. It will implement a seeds database for various iDDB-supporting applications. Those applications can then share tabular data in a similar fashion as Usenet or BitTorrent — less peer-level than BitTorrent (nodes would consist of services rather than personal computers) but less human-conversation-oriented than Usenet.

In fact, theoretically, the end-result network applications of Usenet (NNTP), Azureus (BitTorrent), E-mail (SMTP), RSS feeds and blog pings, and more, can be easily reimplemented using the iDDB infrastructure. It probably would not be as performant, but it would theoretically be far more manageable at each layer of implementation.

ScrewTurn Wiki version 2.0.12. Some of the icons created by FamFamFam.
 

This site is hosted on port 880 temporarily, to host it on a free Internet connection that blocks port 80. It will be moved to a stable port-80 server when this project gains some momentum.