Welcome!

Dave Jilk

Subscribe to Dave Jilk: eMailAlertsEmail Alerts
Get Dave Jilk via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Cloud Computing, Azure Cloud on Ulitzer, Platform as a Service, Salesforce.com Journal, Google App Engine

Article

Different Approaches to Databases in PaaS: A Round-Up

Most PaaS offerings provide some kind of database services

In an application deployed directly on IaaS, you know and control everything about the database; in a SaaS application you know little and control nothing.

But how does it work in PaaS?

Since a PaaS is essentially a container that runs application code, and virtually every application requires a persistent data store, most PaaS offerings provide some kind of database services. Not surprisingly, Resource PaaS offerings most closely resemble SaaS in that they hide more deployment details, while Server PaaS offerings are more flexible but potentially more complex. (For more on Resource PaaS vs. Server PaaS, see Keys to the PaaSing Game: Multi-Tenancy.) What is surprising is that some Resource PaaS offerings use a proprietary and non-standard database. Let's take a closer look at how several of the leading Platform-as-a-Service offerings handle databases and file system access for applications running on them.

Google's AppEngine
The Google AppEngine PaaS provides permanent storage only through a system called "BigTable," a highly scalable non-relational data store, accessed through a limited SQL-like language called GQL. Because AppEngine only allows applications to use its provided API libraries, using a third-party database or database service is possible but awkward; the code must funnel requests through a URL-fetching API that is subject to various limitations. Although there is no direct access to a filesystem, file-like storage can be achieved through a specialized API built on BigTable. Recognizing that the lack of a relational database has been slowing the adoption of AppEngine, Google released a preview version of Google Cloud SQL in October 2011. Cloud SQL is a database-as-a-service (DBaaS) based on MySQL, offering nearly all MySQL commands, except those involving files. It automatically handles its own variant of geographic replication, but does not allow MySQL replication. During the preview, Cloud SQL databases are limited to 10 GB in size, and can only be accessed through AppEngine applications. The former constraint seems likely to be relaxed when the preview period ends; it is not clear if - or when - the latter will be.

Force.com
Salesforce.com's Force.com also provides a built-in non-relational object database, which was recently split out into a DBaaS called database.com. Database.com is accessed through the proprietary Apex language from within Force.com, or via API calls from outside, using a SQL-like language called SOQL. Configuration is limited to tables and fields only. Force.com applications can only access external databases or file stores through HTTP callouts, which are subject to certain limitations. It seems unlikely that Salesforce.com will offer a relational data store with Force.com, since they have long claimed that their approach is superior. Nevertheless, this limits the customer's ability to port applications in or out, find developers experienced in the approach, or build capabilities that require true relational power.

Microsoft Azure
Microsoft Azure offers access to the SQL Azure DBaaS, a high-performance, scalable, and fully managed service. Azure applications cannot access external databases, and external applications cannot access SQL Azure; however, SQL Azure databases can be synchronized with on-premises SQL Server databases. Developers can only access database-level configuration, and while there is no filesystem access, file-like storage is available in Azure through a blob-based emulation system. Like AppEngine and Force.com, Azure is something of a walled garden, but in Microsoft's garden at least you have all the basic food groups.

Heroku
Heroku
treats the database as distinct from the application container, allowing the application to use any database or database service. It also provides a built-in DBaaS, freeing developers from the need to provision and manage the database deployment. The Heroku DBaaS is based on PostgreSQL, can be operated in a shared or dedicated mode, and can be accessed from external applications as well. Developers have unlimited access to the database (via a database client command line), but no access to database deployment configuration settings or versions, in keeping with the Resource PaaS approach. For similar reasons, Heroku does not allow applications to write to the filesystem; a third-party service or a database-mapped approach is required.

CloudFoundry.com
CloudFoundry.com
has a DBaaS that supports four different databases natively: MySQL and PostgreSQL for relational, and MongoDB and Redis for NoSQL. Developers have access only to the database and not to its configuration settings. External databases can be accessed via APIs using an HTTP proxy; in the future, a Service Broker is promised that will enable more direct access to external databases and other services. With the recently added Caldecott tool, customers can also tunnel into their CloudFoundry.com database service from the outside. Although it is possible to write to the filesystem in CloudFoundry.com, the files are actually ephemeral and should not be relied upon as a data store. Note, by the way, that CloudFoundry.com is a service that uses the Cloud Foundry open source software as one of its foundational technologies, but the two are distinct.

Server PaaS Options
Server PaaS offerings, like AWS Elastic Beanstalk, RightScale, Engine Yard, and Standing Cloud, generally have fewer constraints on the database to be used by the application. Filesystem access by the application is unimpaired in all of these Server PaaS offerings.

Both Engine Yard and Standing Cloud automate deployment and management of supported databases, primarily MySQL or PostgreSQL of particular versions. Both also allow database-level and server-level configuration changes, with certain limits. Engine Yard relies on Chef for database deployment, so configuration changes must be performed with a Chef recipe or by working around it. Standing Cloud requires that configuration changes be embedded in a post-deployment script and that the basic deployment structure (filenames and directory paths) remains unchanged. Unsupported databases can also be used by applications running in Engine Yard or Standing Cloud, but they must be deployed and managed separately.

Elastic Beanstalk always requires the developer to deploy and manage the database separately. This can be done within Amazon Web Services using their RDS (Relational Database Service), but it is a separate step that is outside of the PaaS proper. RightScale offers tools (server templates and scripts) that make deployment, integration, and management of the database easier and repeatable, but generally you are on your own. As compensation for the extra effort, of course, you gain complete flexibility.

You may have noticed that I covered these offerings in order of increasing flexibility. When you reached the point in this article where the constraints of the PaaS did not seem too onerous, that suggests a good place to start if you are considering building or moving your application to the cloud. Keep in mind that you may also have different needs for development and test than you do for production - in some cases, scale is crucial; in others, control - and these needs change over time. Because of the inevitability of change, flexibility is important. That's why I recommend that you avoid building an application using a database environment where you would be permanently locked in.

More Stories By Dave Jilk

Dave Jilk has an extensive business and technical background in both the software industry and the Internet. He currently serves as CEO of Standing Cloud, Inc., a Boulder-based provider of cloud-based application management solutions that he cofounded in 2009.

Dave is a serial software entrepreneur who also founded Wideforce Systems, a service similar to and pre-dating Amazon Mechanical Turk; and eCortex, a University of Colorado licensee that builds neural network brain models for defense and intelligence research programs. He was also CEO of Xaffire, Inc., a developer of web application management software; an Associate Partner at SOFTBANK Venture Capital (now Mobius); and CEO of GO Software, Inc.

Dave earned a Bachelor of Science degree in Computer Science from the Massachusetts Institute of Technology.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.