May 18 2010

Amazon RDS – Multi-AZ Deployments For Enhanced Availability & Reliability

“Amazon RDS simplifies many of the common tasks associated with the deployment, operation, and scaling of a relational database. You don’t have to worry about acquiring and installing hardware, loading an operating system, installing and configuring MySQL, or managing backups. In addition, scaling the processing power or storage space available to your database is as simple as an API call.

When we rolled out Amazon RDS last October, we also announced plans to have a “High Availability” option in the future. That option is now ready for you to use, and it’s called “Multi-AZ Deployments.” AZ is short for “Availability Zone”; each of the four AWS Regions is comprised or two or more such zones, each with independent power, cooling, and network connectivity.

[...]

t is really easy to benefit from the enhanced availability and data durability provided by a DB Instance deployment that spans multiple Availability Zones. All you need to do is supply one additional parameter to the CreateDBInstance function and Amazon RDS will take care of the rest.

To be more specific, when you launch a DB Instance with the “Multi-AZ” parameter set to true, Amazon RDS will create a primary in one Availability Zone, and a hot standby in a second Availability Zone in the same Region. Data written to the primary will be synchronously replicated to the standby. If the primary fails, the standby becomes the primary and a new standby is created automatically. Amazon RDS automatically detects failure and takes care of all of this for you. The entire failover process takes approximately about five minutes. In addition, existing standard DB Instance deployments can be converted to Multi-AZ deployments by changing the Multi-AZ parameter to true with the ModifyDBInstance function (a hot standby will be created for your current primary).

When automatic failover occurs, your application can remain unaware of what’s happening behind the scenes. The CNAME record for your DB instance will be altered to point to the newly promoted standby. Your MySQL client library should be able to close and reopen the connection in the event of a failover. If your application needs to know that a failover has occurred, you can use the function to check for the appropriate event.

If you have set up an Amazon RDS DB Instance as a Multi-AZ deployment, automated backups are taken from the standby to enhance DB Instance availability (by avoiding I/O suspension on the primary). The standby also plays an important role in patching and DB Instance scaling. In order to minimize downtime during planned maintenance, patches are installed on the standby and then an automatic failover makes the standby into the new primary. Similarly, scaling to a larger DB Instance type takes place on the standby, followed by an automatic failover.

Multi-AZ deployments also offer enhanced data protection and reliability in unlikely failure modes. For example, in the unlikely event a storage volume backing a Multi-AZ DB Instance fails, you are not required to initiate a Point-in-Time restore to the LatestRestorableTime (typically five minutes prior the failure). Instead, Amazon RDS will simply detect that failure and promote the hot standby where all database updates are intact.
Putting it all together, this new feature means that your AWS-powered application can remain running in the face of a disk, DB Instance, or Availability Zone failure. Once again, you can focus on your application and let AWS handle the “dirty work” for you.

While you cannot use the synchronous standby in a Multi-AZ deployment to serve read traffic, we are also working on a Read Replica feature. This feature will make it easier to take advantage of MySQL’s built-in asynchronous replication functionality if you need to scale your read traffic beyond the capacity of a single DB Instance. You’ll be able to provision multiple “Read Replicas” for a given source DB Instance.

– Jeff;”

Source


May 6 2010

A Few Reasons to Switch to PostgreSQL 9.0

PostgreSQL is widely considered to be the main open source competitor to MySQL. It’s been five years since the release of PostgreSQL 8.0, but this week, PostgreSQL developers showed that their project has evolved significantly in that time. The beta release of PostgreSQL 9.0 has burst onto the scene with handsome new features such as Hot Standby and Streaming Replication, which are making it even more lucrative to switch from MySQL to PostgreSQL.

1. Hot Standby
This feature allows connection to a server that is in archive recovery mode. Users with a server in recovery mode can still process read-only queries and move to normal operations without disconnecting. There are only a few usage and administrative differences when a server is running queries in recovery mode. Hot Standby greatly improves tasks such as log shipping replication and precise restoration of a backup state.

2. Streaming Replication
Streaming replication allows a standby server to connect to the primary server and receive a stream of WAL (Write Ahead Log) records as they are generated, instead of having to wait for those records to be written to disk and picked up later. This allows the standby to be more up to date than it is with file-based log shipping. The streaming replication is asynchronous, so there’s a slight delay between committing a transaction in the primary and seeing it in the standby, but it’s much shorter than with file-based shipping. archive-timeout is no longer required to reduce the data loss window thanks to streaming replication.

3. A Few Other Features
The LISTEN/NOTIFY events in PostgreSQL 9.0 have been moved from a system table to a memory queue for better performance. NOTIFY is now able to pass an optional string to listeners. PostgreSQL 9.0 also includes SQL compliant per-column triggers and anonymous functions using the DO statement. Server side language support has been enhanced as well.
EXCLUDE can now be used in the CREATE TABLE statement as a non-traditional constraint. In the documentation you can find an example of how EXCLUDE can ensure that no two records contain overlapping circles.

4. More in Store
Where a project is going is just as important (if not more important) than where it currently is, if you’re considering migration to that project. A blog from Robert Haas indicates that the PostgreSQL project will continue adding features that distinguish it from competitors. There are several big patches planned for versions 9.1, 9.2, and beyond. KNNGIST will allow the use of indices to accelerate ORDER BY queries. Partitioning support will be improved with built-in syntax to reduce the amount of manual setup required. Another feature will be index-only scans, which reduce I/O for index scans by intelligently skipping or delaying tuple visibility checks.

Source


Apr 20 2010

NoSQL Needed For Cloud-Sized Data

At the Under the Radar showcase for cloud start-ups, I was struck by how relational database, one of the defining technologies of a previous era, has become outmoded in this one. In example after example, it was obvious SQL and structured data tables are no longer the right way to go about handling data.

That statement has to do with a particular type of data, the kind that gets generated copiously in a day’s activity on the Internet. Each day sees 15 million tweets, 60 million Facebook updates and 1.6 billion people active online in a variety of other ways. It’s hard for relational systems to keep up. Relational systems have to work hard at decomposing this data, storing it in tables and building indexes on it — they work so hard on it that you don’t really want your system to undertake the task. It’s too expensive.

“When you scale up relational systems, you introduce single points of failure… You lose the advantage of their precision but you gain the overhead,” as you try to make the system work on a larger and larger data set, said John Quinn, VP of engineering at Digg, the social networking site, and lead off speaker at the Under the Radar’s cloud event April 16 on the Microsoft Campus in Mountain View, Calif.

Those NoSQL systems you’ve been hearing about, on the other hand, scale out by distributing their operations across more nodes in a server cluster. “There’s nothing wrong with relational database…You just need to use the right tool for the right job,” Quinn said, throwing in the fact that NoSQL stands for “Not Only SQL,” although there were a few knowing smiles at that one.

Quinn is a leading member of the generation that doesn’t want to try to capture terabytes of data with relational systems. He prompted the changeover from the MySQL open source relational database at the social networking site, Digg, to Cassandra, a key value store system. Cassandra performs many of the data sorting operations of a relational database but allows data reads to be done in advance of full updates. The practice sometimes leads to momentary consistency problems, since one user of the data might get a version that differs slightly from the next one, although both sought identical sets.

The large, distributed key value store system “sacrifices consistency to slave lag,” or tolerates the lapse between when an update occurs on a distributed node and when it’s replicated on other servers. In most NoSQL systems, assured consistency is less an issue — and less a virtue — than in relational systems.

The NoSQL approach allows “tune-able consistency. You can trade off consistency for speed,” Gunn noted.

Because a server in a NoSQL system automatically creates duplicates of the data on at least one other node, a server in the cluster can fail and no data is lost, the NoSQL system keeps processing and an application keeps running. In addition to Cassandra, MongoDB, Voldemort, and CouchDB are NoSQL systems in the public arena. Google and Amazon operate their own internally.

Gunn did implicitly point to a potential NoSQL shortcoming. Although indexes are associated with relational systems, if you do need an index, you may need an external system to build it. So far, the NoSQL systems have only rudimentary indexing.

That’s why the NoSQL enthusiasts say their systems are not for financial or other time-sensitive transactions. Relational systems are. On the other hand, if you’re updating your Zynga Farmville plot, then Cassandra makes a lot of sense for capturing that information.

Of 24 companies presenting at this event, six had a big data handling, analytics or storage systems in mind. They included Sones, Cloudant, GenieDB, GoodData, neotechnology and Maxiscale.

Each start-ups presented their business and product plans in six minutes at the event, then faced questioning from a three-judge panel of reviewers.

Source


Apr 12 2010

Sun MySQL Head Joins EnterpriseDB

Karen Tegan Padir, the head of MySQL development at Sun Microsystems, did not join Oracle when Sun was merged into the database company at the end of February. Instead, she joined another open source database company, EnterpriseDB.

Padir was the figure who, in the midst of Oracle takeover skepticism, stood up and said Oracle had established a record of dealing with open source code communities and projects and MySQL users were going to have to keep their trust in Oracle. That was during the MySQL annual user group meeting in April 2008. Sometime later, as the long takeover process reached its completion, she packed her bags and headed east.

EnterpriseDB is located in Westford, Mass., and Padir’s family lives nearby. EnterpriseDB is a commercial product based on the PostgreSQL open source database system, a system that is more ANSI SQL compliant than MySQL. It is one of the oldest open source projects, with a wide body of contributors and committers, but it never achieved the marketplace acceptance of the easier-to-use MySQL.

“Step one will be to broaden its appeal,” Padir said in an interview Friday. “Postgres Plus works really well in both a transactions environment and a Web serving environment,” she said.

She said EnterpriseDB is increasing its commitment to the open source project by hiring a fifth committer, Robert Haas. EnterpriseDB already employs Bruce Momjian, lead coordinator of the project, and Dave Page, a leading architect and committer.

Padir will be head of engineering and product marketing for EnterpriseDB’s Postgres Plus. She formerly headed up Sun’s development and marketing of Sun’s market-leading identity management server, Glassfish, and other Java middleware as VP of MySQL and software infrastructure at Sun.

Source


Mar 22 2010

Exploiting hard filtered SQL Injections

While participating at some CTF challenges like Codegate10 or OWASPEU10 recently I noticed that it is extremely trendy to build SQL injection challenges with very tough filters which can be circumvented based on the flexible MySQL syntax. In this post I will show some example filters and how to exploit them which may also be interesting when exploiting real life SQL injections which seem unexploitable at first glance.

Source


Mar 11 2010

Saying Yes to NoSQL; Going Steady with Cassandra

The last six months have been exciting for Digg’s engineering team. We’re working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we’re also rolling out a new client and server architecture. And if that doesn’t sound like a big enough challenge, we’re replacing most of our infrastructure components and moving away from LAMP.

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who’s been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.
What’s Wrong with MySQL?

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

Relational database technology can be a blunt instrument and we’re motivated to find a tool that matches our specific needs closely. Our domain area, news, doesn’t exact strict consistency requirements, so (according to Brewer’s theorem) relaxing this allows gains in availability and partition tolerance (i.e. operations completing, even in degraded system states). We’re confident that our engineers can implement application level consistency controls much more efficiently than MySQL does generically.

As our system grows, it’s important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.
Choosing an Alternative

Digg is committed to the use and development of open source software and we’re keen to avoid the cost of proprietary large-scale storage solutions. We were inspired by Google and Amazon’s broad use of their non-relational BigTable and Dynamo systems. We evaluated all the usual open source NoSQL suspects. After considerable debate, we decided to go with Cassandra.

Simplistically, Cassandra is a distributed database with a BigTable data model running on a Dynamo like infrastructure. It is column-oriented and allows for the storage of relatively structured data. It has a fully decentralized model; every node is identical and there is no single point of failure. It’s also extremely fault tolerant; data is replicated to multiple nodes and across data centers. Cassandra is also very elastic; read and write throughput increase linearly as new machines are added.

We experimented on our live site, replacing a relatively high scale MySQL component with a Cassandra alernative. These tests went well. You can read more about these experiments here.
Where We Are

At the time of writing, we’ve reimplemented most of Digg’s functionality using Cassandra as our primary datastore. We’ve supplemented Cassandra-based indexing using full text, relational and graph indexing systems. We’re getting used to dealing with eventual consistency.

We’ve been working on Cassandra itself too. We’ve made massive performance improvements: increased comparitor speed, added better compaction threading, reduced logging overhead, added row-level caching and implemented multi-get capability. We’ve also implemented native atomic counters using Zookeeper (you can probably guess why were motivated to add that feature :)

We’ve tested and improved the operational capabilities of Cassandra, upgrading its Rackaware capability, added slow query logging, improved the bulk import functionality and implemented Scribe support for improved logging. We’ve also done a ton of operational testing.

We’re open sourcing all our work on Cassandra.
What’s Next?

Currently our main focus is getting Digg’s latest release into general availability, but we’ll continue to lead the way in championing Cassandra’s development and adoption.

If you’re interested in joining a world-class team using cutting edge, NoSQL technology at scale, check out http://jobs.digg.com

Take it easy,
John Quinn. VP Engineering. (Digg: doofdoofsf, Twitter: doofdoofsf)

Source


Mar 9 2010

Cloud Connect: A Convergence Of Expertise

The Cloud Connect conference March 15-18 will feature leaders of the NoSQL movement speaking on how to handle large data sets in the cloud. The NoSQL movement and other cloud practitioners are likely to be out in force at the Cloud Connect 2010 conference March 15-18 in Santa Clara, Calif., one of the first major gatherings of the year on cloud computing.

One of the workshop instructors March 15 will be Dwight Merriman, CEO and co-founder of gen10 and the architect of the DoubleClick ad serving system, DART. DART is now serving billions of ads a day. Merriman will instruct a first day workshop on MongoDB and why it and other no-SQL systems, such as CouchDB and Hadoop, are preferable to traditional database systems for operations in the cloud.

MongoDB is a cluster or cloud-based data management system that does not rely on relational database principles. Cloud users try to get away from relational database for operations on large data sets because SQL queries tend to consume CPU cycles and “thrash the disk” as they pull data off it.

“NoSQL” systems work with data in memory, or upload chunks of data from many disks in parallel. 10gen is a New York-based company that sponsors the MongoDB open source project and provides commercial support for it.

Alistair Croll, an organizer of the event, said Merriman is one of several cloud computing professionals recruited to speak based on their credentials as “doers” in the cloud environment.

Another is Bradford Stephens, founder of Drawn to Scale, a firm which designs systems to deal with Web-sized masses of data. He will speak on “Introduction to Big Data and Storage at Scale” at 8:15-9:15 a.m. on March 18. His co-speaker will be Florian Leibert, software engineer, research, at Twitter.

The topic “Processing Big Data” at 9:30 a.m. March 18 will feature Chris Wensel, CTO and founder of Concurrent, a supplier of tools for creating applications that execute on parallel computing clusters, and Nathan Marz, lead engineer for BackType.com, a Web site that searches blogs and social networking sites for particular topics of discussion.

“Learning from Big Data with Scalable Analytics” will be the topic of a talk at 10:45 a.m. March 18 given by Michael Driscoll, founder of Dataspora, a firm producing software for data analytics and visualization, and Ted Dunning, CTO of Deepdyve, an aggregator of medical knowledge.

The Cloud Connect conference at the Santa Clara Convention Center is organized by TechWeb and is billed as bringing cloud computing stakeholders together in one event.

“These are the people who are the experts in a given domain, the guy who wrote the thing or the guy who invented it, ” said Croll. There will be many cloud computing vendors both on the show floor and in the ranks of speakers, but Croll said the conference was seeking to make their presentations “non-partisan” and focused on their subject expertise.

Source


Feb 24 2010

Open Source NoSQL Databases

For almost a year now, the idea of “NoSQL” has been spreading due to the demand for relational database alternatives. Maybe the biggest motivation behind NoSQL is scalability. Relational databases don’t lend themselves well to the kind of horizontal scalability that’s required for large-scale social networking or cloud applications, and ORMs can abstract away impedance mismatch only so much. In other cases, companies just don’t need as many of the complex features and rigid schemas provided by relational databases. Most people are not suggesting that we all ditch the RDBMS, in fact, many companies don’t really need to switch. Relational databases will probably be necessary for many applications years and years from now. In essence, NoSQL is a movement that aims to reexamine the way we structure data and draw attention to innovation in hopes of finding the solution to the next generation’s data persistence problems.

Check the source for details on various types of NoSQL.

Source


Feb 3 2010

Old security flaws still a major cause of breaches, says report

An over-emphasis on tackling new and emerging security threats may be causing companies to overlook older but far more frequently exploited vulnerabilities, says a recent report.

The report, from TrustWave Inc., is based on an analysis of data gathered from over 1900 penetration tests and over 200 data breach investigations conducted on behalf of clients such as American Express, MasterCard, Discover, Visa and several large retailers.

The analysis showed is that major global companies are employing “vulnerability chasers” and searching out the latest vulnerabilities and zero-day threats while overlooking the most common ones, the report said.

As a result, companies continue to be felled by old and supposedly well-understood vulnerabilities rather than by newfangled attack tools and methods.

For instance, the top three ways hackers gained initial access to corporate networks in 2009 were via remote access applications, trusted internal network connections and SQL injection attacks, TrustWave found.

All three attacks points have been well researched and known about for several years. SQL injection vulnerabilities, for instance, have been known about for at least 10 years, but still continue to be widely prevalent in Web-based, database-driven applications, TrustWave said.

The most common vulnerability that TrustWave discovered during its external network penetration tests had to do with the management interfaces for Web application engines such as Websphere, and Cold Fusion. In many cases, the management interfaces were accessible directly from the Internet and had little or no password protection, potentially allowing attackers to deploy their own malicious applications on the Web server.

Similarly unprotected network infrastructure components such as routers, switches and VPN concentrators represented the second most common vulnerability unearthed by TrustWave. The tendency by many companies to host internal applications on the same server that also hosts external content was another common vulnerability, as were misconfigured firewall rules, default or easy-to-guess passwords and DNS cache poisoning.

Meanwhile TrustWave’s wireless penetration tests unearthed common weaknesses such as the continued use of WEP encryption, legacy 802.11 networks with minimal to no security controls and wireless clients using public “guest” networks instead of secured private networks.

In almost all of the cases, the most common vulnerabilities unearthed by TrustWave were common well-understood issues that should have been addressed a long time ago said Nicholas Percoco, senior vice president at TrustWave’s SpiderLabs research unit.

“There are basically two themes,” Percoco said. “Through our study in 2009 we found some very old vulnerabilities present within enterprises, some as old as 20 to 30 years.” The second theme is that attackers are targeting these old flaws to break into enterprises, then using increasingly sophisticated tools to harvest data from companies, he said.

In addition to older keystroke logging and packet sniffing tools, malicious attackers are increasingly employing tools such as memory parsers and credentialed malware to steal data, Percoco said. Memory parsers are used to monitor the random access memory associated with a certain process and to extract specific data from it. Credentialed malware programs are a new class of multi-user programs that have typically been used to steal money and payment card numbers from ATMs.

There are several measures companies can take to mitigate the risks posed by older and often overlooked vulnerabilities, TrustWave said. One step is to maintain a complete asset inventory. Many companies are often unaware of all the IT assets they own or of the risks they pose to data, so maintaining an up to date list of assets is vital to protecting them, TrustWave said.

Decommissioning older legacy systems as much as possible can also help mitigate the risk. Also, in 80% of the cases that TrustWave looked at, third-parties were responsible for introducing vulnerabilities. So monitoring third-party relationships is key according to the company. Other recommended measures included internal network segmentation, data encryption and stronger Wi-Fi security policies.

Source


Feb 3 2010

Oracle Hacker Gets The Last Word

In 2001, Larry Ellison brashly proclaimed in a keynote speech at the computing conference Comdex that his database software was “unbreakable.” David Litchfield has devoted the last nine years to making the Oracle chief executive regret that marketing stunt.

At the Black Hat security conference Tuesday afternoon, Litchfield unveiled a new bug in Oracle’s 11G database software, a critical, unpatched vulnerability that would allow a hacker to take control of an Oracle database and access or modify information at any security level. “Anything that God can do on that database, you can do,” Litchfield told Forbes in an interview following his talk.

The attack that Litchfield laid out for Black Hat’s audience of hackers and cybersecurity researchers exploits a combination of flaws in Oracle’s software. Two sections of code within the company’s database application–one that allows data to be moved between servers and another that allows management of Oracle’s implementation of java–are left open to any user, rather than only to privileged administrators. Those vulnerable subroutines each have their own simple flaws that allow the user to gain complete access to the database’s contents.

Litchfield says he warned Oracle about the flaws in November, but they haven’t been patched. Oracle didn’t immediately respond to a request for comment.

The bug is far from the first that 34-year-old Litchfield has outed on Oracle’s behalf. As a cybersecurity researcher and penetration tester, Litchfield has exposed more than a thousand database software security flaws, mostly in Oracle’s code.

Source