16 January 2008

How Grid Computing Can Improve Database Performance

Nathan Segal interviews Benny Souder, Vice President of Distributed Database Development for Oracle, and Jeff Jones of IBM.

According to said Benny Souder, Vice President of Distributed Database Development for Oracle, Grid Computing is where you have a network of computers which tap into a main server. The concept comes from the electrical grid and would be arranged in a system that functions in a similar fashion. If you take an appliance and plug it into a wall outlet, then you become a client of the electrical grid. As a client, you don't know how the grid is implemented, whether the power station is in the next state or next door. All you want is power; you plug in and you get it. That's the highest logical level of Grid Computing."

N: How do you maximize the potential of Grid Computing?
S: "Through centralization. This includes consolidation, centralization, and cost savings. As the nodes or points in the grid get bigger and you have a small number of large nodes, you can do a more effective job of Grid Computing, just as a power company has a small number of large power generators, rather than a power generator per house. The power company works this way because they're trying to get real efficient utilization of their resources, because that keeps the rates down."

"If you have little islands of computation, you have to size them for peak, but most of the time they're pretty idle. A good way to get high utilization is to pool these islands into larger nodes. If you then have the right technology and software, you can dynamically allocate these computers to the priorities of your business."

N: Can you offer an example of Grid Computing in actual practice?
S: "Yes. Let's pretend that you're an Internet retailer selling books on the web and you've got two databases, one that powers your website and keeps track of all the books, and the database is a data warehouse of all the click stream data, etc. Right now, you need every computer you've got powering your web site, because if the website is slow, people are going to leave."

"In December, a mountain of data is collected about transactions on your website, but in January, you'll want to analyze that data and begin planning for next Christmas. If you use separate SMP (symmetric multiprocessor) machines for those two databases, it's very hard to put all your CPU's behind the website and then switch them 30 days later and have almost all your CPU's on the data warehouse."

"To get around the problem, you would use Oracle technology and some new hardware called Server Blades. You could do it with SMP, but you'd have to take the cabinet and machines apart. That's a big job and while you're doing it, the website's down for sure."

N: What is the advantage of using Server Blades?
S: "Server Blades are like a computer on a board, with a CPU, some memory, a local disk for caching stuff and a backplane plug. These blades, (about the size of a skinny pizza box) plug into a rack, which has a power supply, a cooling fan and a network connection. Typically, there are 10-30 blades in a rack. Since there are commodity CPU's on these boards and they share a common power supply, they're very economical to make, about 80-90 cheaper than SMP."

"With the blade technology, we can run our database as well as real applications. Other database vendors will tell you that these blades are great, but don't put the database on them. The reason is that their database on blades doesn't run real applications. Their cluster database runs benchmarks. There's no application vendor that's certified on their cluster database. Whereas on our database, what we call real application clusters, SAP is certified as well as Oracle applications and we have a hundred + production reference customers who are running their business on this cluster database."

N: What happens if you attempt to run applications that are not certified?
S: "They don't work. If you call up SAP, they will tell you that it's not supported. We can take a blade off or add a blade to our database while it's running. So if you're running your website and data warehouse on our blades, you can move the blades back and forth without any down time. That means it's really easy to allocate computing to what your business priority is. That's the first thing we've got for grid computing."

"The second thing is information sharing technology. For example, we have this stuff called Transportable Tablespaces. This lets you snap data off one database and snap it onto another. The file is on a disk, meaning that you don't have to load or unload the data. We also have Oracle Stream, which is a complete solution for information sharing asynchronous. It does messaging, replication, events, publishing, subscribing, and has a rules engine all in one integrated database.

"The third thing is that we're completely portable. So the application you've already written on your SMP machine ports right into this grid technology, you don't have to rewrite the application."

"The fourth thing we've got is Globus, a small organization that's trying to develop open source software for grid computing. They built this thing called the Globus Toolkit that we've integrated with the Oracle database. We have a free for download customized, integrated version of the Globus Toolkit with the Oracle database, so you don't have to figure out how to make these two things work together. We do that for you."

A different perspective was shared by Jeff Jones of IBM. He said: "Grid Computing is an effort to make computing resources appear to be utilities that you tap into as necessary. In DB2 (Version 8), several aspects have been enhanced, making it a good candidate for that type of processing. The first is scalability. A grid requires and expects an enormous amount of data to be supported and an enormous number of users to be coming after that data. So very large scale processing is the norm in a grid."

"Some of our experiences with Grid Computing are the Life Sciences based grid done by Oxford University in England with us to support Breast Cancer research. Another one exists at the University of Pennsylvania that's a Mammography sharing grid, all of which have been built on DB2."

N: How does DB2 work with Grid Computing?
J: "With DB2, we have a Share Nothing architecture. Here, any physical number of servers can be clustered together and you can run one instance of DB2 across all of it. One server takes the requests and breaks them up into pieces and farms the pieces out to all the other servers to work in parallel, then reassembles everything at the end and provides a complete answer back when questions are asked."

"Each server in the cluster receives an independent subset of the complete set of data and operates separately and independently on its subset of the problem to be solved. This form of independent cluster processing is extremely scalable with little loss of efficiency as you add more servers to the cluster."

N: How does this compare to Oracle?
J: "Their approach is to have a very large common memory. Each instance of the database shares a common memory and is being gone after by the same user population, so traffic management becomes the hard problem to be solved."

N: Oracle spoke about server blades. Would you have to shut your system down to add more servers?
J: "No. Server blades are new form factors for servers that can be rack mounted in very large numbers and can be pulled in and pulled out and plugged back in again. It's not a new paradigm; it's just a more efficient way of clustering hardware. Their approach and our approach both enable clusters to be grown or shrunk with not nearly as much agony as in the past."

"With DB2, our approach is to offer utilities. When we add a new server to a cluster, we apply a rebalancing utility that allows you to redistribute the data and populate the new server. This type of housecleaning has to be done on anybody's system. You shouldn't let any vendor convince you that it's painless, but today both vendors have made it much more bearable and much of the process can be done with nothing coming down."

"In a fault tolerance sense, this is good, because servers can be paired together and one can serve the idle standby for the other. This is something that both Oracle and IBM do in a similar fashion. You can have an eight server cluster where four of the servers are actually doing work, while the other four are twins of the first four; waiting to be failed over to if something goes wrong. This is very common high tolerance, high availability configuration for servers. And racks and blade servers simply make that more efficient."

Souder said that the goal of Grid Computing is where "you want information, answers, computation, and get it. That's the fundamental idea, the dream and the goal. We're a long way from being there, but that's the direction that we're moving in."

No comments: