Friday, May 16, 2008

(0) comments

New tricks and old tools

Kim Cameron follows up on Clayton Donley's post with some thoughts of his own. And ends by quoting Clayton:
"The real solution here is a combination of virtualization with more standardized publish/subscribe for delivery of changes. This gets us away from this ad-hoc change discovery that makes meta-directories miserable, while ensuring that the data gets where it needs to go for transactions within an application."

and adding: " As soon as applications understand they are PART OF a wider distributed fabric, they could propagate changes using a publication pattern that retains the closed-loop verification of self-converging metadirectory. "

I couldn't agree more with these two erudite gentlemen.

Unfortunately, today's applications, and especially yesterday's applications still hanging around on our networks, but even tomorrow's applications for some time to come won't be written to be a part of a "wider distribution fabric," especially as that fabric doesn't yet exist in any meaningful way. And, as Kim said in an earlier posting, "Here’s the problem. Infrastructure people cannot dictate how application developers should build their applications. " We can build the infrastructure that will excel in a publish-subscribe world, but getting the apps developers to buy in to that model, well, that's something else. I'm all for building the infrastructure and plumbing of the future, but we need to adapt today's tools so that we can get the job done while waiting for the new plumbing.

Labels: , , ,


Monday, May 12, 2008

(2) comments

optimization and expense

Neil Macehiter comments on the last post:

But the issue is not with the language you use to perform the query: it's where the data is located. If you have data in separate physical databases then it's necessary to pull the data from the separate sources and join them locally. So, in Kim's example, if you have 5000 employees and have sold 10000 computers then you need to pull down the 15000 records over the network and perform the join locally (unless you have an incredibly smart distributed query optimiser which works across heterogeneous data stores). This is going to be more expensive than if the computer order and employee data are colocated.


The "expense" is there no matter how you do it. Putting all of your potentially useful data in one RDBMS is incredibly wasteful of storage space and comes at the cost of slowing down all queries. It also means that synchronizations need to be done almost constantly in order for the most up to date data to be available, a network "expense". But the search can be optimized before any data is pulled. For example, query the HR database for the lowest employee number issued after the first date you're interested in (assuming that employee numbers are issued sequentially). Then query the orders for PC purchases by that employee number or higher. Yes, it's two steps, but it's also faster than pulling down all the records to do a local join. And, I hold, less "expensive" than maintaining a huge silo of all potentially useful data.

Labels: , , ,

(1) comments

Getting more violent all the time

The distinguished Mr. Cameron has restated what he thinks is our major disagreement over synchronization and replication of identity data on the so-called "identity bus." He says:

"Sometimes an application needs to do complex searches involving information 'mastered' in multiple locations. I’ll make up a very simple 'two location' example to demonstrate the issue:

'What purchases of computers were made by employees who have been at the company for less than two years?'

Here we have to query 'all the purchases of computers' from the purchasing system, and 'all employees hired within the last two years' from the HR system, and find the intersection.

Although the intersection might only represent a few records, performing this query remotely and bringing down each result set is very expensive. No doubt many computers have been purchased in a large company, and a lot of people are likely to have been hired in the last two years. If an application has to perform this type of query with great efficiency and within a controlled response time, the remote query approach of retrieving all the information from many systems and working out the intersection may be totally impractical.

Compare this to what happens if all the information necessary to respond to a query is present locally in a single database. I just do a 'join' across the tables, and the SQL engine understands exactly how to optimize the query so the result involves little computing power and 'even less time'. Indexes are used and distributions of values well understood: many thousands of really smart people have been working on these optimizations in many companies for the last 40 years."

What Kim fails to note, however, is that a well designed virtual directory (see Radiant Logic's offering, for example) will allow you to do a SQL query to the virtual tables! You get the best of both: up to date data (today's new hires and purchases included) with the speed of an SQL join. And all without having to replicate or synchronize the data. I'm happy, the application is happy - and Kim should be happy too. We are in violent agreement about what the process should look like at the 40,000 foot level and only disagree about the size and shape of the paths - or, more likely, whether they should be concrete or asphalt.


Labels: , ,


Wednesday, April 02, 2008

(0) comments

Get on the bus!

Everybody else is. Dale Olds has commented. So has Phil Hunt. Let's all get together at the European ID Conference in Munich later this month and talk about the Identity Hub, the Identity Bus, the death of the metadirectory and so much more. Suggestions for a suitable meeting place (i.e., biergarten) near the Deutsches Museum are welcome - post as comments to this post.

See you there!

Labels: , , , ,


© 2003-2006 The Virtual Quill, All Rights Reserved

Home

[Powered by Blogger]

-->