Monday, March 23, 2015

The Product Owner role in business application development

At the core of all agile/lean practices is the minimization of work in progress (WIP). The more WIP you have the less visibility and control over the actual output of work. You may have made a major error which has not yet been discovered and will result in reworking all other current work in progress. If your WIP is small (hours or days) than you have minimized this impact. If your WIP spans several months you may have a huge task ahead of you. The top three things required to manage WIP down are:

  • Quick decisions - we need to be able to turnaround a question or clarification to a decision quickly, if not it is all too easy to just start on the next piece of work or make assumptions to keep moving
  • Regular checking - when we have a requirement for quick decision we need to accept that some may be wrong (and we need to change direction). We can only confirm we're heading in the right direction with regular checking (automated and manual) to maintain the confidence that the WIP is complete.
  • A team culture that focuses on collaboration - A command / control process with many 'quality' gates slows down progress and drives up WIP. To turn around a decision to a deliverable requires a highly collaborative approach where contributions from all parties are encouraged.

In Scrum, the Product Owner role plays a critical part in all three of these and is almost entirely responsible for the first two. So it's critical that you get a Product Owner on the project that truly understands their role. This role needs to combine authority (to make low to mid level decisions without the need to convene committees) and speed of decision making (so that WIP is minimized). So in theory the perfect Product Owner has complete authority to make decisions and is available to turn around those decisions quickly (same day at minimum)

Whilst this is certainly achievable in small organisations or software solutions that support a small silo in an organisation, in most reasonable sized business projects this position doesn't naturally exist and all too often we don't identify a single Product Owner and adopt alternate methods:

  • The IT project manager (or Scrum Master) assumes the role with inevitable consequences. Assumptions are made to keep the delivery moving and WIP down, business users reject those assumptions later and rework is required, which will lead the project manager to become more cautious and require signoff from stakeholders, eventually moving the role to a low authority / low speed point - exactly the opposite of where we want it.
  • A steer co is setup to handle decisions. Whilst this is appropriate at a high level (should we undertake this project or not) it is not appropriate for the minutiae of decisions required during a project and results in a very slow decision making process and driving up WIP in an attempt to maintain delivery pace.
  • Alternate options include an Executive sponsor (not highly available, but with authority) or
  • Relying on stakeholders - usually more available and responsive than execs, but if there are more than a couple the decision making authority is completely diluted.

With none of the above options being ideal, a Product Owner should be appointed with:

  • Authority from steer co and sponsor to make decisions on their behalf (monthly steer co sessions to keep them on track)
  • Knowledge of the stakeholders business processes and the ability to look ahead in the project backlog and consult with them on future requirements
  • A close working relationship with the IT project manager (or scrum master or scrum team) to provide fast turnaround on clarifications / decisions to keep them moving in the right direction

Getting as close as possible to the ideal Product Owner is probably the single biggest factor in successfully delivering agile solutions. I've been lucky enough to have worked with a few business people who are 'naturals' in this role, but for each of them there are many more who fail either through indecision (I'll need to check with...) or abdication (Can't IT just figure it out?).

Tuesday, March 10, 2015

Development skills interchangability - 'Online', 'Backend', and 'Mobile'

Many business systems these days are written as web applications - either hosted in the cloud in a true multi-tenanted environment or on the internal corporate network. They are often written using the same technology set as consumer web sites. It would not be a stretch to assume that developers who have worked in these technologies can switch between them freely. Whilst this is true in general there are many issues which are particular to one type of system and not the other. For example:

  • Performance. Whilst performance is an issue for both types of development the causes of performance issues are often quite different. In online apps it's often caused by sheer volume of requests - an architecture supporting a high read/write ratio is going to be of most use - caching, separation of responsibility (CQRS), scalable architecture. In business systems performance issues are more generally the result of low level algorithm choices. The former is a more challenging architectural issue, the latter a more challenging coding issue (which is why back end developers believe they are the 'real' coders)
  • Deployment. Again both environments have deployment challenges. But back end deployments are usually more 'big bang', easier to schedule (any weekend is fine), but are likely to have more integration points with other systems and more complex pre and post deployment testing cycles. On the other hand online deployments are about limiting or eliminating any outages required to perform the deployment. Again this comes back to a quesiton of system architecture, if your deployment takes a scorched earth approach where you spin up new servers, deploy new code, run automated tests, then instantly cut over you can (with careful state management) perform outage free deployments.
  • Feeback. User feedback is critical to any good software. With business systems you can just ask the users. With public sites you can do that as well but you also want to track and analyse the data collected from the site.
  • Development process. I've been a proponent of agile practices for a long time I think they're suitable for both environments, however it fits perfectly the online environment as you want to be making lots of small incremental changes and have the ability to quickly change direction should your analytics prove you made a mis step. Most software development groups would now be at the stage of using unit test and continuous integration. Online teams should also be considering continuous deployment and strong feature management (so we can disable a feature that has been deployed if required, or A/B test a feature we want to assess)
  • Browser support. Business systems are developed for 'captive' audiences where a standard browser / operating system can be mandated or controlled. This makes it easier for developers to develop and support. Online needs to consider both browser and device with a huge percentage of web traffic moving to mobile devices all online content needs to be responsive and designed appropriately for all browsers and screen resolutions available.
  • Usability. Business systems are used by regular users doing their job. Their user interface needs to allow them to complete that job as efficiently as possible. It is acceptable to place a lot of controls on one screen IF it is achievieving that aim. The online user is usually an infrequent (and at best a casual) user of your site. Their UI needs to be as intuitive as possible (and if under an advertising revenue model one that delivers more page impressions!). Whilst UX is important in both cases the lack of available user feedback means you need to pay more attention to it in the online space.

Of course whilst we're talking about online development there's the question of mobile. Native Apps, Hybrid Apps, Web Apps, Mobile friendly web sites all come into the picture and choices of which to adopt depend on a range of factors including device feature accessibility (GPS, camera, etc), performance of the application, level of disconnected use supported, usability requirements, marketing / discoverability, and obviously cost of development (and on going maintenance). In the modern world of web development - where changes can be deployed at will with little regard for backward compatibility issues the on going effort to maintain native apps should not be underestimated - it's more akin to client/server than web development (and it's not really like either). Native apps should only be used when the device features, access to specialised gestures, or speed is of a major concern.

So can developers move freely between these environments? Of course they can, but whoever is leading the team (and determining the architecture of the solution) needs to be familiar with the potential issues.

Sunday, September 07, 2014

Graph Databases

Graph Databases are not new - sites like LinkedIn and Facebook are based on highly connected data which is not managed on traditional RDBMS (Relational Database Management System) infrastructure. Graph DB technology is being rapidly commoditised with platforms like Neo4J and OrientDB leading the way. I believe they will become a new defacto standard in developing all sorts of business and online applications once the inertia of 30+ years of RDBMS thinking is slowly broken down.

Often when I describe what a graph database is - the ideal way to store highly connected data - most people just shrug and say that was solved years ago with the RDBMS platforms - they have 'Relational' in the name after all, right there as the first letter of the acronym! This post is an attempt to explain what makes them a better choice for many applications.

Firstly let's take a look at an example. Say you have a permissioning service which manages permissions for various systems grouped by roles which in turn have a list of functions. In a relational model we may end up with a tables for 'System', 'Role', 'Function', and 'Person' with additional join tables for 'Role_Function' and 'Person_Role'. A typical query of this model would be to determine which functions Person 'A' has permissions to for Application 'X'. The most basic TSQL implementation would be something like

SELECT Function.Id, Function.Name
FROM Function 
INNER JOIN Role_Function ON Role_Function.FunctionId = Function.Id
INNER JOIN Role ON Role.Id = Role_Function.RoleId
INNER JOIN System ON System.Id = Role.SystemId
INNER JOIN Person_Role ON Person_Role.RoleId = Role.Id
INNER JOIN Person ON Person.Id = Person_Role.PersonId
WHERE Person.Name = 'A'
AND System.Name = 'X'

Of course if you add in some reasonable complexity like supporting the fact that some functions may imply permissions to other functions (to edit a record you need to be able to view or search for it). Or that you might have profiles linked to positions rather than people you end up with an explosion in the JOIN factory and the TSQL becomes many times more complicated.

In a graph world however each row in each table simply becomes a node in the graph. The person A node would have a 'IS_IN_ROLE' relationship to a bunch of Roles which would be linked to systems with a 'HAS_ROLE relationship and to functions with an 'INCLUDES' relationship. Functions could relate to each other in a hierarchy. You could add Profile nodes which a Person could hold which can have Roles of their own etc. E.g. a graph looking something like this:

With graph technology comes new querying languages / syntaxes. Neo4J provides a very elegant Cypher language which allows you to query the graph very succinctly. E.g. our complex non-performat TSQL statement might look like this in a graph world:

MATCH (:System {Name:"X"})-->(r:Role)-[*]->(f:Function)
,(p:Person {Name:"A")-[*]->(r)

This query would accommodate function hierarchies and any sorts of connections from a Person to a role (e.g. via positions). In fact whilst it's already simplier than the TSQL version above it is also more powerful, performant, and flexible to changes in the underlying model. We actually have this permissioning system in our organisation written with a RDBMS back end and with the additional complexities mentioned, the TSQL query to retrieve functions for a user takes up over 200 lines which in Cypher condenses down to the simple 3-line statement above.

So with the intro out of the way, what are the benefits?

  • One of the obvious benefits of graph DBs is the types of queries that are easily supported and often DO NOT require changing even for changes to the graph structure itself. Greatly speeding up development time.
  • Another benefit is performance. In the TSQL world there are many index lookups going on to find data in separate tables to JOIN on. In the graph world each node has direct references to its related nodes meaning that traversing the graph (given known starting points like the Person with Name "A" and system with name "X") is super fast as it only ever considers related nodes to see if they match the query. In fact, although indexes are supported in graph DBs they are generally only used to 'anchor' the query to fixed starting points in the graph not to find the data being retrieved.
  • Flexibility to requirements changes. In an agile development world (which is everywhere now really right?) Graph databases accommodate changes to requirements far more easily. The rise of ORMs was due largely to the impedance mismatch between Object Oriented development and the RDBMS data storage structure. Graph DBs remove this issue by allowing data to be stored in a way that more closely matches the code. In fact Graph DBs do not strictly have schemas (though this is somewhat dependent on the technology used) - there is nothing to prevent one node representing a Person having an 'Eye Colour' attribute and another node having a 'Height' attribute. Obviously for use in business applications you will expect some conformity but this is held and defined in code rather than in a separate DB schema as with RDBMS.
  • Deployment of changes is also simplified. Though there are gotchas to look out for with the lack of a schema driven model you are free to add and remove nodes and relationships dynamically meaning you could re-organise the structure of the graph in a live environment
And the drawbacks?
  • Most obvious is the lack of mainstream support. Graph technology is new and untrusted in both enterprise architect and development worlds. This will change over time as exposure increases.
  • The market has not yet stabilized meaning even the most prominent players have not yet settled on a standardized querying language or code base (e.g. Neo4J have recently deprecated their original APIs)
  • There are some applications where a 'good old' RDBMS is still more suitable. Any application with serious aggregation / number crunching requirements or where the structure of the information is very static, not highly related, nor subject to frequent change is probably still going to be developed using an RDBMS backend. Though, I'd hazard a guess that there are fewer and fewer of these systems left. 
  • Reporting requirements are also probably better suited to a properly structured reporting cube maintained separately to the graph. This is actually true of systems running on an RDBMS but since TSQL can aggregate data well often reporting and transactional requirements are supported by a single DB. If you are a reporting purist in someways this is another benefit of the Graph DB as it forces us to think about the reporting requirements separately to the transactional requirements of the system.
If you want to investigate Graph DBs some further reading and suggestions:
  • Neo4J - the self-proclaimed worlds leading Graph DB has a free community edition and a fantastic query language in Cypher. Beware of the 'Oracle' like license model for enterprise implementations though. The site includes some great intro information and links to graph DB examples, demos, and tutorials.
  • OrientDB is another great graph DB tool which also operates as a document DB. Its query language is based on SQL to make it 'more familiar to TSQL developers'. It also supports more of a controlled definition of node types with inheritance from higher order nodes. Like Neo4J there is a community edition available and licensing for enterprise is very reasonable.

Sunday, May 11, 2014

Business Intelligence ETL

There appears to be a quiet revolution going on in commoditisation of business intelligence. Microsoft (late as ever) has weighed in with BI platforms that are very compelling for existing MS shops. Utilising Excel and the vertipaq columnstore in-memory tabular data models
allows advanced and fast dashboard and reporting solutions to be achieved with very little effort and cost.

The key, as ever, is separating the transactional requirements of the business systems that provide the raw data from the reporting requirements of the business joining data from various sources. Too often vendor supported software assumes that its software will be the centre of your universe and that the reports that come with the application are all you will need.

Data needs to be centralised and doing this in a simple, source-agnostic manner is a huge challenge, which requires good business analysis, technical knowledge, and a degree of foresight. In order to address this challenge we're working on a framework to map raw data sources to data marts supporting various types of slowly changing data in a way that supports current and point-in-time reporting in a tabular model utilizing Excel (and PowerPivot) as the platform to surface information

The 'Transactional' to 'Reporting' format process supports the following features:

  • The mapping between the source data to the reporting format is defined in mapping configuration tables
  • The mapping configuration tables auto generate the TSQL required to update and insert reporting tables
  • The framework supports type 1 and type 6 updates to slowly changing data
  • The framework will support set-based operations to maximize the performance of the load. No row-by-solitary-row operations should be considered
  • The framework will track all operations including the row count of the updated / inserted rows
  • The framework will support loading of backdated data if available
  • The framework will support a denormalised input data source - aggregating this data into constituent dimension and fact tables in the target data mart
  • The framework validates denormalised input data to ensure there are no inconsistencies in the import - e.g. different customer names for the same customer ID in the import table

Why go with a denormalised input to the framework?

  • It enforces row by row transactionality - if a row is processed all the dimension and fact data contained in the row must be processed
  • It more clearly segregates the source data structure from the target structure. Without this denormalisation step it is all-to-tempting to replicate the source schema into the reporting schema
  • Point 2 also makes it easier to swap out the source of the data with a new (or additional) system if the transactional systems get replaced for any reason
  • The data extraction process is simplified - a single (complex) query can gather all of the required information from the source system to be uploaded to the reporting data mart.
  • If historical extracts of data have been collected for 'human consumption' over time this method supports the backloading of this data into the reporting data mart, as this data is normally presented as a single denormalised set.

So what are the drawbacks?
  • The re-transalation of the denormalised view of data to a star or snowflake schema required for reporting is not trival to generate even by hand. Having a framework to autogenerate this code is a challenge.
  • Any historical denormalised data often lacks the internal surrogate keys from the source system and natural keys need to be identified and validated (what do you do if you find two John Smiths - is the second record a change of details or a new person?)
  • Auto-generated code is harder to performance tune especially in any round-trip way. Of course indexes can be added to source and target tables to speed up the generated queries (the framework could even define these for you based on it's knowledge of the natural keys in use)

Friday, January 10, 2014

IT jobs

When computers were first around almost everything revolved around the programmer- it took a programmer to write the code to make the computer actually run, to write the compiler to interpret, the the code for the compiler to run, etc. Over time, things were abstracted away. Chip sets were standardised, then standard operating systems were introduced, then development platforms and databases made development and storage easier, then whole industry standard packages meant programmers weren't relied on to deliver a business function. 25 years ago a company may have considered writing their own software for handling accounts payable, 15 years ago they may have considered building a content management system for their web site, 10 years ago they may have considered building an enterprise service bus. All of those decisions, if made today, would be considered crazy.

In the meantime, hardware has also been steadily moving from an engineer-centric world to a commoditised virtual world - physical boxes to virtual machines and now to virtual data centres in the cloud. 10 years ago you might have required 4 weeks notice to procure a new server, now it can take minutes. In fact most of the software applications we develop now are deployed using a scorched earth policy - a new server is spun up, code deployed, tests run, DNS switched over, old server decommissioned automatically in minutes.

What does this mean for IT workers? My guess is that network engineers should consider other skills - especially dev ops as a natural progression. Developers continue to be needed, but expect more and more work in 'filling in the gaps' between off the shelf systems (be this integration or functional gaps). Developers with a few strings to their bow will be in demand especially in regards to service buses, emerging technologies (graph DBs, mobile), and platforms (SalesForce, Dynamics, Sharepoint). Of course if you're near retirement age and know Cobol there will still be a demand until at least 2023!

Friday, February 15, 2013

Sign of the times

I noticed today that GanttHead - a site I regularly visit and (mainly) full of well written and thought provoking articles has changed it's name to ProjectManagement. A subtle, but clear sign that linear planning PM methods are on the way out. The web site announcement stated: is now
Why the change?
Project Management is changing. When gantthead launched in 2000, every project worth managing was run using a gantt chart.
But times change. The change in our name is a recognition that many PMs who would benefit from being one of us may not even know what a gantt chart is.

Monday, July 30, 2012

The hidden cost of planning, precision, and predictability

There appears to be a perception that it would be irresponsible for us to do any work without a very clear idea of what and how we are going to deliver and how much this will cost. Whilst I’m not at odds with this idea, it denigrates all other notions of responsibility. Consider;
  • Is it responsible to make no decisions until all the facts are known even if, by delaying, it is too late to act?
  • Is it responsible to avoid proposing potentially risky courses of action which may yield high returns, because we're unsure of the effort /cost involved?
  • Is it responsible to invest time and energy defining solutions to problems rather than delivering solutions to problems?
Most sane people would agree that all of the above show some levels of irresponsible behaviour, but all too often this exact behaviour is hidden behind the notion that having a clear and precise plan is the only responsible course of action. Now, a clear and precise plan is a great thing to have, especially if you can come up with it at almost no cost, but as the military says ‘No plan survives first contact with the enemy’, and the corollary ‘if your attack is going to plan, it’s an ambush’…

So what is the responsible approach? I’d suggest it’s a compromise, i.e. that the responsible thing is to
  • Balance risks with rewards. If I know something will cost between $100 and $200 I don’t need to go into any more detail if the return is $300. I should switch to delivery mode immediately and get that benefit asap rather than spend time and effort working out that it should cost $173.24. On the other hand, if the return is $150 I might want to do more investigation.
  • Balance planning with action. Have a rough plan for the long term and a detailed plan only for the immediate short term, possibly with other levels between. Spend most of our time delivering value against the short term plan, and revise the long term plan less frequently.
  • Balance precision with effort. Being precise is admirable, but if the effort required to be precise is too high then the benefits of precision are undermined. E.g. if it costs me $20 to determine an investment will be between $100 or $200 (for a total cost of between $120 and $220) and $50 to know the cost is $173.24 (for a total cost of $223.24) I’ve realised no benefit from the extra effort involved in gaining that precision.
  • Balance predictability with adaptability. Having a predictable outcome is admirable, but if it means missing out on opportunities to change course and deliver a better value outcome the advantage is undermined. Knowing that I can spend $150 to achieve a return of $300 is great, but if it means missing out on an opportunity to increase that return to $400 for the same cost I have failed to maximise my effort.
  • Balance budgets with benefits. Rather than try to define a deliverable with a cost, define the benefit associated with an outcome and set a budget accordingly. If the budget appears unachievable reconsider the approach, but if it does appear achievable start delivering and continue to monitor the budget and benefits until the crossover point is reached – i.e. when the incremental benefit of more work is not worth the cost that will be incurred.
That is to say that whilst there's benefit to planning, precision, and predictability they're useless unless coupled with taking action, delivering benefits, and being adaptable. Of course, whilst you're stuck in an environment where an investment has to be proposed and approved through a lengthy bureaucratic process this balanced approach is difficult to achieve, but unless the status quo is challenged effort will continue to be wasted and opportunities missed.