Monday, July 09, 2012

Managing Tacit Knowledge In Projects

In software projects, when we talk about knowledge management it is usually about managing explicit knowledge. Explicit knowledge is disseminated with the help of knowledge assets like documents, standard operating procedures, etc. Tacit knowledge cannot be passed on easily and deals mostly with implicit or unstated knowledge. The individuals who possess this knowledge either don't know that they posses some unique knowledge that can be shared with others, or they think that it is routine and do not know the value of the knowledge they posses. This is where the need for managing tacit knowledge arises, and most often, teams realize the importance of tacit knowledge in crisis situations.

To understand better, let us start with a real life scenario. Ben was the project manager of a maintenance project, which was running successfully for 1 year with 100% Service Level Agreement adherence. The project started with a team size of 20 out of which 13 were fresh people.

The system was developed in a legacy technology, so he had a combination of problems to tackle:
  • Train the team in the technology
  • Ensure the transition happens without any glitches

Due to the effort of the senior engineers in his project they moved into the steady state and wrote success stories for 1 year. This was also due to the knowledge management practices that they adopted. The team developed Books of Knowledge on technology and the system they were maintaining. After 1.5 years Ben had to release 5 of his senior engineers for other opportunities. The release was planned 2 months in advance and Ben ensured that a proper transition took place. After all the planned activities were completed, the release happened. The next two days saw a flurry of cases from customer as it was the quarter end. There were some major issues and suddenly the team was panicking. The frantic and clueless team approached Ben for the contact numbers of the engineers who left the project.

Now what could have gone wrong here, after such careful planning and meticulous knowledge management ? why was the team not able to cope up with the flurry of cases? The answer is simple; Ben missed a few tricks of tacit knowledge management in his project.

Some of the tacit knowledge management practices, which can be adopted in projects, are:
  • No knowledge is too less to share and too trivial not to disclose: Ensure that the team understands the importance of the routine jobs they might be doing, and ensure that such routine jobs are identified and recorded properly.
  • Rotation of tasks: Ensure that people are rotated across modules in the same project and that proper training takes place during such rotation. The team should record special cases that were not handled in the transition, and which they came to know of only after working on the system. This can be identified easily when a particular task is taking a longer time to complete with a new set of people.
  • Working with gurus: In every project, we come across gurus. This is because they have grasped the technology/system better than others in the team, or they have found a way to work smarter. Allow people to work with gurus on a rotation basis and tell them to record any steps that the whole team is unaware of.
  • Eureka mails: Encourage the team to send eureka mails, when they struggle with a task, and eventually find a solution
  • 10 min knowledge sessions: Schedule daily recap sessions of the issues encountered. A scribe should be allocated to take down the major issues. The scribe should record such issues in a daily tips repository and send out emails, whenever a major issue is solved.

From the above mentioned points it is quite clear that interaction between team members is the key to managing tacit knowledge. The team is the strength of any project. Therefore, as a project manager, managing tacit knowledge equates to being a facilitator to interactions within the team and innovating new ideas to make these interactions more productive and interesting.


Technology Integration Lab
Providence
www.providenceconsulting.in
The Agile Processes

A project manager friend once said that rigid processes and systems end up being roadblocks to innovation. I am sure he was wrong, but there is a message to consider - All of us grow up with CMMI processes all around us, and by the time we start managing projects, we have KPAs running through our veins, and we end up being hardened process geeks, or shall I say "processistas". Nothing wrong with that. These time tested processes that we religiously follow: from requirement gathering, thru 'build' to implementation methods produce amazing results.They increase predictability, and maturity, while reducing risk,etc. But sometimes, just sometimes, we get boxed in and find it difficult to think outside of the processes. That's when injecting a dose of "agility" into our system can make a difference. Especially when we are dealing with a world where paradigm shifts are common place. This is exactly what we did while managing projects for a client, resulting in amazing success.

The client was in the midst of a large organizational transformation program, building their Enterprise Resource Planning (ERP) system from scratch. Before Sceda Systems, the program was perennially troubled for over 2 years, with no deliverables to show. So when we came in for one large project in that program, we started with incomplete requirements, a business team with no confidence in the program, and stiff time-to-market and budget constraints. The client wanted the project to be delivered in 4 months. The processistas in us took us straight to the scheduling system that told us that the client was kidding us . We did pass on the advice to the client, but if we had stopped there we would have lost multi-million dollar revenues over the next 2 years. Instead, we created a tailor-made end-to-end development process for this client, taking our regular methodology and accelerating this by applying some agile principals. For this, we did the following:
  • Analyzed as-is process and methodology
  • Created a tailor-made development process
  • Applied innovation to make every step of the Software Development Life Cycle more efficient
  • Educated the client

The As-Is process

The client was following a half-baked agile methodology when we walked in. There were some obvious issues with their process, especially in the context of a large program with multiple development groups and diverse stakeholders. Some of these are listed below:
  • Due to the lack of Requirements management, Requirements Analysis (RA) was moving in circles for 2 years,loosing the complete trust of businesses.
  • The starting code was built before the requirements were complete. This lead to continuous rework at a catastrophic level.
  • Since there were many independent development groups, not having design artifacts - such as the Data Model - created immense churn in the program.
  • Down-stream projects - such as data migration and reporting - struggled because of changing design and data models.
  • A lack of project or program management structure, leading to a lack of predictability.

The client was then informed about the issues in their current process, and that delivering the entire program with decent quality, in a few months was impossible for anyone. Consequently, we sported our innovation hat, and started creating a solution tailor-made for this client.

The Tailor-made Suite

After analyzing the client's process we started creating a development methodology that would work. A typical waterfall would not work, as we had to show some deliverable to the client early on to bring confidence back in the program. At the same time, given the size of the program, the diverse teams, and stake holders we couldn't do a pure agile. We, therefore, needed a strong and mature methodology, but with a dose of agility to accelerate the process and support stiff time-to-market needs.

We decided to go with an iterative development methodology, splitting the first project into iterations:
  • Initial discovery phase to arrive at an iteration plan and detailed SME plan
  • Each iteration runs from requirements through user testing
  • Detailed project plan provided at end of RA for an iteration
  • Separate teams and Project Managers for the first 2 iterations to expedite process
  • Created a strong RA process, using In-Flux, that brought back user confidence
  • Creation and freezing of Data Model at program level
  • Created a strong program management structure, including Risk and Change Management

We incorporated some 'Agile' principals into our methodology:
  • Iteration and feedback-based: Feedback from one iteration was used in the next for improvements
  • Efficient: Strived to make each stage as efficient and lean as possible
  • Optimized: Continuous improvement at every stage
  • Strong Customer Involvement: Strong SME team identified to expedite RA. Also, continuous feedback received from this SME team at every stage
  • Continuous Integration: Work done by all development teams would be integrated fully before every release, thereby reducing downstream risk

We also spent time educating the client about problems with their process, the need for a strong management framework, and how their current timelines and plans are unfeasible. After some initial resistance, the client started to realize that their timelines wouldn't work and started to have faith in our solution.

The Triumph
  • We successfully delivered the first 2 iterations, and that brought immense confidence into our solution, process and capabilities
  • We won all subsequent development projects, some from other vendors, and followed a similar methodology for development
  • The program on track to go-live, winning back business and executive stakeholder confidence
  • Heavy revenue stream from this client over the last 18 months.
  • Project running the gamut, including testing, Data Migration, and Enterprise BI
  • Primary vendor for the client, and we now control over 90% of the program
  • Trusted partner of the CIO

Moral of the Story Let's all continue to be hardened processistas, after all that's what is expected out of us and that's what really put us on the map, but let's also have an agile mindset that brings in flexibility and nimbleness - In conclusion lets be "Agile Processistas".


Technology Integration Lab
Sceda Systems
www.scedasystems.com
Comprehensive Project Estimation

Estimation is a critical planning activity in the life cycle of a project. With the IT industry and engineering processes still considerably immature, estimation still continues to be a challenge. "It is based on estimations that budgets and commitments are made." Standish Group Research, in their latest edition, reports that more than 50% projects overshoot their estimates. What could be the reason? Well, in my thought, this is because project managers focus only on effort estimation. Various aspects like size, effort, schedule, cost, and so on need to be considered for comprehensive estimation.

The below listed points bring out some salient features of comprehensive estimation:
  • The Size of the application provides a sense of estimates. The industry provides a variety of techniques to estimate the size based on project types (like development, maintenance and so on). Development projects make use of FP or cosmic FFP to estimate the size of the application to be developed. Use-case based estimation is also used in some cases. FTE estimation is widely used in a maintenance scenario. More recently, NESMA FP is being used for the same purpose.

  • Effort Estimation considers the productivity and size. Selection of the right productivity data determines the success factor. This productivity depends on various factors - such as people skills, process maturity, and so on. Every organization has to come up with standard productivity data for every variant of project type. They have to make use of the past data and organizational skills set and decide on the target productivity.

  • Estimation of the Right Timeframe (Schedule) for the execution of the project is a key facet in estimation. The schedule for a project needs to include factors - such as people leveling, critical path identification, applying crashing or fast- tracking appropriately, and so on. Identifying all important activities form the key to the success of this activity.

  • Cost Estimation not only includes the effort cost, but also involves various cost factors associated with the life cycle of the project for budget. Budgeting becomes an important activity during the course of the project. Every modification to a requirement requires proper analysis of impact right from size to cost.

Though the above points paint a comprehensive picture, there are other dimensions as well. For example, defect estimation, based on a prediction model, helps the project management to perform objective gating during the course of the project. It may also happen that after defect estimation, the project manager would like to revisit the effort estimation or schedule estimation. Hence, the comprehensive estimation revolves in a cycle, considering all these aspects hand-in-hand. Once the cycle is stabilized, the estimation is recorded and validated with all stakeholders.

Monitoring these estimates happens to be yet another activity throughout all the life cycle phases, as a means to understand and predict risks effectively. Any variation from estimations have to be identified, analyzed and acted upon at the right time, so as to ensure success of the project. A well planned project is half complete. Comprehensive estimation paves the way for the execution of a well planned project. To know more details on comprehensive estimation, please enroll into the Sceda Systems' Certification for IT Project Management. This course covers the topic in-depth, providing IT related examples and case studies.


Technology Integration Lab
Sceda Systems
Ubiquitous Learning With Mobility

Micro-learning refers to the design of learning interventions using Mobile devices commonly referred to as Mobile learning, the recent buzz of the virtual learning domain. Many organizations are experimenting with this new and ever evolving technology to exploit its benefits. The most significant aspect of this tool is its availability and accessibility. The brands and quality may vary extensively, but the basic gadget is there with almost everyone. The urge to utilize this technology in various domains is obvious, and the education and training industry is no exception.

The benefits of learning on the move range from accessibility to addressing a vast audience and the various forms of interactions leading to a rich learning environment. However, the learner on the move operates in a continuously changing information sphere. This has led researchers of technology-mediated learning to deliberate on the appropriateness or otherwise of learning interventions that best suit mobile devices. With the challenge of learning keeping pace with changing technology, the concept of Micro-Learning emerged as a major research area. Although relatively new, this domain promises positive change by considerably reducing mobile learners' cognitive load.

As one of the sub-domains of educational technology Micro-learning concentrates on designing learning as small steps in media-centered learning environments comprising facts, concepts, processes, or procedures, which together cover a macro curriculum independently or in conjunction with other learning modes. The design of Micro-learning incorporates five dimensions. These are:
  1. Time (a learner can spend)
  2. Content (small judicious components to address the short timeframe)
  3. Curriculum (self-contained micro curricula covering a macro curriculum)
  4. Media (varying nature of the media for Micro-learning)
  5. Learning Type (varying learning styles and domains)

The benefit of using these Micro-learning components is the ability to incorporate learning in a learner's routine activities rather than making it an independent and additional activity for the learner.

The size micro may vary with domain and learner context. Therefore, Micro-learning demands a paradigm shift from common models of learning to newer models keeping in view micro dimensions of the learning process. Micro-learning being an evolving domain does not have any strict definitions or coherent uses of the term yet. However, the increasing focus on Micro-learning activities is evident through the activities of internet users.

Micro-learning finds its use in various domains, and perhaps, it is the best approach for many. One example is a project comprising globally spread employees, some working in the field, others travelling regularly, and some overseeing the project. Other examples include upgrading knowledge about the features of a new product or process, providing just in time and just enough knowledge, and so on.

There are, still many unanswered questions, to which researchers of this domain are striving to find answers. Some of them are the extent of independent micro learning effectiveness to address macro curriculum, tracking and evaluating learners, especially in the offline mode, and the right place for micro learning in the overall learning domain.


Technology Integration Lab
Sceda Systems
www.scedasystems.com

Is Your Website Set Up for Conversion ?


You have 3 million unique visitors a month and a well-designed website. So why aren't people signing up for your newsletter, downloading your software, or purchasing your products? The reason could be that your website, as beautiful as it may be, isn't set up to help visitors convert.

A common visual guide that organizations use to help explain the conversion process is known as the "conversion funnel." The conversion funnel is meant to show the journey from landing on your website all the way through the purchase, download, sign-up, or any other predetermined conversion.

1. Quality Content vs. Quantity of Content

The old adage "if you build it, they will come" isn't necessarily true when it comes to websites. Just because you have a well-designed site doesn't mean people are going to find your website and sign-up for your weekly newsletter.

In order to truly be effective and have people engage with your brand online, you have to have quality content. Notice I didn't say quantity of content. Quality content is usually content geared toward a specific audience that has targeted keywords throughout and has a high chance of being shared on social channels. Having web visitors land on your site thanks to a piece a quality content usually infers that the web visitor will be interested in other aspects of your site, such as a product offering, whitepaper, blog posts, etc.. Creating quality content is the starting point to bettering your website's conversion rate.

2. Top Tier Navigation

When a new visitor lands on your website for the first time, they will interact with the website based on their past experiences on the web. It is essential that your website's navigation is clear and easy to use in order for your visitors to quickly and efficiently find relevant content.

Navigation is part of the UI/UX (user interface/user experience). If visitors on your website have a positive experience, they are likely to return. Return visitors are more likely the convert.

Navigation is the key to help a web visitor's progress through the conversion funnel.

3. Easy-to-Understand Calls to Action

Complementary calls to action are essential for increasing your conversion rate on your website. After all, it doesn't hurt to ask (subtly). For example, let's say a web visitor is on your site reading an article about the top tech trends in 2012. They happen to see a call to action on the right rail of the page that says, "For more info on tech trends, sign up here." Wouldn't it be safe to assume that the user is more likely to sign up than someone who isn't interested in tech trends ?

This goes back to the quality content point. If you are creating quality content and guiding your web visitors through the conversion funnel by top tier navigation and calls to action, you should begin to notice an increase in return visitors and a better conversion rate to boot.

4. Analytics and Tracking

The reason I left analytics at the end was to make sure it wasn't forgotten. If you aren't tracking web activity on your site and utilizing the data to make better business decisions, then your conversion rate will never reach its full potential.

Most free analytics software will give your organization the ability to track which content pages are the most popular, where exactly visitors are in the conversion funnel, and what keywords are driving traffic to your site. This knowledge should provide your organization with the insight necessary to make changes on your website and further meet the needs of your potential customers.


Technology Integration Lab
Sceda Systems
www.scedasystems.com

Three Ways To Mint Money With Big Data

Big Data is now deep into the hype phase of the innovation cycle.  All the classic signs are there: you can eat buffet dinners all 52 weeks a year at Big Data conferences, Big Data tag lines are now common in emails from industry analysts, and even investment bankers are tossing around the phrase.  Any experienced businessperson has seen this movie before with earlier technologies ranging from the World Wide Web to CRM to Enterprise Data Warehouses.  As with these other innovations, however, there is real substance at the root of the hype.  And – like CRM, the Web, and data warehouses – Big Data is very likely to be a big part of running almost any large corporation in the future.

Most early movers among the users of these prior technologies lost a lot of money, but a small number created enormous shareholder value.  By definition, all of the early movers were willing to take risks.  But three characteristics distinguished the winners from the losers.
  • First was an unwillingness to be snowed by conventional wisdom, technical jargon or the fairy tales of universal knowledge that abound when everything was still mostly talk and potential.
  • Second was a strong bias to act quickly at low cost, learn what works from experience, and then reinforce strengths.  The ultimate goal was always to exploit the opportunity to pour cash into successful innovations before the competition, but these companies recognized that trial-and-error learning usually uncovers opportunities faster than master plans.
  • Third was a ruthless focus on profits in excess of capital costs within the foreseeable future as the success criteria for proposed investments of time or money.

This article will attempt to consider the Big Data opportunity from the point of view of the P&L-owning executive.  It will keep experience of these prior technologies front of mind and will focus on one question: How can I use Big Data analytics to increase shareholder value?

Consumer-focused businesses have the latent power to exploit what is now called Big Data.  This data has now been stored and maintained to create a multi-year history that grows year-by-year.  Integrated with this are continuously updated data streams for thousands of weather monitoring devices across continents, social media data from various leading services, detailed individual and micro-geographic demographics, comprehensive business census data, and numerous other datasets.  These transaction and other datasets are growing rapidly in terms of percentage coverage of all consumer transactions, variety of data sources, data granularity, and geographic coverage.

The strategic intent has been to triangulate between business strategy, algorithmic math, and database structures to develop software tools that can change decisions to measurably increase shareholder value.  The size and complexity of this data, in combination with the focus on the creation of shareholder value through competitively superior decision-making, has required a unique process that has led to some conclusions contrary to emerging received wisdom, starting with an unconventional definition of Big Data.

What follows outlines three major analytical approaches for unlocking the latent opportunities of Big Data.  Each can be exploited now, and at least some major corporations are doing so currently.  No source of competitive advantage lasts forever, but some last longer than others.  This review proceeds from those opportunities that we believe to be the most transitory to those that we believe to be the most sustainable basis for long-term success.

1. Do It the Old-Fashioned Way: Exploit Faster Clock Speeds First

Jack Burney walks into a grocery store, shops for twenty minutes, checks out and leaves.  What stored data describes this visit that can be used to improve future decisions?

In most real-life large retailers, the data would be Mary Smith's customer ID number (from the loyalty or credit card program), the store ID number, the time of the transaction, the list of items she purchased, and the price paid (including discount codes) for each.  The retailer might also collect and maintain address, phone number, email and other contact information for Jack Burney, and might also purchase information that describes her credit score and other financial data from third-party service bureaus.  All of the transaction records for all customers for the last several years are collated to create the transaction data warehouse.  This core database is usually on the order of 1 to 100 terabytes for a Fortune 500 company.

Typically, various data "cubes" that can be queried by normal Business Intelligence (BI) tools are hived off by taking abstractions (e.g., transactions summarized to units, sales and margin by product by store by day) or subsets (e.g., all transactions in one product department for the last 12 months) of the complete database.  Major BI tools answer descriptive questions such as "What is the most common product to be bought with diapers?" or "On what day of the week do we sell the most beer in Pittsburgh?"  Cubes are created because of processing speed constraints.

Ten to fifteen years ago a database measured in terabytes was very Big Data.  It required capabilities like specialized database software, query tools, and massive IT support to maintain a whole system around this.  Companies could achieve material competitive advantage by being cleverer about how to structure the data model, design queries, and so forth.  Walmart is probably the most famous example of this.

Ten to fifteen years from now, such a transaction database will be 'small'.  In absolute terms, it will very likely be less than a factor of 10 larger than its current size, while processing and storage productivity will very likely increase by a factor of hundreds to thousands over the same period.  Consumer-sized devices and databases will be able to handle it.  The only cube required will be the database itself.  Faster algorithms, more efficient data models and the like won't matter much, because available processing power will render them insignificant (though better businesspeople will always ask smarter questions and use the answers more effectively than others).

However, this transition won't happen all at once, and a lot of money can be made over the next decade by those who manage it better than others.  The raw data storage itself has already become quite cheap.  In simplified terms, as each click forward in Moore's Law happens, the next most processing-intensive analytical method that used to require a lot of IT investment will then become executable with much cheaper IT tools, and therefore analytical processes that were previously not feasible will then become feasible with high-cost infrastructure.  Eventually, there will be no logically-definable queries that cannot be run on such a database with cheap IT tools.

In practice, we are already at the point where everything but the most complex queries for the very largest of these databases can be executed on general purpose IT tools.  This means that this opportunity is mostly about applying low-cost methods first.  In such cases, legacy systems and expertise are little help, and are often a hindrance.

2. Integrate and Use New Data Types

Numerous digital data sources are becoming available because of a combination of increasing data storage, data processing and data transmission productivity (effective available bandwidth appears to double approximately every 21 months).  Even though these data sources are 'small' when seen in isolation, they are properly considered part of Big Data because the infrastructure required to use them in practice is only now emerging and depends on increasing processing and transmission productivity. Consider as a practical example weather data.  Currently several thousand weather stations across the U.S., and proportionate numbers in other advanced countries, collect weather data and make it available on Web sites or through electronic transmission.  This can now be automatically scraped, transferred into a corporate data warehouse, and integrated with other data to provide useful information.  When the illustrative Mary Smith made her grocery store visit, the weather at that time could then be appended to her shopping record to provide useful context data on her shopping trip.  Similar data is collected and available on everything from demographics to shopping habits to traffic flow.

A truism in predictive modeling is that "better data beats better algorithms."  However, 'better' usually doesn't mean 'more data points of the same type,' but rather 'integrating a new type of relevant data into an analytical cube.'  This is one reason that a cloud services architecture is so essential to a Big Data analytics system: it makes the ongoing capture of such data economically feasible.  Systems which rely only on internally generated company data will be analytically outperformed by systems which can use this ever-expanding plethora of data.

The volume of data in one of these alternative data streams is a function of granularity (number of measurements) and intensity (bits per measurement).  Return to Mary Smith shopping at a grocery store.  Instead of just the transaction data plus weather as context, we might also capture and integrate all relevant social media postings, which would become a much larger database.  Next, we might have RFID chips embedded in shopping carts, shelves and her loyalty card that would identify the path she took through the store, what items she inspected but did not buy and so forth.  In the extreme, we might have full-motion video that shows her every motion from entry to exit.

So, how can we apply data analytics to this to create shareholder value?  A lot of attention is being paid to so-called noSQL methods that try to avoid the computation overhead of current relational databases.  Progress will surely be made here.  Just as with enterprise transaction databases over the past 10 – 15 years, there will be a constant competition between extracting abstractions from these Big databases that are more analytically tractable, building noSQL and similar technologies that allow direct analysis of the large databases themselves, and Moore's Law, which will continue to convert a given data size from Big to 'small.'  And just as with enterprise data warehouses, we will see the development of something analogous to cubes (or more broadly, pre-abstractions of data) as a large part of how these high-volume data streams are used.

What is certain about exploiting the large shareholder value opportunities available from integrating new data sources – low-volume as well as high-volume – is that the ability to flexibly and cheaply extract data from cloud services and use them is essential.  The trade-offs involved in using abstracted data in relational databases versus non-abstracted data in non-relational databases, relying on pre-abstraction versus self-abstraction of high-volume data streams, and so on are not obvious, will change as technology involves, and will depend on the company and the problem.

The thing that is clear is that today – right now – large consumer companies can begin taking advantage of many of these data streams by capturing them at an abstracted level, incorporating them in data schemas, and using them to improve decisions.  Some are already doing this.  Just as with the experience of successful innovators in earlier technology waves, smart marketers shouldn't wait to do this. When and how to move beyond these to the inevitable management of ever-larger datasets with ever-improving technology is best done by trial-and-error and reinforcement of demonstrated successes. This highlights the last and most sustainable source of competitive advantage offered by Big Data.

3. Use Test & Learn to Improve Faster
As early as 2002, business has moved into a Big Data environment, the most advantaged paradigm for the analysis and improvement of business programs migrates from model-building based on historical data to Test & Learn – in plain English, trying out new ideas in small subsets of the business, and making predictions based on these tests.  Over the past year, many industry analysts have started to recognize that Test & Learn is central to making Big Data create value.
  • McKinsey's 2011 overall report on Big Data calls out five ways for Big Data to create value.  The second is: "Enabling experimentation to discover needs, expose variability, and improve performance."
  • In 2012, a Forrester Research blog discussion of Big Data claimed that "real-world experiments will become the new development paradigm … it's clear that real-world experiments are infusing data management and advanced analytics development best practices more broadly."
  • The number one lesson called out at the 2012 MIT conference on how to exploit Big Data: "Faster insights with cheap experiments."

Big Data – and more broadly, radical reductions in the unit costs of storing, processing and transmitting data – drive this transition.  First, it has become practical to use IT to automate and semi-automate many aspects of the testing process, which lowers the cost of testing enormously.  Second, widespread sensor and other data streams allow more granular measurement of effects, and superior targeting of actions based on these tests.  Third, the explosion in data means that models built to exploit these much larger datasets are increasingly difficult to evaluate and calibrate due to their complexity, and experiments become essential to do this.

In my experience over the past decade, a Test & Learn capability for a major marketer requires a specialized analytical platform supported by a Big Data infrastructure, but also has several process and organizational components.

First, the sine qua non is executive commitment.  The person or small group with ultimate operational responsibility for shareholder value creation, typically the CEO or President, must legitimately desire reliable analytical knowledge of the business.  This implies several crucial recognitions: that intuition and experience are not sufficient to make all decisions in a way that maximizes shareholder value, that non-experimental methods are not up to the task of determining causality with sufficient reliability to guide many actions, and that experiments can be applied in practice to enough business issues to justify the costs of the capability.

Second, a distinct organizational entity, normally quite small, must be created to design experiments and then provide their canonical interpretation.  It must have analytical depth, a professional culture built around experimental learning, and an appropriate scope of interest that cuts across the various departments that will have programs subject to testing.  It should have no incentives other than scorekeeping; therefore, it should never develop program ideas, nor ever be a decision-making body.  The balance that must be struck, however, is that it should remain connected closely enough to the operational business that it does not become academic.

Third, a repeatable process must be put in place to institutionalize experimentation as a part of how the business makes decisions. This lowers the cost per test, ensures that learning is retained, and maximizes the chances of the experimental regime outlasting individual sponsors and team leaders. The orientation should not be to big, one-time "moon shot" tests, but instead toward many fast, cheap tests in rapid succession whenever this is feasible.


Technology Integration Lab
Sceda Systems
www.scedasystems.com

Friday, May 25, 2012

Cloud Services - Design Considerations


We all realize that SOA and Cloud computing go hand in hand and are complementary in nature. After all we talk, about everything as a service (XaaS) in the Cloud world. So the immediate question that comes to mind is, would the service that has been designed to be consumed in-house would serve equally well if it were hosted on the cloud? The answer apparently is "Not really". Services designed as per SOA principles (clear contract, standalone functionality) would probably be much easier to migrate to Cloud, but there certainly is a need to re-look at some of the design patterns that will be crucial in cloud orienting the services. Let's identify some of the crucial ones. 

Data Storage: Conventional services assume that transactional data would be stored typically in a normal RDBMS. However that might not be the best way moving forward to the cloud. You can sure host RDBMS on the cloud but infinite scalability becomes a problem as the data store is still centralized. Depending on your cloud platform check if it makes more sense in storing data in platform specific storage (Ex: Azure Table Storage and Amazon Elastic Block Store). 

Enhanced algorithm: (Use less compute power) This in my opinion is something which we should have been practicing all along. But we all come across smelly code and its associated in-efficiencies. That still did work in the conventional environment. But with Cloud, we pay for the compute cycles. So non-optimized code and algorithms result in direct cost to the organization. So the next time around when you see someone do a one to one string compare before sorting them or putting them in a hashtable , it would be worth taking the extra effort and time and getting it modified even if it means missed deadline. 

Messaging and Queuing: With services now being hosted on the cloud, extra care needs to be taken to ensure that the messages are not lost even if for some reason the underlying services are down. Remember, you are talking about cloud, where your capability is limited by the underlying cloud providers. Queuing provides you the additional layer of reliability where the messages are never lost and you have a robust implementation even on the cloud. 

Idempotent Capability: The same request even if sent multiple times should only be processed once. Though this is something which we have implemented in our conventional applications, the requirement is more so in a cloud transactional service. It wouldn't be a bad idea to send some unique identifiers in service headers for critical transactions to implement this capability. 

Security - Federated model: This I am sure would be faced by almost all Cloud architects. While moving services to the cloud, you would surely want to authenticate and authorize the requestor. It probably is a good idea to leverage your existing LDAP rather than creating one from scratch on the cloud. A federated LDAP (ex: ADFS) would surely help. 

Multi-tenancy: Multi-tenancy is going to be commonplace on the cloud environment especially in the SAAS world. So if you want to design services that would be used by multiple consumers in similar manner, do have multi-tenancy at the core of your design 

Integration: In all likelihood, your cloud services need to interact with your in-house services. A conventional middleware might just work but do look at the intricacies like transaction maintenance, seamless security, back-end data integration etc. Microsoft has come up with the Azure Bus specifically designed for the purpose
This gives an idea about the intricacies that one can broadly expect while moving to the cloud. 

Technology Integration Lab
Sceda Systems
www.scedasystems.com


Thursday, April 05, 2012

Everything You Wanted to Know About Data Mining but Were Afraid to Ask

Big data is everywhere we look these days. Businesses are falling all over themselves to hire 'data scientists,' privacy advocates are concerned about personal data and control, and technologists and entrepreneurs scramble to find new ways to collect, control and monetize data. We know that data is powerful and valuable. But how?  This article is an attempt to explain how data mining works and why you should care about it. Because when we think about how our data is being used, it is crucial to understand the power of this practice. Without data mining, when you give someone access to information about you, all they know is what you have told them. With data mining, they know what you have told them and can guess a great deal more. Put another way, data mining allows companies and governments to use the information you provide to reveal more than you think. 

To most of us data mining goes something like this: tons of data is collected, then quant wizards work their arcane magic, and then they know all of this amazing stuff. But, how? And what types of things can they know? Here is the truth: despite the fact that the specific technical functioning of data mining algorithms is quite complex -- they are a black box unless you are a professional statistician or computer scientist -- the uses and capabilities of these approaches are, in fact, quite comprehensible and intuitive.

For the most part, data mining tells us about very large and complex data sets, the kinds of information that would be readily apparent about small and simple things. For example, it can tell us that "one of these things is not like the other" a la Sesame Street or it can show us categories and then sort things into pre-determined categories. But what's simple with 5 datapoints is not so simple with 5 billion datapoints.
And these days, there's always more data. We gather far more of it then we can digest. Nearly every transaction or interaction leaves a data signature that someone somewhere is capturing and storing. This is, of course, true on the internet; but, ubiquitous computing and digitization has made it increasingly true about our lives away from our computers (do we still have those?). The sheer scale of this data has far exceeded human sense-making capabilities. At these scales patterns are often too subtle and relationships too complex or multi-dimensional to observe by simply looking at the data. Data mining is a means of automating part this process to detect interpretable patterns; it helps us see the forest without getting lost in the trees.
Discovering information from data takes two major forms: description and prediction. At the scale we are talking about, it is hard to know what the data shows. Data mining is used to simplify and summarize the data in a manner that we can understand, and then allow us to infer things about specific cases based on the patterns we have observed. Of course, specific applications of data mining methods are limited by the data and computing power available, and are tailored for specific needs and goals. However, there are several main types of pattern detection that are commonly used. These general forms illustrate what data mining can do.

Anomaly detection: in a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit.

Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.

Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example - again with purchasing behavior - we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this - large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.

Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.

The patterns detected and structures revealed by the descriptive data mining are then often applied to predict other aspects of the data. Amazon offers a useful example of how descriptive findings are used for prediction. The (hypothetical) association between cocktail shaker and martini glass purchases, for instance, could be used, along with many other similar associations, as part of a model predicting the likelihood that a particular user will make a particular purchase. This model could match all such associations with a user's purchasing history, and predict which products they are most likely to purchase. Amazon can then serve ads based on what that user is most likely to buy.

Data mining, in this way, can grant immense inferential power. If an algorithm can correctly classify a case into known category based on limited data, it is possible to estimate a wide-range of other information about that case based on the properties of all the other cases in that category. This may sound dry, but it is how most successful Internet companies make their money and from where they draw their power.


Technology Integration Lab
Sceda Systems
www.scedasystems.com