Recommender Systems for Social Impact

A fair portion of our work, especially in the areas of education, school improvement, and sustainability analytics, involve systems designed primarily to assess – to collect data on something's or someone's performance, and then present analysis. And to see anything like a successful application of an action-research cycle, performance assessment and related metrics are critical. They're also often easy to model in software: forms can collect directly solicited information (a survey, a questionnaire), and broader information can be gathered passively (time spent engaged with a tool, contextual environmental data, past performance indicators). From there, simple statistical analysis, against personal history, benchmarks, and peers, can yield "results" – an evaluation, a scorecard, a set of findings, a compelling visualization.

The thing is, though, the ultimate goal of these system processes is not actually to generate a report. It's to identify deficiencies for correction. Which got us scratching our heads: could we engineer a recommendation system that could tie into assessment platforms to suggest reasonable next steps for improvement?

Predictive analytics and recommender systems, though rapidly evolving, are already a mainstay in commercial e-commerce. Of course Amazon suggests a pouf after you've purchased an area rug, Netflix recommends The French Connection after you've watched The Conversation. But what about rational suggestions for improving the outcome of classroom teachers' professional development activities?

Often e-commerce recommender algorithms, after factoring in perhaps some contextual and demographic data, rely on frequency of grouping. "Collaborative filtering" provides a handy mechanism to determine likely-to-be-popular suggestions based on existing patterns of users and items. But in the case of social systems, popularity, though maybe helpful as a palatability-check, is not necessarily the best direction driver. Social improvement, I think it's safe to say, rarely follows the crowd.

Another challenge in making recommendations in social system improvement frameworks is the volume of historic data available usually is relatively minuscule. Though plenty of global knowledge might exist – someone working in the field probably has both anecdotal and methodically collected data on what's likely to work in a given situation – an extensive historic pattern of assessment -> recommendation -> action -> result -> assessment within a system's database itself is unlikely.

As an initial step to a possible solution, we came up with what we believe is a unique algorithm comprised of three primary components:

  • Metrics and benchmarks. Metrics-based analyses are integral to many improvement processes, and are used often in the social responsibility domain to certify supplychains, assess school quality, index health equity and social determinants, evaluate the environmental performance of buildings, and so on – areas for which we've helped build systems already. How does an assessment result benchmark against its historic self, its peers, or a specific ideal?

  • Human expert opinion. What options are actually available to recommend? Which have been found (in contexts outside of what has been recorded in the system's history) to be effective? Which are deemed detrimental?

  • Popularity. A dose of collaborative filtering logic, to increase the likelihood that a particular recommendation will be found acceptable enough to adopt.

So far, a few trial applications of the algorithm concept seem to work well. That is, by adjusting various parameters within each of the three components, and adjusting how the three integrate and interact with each other, we've created model systems that can generate plausible recommendations for given scenarios. And that's exciting!

The real question now, however, is how useful will it be in the wild? Even with a recommender system that sees high event traffic, assessing efficacy can be challenging. Did someone click on a recommended book title? Would they have looked for it anyway? Would they have enjoyed an alternate suggestion better? In a social system where there might only be few hundred users total, and a recommended path of action might take months to implement and even longer to realize results, quantitatively assessing the performance of the recommendation algorithm may be a bit of a nut.

We shall see....