Monthly Archives: February 2012

Data assumptions can be deluding

Here’s a slightly critical add-on to my previous post about Learning Analytics and EDM. This year’s Learning and Knowledge Analytics course (LAK12), once again, brings up some highly valuable perspectives and opportunities for developing new insights, models, and harvesting possibilities for learning in general. However, this should not stop us from being aware of delusions and assumptions that are somehow orphaned in the ongoing discussions. In particular, I want to mention three potential pitfalls that are perhaps too much taken for granted:

  • data clenliness
  • weighting
  • meaningful traces

When developers talk about their products, everything looks shiny and wonderful. All the examples shown work smoothly and give meaningful results. This makes me pause, for while the ideation of new analytics technology is a wonderful thing and anticipated with much enthusiasm, the data isn’t always as clean as it is presented to be in the theoretical context. Most, if not all data processes have to undergo a clensing process to get rid of datasets that are “contaminated”. A good example is the typical “teacher test” content found in virtually every VLE database. It’s not always clearly indicated as “test”either, so in many cases extensive manual processes of eliminating nonsensical data have to be conducted. It should, therefore, be standard practice to report on how much data was actually “thrown away” and on what basis. Not that this would discredit in any way the usefulness of the remaining dataset, but indicate the amount of automated or manual selection that has gone into it.

This necessarily leads to questioning the weighting of data. By which mechanism are some datasets selected as being meaningful and at which priority level over others. Very often, the rational behind the selection of variables is not exposed neither is the priority relationship to other variables in the same dataset. Still it must be transparent whether e.g. the timing, the duration, or the location of an event is given more weight when predicting a required pedagogic intervention (or not). After all, a young person’s future may depend on it.

From the above two limitations results a third, that concerns the question of what is a meaningful trace a user leaves on the system. We know users leave data behind when conducting an electronic activity. These can be sequenced by time, but it is by far not clear where the useful cut-off points of a sequence or ‘trace’ are. Say you had a string of log data A-B-C-D-E-F-G-H. Does it make more sense to assume BCD constituting meaning or would CDEF perhaps be better – and why would it be better?

I realise that these questions could be interpreted as destructive criticism, but we have one other possibility, which is to just take the results conjured up in a black box at face value and see if they look plausible no matter how they were derived. This we could call the Google approach.

The twinning of EDM and Learning Analytics

After listening to Ryan Baker’s presentation on Educational Data Mining (EDM), I am more convinced than ever that EDM and Learning Analytics are actually the same side of the same coin. Despite attempts being made to explain them into different zones of influence or different missions, I fail to see such differences, and from reading other LAK12 participants’ reflections, I am not alone in this. Baker’s view that Learning Analytics are somewhat more “holistic” can be refuted with a simple “depends”. What is more, historically, EDM and LA don’t even originate from different scientific communities, such as is the case with metadata communities versus librarians, or with electric versus magnetic force physics – now of course known as electromagnetism.

Both approaches (if there are indeed two) are based on examining datasets to find ‘invisible’ patterns that can be translated into information useful to improve the success and efficiency of the learning processes. A good example Baker mentioned was the detection of students that digress, misunderstand, game the system, or disengage. It’s all in the data.

I would also like to believe that predicting the future leads to changing the future, at least it could give users the air of being in control of their destination. As a promotional message this has quite some power. But even in support of reflection the same can be postulated: knowing past performance can help your future performance! So, once again a strong overlap between predictive and reflective application of data analytics.

For me, all of this can only lead one way: instead of using efforts and energies to differentiate the two domains, which would only lead to reduced communities both ends, and friction in between, we need to think big and marry them into one large community and domain: Let’s twin EDM and LA!

Crowdsourcing while learning

This is an interesting approach. Duolingo promises to translate the Web while users learn a language. I liked their nice intro video, which explains how it works: Duolingo uses your native speaker skills to have you translate foreign sentences into your own language. This is a sensible approach which is also used in translation sciences and interpreting – always translate into your native tongue.

How do you translate from a language you don’t understand? Duolingo adjusts to your competence level and provides help on the fly, such as translation suggestions. I suppose this approach works reasonably well with languages that have an association to yours, e.g. English and Spanish (through their shared Latin vocabulary: library – libereria). But I’d be interested to see how this is done with Chinese or Finnish.

Google does similar stuff with its translation service, but what’s innovative here is that Duolingo promises learning in return for your translations. There are two open questions for me: Firstly, what does translating the Web mean? i.e. how are the translations fed back to the Web, and will they be free and open? Secondly, what is the learning model behind, since merely translating sentences only gets you so far in language learning? In language learning you want to be able to produce the other idiom not only understand it passively. How are repetition and grammatical structure analysis incorporated in the tool?

It’s in private beta, but I am curious about the didactical model once I get access to it.

End for for-profit HE in the UK

A good post by Sean Mehan commenting on the UK government dropping a bill that would have allowed private for-profit companies to enter the HE market. Refering to a news item in the Telegraph, Sean writes:

The legislation would have allowed state loans to go into profits for for-profits, even allowing foreign companies (yes, companies, not institutions, for that is what they are), into the mix. So, UK taxpayer money goes to profits in a foreign country, while the national infrastructure is forced to compete or rot.

Quite right! Add to this monetary concern the socio-intellectual one, that such a privatisation move would severly damage the mission of HE education to serve wider society not a handful of shareholders.



Metacognition and Learning Analytics

Following the first live session in the latest MOOC on Learning and Knowledge Analytics (LAK12), I did some reflection on the direction that Learning Analytics has taken over the past year or two. As far as I can see, Learning Analytics follows to a large extent the line of web analytics, but with the intention to improve learning by gaining insights into hitherto invisible connections between user characteristics and actions.

However, web analytics has, I believe, a very different objective when analysing people’s navigation patterns and tracking their activities online. This objective is to better influence user behaviour in order to direct them (unknowingly and personalised) to the pages and activities that matter – to the company not the user. In almost parallel, the expressed attitude and examples brought forward in favour of Learning Analytics, puts the main focus on understanding and influencing learner behaviour, and only to an extremely limited extent if at all, their cognitive development.

An often mentioned example is that of a jogger who trains up for a marathon run, and through collection of performance data becomes more motivated, is able to see progress, compares this to other runners, etc. Similarly, tools that track the usage of software applications on your computer, provide feedback that is useful if you think you should change the amount of time you spend on e-mails. Equally, tracking your own smoking or eating habits, will hopefully lead to achieving a personal goal. These are all valid examples where and how feedback loops can improve a person’s acustomed performance.

It is vitally important, though, that if Learning Analytics is supposed to make a beneficial impact on (self-directed) learning, it does not stop at manipulating learners in a way that these are merely conditioned into different behaviours! It is not enough to check behaviour patterns of learners even though some such feedback might be helpful at times. We need more LA applications that support metacognition and cognitive development. Even memory joggers are quite useful at this. One of the oldest I am familiar with and which I haved used to great benefit are vocabulary trainers. In using those, I could see that in the first run, I was able to answer maybe 46% of a given wordlist, increasing to 65% in the next run. Over only a few runs I was able to answer 96% of all questions. Not only was this summative feedback in % a motivator and excellent for my own benchmarking; I also was able to detect decline in memorised vocabulary and identify which words I was most likely to forget, once I stopped actively revising (say three weeks later).

Since I am most interested in cognitive development and less in learning behaviour patterns, I would like to see more Learning Analytics tools that allow this to happen.