Saturday, 3 March 2018

The Trouble With Algorithms

With so much of our attention taken up with the on-going effects of the TR omnishambles, there's hardly any scope for discussion of practice issues, but I spotted this on Facebook recently by David Raho:-

Do the various organisations now involved in delivering Probation services bring with them ingrained prejudices from their original jurisdictions? Are the systems algorithms that we use in assessment neutral or loaded (eg culture, race, religion etc) against particular groups? Do we perpetuate the inequalities in the system by tapping into our own deeply held values and beliefs or do we constantly challenge and revise them?

When I first joined Probation in the 1980s I was given what I have always considered to be good advice. This was to distrust all tools such as those used to produce risk of offending scoring particularly those said to work in other jurisdictions and also to be deeply suspicious of everything that was ever produced as an instruction/directive by the employers (then the Home Office and local authority) as it was invariably nonsense. I was also advised to constantly challenge my own values and beliefs and my personal motivation for doing Probation work and if I wasn’t fairly certain why I wanted to do it or happy about doing it for those I was doing it for, then it was best to go and do something else for those I wanted to work with or for. I was also told that the best training for Probation for graduates was 6 months on a building site. I have always found that advice useful and only come a-cropper when I have not followed it or lingered in roles or teams I should have left far sooner than I did.

As for predictions of criminal behaviour. This is a very imprecise science but it has always been the case that past behaviour is a fairly good predictor of future behaviour so unpicking the reasons for offending and working through these in some form of systematic or purposeful way in supervision may be useful in reducing it in many cases ie not just a standard tick box check in.


Read The Black Swan by Taleb. He says that risk tools (in his case financial risks) are based on mass data and so regress to the mean. Sudden shocks are unpredictable as rare events do not arise from the predictable norms. Now Risk Tools are exactly analogous with this. They are based on mass data. They CANNOT predict who will do it. All they can do is give an idea of the TYPE of person who MIGHT.

But any tool will drown you in false positives. So if you have an assessment tool with 99% accuracy, 100,000 people and 100 murderers, what will happen is that it will say 1 murderer is safe; identify 99 correctly as killers; Identify 99,000 non killers; and misidentify 1000 killers. 

Every 1% error in accuracy reduces your killspot by 1 and adds 1000 false positives. So it matters not a fig how many 'factors' you use. None will get you to 100% and any less buries you in false noise. The proof of this is, do you use OAsys to inform your day to day work on risk? No me neither! Do you take much account of OASys once completed? Also no me neither. What we do is we use our informed instinct and gut to decide who to worry about...and then another person does something. 

RSR proves the pointlessness of the whole Risk Fetish process. You 'score' 6.8% and you are so scary dangerous that you must be in the NPS. At that point close on 19/20 are NOT going to do anything serious even if we do NOTHING AT ALL. Yet they all need close monitoring. The whole thing is Voodoo Probation.  Note that 6.8 is an 93.2% false positive rate. Ponder that for a moment.

Risk is a constant dynamic. Only regular contact can measure it. However static measures should be the reality check, no matter what. The right balance.

If you are way out of whack with the static factors you need a solid argument as to why. The static factors as such tell you very little.


Here's the American article that prompted the discussion:-

Computerized Criminal Behavior Predictions Are No More Effective Than Untrained Humans: Report

Without effective scrutiny, algorithm-based software could hurt those who are already the most vulnerable.

The effectiveness of the criminal justice system has been debated since its creation. There is great difficulty in developing a uniform system when criminal defendants’ circumstances are variable. Thanks to recent coverage of police shooting, sexual assault cases and self-defense trials over the last few years, the criminal justice system has become interwoven with our daily news of politics, government and pop culture. It doesn’t take long to see the system operates in favor of those with power and influence while being disadvantageous for those with a history of systemic vulnerability. It is inescapable, and it is becoming increasingly apparent that the system is flawed.

We had hoped that in the age of technology, we could eradicate bias by putting computer programs in place of our old systems. With algorithm-based systems, we can make faster, less variable predictions about the likelihood of people ending up in the criminal justice system again, or recidivism. But it’s become increasingly apparent that automating the process made things worse because now we have taken old bias and embedded it by teaching it to computers. We hoped machines could provide the fair treatment humans have failed to give criminal defenders and past offenders—but they haven’t. And it turns out, machines may not be any more effective than humans at predicting recidivism.

“People hear words like ‘big data’ and ‘machine learning’ and often assume that these methods are both accurate and unbiased simply because of the amount of data used to build them,” said Julia Dressel, whose senior undergraduate honors thesis in computer science at Dartmouth College is gaining national attention.

Earlier this year, Dressel released a report in conjunction with computer science professor Hany Farid at Dartmouth College titled “The accuracy, fairness, and limits of predicting recidivism” in the Journal of Science Advances. The two evaluated the risk assessment software COMPAS—Correctional Offender Management Profiling for Alternative Sanctions—and the results were shocking. Participants only had seven details about offenders, compared to 137 given to COMPAS, and were accurate 67 percent of the time.

The conclusion is that untrained experts are making criminal predictions with considerably less information than COMPAS.

“We have shown that a commercial software that is widely used to predict recidivism is no more accurate than the predictions of people with no criminal justice expertise who responded to an online survey. We have also shown that the COMPAS prediction algorithm is equivalent to a very simple classifier,” says Dressel.

Predicting criminal behavior is serious business. It’s critical that we use this and similar research to evaluate ways to improve the programs we are using to make determinations on people’s futures. Without effective scrutiny, algorithm-based software could hurt those who are already the most vulnerable.

Dressel’s research isn’t the first time COMPAS has been under scrutiny. In 2016, ProPublica presented an in-depth analysis of the software that found high levels of racial bias. Findings included heightened and false predictions on black recidivism, a false reduced risk for white recidivism and black defendants misclassified as having a higher risk for violent offenses.

“Algorithmic tools sound impressive, so people are quick to assume that a tool’s predictions are inherently superior to human predictions. It’s dangerous if judges assume COMPAS produces accurate predictions, when in reality its accuracy is around 65 percent. Therefore, it’s important to expose when software like COMPAS isn’t performing as we expect. We cannot take any algorithm’s accuracy for granted, especially when the algorithm is being used to make decisions that can have serious consequences in someone’s life,” Dressel continued.

Along with being equally as effective in predictions, participants were equally as biased, particularly in the area of race.

“Our research suggests that the two most predictive criteria of recidivism are age and total number of previous convictions. On a national scale, black people are more likely to have prior crimes on their record than white people are (black people in America are incarcerated in state prisons at a rate that is 5.1 times that of white Americans, for example). Within the dataset used in our study, white defendants had an average of 2.59 prior crimes, whereas black defendants had an average of 4.95 prior crimes. The racial bias that appears in both the algorithmic and human predictions is a result of this discrepancy,” Dressel explained.

False predictions about criminal defendants and recidivism affects people’s lives. The predictions made by COMPAS and similar software are used to determine bail totals, prison sentences and eligibility for parole. Unfortunately, more often than not individuals of color, black men in particular, are the ones whose lives are destroyed by these mistakes.

Over the last 30 years, the Sentencing Project has been at the forefront of the discussion around criminal justice reform. Their research efforts help spread awareness of the way race affects experience within the criminal justice system, the long-term isolation of felons and suggestions for improving sentencing.

“More than 60 percent of the people in prison today are people of color. Black men are nearly six times as likely to be incarcerated as white men, and Hispanic men are 2.3 times as likely. For black men in their 30s, 1 in every 10 is in prison or jail on any given day,” says Kara Gotsch, the director of strategic initiatives at the Sentencing Project.

Given our nation's history, this information is troubling, yet not surprising. The real conflict comes in when we consider the struggles associated with life after prison and its continuous stigma.

“People with criminal records face significant collateral consequences of their conviction, including barriers to voting, employment, housing, and financial public assistance. These barriers complicate the reintegration process after incarceration and likely increase the odds a person will recidivate,” Gotsch explained.

The biases people of color are exposed to when interacting with the criminal justice system are paired with the everyday experiences of personal and systemic racism. Think of it this way—a black man with a college degree has fewer employment opportunities than a white man with a high school diploma. If you add a criminal record, finding gainful employment can be nearly impossible. Our criminal justice system takes the usual oppression and multiplies it, leaving additional barriers to self-sufficiency for black and brown people.

According to research conducted at the Sentencing Project, the problems with our criminal justice system go much deeper than furthering racism. Incarceration is often used as a substitution for issues of addiction and mental illness that would be better treated with personalized treatment programs than with removal from society.

“Public education about the United States’ exceptionalism in its use of incarceration is critical to demonstrate that there is another way to address public safety concerns. The U.S. relies on incarceration and severe punishments to address circumstances that really result from our broken social safety net, like inadequate access to medical and mental health treatment, including drug treatment. Incarceration will not solve these problems, but community investment in services can,” says Gotsch.

If we want to do more to assist individuals who are at risk for committing crimes instead of hiding them away, we must spend more time evaluating how we interact with criminal defendants. Dressel’s honors project shedding light on both the flaws of algorithm-based punishments and ever-present human implicit bias is a step in the right direction.

Since the studies publication, Northpointe, Inc. (now Equivant), which owns the COMPAS software, wrote an “Official Response” on its website that alleges Dressel’s research “actually adds to a growing number of independent studies that have confirmed that COMPAS achieves good predictability and matches the increasingly accepted AUC standard.” However, they make no mention of the racial bias and say they will review the materials from the study for accuracy.

Our criminal justice system needs fixing, and following technological shortcuts isn’t the answer. We need to do more as a nation to make sure we aren’t derailing someone’s life for things we could have fixed through early intervention.

The takeaway? According to Dressel, it’s being aware that algorithms are imperfect and that reducing bias will require intentional effort—but there is hope. “Algorithms can have racially biased predictions even if race isn’t a feature of the algorithm. It could be possible to develop algorithms void of racial bias. However, in this field, as research continues it’s important that we assess the algorithms for bias at every step of the way to ensure the tools are performing as we expect,” she concluded.

If companies won’t take responsibility for making racial bias permanent, it’s up to us as a community to bring attention to racial disparities. Both the Sentencing Project and Dressel’s research are a step in the right direction.

Rochaun Meadows-Fernandez

A. Rochaun is a writer, speaker, and activist with a passion for learning, located in Wyoming. Due to natural talent, her career has gained noteworthy momentum in a very short amount of time. While capable of writing on a wide variety of subjects, she specializes in Black health, parenting, and diversity education. You can read her content in the Washington Post (Print and Web), New York Mag, Athena Insight, and many more. Her work is also republished in the Chicago Tribune and other long-standing, reputable media outlets. She is an honor graduate of Texas Woman’s University and member of Alpha Kappa Alpha Sorority as well as Omicron Delta Kappa.


A couple of comments:-

Mostly, this article makes some good points. The so-called experts are often wrong in predicting future crimes. These experts frequently have their own biases and prejudices, and that could interfere with the accuracy of their predictions. Even with algorithmic systems that run on computerized calculations, the algorithms that are used in making such programs are often influenced by the biases and prejudices of those designing the programs. Nobody can always predict what influences may occur in people's lives that might turn them around.

However, increased mental "health care", which this article made a few statements promoting, can still be oppressive. Psychiatry has its own biases. Its drugs often have suppressive and debilitating effects. Psychiatry has often incarcerated people. Psychiatric "diagnoses" are often influenced by the biases of the psychiatrists doing the "diagnosing". While the criminal justice system in this country is far from perfect, defendants get better due process there than most people who are dragged before psychiatrists. Dissident psychiatrist Thomas Szasz mentioned the last point in his book "Psychiatric Justice"(www dot szasz dot com).

Good intentions but totally misses the key flaw.

Basing incarceration, whether initial or continuation of it, on any form of prediction, whether by humans or algorithms, is a form of preventive detention and should be recognized as such. The key problem here is the high pathological and completely dysfunctional attitude that once the judicial system has someone in its clutches it should not release them until and if it is convinced that they will not reoffend.

Incarceration has only two valid purposes: punitive, as a disincentive to reoffend and rehabilitative – to give the incarcerated the abilities and resources to deal with the problems that led them to offend in the first place. Preventive detention when recognized as such, has always been treated as the severe cultural and political pathology it is. Disguising it under a fa├žade of computer algorithms or "predictions" by humans should make no difference in how it is treated.

A miserable accuracy rate of 65% means that something is only doing 15% better than pure chance and that almost 1/3 of those incarcerated on the basis of such a prediction should not have been. This should be considered totally unacceptable in any society that has any pretensions whatsoever of being a democracy.

The real irony here is that many people were driven to creating crimes in the first place by constant and unrelenting abuse by the application of algorithms instead of being judged by their personal performance and history. The use of algorithms to save the short-term costs of judgments based on individual records and characteristics is always at the expense of far greater long-term costs due to inevitable higher inaccuracies. [ADDED: It's always done because those doing it get the minor short term savings while it is society that pays the vastly higher long term costs. The USA pathological version of capitalism is deeply inherently highly predisposed to such rank abuses.]

The victims of algorithmic based judgments almost inevitably become aware of exactly what is going on and that awareness has highly corrosive effects on their judgment and activity. You can control your behavior but have almost no control over many of the characteristics used in algorithms. So the highly pathological message that victims of algorithmic based judgments get is that what they do is almost irrelevant.


  1. Anyone who has ever had an OASys report done on them knows full well that as a tool for calculating risk it's a load of absolute cobblers.

    It's not helped by the fact that probation officers fail every single day to accurately record information about the client in their records as they are legally obligated to under the Data Protection Act 1998. They fail to check that the information 3rd parties give them is accurate as they are legally obligated to which skews what OASys spews out even further. A result is only as good as the information put into the assessment notwithstanding the effectiveness of the actual tool in the first place.

    And as for actual risk, well considering every human on the planet is capable of murder in the right set of circumstances, it's actually impossible to predict the risk of anyone committing a crime. Some people you think will, won't and others who have but never been caught will continue to do so and those who have never committed an arrest worthy crime suddenly will. It all depends on what happens to that person on any given day and none of know when we get up in the morning what the day will bring.

  2. I was voluntarily involved in the genesis of OASys (I know!! but I thought a single assessment tool was a good idea in principle) & whilst it was a fascinating exercise to begin with I was variously horrified, dismayed & angered by the arrogance of the lead psychologist who had a very clear Home Office agenda within rigid parameters. I had a similar experience when being introduced to Thornton's RM2000 in that the lead clinician would not entertain any discussion of exceptions. In both cases my frustrations were subordinate to the tsunami of nausea generated by the fawning & fanboy/fangirl responses of colleagues; & management in particular.

    1. Context - a single assessment tool being a good idea in principle as opposed to each probation area having their own homegrown version, which was problematic &/or confusing when cases were being transferred. Points made by 09:02 are valid & were among the issues & concerns dismissed or ignored when they were raised with lead clinicians.

  3. The trouble with Al Gore-isms ... is that they're apocryphal. But one of my favourites is:

    "A zebra does not change its spots"

    1. Mi diaspascio - I have a lisp. It still didn't stop Gore claiming he'd "invented the internet"!!!

  4. “Risk management” justifies “enforcement” and is the biggest lie probation officers delude themselves in to believing! For the new breed PO’s (those that don’t know or don’t care what probation really is because it ceased having any meaningful purpose some years ago) “risk management” and “public protection” makes them feel purposeful and important.

  5. I have frequently asked the service users I worked with if they thought they were dangerous, what kind of dangers they posed, to whom, and what tends to trigger the behaviours which pose risks to others. Within a good working relationship those kinds of conversations beat 1000 oasys’es. I ask the same questions of myself. Driving is where I’m risky.

  6. The police also are having problems with algorithms. I found this article in Wired this week pretty interesting.


    1. An algorithm designed to help UK police make custody decisions has been altered amid concerns that it could discriminate against people from poorer areas. A review of its operation also found large discrepancies between human predictions and those made by the system.

      For the last five years Durham Constabulary and computer science academics have been developing the Harm Assessment Risk Tool (HART). The artificial intelligence system is designed to predict whether suspects are at a low, moderate or high risk of committing further crimes in a two years period.

      The algorithm is one of the first to be used by police forces in the UK. It does not decide whether suspects should be kept in custody but is intended to help police officers pick if a person should be referred to a rehabilitation programme called Checkpoint. The scheme is designed to intervene in proceedings rather than push people through the UK's court system.

      HART uses data from 34 different categories – covering a person's age, gender and offending history – to rate people as a low, moderate or high risk. Within these data categories is postcode information. The police force is now removing the primary postcode field, which includes the first four digits of Durham postcodes from the AI system. "HART is currently being refreshed with more recent data, and with an aim of removing one of the two postcode predictors," a draft academic paper, published in September 2017, reviewing the use of the algorithm reads. The paper was co-authored by one member of the police force.

      "I have a concern about the primary postcode predictor being in there," says Andrew Wooff, a criminology lecturer at Edinburgh Napier University, who specialises in the criminal justice system. Wooff adds that including location and socio-demographic data can reinforce existing biases in policing decisions and the judicial system. "You could see a situation where you are amplifying existing patterns of offending, if the police are responding to forecasts of high risk postcode areas."

      The academic paper, the first review of HART to be published, states that postcode data could be related to "community deprivation". A person's "residence may be a relevant factor because of the aims of the intervention," the paper continues. If the postcode data is relied upon for building future models of reoffending then it could draw more attention to the neighbourhoods. "It is these predictors that are used to build the model – as opposed to the model itself – that are of central concern," the paper states.

      The paper also highlights a "clear difference of opinion between human an algorithmic forecasts". During initial trials of the algorithm, members of the police force were asked to mimic its outcomes by predicting whether a person would be of a low, moderate or high risk of reoffending. Almost two-thirds of the time (63.5 per cent) police officers ranked offenders in the moderate category. "The model and officers agree online 56.2 per cent of the time," the paper explains.

      WIRED contacted Durham Constabulary with questions about its alterations to the algorithm but had not received a response at the time of publication.