Categories
conference Data Technology

Your algorithmic future: weapons of maths creation and destruction

Science Fiction writer William Gibson said “The future is already here, it’s just not widely distributed.” When you look around you can see the truth of that statement. Most of the technologies that will influence us over the next few decades already exists. In many ways it feels like we’re living in parts of that future. We can 3-D print replacement jaws for people. And 3D printing was invented over 30 years ago. In NDRC, where I work, we have companies working on embedded sensors for post operative bleed detection, and working on helping kids with focusing and ADHD problems through neuro-feedback game play. [1]  In many ways technology is enriching our lives. In reality the title of this piece is less ‘Our Algorithmic Future’ than ‘Our Algorithmic Present’.

As a technophile that’s very exciting. I have a deep and abiding love of science and the wonderful possibility of technology. I grew up reading Isaac Asimov (his science and his fiction), Arthur C Clarke and Carl Sagan. And watching Star Trek, Tomorrow’s World and other optimistic visions of technology and the future.

At the same time there is a darker side to technology. Paul Erlich said “To err is human, to really foul things up requires a computer.” It’s not hard to find examples. California released 450 high-risk, violent prisoners, on an unsuspecting public in 2011, due to a mistake in its computer programming. ‘We-connect’ an app based vibrator which captures the date and time of each use and the selected vibration settings, and transmits the data — along with the users’ personal email address — to its servers in Canada “Unbeknownst to its customers” a number of whom are now suing the company.[2]

And most dark of all is the case of the firing of elementary school teacher Sarah Wysocki by Washington DC Public schools. The school system used “VAR”, a Value Added statistical tool to measure a teacher’s direct contribution to students test results. Despite being highly regarded in classroom observations the low score from the algorithm led to her being fired. There was no recourse or appeal. And no way to really understand the working of VAR as they are copyrighted and cannot be viewed.[3]

Computer Says No There is this abstract notion of what the computer said or what the data tells us. Much as the complex gibberish that underlay the risk models of economists and financial services companies in the run wasn’t questions (because maths) the issue here isn’t the algorithms as much as people and their magical thinking.

 

I came across this quote from IPPN Director Sean Cottrell, in his address to 1,000 primary school Principals at Citywest Hotel in 2011.[4]  He commented

‘Every calf, cow and bull in the State is registered by the Department of Agriculture & Food in the interests of food traceability. Why isn’t the same tracking technology in place to capture the health, education and care needs of every child?’

Well intentioned as it might be, this shows a poor understanding of cows, a worse understanding technology and dreadful misunderstanding of children and their needs. I find this thinking deeply disturbing, and profoundly creepy so I decided to unpack it a little.

This is how we track cows
Cow
And this is how we start that process by tracking calves
Calf
And I wondered is this how he’d like to track children? (H/T to @Rowan_Manahan for that last image)
screenshot-2016-10-16-16-54-37

Then I realised that we are already tracking children.
KidsOnly its not the Primary Principles Network that doing it, it is private companies doing the tracking and tagging. It is Google and Facebook and Snapchat, with some interesting results and some profound ethical questions. We now know that Instagram photos can reveal predictive markers of depression and that Facebook can influence mood, and peoples purchasing habits.[5]

Our algorithm present is composed of both data and algorithms. We have had an exponential growth of processing capability over the last number of years, which has enabled some really amazing developments in technology. Neural Networks emerged first in the 1950s dimmed in the late 1960’s, reemerged in the 1980s and has taken off like wildfire in the last few years.The Neural Network explosion is down to the power, cheapness and availability of GPU’s, together with improvements in the algorithms themselves. And Neural Networks are really really good at some kinds of pattern analysis. We are getting to a point where they are helping radiologists spot overlooked small breast cancers. [6]

There is also a very big problem with algorithms. The problem of the Black Box. The proprietary nature of many algorithms and data sets mean that only certain people can look at these algorithms. Worse we are building systems in a way where we don’t necessarily understand the internal workings and rules of these systems very well at all.
BlackBox
Black boxes look like this. In many systems we see some of the input and the output. But most is not only hidden its not understood. In a classic machine learning model. We feed in data and apply certain initial algorithms. Then we use it prediction or classification. But we need to be careful of the consequences. As Cathy O’Neill cleverly put it Donal Trump is an object lesson in Bad Machine Learning. Iterate on how crowd reacts to what he says and over optimise for the output – Classic problem of Machine Learning trained on bad data set. We need to think about what the systems we’re building are optimising for. [7]

George Box said that “All models are wrong but some are useful.” Korzybski put it more simply “The Map is not the territory.” And its important to remember that an algorithm is a model. And much as the human mind creates fallible biased models we can also construct fallible computer models. Cathy O’Neill put it bluntly that “A model is no more than a formal opinion embedded in code.” The challenge is that the models are more often than not created by young white males from an upper middle class or upper class background. It is not that human brains are perfect model makers but we spend a long time attempting to build social processes to cope with these biases. The scientific method itself is one of the most powerful tools we’ve invented to overcome these biases.

As we unleash them on education, (Sarah), Policing (pre-crime in chicago) and health and hiring we need to be aware of the challenges they pose. Suman Deb Roy has pointed out

Algorithmic systems are not a settled science, and fitting it blindly to human bias can leave inequality unchallenged and unexposed.  Machines cannot avoid using data.  But we cannot allow them to discriminate against consumers and citizens. We have to find a path where software biases and unfair impact is comprehended not just in hindsight. This is a new kind of bug. And this time, punting it as ‘an undocumented feature’ could ruin everything. [8]

Bernard Marr illustrates this with an example

Hiring algorithms. More and more companies are turning to computerized learning systems to filter and hire job applicants, especially for lower wage, service sector jobs. These algorithms may be putting jobs out of reach for some applicants, even though they are qualified and want to work. For example, some of these algorithms have found that, statistically, people with shorter commutes are more likely to stay in a job longer, so the application asks, “How long is your commute?” Applicants who have longer commutes, less reliable transportation (using public transportation instead of their own car, for example) or who haven’t been at their address for very long will be scored lower for the job. Statistically, these considerations may all be accurate, but are they fair? [9]

There is an old saying in tech: “GIGO: Garbage In Garbage Out” the risk now it that this will will become BIBO “Bias in and BIAS out”

As we gather vast amounts of data the potential for problems increase. There can be unusual downstream consequences also the opportunity to create perverse incentives. We are embedding sensors in cars, and looking the idea that safer driver will be given better rates. The challenge is that personalised insurance breaks the concept of shared risk pools, and can drive dysfunctional behaviour. Goodhart said “When a measure becomes a target, it ceases to be a good measure.” We had a significant recent Irish example with crime statistics where the CSO pointed out problems with both the Under-recording by police of crime and the downgrading of a number of reported crimes. [10]

At one level I see our future as a choice between, Iron Man – technology to augment, or Iron Maiden – technology controlled by a few that inflicts damage on the many. Technology to augment or to constrict . Technology  changes that threaten the self also offer ways to strengthen the self, if used wisely and well.

screenshot-2016-10-16-17-06-49

It is clear that technology does not self-police. We could cut off the use of phones in cars using technology – so it can’t be used while driving but the companies doing so currently choose not to do so

In Europe we have our own bill of rights – a charter of fundamental rights enshrined in the Lisbon treaty and it guarantees “Everyone has the right of access to data which has been collected concerning him or her, and the right to have it rectified.”  This right has been used to challenge the export of data from the EU to the US under the Schrems decision of the European Court of Justice. [11]

My belief is that we need to extend these rights in the algorithmic era. We need to create a “Charter of Algorithmic Rights” For our algorithmic age. Not a Magna Carta  which really just enabled the lords against the king without much for the the peasants. We need algorithmic rights, of the people, by the people and for the people.

CrashSimply put we need airbags for the algorithmic age. For decades cars have safer for men than women because the standard crash test dummy tests on male size standard and biases the development of safety towards the average male. As I said, technology is not self policing. [12]

 

 

 

We are going to have to create better tools. We need to be able to detect, and correct bias and to audit and ensure fairness over a simple move to efficiency. Or else we are tying things together in unforeseeable ways that can have profound consequences at the individual and societal level. Tools such as Value in Design and Thought experiments help. But we need to go much further.

 

Kate Crawford writing in Nature says

“A social-systems analysis could similarly ask whether and when people affected by AI systems get to ask questions about how such systems work. Financial advisers have been historically limited in the ways they can deploy machine learning because clients expect them to unpack and explain all decisions. Yet so far, individuals who are already subjected to determinations resulting from AI have no analogous power.” [13]

Augmentation

While this is necessary I don’t believe it’s sufficient. We need a “Charter of Algorithmic Rights“. While looking to the opportunities they can afford we need to recognise the biases and limitation of technology.  What appears to be augmentation may not really be the case. It may restrict and rule rather than enable.

 

 

We need to ensure that are tools are creative and reflect the diversity of human experience.

– (C) BBC / BBC Studios – Photographer: Ben Blackall

We are better managing them than being managed by them in our algorithmic future.

 

 

Footnotes.

[1] The companies mentioned are Enterasense and Cortechs.

[2] Computer errors allow violent California prisoners to be released unsupervised can be found here and the story on the app based vibrator is here.

[3] One link to the Sarah Wysocki story is here for more details read Cathy O’Neills excellent book “Weapons of Math Destruction” or take a look at Cathy’s blog.

[4] Original Link was Tweeted by Simon McGarr. The piece is here http://www.ippn.ie/index.php/advocacy/press-releases/5000-easier-to-trace-cattle-than-children

[5] How an Algorithm Learned to Identify Depressed Individuals by Studying Their Instagram Photos  https://www.technologyreview.com/s/602208/how-an-algorithm-learned-to-identify-depressed-individuals-by-studying-their-instagram/  and https://arxiv.org/pdf/1608.03282.pdf  Everything we know about Facebooks mood manipulation  http://www.theatlantic.com/technology/archive/2014/06/everything-we-know-about-facebooks-secret-mood-manipulation-experiment/373648/

[6] http://www.cancernetwork.com/articles/computer-technology-helps-radiologists-spot-overlooked-small-breast-cancers  Neural Nets  may be so good because they map onto some fundamental principles of physics http://arxiv.org/abs/1608.08225:

[7] Trump as a bad Machine Learning Algorithm https://mathbabe.org/2016/08/11/donald-trump-is-like-a-biased-machine-learning-algorithm/

[8] Genesis of the Data Drive Bug https://www.eiuperspectives.economist.com/technology-innovation/genesis-data-driven-bug

[9] Bernard Marr The 5 Scariest Ways Big Data is Used Today http://data-informed.com/the-5-scariest-ways-big-data-is-used-today/

[10] What is the new Central Statistics Office report on Garda data and why does it matter?
http://www.irishtimes.com/news/crime-and-law/q-a-crime-rates-and-the-underreporting-of-offences-1.2268154
and CSO (2016) http://www.cso.ie/en/media/csoie/releasespublications/documents/crimejustice/2016/reviewofcrime.pdf

[11]DRI welcomes landmark data privacy judgement https://www.digitalrights.ie/dri-welcomes-landmark-data-privacy-judgement/ and Schrems v. Data Protection Commissioner https://epic.org/privacy/intl/schrems/

[12] Why Carmakers Always Insisted on Male Crash-Test Dummies
https://www.bloomberg.com/view/articles/2012-08-22/why-carmakers-always-insisted-on-male-crash-test-dummies

[13] There is a blind spot in AI research Kate Crawford& Ryan Calo
http://www.nature.com/news/there-is-a-blind-spot-in-ai-research-1.20805

Categories
Data education Quotes

Infovore

A friend once described me as an “Eater of Books” in the rate at which I consumed them…

Mitch Joel has the right of it when talking about Infovores,  which is another way I’d describe myself

The good

Personally, I have a hard time watching a dance competition on TV knowing full-well that iTunes U is stuffed to the digital rafters with audio and video Podcasts from some of the leading universities and given by the best professors… and that’s just one, small channel.

And the less good

The other side of the challenge is that there is simply not enough time to follow, consume and deeply ingest everything. You will never be able to read every e-newsletter, Blog post, tweet or listen/watch every Podcast or interesting YouTube video. As an Infovore, I’ve become quite comfortable with a diet that consists of both grazing and then taking the time to really enjoy a full and hearty meal (I tried to read one book every week). The mightiest of Infovore’s embrace the “mark all as read” button and take refuge in knowing that it’s not about consuming everything.

 

Categories
Data education

Minister’s Reply to Primary Online Database Complaint

I received the following reply to my email complaint in relation to the Primary Online Database (POD). Link takes you to the full letter. The fundamental questions still remain unanswered.

My comments are interspaced with text from the Ministers  letter

On retention of data

The current retention policy for Primary Online Database (POD) data is for records to be maintained for the longer of either the period up to the pupil’s 30th Birthday or for a period of ten years since the student was last enrolled in a primary school

The Department’s retention policy is for audit and accounting purposes as pupil’s data is used in the allocation of teaching posts and funding to schools. The policy also serves to trace retention trends in the education system, is important for longitudinal research and policy formation, as well as being an important statistical indicator nationally and internationally.

Aggregate and not individual data is used for the majority of these purposes

This reads to me as “we’ll hold data until the kids are 30 even though we only need aggregate information for statistical purposes.”

There is a clear conflict  in need between aggregate information, information for allocation of resources while children are in school and holding detailed information until the children are 30 (or possibly longer given we don’t know what processes will be in place to remove the information in 18 years time).

On the racist nature of the cultural/ethnic categories

We are committed to reviewing the questions asked in POD. As part of this we have reviewed our question in POD on the collection of information on Ethnic or Cultural Background. We feel that the question used to collect data on ethnic or cultural background should be harmonised across all the education partners and other bodies who collect this type of information. As the CSO is the National Statistical Office, we are taking our lead from them. However, while the question asked in POD is not the exact same as the question asked in the Census of Population, it is based on the question.

I’d describe this as “some of the questions we asked were a bit racist so we’re changing the question and taking our lead from the CSO”.  It’s important to note that the CSO’s questions is problematic

In this regard the Statistics Section of the Department met with the CSO’s Census of Population Division to discuss concerns such as yours. They too accept that the variant of the ethnicity question on the 2016 census may fall short of what could be expected in today’s multi-racial Ireland. Unfortunately, given the no-change’ census approach being adopted for Census of Population 2016 it is not possible to change the CSO question at this stage. However the CSO has indicated that it is considering holding a seminar to examine how the data in this area can be improved from the point of view of maximising the number of write-in responses to increase the variety of ethnic description captured,

As pointed out on Twitter the Religion question on the Census is also problematic and the CSO don’t appear to be anxious to change it.  The Religion question in the POD is similarly problematic.

In terms of the complaints (in italics) I made to the Minister and the Department this is how I’d summarise it

1. Excessive retention of data. The retention of data until Children are 30 years of age is clearly excessive.

Not addressed.

2. Not using data for the purpose it was collected. I shared data with my school for very specific purposes. I have not consented to transferring this information to the Department. As it is not clear why some of this information is being collected at all the is a clear lack of purpose in collection of the data

Not addressed. The answer here seems to be we’ll decide what is appropriate even if we clearly don’t understand why we’re collecting the information.

3. Collection of unnecessary highly sensitive information. Some of the data being requested is highly sensitive (medical, psychological data) and there is no clear grounds for collecting this information

Not addressed at all 

4. Lack of appropriate security and safeguards around the data (including transmission of the data between schools and the department) It is not clear how or where the data is being retained and stored.  And the proposed mechanisms for transmission of data are hard to implement and easy to make mistakes around

Partly addressed in terms of storage of data but not addressed in terms of either access to information or in terms of transmission of data to schools and retention in schools.

5. Data is supposed to be accurate. There will be an inability to contain accurate information in light of free format text data and any information can be held in these fields.

Partially and poorly addressed. 

The “Notes” area is for schools’ use only, it will only be accessible to the school where the child is currently enrolled, and will not be transferable from one school to the next if the child is moving school. It is intended to keep administrative information which is required at school level only.

Its not clear why this data field is here and why if data for schools use only is being held in a central database and will be held until the child is 30.

6. The categorisation of the data on ethnic and cultural grounds is clearly racist and undermines the ability to store accurate information. The usage of the data for state purposes is also undermined by the racist classification scheme.

Partially addressed as discussed above.

7. The Department is acting beyond its power. The Department of Welfare hasn’t been informed or consented to the use of PPSN

Addressed in terms of the formal right to use the information through Dept of Social Welfare. Not addressed in terms of retention of the data.

 

Categories
Data education politics

Dept of Education and Primary Online Database

What is the Department of Education up to with the Primary Online Database.

Simon McGarr has a hypothesis

Categories
Data Security Technology

A Short Video on what happens when the data is all connected.

The ACLU produced a video a number of years ago about ordering pizza in the future. Warning or prediction. Sometimes its hard to tell the difference. About the only part that doesn’t make sense is the voice explaining what all the problems are. Algorithms don’t explain.

Categories
Data Technology

How do you categorise your children?

The Primary Online Database is one of the worst ideas I’ve come across.  Dave Molloy wrote a good piece about how easy abuse of the system could be.

I want to pick at one of the little threads. The ethnic or cultural background categorisation

Its useful when looking at a system to take apart the assumptions underlying it . There is a much longer piece to be written about categorisation and we’ll take one element here.  This is a list of drop down choices for one of the pieces of information for the Primary Online Database.

Ethnic or cultural background (drop-down list)

White Irish
Irish Traveller
Roma
Any other White Background
Black African
Any other Black Background
Chinese
Any other Asian background
Other (inc. mixed background)
No consent

The comment I used on Twitter when I first saw this list  “The word you’d use to describe the list of ethnic/cultural choices in the Dept of Eduction planned Primary Schools Database is WRONG.”

Categories are artificial ways of slicing up the world.  Dave Snowden wrote an interesting post on Categories recently. In it he quoted a passage from Aldous Huxley’s ‘Brave New World’ which is worth quoting again here.

Alpha children wear grey. They work much harder than we do, because they’re so frightfully clever. I’m awfully glad I’m a Beta, because I don’t work so hard. And then we are much better than the Gammas and Deltas. Gammas are stupid. They all wear green, and Delta children wear khaki. Oh no, I don’t want to play with Delta children. And Epsilons are still worse. They’re too stupid to be able to read or write. Besides they wear black, which is such a beastly color. I’m so glad I’m a Beta.

I was going to describe that the categories above as “not even wrong” a better description is probably “wronger than wrong.” They are wrong in that they are very poor and very distorting classification.  As Dave said in his piece

The problem with categories is that things are made to fit within the boundaries

What I’d wonder about is what is the mindset of someone who comes up with these particular categories.