

61·
1 month agoThat kind of data sanitization is just standard practice. You need some level of confidence on your data’s accuracy, and for anything normally distributed, throwing out obvious outliers is a safe assumption.
That kind of data sanitization is just standard practice. You need some level of confidence on your data’s accuracy, and for anything normally distributed, throwing out obvious outliers is a safe assumption.
We do get what you mean (extremely condescending and reductive take, if you ask me). I was thinking rigidly along the lines of data engineering, as this is, well, a data engineering problem… There just isn’t 30% of people doing this on Google captchas, and this isn’t a “take”, just a reality of the scale and amount of people interacting with Google products. Have fun all you want, you do this, your data most likely gets thrown out, that’s all.
We’re still talking about image recognition, aren’t we? This feels like a general commentary on how Big Tech sees their customer base, which I don’t disagree with, but in my mind was just another discussion entirely…