Well, let’s start it the “scary” way. Never analyze and report on your collected data, until you take a closer look at ‘who’ may be contaminating your data, and ‘why’!
While the vast majority of respondents fill in the surveys in good faith, there are sometimes many rotten apples in the crowd. This is especially an online phenomenon, and these individuals are sometimes referred to as “trolls”. The anonymity of online environments can provide a safe place for the rotten apples to freely express what goes on in their sick minds, ruin other’s lives, spread hate, and destroy what others are trying hard to build or improve. They may have any type of motivation to do such things, and unfortunately sometimes surveys and research projects fall victim to these cheaters too. Affecting the final results of a study, research, or a competition may have a lot of benefits for them.
Some professionals estimate that up to a third of online survey respondents these days are not real respondents. They may not even be real humans. The rotten apples and robots unfortunately propagate themselves and ruin research that could otherwise be insightful.
Fortunately, there are many things you can do to minimize, prevent, or even eliminate cheaters, trolls, or robots from affecting your survey results.
1: Use what our technology already offers
SurveyLegend offers a series of nifty features that can help you find cheaters or block the spam robots.
Built-in spam bot protection
Bots (robots) are small computer programs which are written to quickly do boring and monotonous tasks which would take a lot of time for humans to do. So theoretically, a programmer with bad intentions is able to write some code to fake a respondent who is participating in a survey. Then the bot can open the survey several times, and vote in their desired way to affect the results.
However, we have made an anti-spam-bot system which observes and measures many parameters to see whether a robot taking the survey or a human. When our mitigation system detects a spam-bot, it will silently ignore their votes, so nothing will be collected or registered in our database. But for the bot it appears that their responses are registered. Additionally, if our anti-spam system suspects that a respondent but cannot judge by 100% certainty, it will instead mark their responses in the exported data as “suspected spam bot”. This way users can decide and filter them out. Further, this system automatically goes through old collected data and removes any eventual spam-bot responses.Note:Please note that we cannot reveal how our bot mitigation system works, as it can make it easier to bot programmers to find ways to cheat more efficiently. Anyways, we will keep improving our spam-bot protection code, and constantly make it smarter to mitigate more types of spam bots.
Enable the IP tracking system
To help yourself with flagging and cutting specific respondents for quality problems, you may want to take a look at respondents’ IP addresses. You can enable the IP tracking in Configure step, and you can read more about collecting IP addresses from your survey respondents here.
IP address represents the network via which a respondent is opening your survey. It could be for example a WiFi network at somebody’s house, or a shared internet network at a public library, or a student accommodation.
If you suspect that there are cheaters in your audience, you will probably see many of them coming from similar IP addresses. Repetitive IPs may mean that there is one person (or several persons) sitting in the same place, reloading your survey and voting over and over again. But keep in mind that having a few repetitive IPs in the list may simply mean that there are simply a few different (and real) individuals voting, and they are just all using a shared internet network, for example their WiFi at home.
If you want to go even deeper with your analysis, you can use tools which verify whether an IP is blacklisted in any anti-spam databases, such as What Is My IP Address: Blacklist Check.Note:
IP addresses are considered as “personal information” by the European GDPR law. Therefore, collecting and storing people’s IPs without their concsent is not wise. Please read this article to learn more about how you do it the right way.
Disable ‘multiple participation’:
All questionnaires which are made in our platform have a feature which block people from ‘submitting’ a survey more than one time. The setting that controls this behavior is located in the ‘configure’ step of each survey and is called ‘Allow multiple participations‘, which is OFF by default.
Make sure this setting is OFF, but also remember that this feature is only capable of preventing normal internet users from re-opening your survey and polluting your data. This is because the technology that our system uses to know whether the same person is trying to participate in the same survey again relies on ‘cookies’. Due to legal restrictions, we cannot track people and save their data without their consent. Therefore, what we do is to put a small cookie file in their browser’s cache memory. The file does not contain any crazy methods to identify and track a person or match them agains a secret database to “prevent cheating”. If we did that, that’d be quite illegal. So the file will only be able to tell our system that this browser has previously opened the same survey link; and that’s it. Therefore our system can redirect them to another page instead of opening the survey.
This means if a respondent can re-submit the survey if they:
- open the survey again in another browser (which has a different cache memory), or
- surf the internet in private browsing mode (which clears the cache memory and cookie files as soon as you close a web page), or
- clear their internet history and start again
However, since most of the people won’t have this technical knowledge, this feature protects you from rookie cheaters quite well ;).
Review time stamps
One of the pieces of information that our system captures is the time which each respondent spends on a survey. It means the time it takes for them, since they open the survey until they either ‘close’ it or ‘submit’ it. The ‘Time spent’ information is located in Individual responses view, under the tab. You can also find this information in the exported raw data.
You can give your survey to a few respondents and ask them finish the survey, while observing how long in average it takes for them to read the questions, comprehend them and respond to them. This will give you a good idea about the average time spent on this specific survey (for a specific audience).
When reviewing your data, if you notice that there is some very long participation times (which are usually infrequent) it should be OK. This usually means that the respondents got perhaps interrupted and resumed to the survey later on.
However, very short times which are notably below the average are not OK. Those most probably mean that somebody rushed and clicked through the answers, without really reading the questions. Hmmmm… that should either mean their responses are really low quality and they don’t care, or that they are just cheating.
Additionally having multiple similar time stamps which have an increasing sequential ‘start time’ and ‘end time’ can also reveal attempts from spam bots (and often submitted/completed surveys).
Review device information
Another type of information which SurveyLegend collects from respondents is about their devices. Under thetab in Individual responses, you will find information regarding respondent’s Browser model and version, as well as their OS (operating system).
Exactly similar responses which are coming from the same device and browser should set off the alarm for possible fraud attempts.
2: Apply these best practices
Technology helps you partially; yet there are many methodological things you can do to prevent cheaters and frauders to infect your survey results.
Use simple psychology tricks
Simply said, right at the beginning scare off the cheaters, make them understand that their cheating efforts are worthless, because you have means of monitoring the participation process which can easily flag cheaters. This is still useful, even if you do not have any decent way of monitoring responses.
You as the survey creator can ask respondents to provide a valid email address, to be able to continue. Simply put the email question first (which automatically validates emails to some extent), then make it required, and add a Page break after that. This way, respondents cannot continue without providing an email address.
*To make sure NO cheating happens and everyone has a fair chance of winning in the competition, please type your valid email address for the vote you’re proving to be counted as a valid vote.
NOTE: This survey is protected with SurveyLegend’s spam-bot mitigation system. Additionally, if we notice a human patterns of cheating such as invalid emails or sudden spikes in votes coming from the same IP, location, or device, we will disqualify both such votes and the competitor who is competing in this contest.Have a look at this similar live example.
Get a commitment from respondents
Sometimes ‘cheating’ is about providing answers which are not based on honest opinions or personal knowledge or information from respondents.
How do you obtain a commitment from respondents not to cheat? According to this research, asking a question like the one below, where you make them commit for a specific positive action (clicking yes) does the trick. Answering such a question constitutes a specific behavioral commitment, which actually works.
*It is important to us that you do NOT use outside sources like the Internet to search for the correct answer. Will you answer the following questions without help from outside sources?
Additionally, some studies suggest that you can easily reduce the number of people who are gaming your survey, by adding “warnings” in questions like this substantially improved data quality.*We check responses carefully in order to make sure that people have read the instructions for the task and responded carefully. We try to only use data from participants who clearly demonstrate that they have read and understood the survey. Again, there will be some very simple questions in what follows that test whether you are reading the instructions. If you get these wrong, we may not be able to use your data. Do you understand?
Add extra validations, using logic flows
You can validate responses which are typed by your respondents, easily using our survey logic flows.
For instance in the previous example where you ask for email addresses, you can exclude invalid email providers which can generate fake and temporary inboxes and email addresses to users. Some examples of such providers are: Fake Email, Fake Mail Generator, Mailinator, see more examples here.
Such fake providers can be filtered out if you make a logic rule like this: if > answer to the email question > contains > [@mmailinator.com] or [@email-fake.com], [etc…] > then > skip to > thank you page. Read more about adding logic for text based questions here.
You can alternatively only allow certain email addresses to be accepted as valid emails, using logic flow. For example you can only allow emails from your company’s domain to be able to enter the survey, by making a logic flow like this: if > answer to the email question > contains > [@my-company-site.com] > then > show > page 2.
Adding logic validations depends on your survey case. You can come up with other validations and build more elaborate screening paths. For example you can create questions for multiple responses that will conflict with each other. This way you can detect if a respondent is rushing through the survey, answering randomly, or if they are selecting many options just to get in. If your criterion meets, the logic flows won’t let them in. You will see your qualifying incidence drop dramatically.
Avoid ‘river sampling’, and instead go for ‘panel sampling’
River sampling are techniques used to mass-invite respondents to take a survey. In this method, researchers use online banners, ads, promotions, offers and social media invitations. They try to catch the attention of potential respondents, make them click on the link, and hopefully take the survey. Surveyors have no clear idea who the respondents will be, or what psychographic features or demographic will they represent. And usually following up with them again after survey completion is not a part of their plan.
Panel sampling, on the other hand, is about inviting respondents from an affiliate site. On registration, members (aka “panelists”) of such sites are asked to confirm that they are interested in participating in multiple surveys. Researchers then invite specific panel participants by email, based on their qualifying demographic and psychographic characteristics. Members of the panel list are easily trackable and researchers can reach out to them gain at any time. Panelists are then fairly compensated for their time and efforts.
While some types of research may work better with river sampling, this is not always the best way of doing research. Many market researchers for instance use river sampling, but it is only as a compliment to their ongoing panel sampling. Panel sampling gives a more dedicated and reliable data source. However, sometimes this too carefully selected source of information may result in rigid or over-processed data too.
But generally, by using the river sampling method, businesses are not able to pinpoint exact characteristics of their respondents or know who they really are. Because unlike panels, there is no double opt-in verification to make sure that the survey invitees are real, individual people. Anyone with any intention may come in.
Include at least one open ended question
Best place to add such question may be at the end of the survey. Make it a ‘required’ question for everyone, and make sure that all respondents will be able to answer it thoughtfully. When analyzing the results, review every response to evaluate whether it is answered thoughtfully. Bad respondents always give you bad, short, and silly answers. Sometimes it’s just random letters and meaningless words, and many my type in irrelevant information or completely generic responses which are not elaborated well.
You can give this question a minimum acceptable range to make respondents type at least a certain quantity of characters or words. Later under ‘individual responses’ view, you can easily view and disqualify such respondents.
Add a simple ‘attention test’
You can add a simple question, where you ask respondents to choose a certain choice from a list of provided choices. To do so, you can add a multiple selection question or a single selection one, then in the question you ask them either tofiref choose only one certain choice, or to entirely skip answering this question. If respondents are actually paying attention and if they are not robots, they will do as you ask them in the question.How many people live in your household? If you answer this question we will not count your vote, because you are either not paying attention and your responses are therefore not valid, or you are a spam robot.
- Only me
- 5 or more
When analyzing collected data, you can easily filter out respondents who have answerd this question, or if their answer is different than what you have asked them to answer.
Another idea could be adding logic flows which automatically disqualify the respondents who answer this question, or answer wrong.
No matter what we do, there are always people who cheat or provide dishonest answers. But using a combination our technologies and these cheat prevention techniques will dramatically reduce the number of cheaters or spam bots in your surveys, polls, and questionnaires. Make sure to at least use a few of these tricks to gather some quality data when you get feedback or do surveys, exams, or online competitions.