To help prevent the spread of the coronavirus, in early 2020 shopping malls, subway stations, offices and other public places started installing terminals that recognize faces, take thermal images and collect data at the same time. Except for a minority of software or application scenarios, most of the data collection is done without the permission or even knowledge of the users.
An October survey by the Southern Metropolis Daily listed scenarios where people found the data collection is most unacceptable. It includes shopping malls that use facial recognition to collect data about customer behavior and shopping habits, universities that collect data of students’ micro expressions and teachers’ gestures during class and photo editing apps that demand photos for face swapping or virtual makeup.
“The collection of facial information is rather invasive because data is collected from a distance without people knowing. The data keeps accumulating for a long time and at a large scale without anyone noticing,” Lao said. She is most concerned about who stores the collected data and its safety.
The CCTV report pointed out that without unified standards in place, vast amounts of facial data are stored in the databases of app operators or technology suppliers. But the outside world has no idea if sensitive data is redacted, which data is to be used for algorithm training and which will be shared with their partners.
In September, Kai-Fu Lee, CEO of venture capital firm Sinovation Ventures, caused uproar after saying at the HICOOL Global Entrepreneur Summit held in Beijing that he had helped AI company Megvii build a partnership with Alibaba’s fintech division Ant Group, through which photo editing apps Meitu and Megvii gained a massive amount of facial data. Ant Group later denied this, and Lee said he misspoke.
Megvii started as a facial recognition company in 2011. For startups in this field, gaining as much facial data as possible is crucial to the accuracy of the product. These companies have a strong desire to acquire data. In early development, they use public data provided by research institutes or universities and many companies pay volunteers to collect samples, according to technicians engaged in the field. Later it became normal practice for companies to acquire data from photos uploaded online, even though the legitimacy of this has been questioned.
There is enormous concern about how AI companies cooperate with their customers in terms of data. Megvii states in its service agreement that it has the right to store customer data and use it for internal research to “improve the accuracy of facial recognition, updating algorithms and improving our products and services.”
An employee of CloudWalk, a Chinese AI company founded in 2015, told NewsChina that their customers usually store the data they collect and may not be willing to share data with facial recognition companies. “It is particularly so when we cooperate with banks and public security systems. Our servers are built in their intranet on their private servers. There is no way to get the data out from outside.”
Respondents to the Southern Metropolis Daily survey said they are most concerned about how firms that collect data will protect and ensure its safety.
In the early years, tech firms paid lip service to data protection. Huang Hao (pseudonym) who worked at MSRA (Microsoft Research Asia), Microsoft’s research arm in the Asia-Pacific region, said the risk is highest when one firm outsources work involving data to other companies, which may not be secure. He claimed he knew of cases where outsourced work had been exposed online, without mentioning the firms involved. Huang said that data protection might cost too much for some startups.
Even today, the storage and protection of data is a vulnerability for many companies, according to Zeng Yi, an AI specialist at the Institute of Automation of the Chinese Academy of Sciences.
In February 2019, Netherlands-based NGO GDI Foundation security researcher Victor Gevers revealed that SenseNets, a Shenzhen-based technology provider that has a contract with a local public security system, failed to protect its data and exposed the personal information of millions of people to all visitors to the company’s database for months, meaning anyone with malicious intent could sell the data on.