FIT

SUNY Korea leads the future of higher education

Dr. Gerald Stokes, DTS Chair, wrote a column in Mail Business Newspape…

Press Releaseㅣ2020-11-17 11:37

Data

Gerald Stokes 


We are in the midst of the information age and are awash in information. As this flood

surrounds us and envelopes us, we cannot forget the importance of the primary data that

should be at the root of all information streams. It is the primary data that connects

information to the real world. Primary data are the facts. Without them our information

threads are, in the words of Shakespeare’s Macbeth, “full of sound and fury, signifying

nothing”. Concerns about fact checking have considerable merit.

This is not a new observation. A famous US Senator noted “Everyone is entitled to his own

opinion, but not his own facts.” Others, in the computer industry more directly have stated

“Garbage in – Garbage out”. As in many things, the need for maintaining integrity of data is

easier said than done. The essential observation is that it is easier to recover from a bad

analysis than bad data.

Since the beginning of the pandemic, I like many others have followed news of the worldwide

spread of the virus. The ebb and flow of the disease in the nearly 200 countries of the world is

fascinating. However, as one watches the data you very quickly become aware that countries

do not have common standards for reporting either who has the virus or who may have died

from it. On the one hand, this observation has made me incredibly appreciative of the open and

transparent reporting and testing process that the KCDC, now the KCDA, has sustained

throughout the pandemic. On the other, it is difficult to see how many other countries, in the

rest of the world will bring things under control without reliable and relevant data.

In the best businesses the mantra is “you cannot manage what you cannot measure”. Good

management therefore comes from good data. The data must be of known and reasonable

quality and be appropriate to the problem at hand. This three-part test - known quality,

reasonable quality, relevant to the question - is essential. It implies that data must be validated

and curated before being organized and analyzed.

Validation and curation begin with the collection of the data itself. The collection of data is

subject to many potential biases and errors that are many times, not obvious. Knowing how

good your data is, or is not, is essential. Scientists and engineers spend a great deal of time

calibrating their instruments. The process of calibration is one of assurance. How well does the

instrument measure what it is intended to measure? Curation establishes the origin and history

of the data from its collection to its state when used. It is the pedigree of the data.

Polls and surveys are instruments as well. One test of their reasonableness is the size of the

sample and another is how the sample is selected. Selecting ten people on a street corner in

Seoul and asking what nearby restaurant they might recommend provides much more reliable

data for dining than asking them who they would vote for in the next mayoral election in an

effort to predict the outcome.


We know all these things about data and its importance. We also know that data is deliberately

altered, fabricated, or ignored on a systematic basis all around the world. What can we do? This

is an essential question for governments and companies if they are going to serve the public

and society. This is certainly true here in Korea where both the government and corporations

see their future tied to ICT, “Big Data”, the digital economy and all the other terms we use to

capture this remarkable time of transformation.

There is a value chain for data that extends from its collection to its use. Technology enables

this process, and, in some cases, there are technologies that are emerging that help ensure the

integrity of parts of that chain. An excellent example is Blockchain. This technology uses a

cryptographic process to ensure that data is not altered along the value chain. So, once we have

data, we can maintain its integrity. This technology is in early stages of deployment in several

sectors, most notably the financial sector, and I see it as an essential part of all big data

applications over time, including utility data and medical records.

At the end of the chain is a piece of information used by a government, a company or individual

to make some decision. In the domain of social media, attempting to influence the public,

instances of data alteration and fabrication are increasingly sophisticated. For example, “deep

fakes”, using very sophisticated deep learning and processing methods, are video and audio

products that have a high potential to deceive. Fact checking is growing, but this is a labor-

intensive activity that can hardly keep up with the speed of social media and the fertile

imaginations of those who seek to deceive.

There are attempts to eliminate things like hate speech by companies like Facebook, and some

governments have all too effective automated censorship, all enabled by AI. My computer’s

grammar checker also informs me whether my messages are “friendly”, “optimistic” or “direct”.

Assessments of truth and faithfulness to fact are harder to automate. At the university, we have

software that helps us identify plagiarism. I think what we ultimately need is an automated

assessment of the connection to the facts and data – collection to interpretation. This is

probably a long time away for the basic consumer, but investors and government agencies have

this need now for the evaluation of everything from financial filings to “expert” testimony.

Ultimately, these tools will not only benefit the consumer of information but also the

generators of information trying sincerely to ensure their analysis is grounded in reality.



Read the original article on Mail Business (Korean)

Close

Spring 2021 Stony Brook University Application
Priority deadline: November 30(Mon), 2020
Regular Deadline: February 5(Fri), 2021
Fall 2021 FIT Application
Priority Deadline: February 06 (Sat), 2021
Regular Deadline: April 30 (Fri), 2021
스토니브룩대학교 2021 봄학기 지원하기
조기지원마감일: 2020년 11월 30일 (월)
정규지원마감일: 2021년 2월 5일(금)
FIT 2021 가을학기 지원하기
조기지원마감일: 2021년 2월 06일(토)
정규지원마감일: 2021년 4월 30일(금)