Evolution of data and data analysis: A geographer’s perspective

Hits: 89

 

By Linda Kaidan

How has our technology evolved from basic data concepts to big-data concepts in just 60 years?

Examining the history of data evolution and Geographic Information Systems is an excellent way to understand the progress of data storage, management and data analysis as a whole. In the beginning data was stored as written records on paper. Throughout history maps have been significant in navigation as they continue to be today. In the 1950’s slide rules and mechanical calculators were commonly used for calculations. Many mapping applications were completed by teams of mathematicians performing spatial and geodetic analysis that today is being accomplished in microseconds.

In the 1950s, maps were simple. They had their place in vehicle routing, new development planning and locating points of interest. Early mapping did not have the advantage of computers. However, satellite and aircraft systems were early uses of computer technology. In the 1950’s a self-contained dead reckoning system was developed by government contractors. The first space programmers developed a navigation system that could function without external references. US government contracting companies engineered inertial commercial navigation systems in the 1970s. They were capable of computing present position, distance to waypoints, direction and heading without navigation input employing onboard self-contained systems (Wyatt).

Digital Equipment Corporation built the first user friendly computer in 1959, the PDP 1. They introduced the PDP-11/70 in 1975.  It was used to integrate satellite-derived data and imagery from remote sensing. It supported stereographic image analysis, aircraft sensor data and data analysis. In 1972 The Defense Mapping Agency began providing mapping, charting, and geodetic information in paper and electronic format to the US Defense Department (US Government Publishing Office).

ARC INFO offered a GIS system with a graphical user interface for desktop computers. ARC refers to line segments of map elements. Info refers to data stored in an information system. The first microcomputer based geographic information system was released in 1982. ArcView provided a PC based environment supporting a basic geospatial model. Geo-referencing mapped world geographic coordinate systems to digital map elements.

Data representation featured vector models of points lines and polygons mapped to georeferenced X, Y coordinates. Topological models were used to relate map elements to each other. Spaghetti models connected independent map elements layering them atop one another. Raster models store data that varies continuously such as aerial or satellite acquired surface imagery.

As early as the 1980’s ARC GIS systems have been put to use to map municipality data in layers of substructures. Municipalities maintain control of information regarding public utilities like water, electric and sewer operations and detailed location of buildings and other structures. Whenever excavations are made the dig safe program retrieves data from Municipal GIS systems to ensure that electrical lines, gas lines and other buried infrastructures are not damaged in the process.

ArcGIS stores data in proprietary files. They also used ORACLE’s relational database system. Unlike many technology companies which have fallen by the wayside, ArcGIS is available today as a cloud based software as a service (ARCGIS.COM). Today’s ARCGIS provides industry support to utilities by making GIS system design affordable to cooperatives and municipalities with limited budgets using data model templates for project implementation (EIS.COM).

Open source Geographic Information Systems data is available to all. This includes Landsat satellite imagery and Tiger Data. ArcGIS Online stores massive amounts of spatial data. GIS software is being developed using open source collaborative efforts and is available to the public free of charge (ERSI.COM).

ORACLE is the second largest software company in the world. It began offering the first database software employing highly structured relational modeling for data storage retrieval and analysis. Though nearly 40 years have now gone by since the company’s inception in 1977, ORACLE’s RDBMS is the most widely used RDBMS in the world.

Well-structured data based on relational modeling is essential for organizations like financial institutions including banks, consumer based organizations, and medical research companies. It will likely continue to be valuable for a very long time to come. However, with our ever increasing capacity to create, access and store data, what is popularly called Big Data is an ever growing and every increasingly important source of information.

Big Data can be well structured just as data stored in a relational database or unstructured like that found in a periodical, research paper or essay. Data can be stored in a user defined self-referential structures such as that supported by JSON or XML.

Where does Big Data come from? It may come from sensors in manufacturing facilities, elevator banks in the World Trade Center or from automated, self-contained agriculture facilities in isolated areas. It may even come from biometric data collected by sensors on our bodies. Mining and manufacturing operations monitor production and distribution processes with sensors. That data can provide real-time quality control and predict when maintenance is necessary as well as when malfunctions occur. The result is less down time and greater efficiency. When used properly, Big Data can answer important questions relating to safety, quality, efficiency and market trends.

Big Data can be analyzed in real time as it streams or after collection. Data Analytics is a relatively new science offering techniques to glean the most from ever increasing volumes of data. ORACLE defines Big Data by referencing:

  • Volume of data processed which can reach into the petabytes (1000 terabytes)
  • Velocity – the rate at which data is received and acted upon
  • Value – an ongoing discovery process fueled by business executives and analysts.  Determining value is a function of asking the right questions, identifying patterns, making perceptive assumptions, and predicting behavior.

Data Representation Structured Versus Unstructured

Files, Relational Models, Self-Defined and Unstructured Data all contribute to how data can be represented and stored. Commercial and scientific software often use proprietary product based files to store data. Photoshop uses PSD files, DOCX is a file format for MS Word and standard image formats include jpg and png.

Relational data is modeled and stored in relational databases like MySQL, SQLSERVER and ORACLE. Relational data is based on relational mathematics for data set mapping developed by Edgar F. Codd in the 1970’s.  Such databases are sets of normalized data organized by tables, records and columns. There are well defined relationships between database tables. Relational structures support communication, sharing, search ability, organization and reporting. Structured Query Language called SQL, offers easy access to data from programs and scripts (Techopedia).

Unstructured data may be impossible to fit into an RDBMS. Such data often originates in mobile, social, and cloud computing data feeds. Some estimates agree that 90 percent of the data that exists now was created in the last two years and most of this information is unstructured. Unstructured databases like NoSQL and Hadoop offer the opportunity to capture, store and analyze vast resources of unstructured data as it is generated and to use it for business intelligence.

XML data is a data format that supports both data and a definition of that data’s structure. That means it’s self-defining. XML format can be used by any individuals or organizations that want to share data in a consistent way and in a format that they define and choose. Clusterpoint Server is database software for high-speed storage and large-scale processing of XML and JSON data on commodity hardware clusters. It works as a schema free document-oriented DBMS platform with an open source API. Clusterpoint supports immediate search and access to billions of documents and fast analytics operating on structured and unstructured data.

JSON stands for JavaScript Object Notation. It is an easily readable and usable data-interchange format that machines can parse and humans can read and understand.  It is defined as a portion of JavaScript Programming Language, Standard ECMA-262.

Conclusion

Our ongoing data evolution began in the 50’s. We maximized its use to do amazing things like sending satellites to space and using those satellites to acquire and transmit geospatial information and meteorological information. As time has gone by, each generation of technologists has built upon the work of the last. We’ve created library upon library of mathematical functions, utilities and models that allow us to build systems that are more specialized, inclusive and integrated. We’ve created an amazing toolbox of wonders that will continue to grow as long as we do.  We have not only expanded our data and data analysis resources, we have new open source opportunities that allow us to share our data technology and further accelerate our learning process.

In such a short period of time we’ve become capable of monitoring and analyzing the oceans, climate, forest fires, epidemics, seismic events and financial catastrophes.  We’ve just recently heard about new quantum computers which are not binary but based on state. This multi-state structure supports more adaptable algorithms and enables us to solve problems millions of times faster, leading to solutions to previously insoluble problems. We’ve done all of this in just 60 years.

One challenge for the future will be to manage this ever increasing information torrent and to uncover some way to find and make use of everything we seek to discover. At the same time Big Data increases threats to both personal and corporate privacy. With every click of our mouse we give up information to parties unknown.

Linda Kaidan was a Principal Software Developer at Oracle Corporation. She’s also developed public utilities software at IBM and was a senior software design engineer at the Jet Propulsion Laboratory. Kaidan has a BA in Geography from the Hebrew University of Jerusalem and an MS in Computer Science from American University. Early in her career, she worked as a cartographer at the National Geospatial Intelligence Agency. Kaidan is the author of Surviving Climate Change: Decide to Live.

 

References

ARCGIS. Introduction to ARCGIS software. Retrieved from https://www.wou.edu/las/physci/taylor/es341/arcGIS_intro.pdf

bigdata-madesimple.com. A Deep Dive into NOSQL. Retrieved from http://bigdata-madesimple.com/a-deep-dive-into-nosql-a-complete-list-of-nosql-databases/

ERSI. GIS for Municipalities. Retrieved from. https://www.esri.com/~/media/Files/Pdfs/library/brochures/pdfs/gis-for-municipalities.pdf

GIS Geography. Landsat satellite imagery. Retrieved from http://gisgeography.com/usgs-earth-explorer-download-free-landsat-imagery/

JSON.org. Java Script Notation. Retrieved from http://www.json.org/fatfree.html

ORACLE. What is Big Data? Retrieved from https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiWxdj6zcHOAhUD8CYKHSFRAo4QFghGMAA&url=https%3A%2F%2Fwww.oracle.com%2Fbig-data%2F&usg=AFQjCNHEV30BwI3YrV3vtFymuVgqUUN9YQ&sig2=OQ9yGH9qyGiLXbPnOD_XAg

Porter, Claire. Little privacy in the age of big data. Retrieved from https://www.theguardian.com/technology/2014/jun/20/little-privacy-in-the-age-of-big-data

Techopedia. Relational Database (RDB). Retrieved from https://www.techopedia.com/definition/1234/relational-database-rdb

United States Government Publishing Office.  Defense Mapping Agency. Retrieved from https://www.gpo.gov/fdsys/pkg/GOVMAN-1996-05-31/pdf/GOVMAN-1996-05-31-Pg233.pdf

Wyatt, David. Aircraft Flight Instruments and Guidance Systems.

Stop Our Runaway Global Warming Emergency Now

Hits: 54

by Linda Kaidan

Runaway Gloabal Warming Emergency
Temperature rate of increase means time for emergency measures

Time is running out to control runaway global warming. Most humans will die when the temperature reaches a very humid 140 F. It takes only one day of peak temperature above the line of survivability to kill almost all life in an affected area.

There are solutions, but we need to begin applying them now.

Scary NOAA data reveals that while New York State’s average max temperature increased just 1-2 degrees during the 20th century, its average temperature from 2011 to 2014 was 2-4 degrees above the 20th century average.

Flee to Alaska? Not a wise choice

Alaska’s temperature increase is double that of the rest of the United States. Average annual temperatures in Alaska are likely to increase by 2°F to 4°F in the next 34 years. If global emissions continue increasing as they have been, temperature increases of 10°F to 12°F are anticipated by 2100. Heat kills directly by hyperthermia when it becomes too hot for a living organism to function. It kills indirectly by ocean floods resulting from polar melting, and the spread of disease from heat-driven pest population increases.

Hyperthermia – In 2003 14,802 residents of France died from temperatures of 104 degrees. Most were elderly. France is an economically secure and technologically advanced country, yet people were taken by surprise. People were just unprepared for the sudden extreme temperature highs.

Rising Oceans – In 2012 a mother, father and teenage daughter were swept out of the second floor of their Staten Island, New York home by rising ocean waters. Only the mother survived. Tropical Storm Sandy brought devastating coastal flooding to the east coast. Most of Sandy’s deaths were from drowning.

Pest population increases – The current Zika epidemic may be related to global warming. A number of devastating diseases are spread by mosquitos. Mosquito populations soar when it’s very hot. Malaria and Lyme disease are also spread by insects.

Geoengineering to the rescue?

Temperature rate of increase means time for emergency measures

Geoengineering offers emergency measures to stop global warming cold. There are two basic approaches. The first is to remove greenhouse gases from the environment – possibly using systems of atmospheric filters. The second is to prevent solar radiation from heating up Earth’s atmosphere, possibly using solar space shades. NASA describes these geoengineering techniques as risky, but even riskier is leaving temperatures to soar out of control and kill us all.

NASA recommends geoengineering as a near-term strategy for slowing global warming of our atmosphere and oceans and helping ensure humanity’s survival. Immediate implementation of such emergency measures may well save billions of plant, animal and human lives. But we have to act now!

Tiny new satellites can guaranty that we will not all die from global warming

Hits: 74

by Linda Kaidan

Tiny new satellites or Nanosatellites halt global warming
NASA Nanosatellite Cube Launcher 2013

Tiny new satellites are being developed at MIT’s Space Propulsion Laboratory. This breakthrough technology can support the delivery of solar shades in space to prevent earth’s atmosphere from continuously heating up.  With advances in nanotech, satellites can be as tiny as half an inch cubed. The tininess is important as it makes the cost of launching the them comparatively small.

Tiny new satellites called nanosatellites are a foundation technology for global warming prevention

This emerging generation of satellites is much smarter than those we have now. They have highly effective navigation and propulsion systems for ease of placement, and can return to earth on their own without adding to Earth’s orbiting space junk colonies.

Each satellite can deliver a space shade cluster capable of covering huge areas with reflective lightweight fabric that can be repositioned to precisely control the amount of solar radiation reaching the earth. With space shades that we can easily move and control, we can accurately regulate how much heat our atmosphere absorbs from the sun.

Nanosatellite Launchers are under development by NASA

Space shades don’t solve all our problems. We need to stop polluting our soil, water and air while cleaning up the mess we’ve already made.  Nanosatellite delivered space shades can give us the time we need to recover from our ongoing climate change disaster. Perhaps this will become a major step in planetary weather control. The first flight of NASA’s Nanosatellite Launch Adapter System took place in November 2013.

 

How is a Quantum Computer different from yours?

Hits: 96

NASA Quantum Computer Different
New NASA Quantum Computer Different!

How is a Quantum computer different from the computers we use today? They’re much faster, more complex and far more expensive.

Multi-state structure supports more adaptable algorithms

Quantum computing is based on a non-binary system where an atomic piece of data – like the bit – can have any possible number of states, rather than a simple 0 or 1. Quantum atomic data is called a qubit and its structure is somewhat like that of a molecule. This is significant because it enables us to represent and process massive complex data more effectively.

Data structured in this unique way can help us understand complex information that is difficult and unruly to work with even on the largest super computers. A topological approach based on data features can make representation intuitive, thus enabling us to extract valuable knowledge from huge data sets with elegance and simplicity.

What kind of outcomes can be expected from this revolutionary type of computing?

The Institute for Advanced Study advises that quantum computers would likely prove useful for needle in the haystack problems, completing them in 1 thousandth of the time a conventional search might take.

A quantum computer can be in multiple states at the same time. This enables parallel processing without parallel processors. It sounds fast and effective, but the problem arises in accessing the results. Why? Because by accessing just a single result the others disappear. Quantum interference solves this problem by combining multiple results into meaningful and measurable data.

Is a Quantum Computer different or better?

Quantum computers are expected to revolutionize artificial intelligence, transportation, space exploration, medicine and the development of pharmaceuticals, but they won’t be cheap. The price tag is not within the realm of most consumers or small businesses.

In 2014 Time Magazine discussed the $10 million cost of the first quantum computer. It was built by the Canadian Company D-Wave, in collaboration with the CIA, Jeff Bezos of Amazon and NASA. These colossal data users were the perfect partners because of the massive amounts of data they process.

A quantum computer may  be in your future if you use Amazon products, use any kind of transportation or appreciate space research and technology.

Linda Kaidan

 

Climate Change Survival

Hits: 90

Space shades can defend us from global warming
Space shades can defend us from global warming

Climate change survival is an attainable goal that requires the hard work of communities, industries, educational institutions and governments worldwide. In 1000 years, life on this planet will be extinct unless we make sensible changes to the way we live, starting now. One of the most eminent scientists on our planet, England’s Stephen Hawking, has told us that 1000 years from now our planet Earth will have an atmosphere quite similar to that of Venus if we do not take appropriate action. Continue reading “Climate Change Survival”

Problems Nano Technology Can Solve

Hits: 201

Graphene

Miracle substances made from nano-sized material can help us live longer despite serious disease, and can help save our planet from the devastating effects of climate change. Products made from these materials will be researched, designed and produced in the Mohawk Valley of New York.

Graphene is a two-dimensional material made from carbon. In 2010, groundbreaking experimentation with this substance won Konstantin Novoselov and Andre Geim the Nobel Prize in Physics. A stellar performer in material science, graphene’s electrical, chemical, mechanical and optical properties make it a miracle for innovators in many different fields.

Carbon in graphene is one atom thick, a hexagonal lattice with a single carbon atom at each of its 6 vertices. It’s ironic that a way to overcome the long-term threat to life on Earth posed by climate change is by using the very material that all life on Earth is made from, carbon.

A recent MIT tech review article discusses how super thin graphene can mitigate climate change. Graphene may be the catalyst to protecting us from slow death by  global warming. Graphene traps the carbon that contributes to CO2 pollution and the greenhouse effect—a process that’s causing average temperatures to rise all over the Earth.Through the process of graphene sequestering, this seemingly miraculous material will convert the harmful CO2 it captures into a valuable building material.

Researchers are optimistic about graphene’s huge potential for reducing atmospheric temperature. They calculate that given an area less than 10 percent of the size of the Sahara Desert, the method could remove enough carbon dioxide to make global atmospheric levels return to preindustrial levels within 10 years, even if we keep emitting the greenhouse gas at a high rate during that period.

Another nano-filter solution of global impact is desalination. Many parts of our planet are experiencing severe drought. California has long been a region where much of US produce is grown. But California’s devastating water shortage may severely impact its agricultural production.  With graphene, though, large scale and inexpensive desalination can come to the rescue.Reuters reports that defense contractor Lockheed Martin is developing graphene filters to cleanse wastewater contaminated by oil, using sheets with precisely designed openings of 1 nanometer, a one-atom thick membrane.

Graphene sheets can be produced with precisely-sized holes as small as 1 nanometer, or a billionth of a meter (0.000000001, or 10-9 meters). Lockheed eventually plans on also applying this technology to desalination. The methodology has proven successful, but further refinement  is needed to make the solution cost effective.

Filtering carbon from the atmosphere and removing salt and contaminants from ocean water are critical uses of graphene, an incredibly versatile nano material vital to solving some of the earth’s most pressing problems.

Linda Kaidan