Month: October 2019

BI Administration, Business Intelligence Platform, Database Administration, Databases October 26, 2019

BI Application getting ORA-00257 Error

One day this week, we got the following error showing up on our BI dashboards.
“ORA-00257: Archiver error. Connect AS SYSDBA only until resolved.”
This is an Oracle database error (which you may guess based on the “ORA”), and not an error directly from BI application.

If you get this error, it means that the database redo logs are filled up, and cannot be archived due to lack of space on the designated archive area or some other issue. In our case, the “some other issue” was caused by some issues with “commvault”, a software application used for data backup and recovery, among other things.

When this happens, if a user tries to connect to the database, such as the BI Application user in our case, the database will not allow the new connection. The only exception is SYSDBA users will be allowed to connect.

If you are not the database administrator (DBA), you will most likely work with your DBA (as we do) to get this error resolved.
After the issue that caused the problem is resolved and the redo logs are cleared, then the database, and therefore the BI application, will allow new connections as normal.

Thanks for reading and I hope you found this helpful.

Big Data October 13, 2019

Learning Hadoop: The benefits of Hadoop commercial distributions

What are the benefits of using a commercial distribution of Hadoop? And what are the popular commercial distributions of Hadoop?

Hadoop, the preeminent open-source platform for retrieving, processing, storing and analyzing very large amounts of data, has grown tremendously from its core components pioneered by Google into a powerful ecosystem of supporting tools. There are various tools for integrating, streaming, storing, searching, and retrieving data, and tools for security and resource management, among others. And new tools keep emerging at a rapid pace.

Keeping these tools in sync with the versions that are compatible with each other, and keeping patches up-to-date, and plugging in new tools as they become available, and making sure it all works well together, along with the normal management of the Hadoop cluster, can become overwhelming for a small team. Using a commercial distribution of Hadoop alleviates this problem.

Commercial Distributions of Hadoop bundle the various tools of the ecosystem using compatible versions, ensure that they all work together, apply patches, package things in a way that makes the distribution of the software easy to download and install, and provide tools for managing the platform. For production projects created to help meet important business goals, it’s best to use a commercial distribution instead of trying to handle it all on your own. This will allow your team more time to focus on building business solutions instead of solving pesky technology issues.

Some of the most popular commercial distributions of Hadoop (not in any specific order) are:

Cloudera Hadoop Distribution (CDH)
- Some major technology vendors, such as Oracle and Dell, provide their flavors of CDH
Hortonworks Data Platform (HDP)
Amazon Elastic MapReduce
MapR Hadoop Distribution
IBM Open Platform
Microsoft Azure’s HDInsight
Pivotal Big Data Suite
Datameer Professional
Datastax Enterprise Analytics

I will provide details of the various distributions in future posts.

Analytics, Big Data, Data Analysis, Data Integration, Data Science October 10, 2019

Learning Hadoop: The key features and benefits of Hadoop

What are the key features and benefits of Hadoop? Why is Hadoop such a successful platform?

Apache Hadoop, mostly called just Hadoop, is a software framework and platform for reading, processing, storing and analyzing very large amounts of data. There are several features of Hadoop that make it a very powerful solution for data analytics.

Hadoop is Distributed

With Hadoop, from a few to hundreds or thousands of commodity servers (called nodes) can be connected (forming a cluster) to work together to achieve whatever processing power and storage capability is needed. The software platform enables the nodes to work together, passing work and data between them. Data and processing is distributed across nodes which spreads the load and significantly reduces the impact of failure.

Hadoop is Scalable

In the past, to achieve extremely powerful computing, a company would have to buy very expensive, large, monolithic computers. As data growth exploded, eventually even those super computers would become insufficient. With Hadoop, from a few to hundreds or thousands or even millions of commodity servers can be relatively easily connected to work together to achieve whatever processing power and storage capability is needed. This allows a company or project to start out small and then grow as needed inexpensively, without any concern about hitting a limitation.

Hadoop is Fault Tolerant

Hadoop was designed and built around the fact that there will be frequent failures on the commodity hardware servers that make up the Hadoop cluster. When a failure occurs, the software handles the automatic reassignment of work and replication of data to other nodes in the cluster, and the system continues to function properly without manual intervention. When a node recovers, from a reboot for example, it will rejoin the cluster automatically and become available for work.

Hadoop is backed by the power of Open Source

Hadoop is open source software, which means that it can be downloaded, installed, used and even modified for free. It is managed by the renown non-profit group, Apache Software Foundation (ASF), hence the name Apache Hadoop. The group is made up of many brilliant people from all over the world, many of whom work at some of the top technology companies, who commit their time to managing the software. In addition, there are also many developers that contribute code to enhance or add new features and functionality to Hadoop or to add new tools that work with Hadoop. The various tools that have been built over the years to complement core Hadoop make up what is called the Hadoop ecosystem. With a large community of people from all over the world continuously adding to the growth of the Hadoop ecosystem in a well-managed way, it will only get better and become more useful to many more use-cases.

These are the reasons Hadoop has become such a force within the data world. Although there is some hype around the big data phenomenon, the benefits and solutions based on the Hadoop ecosystem are real.

You can learn more at https://hadoop.apache.org