목록 중 For Hive 0.14 and Newer에 보면 설정 방법이 나오는데, 다음을 세팅해준다. (내용을 읽어보면 좋아서 그대로 가져왔다.)
Set the following in hive-site.xml: cloudera의 경우 "hive-site.xml에대한 Hive 서비스고급구성스니펫"에 입력해주면 된다.
hive.server2.enable.doAs to false. : cloudera에서는 hive config상에서 수정 가능하다.
hive.users.in.admin.role to the list of comma-separated users who need to be added to admin role. Note that a user who belongs to the admin role needs to run the "set role" command before getting the privileges of the admin role, as this role is not in current roles by default.
Add org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly to hive.security.metastore.authorization.manager. (It takes a comma separated list, so you can add it along with StorageBasedAuthorization parameter, if you want to enable that as well). This setting disallows any of the authorization api calls to be invoked in a remote metastore. HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api. Hive cli and any other remote metastore users would be denied authorization when they try to make authorization api calls. This restricts the authorization api to privileged HiveServer2 process. You should also ensure that the metastore rdbms access is restricted to the metastore server and hiverserver2.
hive.security.authorization.manager to org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory. This will ensure that any table or views created by hive-cli have default privileges granted for the owner.
Set the following in hiveserver2-site.xml: 이 파일은 존재하지 않을 수도 있다. cloudera의 경우 "hive-site.xml에대한 HiveServer2 고급구성스니펫"에 입력해주면 된다.
위 설정을 추가한 후 beeline으로 접속한 후 테스트를 해본다. 위 설정에서 admin role을 admin 계정으로 주었고, hue에도 계정이 있어야 한다. 만약 문서에 나온것만으로 잘되면 다음과 같이 될 것이다. 하지만 나는 잘 안됐다.
[irteam@dnaa-opr-t1001 ~]$ beeline -u jdbc:hive2://[host주소]:10000 -n admin
Beeline version 1.1.0-cdh5.16.1 by Apache Hive
0: jdbc:hive2://dnaa-opr-t1001:10000> show current roles;
INFO : OK
+---------+--+
| role |
+---------+--+
| public |
+---------+--+
1 row selected (2.028 seconds)
0: jdbc:hive2://dnaa-opr-t1001:10000> select * from default.aaa; => 권한이 없어 안되는것 확인
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=admin, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.aaa]] (state=42000,code=40000)
0: jdbc:hive2://dnaa-opr-t1001:10000> set role admin;
INFO : OK
No rows affected (0.022 seconds)
0: jdbc:hive2://dnaa-opr-t1001:10000> show current roles;
INFO : Compiling command(queryId=hive_20200515160404_c358a275-56b9-4be4-b9df-b8c0e9d0bb13): show current roles
INFO : OK
+--------+--+
| role |
+--------+--+
| admin |
+--------+--+
0: jdbc:hive2://dnaa-opr-t1001:10000> select * from default.aaa;
INFO : OK
+---------+--+
| aaa.id |
+---------+--+
| 222 |
| 111 |
| 333 |
+---------+--+
3 rows selected (0.169 seconds)
위에 썼듯이 나의 경우는 잘 안됐다. 그럴경우 다음 config 중 필요한 것을 추가해보자.
아마도 위 설정 중 Factory 관련된 것들은 다음 설정이 필요한 것 같다.
<property>
<name>hive.security.authorization.task.factory</name>
<value>org.apache.hadoop.hive.ql.parse.authorization.HiveAuthorizationTaskFactoryImpl</value>
</property>
다음 설정은 hue에서 권한 설정이 실행되지 않을 때 필요하다.
<property>
<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
<value>hive\.server2\.proxy\.user</value>
</property>
다음 설정은 꼭 필요하지 않다. 권한설정이 완료되면 grant 설정이 전혀 없고 수동으로 권한을 넣어줘야 하는데, 이 설정을하면 create table 시 table owner를 기준으로 자동으로 grant 설정을 넣어준다.
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>ALL</value>
</property>
여기까지 완료되면 설정은 다했다.
beeline과 hue로 접속하면 권한이 없는 계정으로는 테이블에 접근할 수 없다.
누구에게나 공개하고 싶은 테이블은 public 권한을 주면 된다.
다만 hive 명령어로 접속하면 다른계정에서도 접근이 가능하니,
보안을 위해서 admin을 제외하고 hive 명령어에 접근할 수 없도록 해야한다.
이제 핵심 레퍼런스를 참고하여 grant로 권한을 넣고 사용하면 된다.
그러나 그래도 안된다면, 다음을 확인해보자.
hadoop.proxyuser.hive.groups *[default] 인지 확인 HDFS 상의 권한 설정 – sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehosue – sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
hive에서도 데이터를 가져올 때 parquet 포맷으로 가져오는데, 그 때는 hive 테이블 포맷이 다음과 같다.
ROW FORMAT SERDE
‘org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat’
그런데 json 파일 포맷을 hive에서 사용하고 싶다는 요청이 왔다. 설정 적용은 생각보다 어렵지 않다. hive hcatalog 라이브러리(hive-hcatalog-core-1.1.0-cdh5.8.0.jar)를 cloudera hive 라이브러리 디렉토리에 넣고 hive-site.xml에 추가해주면 된다.
<property>
<name>hive.aux.jars.path</name>
<value>[jar파일이 있는 디렉토리 경로]</value>
</property>
클라우데라 설정은 다음과 같다.
나의 경우는 hive hcatalog가 이미 설치가 돼있고, /opt/cloudera/parcels/CDH/jars 디렉토리에 라이브러리가 있었다. 설정 후 hive 관련 서비스를 재시작해주면 된다. 그리고 hive에서 json 포맷이 잘 불러와지는지 확인하면 된다.
ROW FORMAT SERDE
‘org.apache.hive.hcatalog.data.JsonSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
export SPARK_HOME=/usr/lib/spark
# 아래 설정은 해도 되고 안해도 된다.
# set hadoop conf dir
export HADOOP_CONF_DIR=/etc/hadoop/conf
# set options to pass spark-submit command
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
# extra classpath. e.g. set classpath for hive-site.xml
export ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/conf
그리고 yarn 모드는 다음과 같이 있는데, 적절한 것으로 선택한다. 보통 yarn-client나 yarn-cluster를 사용한다.
local[*] in local mode
spark://master:7077 in standalone cluster
yarn-client in Yarn client mode
yarn-cluster in Yarn cluster mode
mesos://host:5050 in Mesos cluster
java 설정이나 pyspark python 버전등을 바꾸려면 다음 설정등을 넣어줄 수 있다.
여기까지가 기본설정이며, 보통의 경우 위 설정만 해주면 zeppelin을 잘 사용할 수 있다.
다음은 kerberos가 적용된 hadoop Secure Mode일 때 설정이다. 다음 설정을 추가해준다.
# 위에서는 안해도 됐지만, 이번엔 해야한다.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export KRB5_CONFIG=[krb 설정 경로]
export LIBHDFS_OPTS="-Djava.security.krb5.conf=${KRB5_CONFIG} -Djavax.security.auth.useSubjectCredsOnly=false"
export HADOOP_OPTS="-Djava.security.krb5.conf=${KRB5_CONFIG} -Djavax.security.auth.useSubjectCredsOnly=false ${HADOOP_OPTS}"
export ZEPPELIN_INTP_JAVA_OPTS="-Djava.security.krb5.conf=${KRB5_CONFIG} -Djavax.security.auth.useSubjectCredsOnly=false"
export JAVA_INTP_OPTS="-Djava.security.krb5.conf=${KRB5_CONFIG} -Djavax.security.auth.useSubjectCredsOnly=false ${JAVA_INTP_OPTS}"
# 경우에 따라 spark-submit 시에 아래 설정이 필요할 수 있다.
export SPARK_SUBMIT_OPTIONS="--keytab [keytab경로] --principal [principal]"
위에 있는 -Djavax.security.auth.useSubjectCredsOnly=false 옵션이 없으면 아래와 같은 에러가 발생할 수 있다.(이것때문에 시간을 많이 버렸다 ㅠ)
ERROR [2019-12-20 17:41:57,275] ({pool-2-thread-2} TSaslTransport.java[open]:315) - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
at org.apache.zeppelin.jdbc.JDBCInterpreter.access$000(JDBCInterpreter.java:91)
at org.apache.zeppelin.jdbc.JDBCInterpreter$2.run(JDBCInterpreter.java:474)
at org.apache.zeppelin.jdbc.JDBCInterpreter$2.run(JDBCInterpreter.java:471)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:471)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 43 more
INFO [2019-12-20 17:41:57,278] ({pool-2-thread-2} HiveConnection.java[openTransport]:194) - Could not open client transport with JDBC Uri: jdbc:hive2://[hive주소]/;principal=hive/[principal주소]
ERROR [2019-12-20 17:41:57,278] ({pool-2-thread-2} JDBCInterpreter.java[getConnection]:478) - Error in doAs
java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:471)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://[hive주소]/;principal=hive/[principal주소]: GSS initiate failed
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:215)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:163)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
at org.apache.zeppelin.jdbc.JDBCInterpreter.access$000(JDBCInterpreter.java:91)
at org.apache.zeppelin.jdbc.JDBCInterpreter$2.run(JDBCInterpreter.java:474)
at org.apache.zeppelin.jdbc.JDBCInterpreter$2.run(JDBCInterpreter.java:471)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
... 14 more
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)
... 33 more
INFO [2019-12-20 17:41:57,285] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:120) - Job 20191220-170645_132084413 finished by scheduler org.apache.zeppelin.jdbc.JDBCInterpreter1849124988
그리고, ${SPARK_HOME}/conf/spark-defaults.conf 에 kerberos 설정을 넣어준다.
spark.yarn.principal
spark.yarn.keytab
위 설정이 잘 적용되는지 확인해봐야 한다. 만약 안될 시에는 zeppelin interpreter 설정에 추가해준다.
Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
hadoop security를 구성한 후에 위와 같이 SIMPLE authentication 오류가 발생할 수 있다. SIMPLE authentication이 허용되어 있지 않아서 발생한 내용인데, 다음 설정을 core-site.xml에 넣어주면 해결된다.