<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.gokhanatil.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.gokhanatil.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2025-10-21T18:20:48+00:00</updated><id>https://www.gokhanatil.com/feed.xml</id><title type="html">Gokhan Atil’s Technology Blog</title><subtitle>Copyright &amp;copy 2008-2023 Gokhan Atil.</subtitle><author><name>Gokhan Atil</name></author><entry><title type="html">How to Disable Caching in Snowflake for Testing Query Performance?</title><link href="https://www.gokhanatil.com/how-to-disable-caching-in-snowflake-for-testing-query-performance/" rel="alternate" type="text/html" title="How to Disable Caching in Snowflake for Testing Query Performance?" /><published>2023-02-20T00:00:00+00:00</published><updated>2023-02-20T00:00:00+00:00</updated><id>https://www.gokhanatil.com/how-to-disable-caching-in-snowflake-for-testing-query-performance</id><content type="html" xml:base="https://www.gokhanatil.com/how-to-disable-caching-in-snowflake-for-testing-query-performance/"><![CDATA[<p>It’s a common question to ask how to disable caching in Snowflake for testing. Although it’s a very straightforward question, the answer is a bit complicated, and we need to understand the cache layers of Snowflake to answer this question.</p>

<p>There are three cache layers in Snowflake:</p>

<p>1) Metadata cache: The Cloud Service Layer has a cache for metadata. It impacts compilation time and metadata-based operations such as SHOW command. The users may see slow compilation times when the metadata cache required by their query is expired. This cache cannot be turned off and is not visible to end-users if the metadata cache is used.</p>

<p>2) Warehouse cache: Each node in a warehouse has an SSD storage. Snowflake caches recently accessed micro-partitions (from the Cloud storage) in this local SSD storage on the warehouse nodes. So running similar queries may use these cached micro-partitions instead of accessing remote storage. This cache cannot be turned off, but it’s possible to see how much warehouse cache is used via the query profile:</p>

<p><img src="/assets/readingfromcache.png" alt="Reading from cache" /></p>

<!--more-->

<p>You may try to suspend/wait/resume the warehouse to clean the warehouse cache, but if the same nodes are assigned to the warehouse, your query may continue to use the cached data:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="n">WAREHOUSE</span> <span class="n">YOU_WH</span> <span class="n">SUSPEND</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="n">WAREHOUSE</span> <span class="n">YOU_WH</span> <span class="n">RESUME</span><span class="p">;</span>
</code></pre></div></div>
<p>You may run the query on a new warehouse to avoid the warehouse cache, but it will be challenging if you need to re-run the query multiple times.</p>

<p>3) Result cache: Snowflake stores the results of queries on Cloud Storage. If a query is re-executed within 24 hours and the underlying wasn’t changed (<a href="">check other requirements</a>(https://docs.snowflake.com/en/user-guide/querying-persisted-results#retrieval-optimization)), Snowflake can return the result without executing the query. Each time the persisted result for a query is reused, Snowflake resets the 24-hour retention period for the result up to 31 days from the date and time the query was first executed. After 31 days, the result is purged, and the next time the query is submitted, a new result is generated and persisted.</p>

<p>The USE_CACHED_RESULT parameter can control this cache. The following command will disable it for the current session:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">SESSION</span> <span class="k">SET</span> <span class="n">USE_CACHED_RESULT</span><span class="o">=</span><span class="k">FALSE</span><span class="p">;</span>
</code></pre></div></div>

<p>You can check the query profile to see if a query uses the result cache:</p>

<p><img src="/assets/usingresultcache.png" alt="Using result cache" /></p>

<p>Please note that even result cache will be used for a query, the query still needs to be compiled! Sometimes, people may see that the total execution time is longer when the result cache is used because the metadata cache is expired, and the query execution time is already fast.</p>

<p>So when testing the query performance, I suggest you ignore the metadata and warehouse cache. Just disable the result cache and run the query on a dedicated warehouse multiple times to get an average execution time. This will give you a better estimation of how the query will perform in the production environment.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[It’s a common question to ask how to disable caching in Snowflake for testing. Although it’s a very straightforward question, the answer is a bit complicated, and we need to understand the cache layers of Snowflake to answer this question. There are three cache layers in Snowflake: 1) Metadata cache: The Cloud Service Layer has a cache for metadata. It impacts compilation time and metadata-based operations such as SHOW command. The users may see slow compilation times when the metadata cache required by their query is expired. This cache cannot be turned off and is not visible to end-users if the metadata cache is used. 2) Warehouse cache: Each node in a warehouse has an SSD storage. Snowflake caches recently accessed micro-partitions (from the Cloud storage) in this local SSD storage on the warehouse nodes. So running similar queries may use these cached micro-partitions instead of accessing remote storage. This cache cannot be turned off, but it’s possible to see how much warehouse cache is used via the query profile:]]></summary></entry><entry><title type="html">PySpark Examples</title><link href="https://www.gokhanatil.com/pyspark-examples/" rel="alternate" type="text/html" title="PySpark Examples" /><published>2023-02-16T00:00:00+00:00</published><updated>2023-02-16T00:00:00+00:00</updated><id>https://www.gokhanatil.com/pyspark-examples</id><content type="html" xml:base="https://www.gokhanatil.com/pyspark-examples/"><![CDATA[<p>This post contains some sample PySpark scripts. During my “Spark with Python” presentation, I said I would share example codes (with detailed explanations). I posted them separately earlier but decided to put them together in one post.</p>

<h3 id="grouping-data-from-csv-file-using-rdds">Grouping Data From CSV File (Using RDDs)</h3>

<p>For this sample code, I use the <a href="https://files.grouplens.org/datasets/movielens/ml-100k/u.user">u.user</a> file of MovieLens 100K Dataset. I renamed it as “users.csv”, but you can use it with the current name if you want.</p>

<p><img src="/assets/pyspark1.png" alt="Pyspark1" /></p>

<!--more-->

<p>Using this simple data, I will group users based on gender and find the number of men and women in the users data. As you can see, the 3rd element indicates the gender of a user, and the columns are separated with a pipe symbol instead of a comma. So I write the below script:</p>

<script src="https://gist.github.com/d37ac1eda43990629d602edf0153aba4.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Line 1) Each Spark application needs a Spark Context object to access Spark APIs. So we start with importing the SparkContext library.</li>
  <li>Line 3) Then I create a Spark Context object (as “sc”). If you run this code in a PySpark client or a notebook such as Zeppelin, you should ignore the first two steps (importing SparkContext and creating sc object) because SparkContext is already defined. You should also skip the last line because you don’t need to stop the Spark context.</li>
  <li>Line 5) sc.TextFile method reads from a file and returns the content as RDD (when we call an action because RDDs have lazy evaluation). The print command will write out the result.</li>
  <li>Line 6) I use “map” to apply a function to all rows of RDD. Instead of defining a regular function, I use the “lambda” function. The lambda functions have no name and are defined inline where they are used. My function accepts a string parameter (called X), parses the X string to a list, and returns the combination of the 3rd element of the list with “1”. So we get Key-Value pairs like (‘M’,1) and (‘F’,1). By the way, the index of the first element is 0.</li>
  <li>Line 7) reduceByKey method is used to aggregate each key using the given reduce function. The previous “map” function produced an RDD which contains (‘M’,1) and (‘F’,1) elements. So the reduceByKey will group ‘M’ and ‘F’ keys, and the lambda function will add these 1’s to find the number of elements in each group. The result will be a Python list object: [(u’M’, 670), (u’F’, 273)]</li>
  <li>Line 8) Collect is an action to retrieve all returned rows (as a list), so Spark will process all RDD transformations and calculate the result.</li>
  <li>Line 10) sc.stop will stop the context – as I said, it’s not necessary for PySpark client or notebooks such as Zeppelin.</li>
</ul>

<p>If you’re not familiar with the lambda functions, let me share the same script with regular functions:</p>

<script src="https://gist.github.com/5efd9dc59527c4fec47d9a04dfd16972.js"> </script>

<p>It produces the same result with the same performance. Now let me write another one. This time, I will group the users based on their occupations:</p>

<script src="https://gist.github.com/2182798bd287c4592eb04c1a96d22890.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1,3,14) I already explained them in the previous code block.</li>
  <li>Line 5) Instead of writing the output directly, I will store the result of the RDD in a variable called “result”. sc.textFile opens the text file and returns an RDD.</li>
  <li>Line 6) I parse the columns and get the occupation information (4th column)</li>
  <li>Line 7) I filter out the users whose occupation information is “other”</li>
  <li>Line 8) Calculating the counts of each group</li>
  <li>Line 9) I sort the data based on “counts” (x[0] holds the occupation info, x[1] contains the counts) and retrieve the result.</li>
  <li>Lined 11) Instead of print, I use “for loop” so the output of the result looks better.</li>
</ul>

<h3 id="grouping-data-from-csv-file-using-dataframes">Grouping Data From CSV File (Using Dataframes)</h3>

<p>This time, I will use DataFrames instead of RDDs. DataFrames are distributed data collections organized into named columns (in a structured way). They are similar to tables in relational databases. They also provide a domain-specific language API to manipulate your distributed data, so it’s easier to use.</p>

<p>The Spark SQL module provides DataFrames, which are primarily used as API for Spark’s Machine Learning lib and structured streaming modules. Spark developers recommend using DataFrames instead of RDDs because the Catalyst (Spark Optimizer) will optimize your execution plan and generate better code to process the data.</p>

<p>I will use the “u.user” file of MovieLens 100K Dataset again. I will find the total number of men and women in the users data. I recommend you compare these codes with the previous ones (in which I used RDDs) to see the difference.</p>

<script src="https://gist.github.com/7be70195ed87181d6f5b34d86d4dfda6.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1-5,12) I already explained them in previous code blocks.</li>
  <li>Line 7) I use DataFrameReader object of spark (spark.read) to load CSV data. As you can see, I don’t need to write a mapper to parse the CSV file.</li>
  <li>Line 8) If the CSV file has headers, DataFrameReader can use them, but our sample CSV has no headers, so I give the column names.</li>
  <li>Line 9) Instead of reduceByKey, I use the groupby method to group the data.</li>
  <li>Line 10) I calculate the counts, add them to the grouped data, and show the method prints the output.</li>
</ul>

<p>What if we want to group the users based on their occupations?</p>

<script src="https://gist.github.com/f0d09252ed4591575dd4cec24cf57021.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1-5,14) I already explained them in previous code blocks.</li>
  <li>Line 7) I use DataFrameReader object of spark (spark.read) to load CSV data. As you can see, I don’t need to write a mapper to parse the CSV file.</li>
  <li>Line 8) If the CSV file has headers, DataFrameReader can use them, but our sample CSV has no headers, so I give the column names.</li>
  <li>Line 9) “Where” is an alias for the filter (but it sounds more SQL-ish. Therefore, I use it). I use the “where” method to select the rows whose occupation is not others.</li>
  <li>Line 10) I group the users based on occupation.</li>
  <li>Line 11) Count them, and sort the output ascending based on counts.</li>
  <li>Line 12) I use the show to print the result</li>
</ul>

<p>Please compare these scripts with RDD versions. You’ll see that using DataFrames is more straightforward, especially when analyzing data.</p>

<h3 id="spark-sql-module">Spark SQL Module</h3>

<p>Spark SQL Module provides DataFrames (and DataSets – but Python doesn’t support DataSets because it’s a dynamically typed language) to work with structured data. First, let’s start creating a temporary table from a CSV file and run a query on it. I will use the “u.user” file of MovieLens 100K Data (I save it as users.csv).</p>

<script src="https://gist.github.com/f187f980e076a10d4f7511ddff8368e7.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1-5,13) I already explained them in previous code blocks.</li>
  <li>Line 7) I use DataFrameReader object of spark (spark.read) to load CSV data. As you can see, I don’t need to write a mapper to parse the CSV file.</li>
  <li>Line 8) If the CSV file has headers, DataFrameReader can use them, but our sample CSV has no headers, so I give the column names.</li>
  <li>Line 9) Using the “createOrReplaceTempView” method, I register my data as a temporary view.</li>
  <li>Line 11) I run SQL to query my temporary view using Spark Sessions sql method. The result is a DataFrame, so I can use the show method to print the result.</li>
</ul>

<p>When I check the tables with “show tables”, I see that the “users” table is temporary, so when our session(job) is done, the table will be gone. What if we want to store our data as persistent? If our Spark environment is configured to connect Hive, we can use the DataFrameWriter object’s “saveAsTable” method. We can also save the file as a parquet table, CSV file, or JSON file.</p>

<script src="https://gist.github.com/90550a9f4e3ed5ed2f5b20b5038dcdcd.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1-5,21) I already explained them in previous code blocks.</li>
  <li>Line 7) I use DataFrameReader object of spark (spark.read) to load CSV data. The result will be stored in df (a DataFrame object)</li>
  <li>Line 8) If the CSV file has headers, DataFrameReader can use them, but our sample CSV has no headers, so I give the column names.</li>
  <li>Line 10) I use the saveAsTable method of DataFrameWriter (write property of a DataFrame) to save the data directly to Hive. The “mode” parameter lets me overwrite the table if it already exists.</li>
  <li>Line 12) I save data as JSON files in the “users_json” directory.</li>
  <li>Line 14) I save data as JSON parquet in the “users_parquet” directory.</li>
  <li>Line 16) I save data as CSV files in the “users_csv” directory.</li>
  <li>Line 18) Spark SQL’s direct read capabilities are incredible. You can directly run SQL queries on supported files (JSON, CSV, parquet). Because I selected a JSON file for my example, I did not need to name the columns. The column names are automatically generated from JSON files.</li>
</ul>

<p>Spark SQL module also enables you to access various data sources, including Hive, Avro, Parquet, ORC, JSON, and JDBC. You can even join data from different data sources.</p>

<h3 id="discretized-streams-dstreams">Discretized Streams (Dstreams)</h3>

<p>Spark supports two different ways of streaming: Discretized Streams (DStreams) and Structured Streaming. DStreams is the basic abstraction in Spark Streaming. It is a continuous sequence of RDDs representing a stream of data. Structured Streaming is the newer way of streaming built on the Spark SQL engine.</p>

<p>When you search for any example scripts about DStreams, you find sample codes that read data from TCP sockets. So I decided to write a different one: My sample code will read from files in a directory. The script will check the directory every second and process the new CSV files it finds. Here’s the code:</p>

<script src="https://gist.github.com/1a3c5bfd606b686d37f8f90a6976f3b6.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1,4) I already explained this in previous code blocks.</li>
  <li>Line 2) For DStreams, I import the StreamingContext library.</li>
  <li>Line 5) I create a Streaming Context object. The second parameter indicated the interval (1 second) for processing streaming data.</li>
  <li>Line 7) Using textFileStream, I set the source directory for streaming, and create a DStream object.</li>
  <li>Line 8) This simple function parses the CSV file.</li>
  <li>Line 10) This is the action command for the DStream object. pprint method writes the content.</li>
  <li>Line 12) Starts the streaming process.</li>
  <li>Line 14) Waits until the script is terminated manually.</li>
</ul>

<p>At every second, the script will check “/tmp/stream” folder, if it finds a new file, it will process the file and write the output. For example, if we put a file that contains the following data in the folder:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fatih,5
Cenk,4
Ahmet,3
Arda,1
</code></pre></div></div>

<p>The script will print:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-------------------------------------------
Time: 2023-02-16 13:31:53
-------------------------------------------
['Fatih', '5']
['Cenk', '4']
['Ahmet', '3']
['Arda', '1']
</code></pre></div></div>

<p><strong>pprint</strong> is a perfect function to debug your code, but you probably want to store the streaming data to an external target (such as a Database or HDFS location). DStream object’s foreachRDD method can be used for it. Here’s another code to save the streaming data to JSON files:</p>

<script src="https://gist.github.com/60ccd038d5dc1f72bee4b52c03a196eb.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1,5,6,19,21) I already explained them in previous code blocks.</li>
  <li>Line 2) Because I’ll use DataFrames, I also import the SparkSession library.</li>
  <li>Line 3) For DStreams, I import the StreamingContext library.</li>
  <li>Line 7) I create a Streaming Context object. The second parameter indicated the interval (1 second) for processing streaming data.</li>
  <li>Line 9) Using textFileStream, I set the source directory for streaming and created a DStream object.</li>
  <li>Line 10) This simple function parses the CSV file.</li>
  <li>Line 12) I define a function accepting an RDD as parameter.</li>
  <li>Line 13) This function will be called every second – even if there’s no streaming data, so I check if the RDD is not empty</li>
  <li>Line 14) Convert the RDD to a DataFrame with columns “name” and “score”.</li>
  <li>Line 15) Write the data to the points_json folder as JSON files.</li>
  <li>Line 17) Assign the saveresult function for processing streaming data</li>
</ul>

<p>After storing all these data in JSON format, we can run a simple script to query data:</p>

<script src="https://gist.github.com/35b701e5c80d4c0a016ad67fee3c939d.js"> </script>

<h3 id="structured-streaming">Structured Streaming</h3>

<p>Structured Streaming is a stream processing engine built on the Spark SQL engine. It supports File and Kafka sources for production; Socket and Rate sources for testing. Here is a very simple example to demonstrate how structured streams work:</p>

<script src="https://gist.github.com/cf94c0b0379021cb54fb4462e8a9ac91.js"> </script>

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Lines 1-5) I already explained them in previous code blocks.</li>
  <li>Line 7) I create a DataFrame to process streaming data.</li>
  <li>Line 8) It will read CSV files in the path (/tmp/stream/), and the CSV files will contain the name (string) and points (int) data. By default, Structured Streaming from file-based sources requires you to specify the schema, rather than rely on Spark to infer it automatically.</li>
  <li>Line 9) The data will be grouped based on the “name” column, and aggregate points.</li>
  <li>Line 10) The data will be ordered based on points (descending)</li>
  <li>Line 12) The output will be written to the console and the application will wait for termination.</li>
</ul>

<p>For testing, I created 2 CSV files:</p>

<p>1.csv:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fatih,5
Cenk,4
Ahmet,3
Arda,1
</code></pre></div></div>

<p>2.csv:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fatih,1
Cenk,1
Ahmet,2
Osman,1
David,2
</code></pre></div></div>

<p>Then I started the script, and on another terminal, I copied the above files one by one to /tmp/stream/ directory (if you don’t have the directory, you should create it):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp 1.csv /tmp/stream 
cp 2.csv /tmp/stream
</code></pre></div></div>

<p>Here is the output of the PySpark script:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-------------------------------------------                                     
Batch: 0
-------------------------------------------
+-----+-----------+
| name|sum(points)|
+-----+-----------+
|Fatih|          5|
| Cenk|          4|
|Ahmet|          3|
| Arda|          1|
+-----+-----------+

-------------------------------------------                                     
Batch: 1
-------------------------------------------
+-----+-----------+
| name|sum(points)|
+-----+-----------+
|Fatih|          6|
| Cenk|          5|
|Ahmet|          5|
|David|          2|
| Arda|          1|
|Osman|          1|
+-----+-----------+
</code></pre></div></div>
<p>Although I also talked about GraphFrames and Spark’s Machine Learning capabilities in my presentation, I will not include examples of them in this blog post. I hope this blog post will be helpful.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[This post contains some sample PySpark scripts. During my “Spark with Python” presentation, I said I would share example codes (with detailed explanations). I posted them separately earlier but decided to put them together in one post. Grouping Data From CSV File (Using RDDs) For this sample code, I use the u.user file of MovieLens 100K Dataset. I renamed it as “users.csv”, but you can use it with the current name if you want.]]></summary></entry><entry><title type="html">How To Find Storage Occupied by Each Internal Stage in Snowflake?</title><link href="https://www.gokhanatil.com/how-to-find-storage-occupied-by-each-internal-stage-in-snowflake/" rel="alternate" type="text/html" title="How To Find Storage Occupied by Each Internal Stage in Snowflake?" /><published>2022-11-18T00:00:00+00:00</published><updated>2022-11-18T00:00:00+00:00</updated><id>https://www.gokhanatil.com/how-to-find-storage-occupied-by-each-internal-stage-in-snowflake</id><content type="html" xml:base="https://www.gokhanatil.com/how-to-find-storage-occupied-by-each-internal-stage-in-snowflake/"><![CDATA[<p>The account usage view has two views related to stages: STAGES and STAGE_STORAGE_USAGE_HISTORY. The STAGES view helps list all the stages defined in your account but does not show how much storage each stage consumes. The STAGE_STORAGE_USAGE_HISTORY view shows the total usage of all stages but doesn’t show detailed use.</p>

<p>I wrote the following script to list the internal stages (and their occupied storage) in all available databases:</p>

<script src="https://gist.github.com/abbc604b0e69ff0c545d014c167b24ba.js"> </script>

<!--more-->

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Line 1: Decleration of variables</li>
  <li>Line 2: A resultset is a SQL data type that points to the result set of a query.</li>
  <li>Line 3: A variant variable to hold the list of stages and sizes</li>
  <li>Lines 5: Total_size will store the size of a stage</li>
  <li>Lines 6: This query will convert the variant data to a table with two columns (stage_name and total_bytes)</li>
  <li>Lines 7-10) Defining a cursor to list all internal stages</li>
  <li>Line 11) Begin of the anonymous code block</li>
  <li>Line 12) Initializing rpt variable as an empty array</li>
  <li>Line 13) A loop for each record in cursor (each stage in the database)</li>
  <li>Line 14) Begining of the block that we handle exceptions</li>
  <li>Line 15) The name of the stage is assigned to the name variable</li>
  <li>Line 16) Execute the LS command for the stage and store the result in the RES variable</li>
  <li>Line 17) Open a new cursor to process the rows in the RES variable</li>
  <li>Line 18) Total size is 0</li>
  <li>Lines 19-21) The size of each file in the stage, is added to the total_size variable</li>
  <li>Line 22) Add the stage name and the total_size as a new element to the RPT array</li>
  <li>Lines 23-25) Exception handling section to return -1 as the size if we can’t access the stage</li>
  <li>Line 26) End ot the block that we handle exceptions</li>
  <li>Line 27) Repeat this process for each stage. The loops was stated on line 12</li>
  <li>Line 28) Convert the array to a table with two columns (stage_name and total_bytes)</li>
  <li>Line 29) Return the result as a table</li>
  <li>Line 30) Ends the anonymous block</li>
</ul>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[The account usage view has two views related to stages: STAGES and STAGE_STORAGE_USAGE_HISTORY. The STAGES view helps list all the stages defined in your account but does not show how much storage each stage consumes. The STAGE_STORAGE_USAGE_HISTORY view shows the total usage of all stages but doesn’t show detailed use. I wrote the following script to list the internal stages (and their occupied storage) in all available databases:]]></summary></entry><entry><title type="html">Amazon QLDB and the Missing Command Line Client</title><link href="https://www.gokhanatil.com/qldb-the-missing-command-line-client/" rel="alternate" type="text/html" title="Amazon QLDB and the Missing Command Line Client" /><published>2019-09-12T00:00:00+00:00</published><updated>2019-09-12T00:00:00+00:00</updated><id>https://www.gokhanatil.com/qldb-the-missing-command-line-client</id><content type="html" xml:base="https://www.gokhanatil.com/qldb-the-missing-command-line-client/"><![CDATA[<p>Amazon <a href="https://aws.amazon.com/qldb/">Quantum Ledger Database</a> is a fully managed ledger database that tracks all changes in user data and maintains a verifiable history of changes over time. It was announced at AWS re:Invent 2018 and is now available in five AWS regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo).</p>

<p>You may ask why you would like to use QLDB (a ledger database) instead of your traditional database solution. We all know that it’s possible to create history tables for our fact tables and keep them up to date using triggers, stored procedures, or even with our application code (by writing changes of the main table to its history table). You can also say that your database has write-ahead/redo logs, so it’s possible to track and verify the changes in all your data as long as you keep them in your archive. On the other hand, this will create an extra workload and complexity for the database administrator and the application developer. At the same time, it does not guarantee that the data was intact and reliable. What if your DBA directly modifies the data and history table after disabling the triggers and altering the archived logs? You may say it’s too hard, but you know that it’s technically possible. In a legal dispute or a security compliance investigation, this might be enough to question the integrity of the data.</p>

<p><img src="/assets/journal-structure.png" alt="Journal-Structure" /></p>

<!--more-->

<p>QLDB solves this problem with a cryptographically verifiable journal. When an application needs to modify data in a document, the changes are logged into the journal files first (WAL concept). The difference is that each block is hashed (SHA-256) for “verification” and has a sequence number to specify its address within the journal. QLDB calculates this hash value using the journal block’s content and the previous block’s hash value. So the journal blocks are chained by the hash values! The QLDB users do not have access to the immutable logs. If someone modifies data, they also need to update the journal blocks related to the data. This will cause a new hash to be generated for the journal block, and all the following blocks will have a different hash value than before.</p>

<p>As a devil’s advocate, you may wonder, “what happens if my data is modified without my permission and all the journal blocks are regenerated with new hash values”. It’s a very unlikely situation, but what if it happens? Honestly, this was my first question when I heard about the chain mechanism between the journal blocks. QLDB lets you download (and store) the last generated hash value (the digest), and this digest can be used to verify all previously committed transactions. So you have the key (the digest) to verify the integrity of the data. If any change is made to your journals or your data, even a bite changes, the digest will be different and not match yours!</p>

<p>What else do you need to know about QLDB?</p>

<ol>
  <li>Journal-first: The system of record is the journal instead of the table storage.</li>
  <li>It’s immutable. You can see all history of data (even deleted data) because nothing can be deleted from the journal.</li>
  <li>Cryptographically verifiable: Hash-chaining provides data integrity.</li>
  <li>Highly scalable: It’s serverless, and you don’t need to maintain the underlying structure or resources.</li>
  <li>Easy to use: Supports PartiQL – SQL-compatible access to relational, semi-structured, and nested data.</li>
  <li>Document based: All records are stored in Amazon Ion format.</li>
</ol>

<p>If you create a ledger, you’ll see that you can access data in two ways: Write a Java application or use the query editor. Unfortunately, there’s no “data import” tool or a command line client for now. So I wrote a basic command line tool which you can download the JAR file and the sources from the GitHub repository:</p>

<p><a href="https://github.com/gokhanatil/qldbcli"><img src="/assets/gitHub-download-button.png" alt="Download" /></a></p>

<p>It supports importing CSV files to the existing tables in your ledger. Of course, it’s just a sample application and not designed for production work. To be able to use it, you need to configure your AWS CLI and set the region where your QLDB ledger resides.</p>

<p>After that, you can run it with “java -jar qldbcli.jar -l LEDGERNAME” (My ledger name is <strong>Deneme</strong>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar qldbcli.jar -l Deneme
----------------------------------------------------------
QLDB "the missing" Command Line Client v0.2 by Gokhan Atil
----------------------------------------------------------
PartiQL (Deneme) &gt; select * from Vehicle
VIN,Type,Year,Make,Model,Color
"3HGGK5G53FM761765","Motorcycle",2011,"Ducati","Monster 1200","Yellow"
"1HVBBAANXWH544237","Semi",2009,"Ford","F 150","Black"
"KM8SRDHF6EU074761","Sedan",2015,"Tesla","Model S","Blue"
"1C4RJFAG0FC625797","Sedan",2019,"Mercedes","CLK 350","White"
"1N4AL11D75C109151","Sedan",2011,"Audi","A5","Silver"
</code></pre></div></div>

<p>If you want to quit from the client, you can use the “quit” command, and if you’re going to connect to another ledger, you can use the “CONN LedgerName” command. I usually use it for exporting/importing sample data between my tables. You can export your ledger table as CSV. All you need is to run a SELECT query with the “-q” parameter and then redirect the output to a file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar qldbcli.jar -l Deneme -q "SELECT * FROM Vehicle" &gt; Vehicle.csv
</code></pre></div></div>

<p>You can also import data from a CSV file to a table you created on the ledger (Please note that I created the NewVehicle table by running the “CREATE TABLE NewVehicle” PartiQL command before running the import). The application expects you to give the filename (with “-f” parameter) and the target table (with “-t” parameter):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar qldbcli.jar -l Deneme -f Vehicle.csv -t NewVehicle
</code></pre></div></div>

<p>You can enable verbose mode to debug connection or the errors you get with PartiQL commands by the “-v” parameter:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java -jar qldbcli.jar -l Deneme -v
</code></pre></div></div>

<p>Please keep in your mind that the ledger, table, and field names are all case-sensitive. As I said, my sample application is not designed to use for production, so please do not expect to use it to import millions of records. On the other hand, if you examine the source codes, it might help you to write your own QLDB application.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[Amazon Quantum Ledger Database is a fully managed ledger database that tracks all changes in user data and maintains a verifiable history of changes over time. It was announced at AWS re:Invent 2018 and is now available in five AWS regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). You may ask why you would like to use QLDB (a ledger database) instead of your traditional database solution. We all know that it’s possible to create history tables for our fact tables and keep them up to date using triggers, stored procedures, or even with our application code (by writing changes of the main table to its history table). You can also say that your database has write-ahead/redo logs, so it’s possible to track and verify the changes in all your data as long as you keep them in your archive. On the other hand, this will create an extra workload and complexity for the database administrator and the application developer. At the same time, it does not guarantee that the data was intact and reliable. What if your DBA directly modifies the data and history table after disabling the triggers and altering the archived logs? You may say it’s too hard, but you know that it’s technically possible. In a legal dispute or a security compliance investigation, this might be enough to question the integrity of the data.]]></summary></entry><entry><title type="html">Sample AWS Lambda Function to Monitor Oracle Databases</title><link href="https://www.gokhanatil.com/sample-aws-lambda-function-to-monitor-oracle-databases/" rel="alternate" type="text/html" title="Sample AWS Lambda Function to Monitor Oracle Databases" /><published>2019-09-04T00:00:00+00:00</published><updated>2019-09-04T00:00:00+00:00</updated><id>https://www.gokhanatil.com/sample-aws-lambda-function-to-monitor-oracle-databases</id><content type="html" xml:base="https://www.gokhanatil.com/sample-aws-lambda-function-to-monitor-oracle-databases/"><![CDATA[<p>I wrote a simple AWS Lambda function to demonstrate how to connect an Oracle database, gather the tablespace usage information, and send these metrics to CloudWatch. First, I wrote this lambda function in Python, and then I had to re-write it in Java. As you may know, you need to use the cx_oracle module to connect Oracle Databases with Python. This extension module requires some libraries shipped by Oracle Database Client (oh God!). It’s a little bit tricky to pack it for the AWS Lambda.</p>

<p>Here’s the main class which a Lambda function can use:</p>

<script src="https://gist.github.com/37cb009db0109240014cdb00648b7f52.js"> </script>

<!--more-->

<p>Here is the step-by-step explanation of the above script:</p>

<ul>
  <li>Line 15) I created a class named Monitoring – this will be the handler class. To be able to test it, it accepts JSON objects. This is why it implements “RequestHandler, String&gt;”.</li>
  <li>Line 16) Definition of the function required to handle requests</li>
  <li>Line 18) Taken from the AWS documents. It’s the object which we use to push metrics to CloudWatch</li>
  <li>Lines 20-24) Loading the JDBC class to be able to connect an Oracle database</li>
  <li>Lines 26-29) Defining the connection and credentials. As you see, I fetch the required parameters from the environment because Lambda lets you define env variables</li>
  <li>Lines 32-33) Connecting to the database</li>
  <li>Line 35) Creating a statement object to run queries</li>
  <li>Line 37) As a sample, I query dba_tablespace_usage_metrics to get the usage percentage for tablespaces</li>
  <li>Line 38) Loop for each record</li>
  <li>Lines 40-47) Preparing the metric I want to push to CloudWatch – The metric will be stored as “Space Used (pct) for each tablespace in the “Databases” namespace in the customer metrics</li>
  <li>Line 49) Pushes the metric data</li>
  <li>Lines 53-55) Mandatory exception handling block</li>
  <li>Line 57) Returns a result (success)</li>
</ul>

<p>You may notice a lot of “System.getenv” calls. I used it to read the environment parameters from the Lambda function.</p>

<p><a href="https://github.com/gokhanatil/samplelambda"><img src="/assets/gitHub-download-button.png" alt="Download" /></a></p>

<p>You can download the Maven project from my GitHub repository. After you build the maven project, you need to upload the JAR file to your S3 bucket:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws s3 cp target/samplelambda-0.1-jar-with-dependencies.jar s3://yourbucketname/
</code></pre></div></div>

<p>I used this JAR file to create a lambda function through the AWS Lambda console. My package name is “com.gokhanatil.samplelambda”, my class name is “Monitoring”, and the function name is “handleRequest”. This is why I entered “com.gokhanatil.samplelambda.Monitoring::handleRequest” as the handler. Please note that you need to give the required permissions to grant access to CloudWatch (e.g. CloudWatchFullAccess) and AWSLambdaVPCAccessExecutionRole. You also need to enter the Virtual Private Cloud (VPC) information (Subnets: pick two subnets that can reach your Oracle instance. Security groups: select one that gives you access to your Oracle instance).</p>

<p>To schedule this job to run periodically, you can use “CloudWatch events”. This event will be triggered every 15 minutes and launch the lambda function.</p>

<p><img src="/assets/cloudwatch-metrics.png" alt="Metrics" /></p>

<p>After enabling the lambda function, new metrics appear in the CloudWatch dashboard. It’s possible to create an alarm for these metrics using the console or AWS CLI commands.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[I wrote a simple AWS Lambda function to demonstrate how to connect an Oracle database, gather the tablespace usage information, and send these metrics to CloudWatch. First, I wrote this lambda function in Python, and then I had to re-write it in Java. As you may know, you need to use the cx_oracle module to connect Oracle Databases with Python. This extension module requires some libraries shipped by Oracle Database Client (oh God!). It’s a little bit tricky to pack it for the AWS Lambda. Here’s the main class which a Lambda function can use:]]></summary></entry><entry><title type="html">How to Build A Cassandra Cluster On Docker?</title><link href="https://www.gokhanatil.com/how-to-build-cassandra-cluster-on-docker/" rel="alternate" type="text/html" title="How to Build A Cassandra Cluster On Docker?" /><published>2018-02-13T00:00:00+00:00</published><updated>2018-02-13T00:00:00+00:00</updated><id>https://www.gokhanatil.com/how-to-build-cassandra-cluster-on-docker</id><content type="html" xml:base="https://www.gokhanatil.com/how-to-build-cassandra-cluster-on-docker/"><![CDATA[<p>In this blog post, I’ll show how to build a three-node Cassandra cluster on Docker for testing. I’ll use official Cassandra images instead of creating my images, so all processes will take only a few minutes (depending on your network connection). I assume you have Docker installed on your PC, have an internet connection (I was born in 1976, so it’s normal for me to ask this kind of question), and have at least 8 GB RAM. First, we need to assign about 5 GB RAM to Docker (in case it has less RAM) because each node will require 1.5+ GB RAM to work properly.</p>

<p><img src="/assets/dockermemory.png" alt="Docker Memory" /></p>

<p>Open the docker preferences, click the advanced tab, set the memory to 5 GB or more, and click “apply and restart” docker service. Launch a terminal window, and run the “docker pull cassandra” command to fetch the latest official Cassandra image.</p>

<p>I’ll use cas1, cas2, cas3 as the node names, and the name of my Cassandra cluster will be “MyCluster” (a very creative and unique name). I’ll also configure cas1 and cas2 like they are placed in datacenter1 and cas3 like it’s placed in datacenter2. So we’ll have three nodes, two of them in datacenter1 and one in datacenter2 (to test Cassandra’s multi-DC replication support). For multi-DC support, my Cassandra nodes will use “GossipingPropertyFileSnitch”. This extra information can be passed to docker containers using environment variables (with -e parameter).</p>

<!--more-->

<p>Now it’s time to start the first node:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run --name cas1 -p 9042:9042 -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 -d cassandra
</code></pre></div></div>

<p>The -p parameter is for publishing the container’s port to the host, so I could connect to the Cassandra service from the outside of the docker container (for example, using DataStax Studio or DevCenter). After the first node is up, I’ll add the cas2 and cas3 nodes, but I need to tell them the IP address of cas1, so they can use it as the seed node and join the cluster. We can find the IP address of cas1 by running the following command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker inspect --format='' cas1
</code></pre></div></div>

<p>I’ll add it to docker run command strings for cas2 and cas3:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run --name cas2 -e CASSANDRA_SEEDS="$(docker inspect --format='' cas1)" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 -d cassandra

docker run --name cas3 -e CASSANDRA_SEEDS="$(docker inspect --format='' cas1)" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter2 -d cassandra
</code></pre></div></div>

<p>I gave a different datacenter name (datacenter2) while creating the cas3 node. Run them one by one, give time to the new nodes to join the cluster, and then run the “nodetool status” command from cas1 (or any other node):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker exec -ti cas1 nodetool status
</code></pre></div></div>

<p>The above code, connects to cas1 node and runs the “nodetool status” command. If everything went fine, you should see something similar to the below output.</p>

<p><img src="/assets/nodetoolstatus.png" alt="Node Tool Status" /></p>

<p>The status column of each node should show UN (node is <strong>UP</strong> and its state is <strong>Normal</strong>). If you see “UJ” that means your node is joining, just wait a while and recheck it. If your new nodes didn’t appear in the list, they probably crashed before joining the cluster. In this case, you may restart the missing nodes. For example, if cas3 (the last node) didn’t join the cluster and it’s down, you can run the “docker start cas3” command to start it. It’ll try to join the cluster automatically.</p>

<p>Now Let’s create a keyspace (database) that will be replicated to datacenter1 and datacenter2 and a table in this newly created keyspace. I’ll use NetworkTopologyStrategy for replicating data. Each datacenter will store one copy of data. Here is the CQL (Cassandra query language) commands to create the keyspace and table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="n">KEYSPACE</span> <span class="n">mykeyspace</span>
<span class="k">WITH</span> <span class="n">replication</span> <span class="o">=</span> <span class="p">{</span>
	<span class="s1">'class'</span> <span class="p">:</span> <span class="s1">'NetworkTopologyStrategy'</span><span class="p">,</span>
	<span class="s1">'datacenter1'</span> <span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
	<span class="s1">'datacenter2'</span> <span class="p">:</span> <span class="mi">1</span>
<span class="p">};</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">mykeyspace</span><span class="p">.</span><span class="n">mytable</span> <span class="p">(</span>
	<span class="n">id</span> <span class="nb">int</span> <span class="k">primary</span> <span class="k">key</span><span class="p">,</span>
	<span class="n">name</span> <span class="nb">text</span>
<span class="p">);</span>
</code></pre></div></div>

<p>We can execute these commands using cqlsh by connecting one of our nodes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker exec -ti cas1 cqlsh
</code></pre></div></div>

<p>Or we can execute them using a client program such as DevCenter (you need to register to the DataStax website to be able to download it). I tried to find a stable GUI for Cassandra, and DevCenter looks fine to me:</p>

<p><img src="/assets/devcenter.png" alt="Node Tool Status" /></p>

<p>After we created the keyspace, we can run “nodetool status” to check the data distribution:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker exec -ti cas1 nodetool status mykeyspace
</code></pre></div></div>

<p>As you can see, I gave the name of the keyspace as a parameter to nodetool, so it will show the distribution of our newly created keyspace.</p>

<p><img src="/assets/datadistribution.png" alt="Data Distribution" /></p>

<p>Did you notice that the nodes at datacenter1 share data almost evenly, while the node at datacenter2 has a replication of all data? Remember the replication strategy of our keyspace: Each datacenter stores one copy. Because there are two nodes in datacenter1, the data will be evenly distributed between these two nodes.</p>

<p>You can shut down nodes using “docker stop cas1 cas2 cas3” and start them again with “docker start cas1 cas2 cas3”. So, we have a working Cassandra cluster that is deployed to multiple data centers.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[In this blog post, I’ll show how to build a three-node Cassandra cluster on Docker for testing. I’ll use official Cassandra images instead of creating my images, so all processes will take only a few minutes (depending on your network connection). I assume you have Docker installed on your PC, have an internet connection (I was born in 1976, so it’s normal for me to ask this kind of question), and have at least 8 GB RAM. First, we need to assign about 5 GB RAM to Docker (in case it has less RAM) because each node will require 1.5+ GB RAM to work properly. Open the docker preferences, click the advanced tab, set the memory to 5 GB or more, and click “apply and restart” docker service. Launch a terminal window, and run the “docker pull cassandra” command to fetch the latest official Cassandra image. I’ll use cas1, cas2, cas3 as the node names, and the name of my Cassandra cluster will be “MyCluster” (a very creative and unique name). I’ll also configure cas1 and cas2 like they are placed in datacenter1 and cas3 like it’s placed in datacenter2. So we’ll have three nodes, two of them in datacenter1 and one in datacenter2 (to test Cassandra’s multi-DC replication support). For multi-DC support, my Cassandra nodes will use “GossipingPropertyFileSnitch”. This extra information can be passed to docker containers using environment variables (with -e parameter).]]></summary></entry><entry><title type="html">Oracle Enterprise Manager Cloud Control: Write Powerful Scripts With EMCLI</title><link href="https://www.gokhanatil.com/oracle-enterprise-manager-cloud-control-write-powerful-scripts-with-emcli/" rel="alternate" type="text/html" title="Oracle Enterprise Manager Cloud Control: Write Powerful Scripts With EMCLI" /><published>2016-09-25T00:00:00+00:00</published><updated>2016-09-25T00:00:00+00:00</updated><id>https://www.gokhanatil.com/oracle-enterprise-manager-cloud-control-write-powerful-scripts-with-emcli</id><content type="html" xml:base="https://www.gokhanatil.com/oracle-enterprise-manager-cloud-control-write-powerful-scripts-with-emcli/"><![CDATA[<p>Last week, I attended the Oracle Open World and gave a presentation about writing scripts with EMCLI. If you’re unfamiliar with EMCLI, it’s the command line interface for Oracle Enterprise Manager Cloud Control. Here’s my presentation:</p>

<div style="position: relative; margin: 1.5em 0; padding-bottom: 56.25%;">
  <iframe style="position: absolute;" src="//www.slideshare.net/slideshow/embed_code/key/zeRPQ2zBlTcyon" width="100%" height="100%" frameborder="0" allowfullscreen=""></iframe>
</div>

<!--more-->

<p>Although EMCLI is a very specific topic that appeals only to advanced users, many people attended my session. I want to thank <a href="https://oramanageability.com/">Ray Smith</a> (IOUG Director of Education) for his support. He did his best to inform people about my session.</p>

<p>If you attended my session, or if you have just seen the presentation slides, and have questions about EMCLI scripting, please do not hesitate to ask me.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[Last week, I attended the Oracle Open World and gave a presentation about writing scripts with EMCLI. If you’re unfamiliar with EMCLI, it’s the command line interface for Oracle Enterprise Manager Cloud Control. Here’s my presentation:]]></summary></entry><entry><title type="html">How To Recover The Weblogic Administrator Password Of The Enterprise Manager?</title><link href="https://www.gokhanatil.com/how-to-recover-weblogic-administrator-password-of-enterprise-manager/" rel="alternate" type="text/html" title="How To Recover The Weblogic Administrator Password Of The Enterprise Manager?" /><published>2015-03-31T00:00:00+00:00</published><updated>2015-03-31T00:00:00+00:00</updated><id>https://www.gokhanatil.com/how-to-recover-weblogic-administrator-password-of-enterprise-manager</id><content type="html" xml:base="https://www.gokhanatil.com/how-to-recover-weblogic-administrator-password-of-enterprise-manager/"><![CDATA[<p>As you know, Weblogic is a part of the Enterprise Manager Cloud Control environment, and it’s automatically installed and configured by the EM installer. The Enterprise Manager asks you to enter a username and password for Weblogic administration. This information is stored in secure files; you usually do not need them unless you use the Weblogic console. So it’s easy to forget this username and password, and that’s what happened to me. Fortunately, there’s a way to recover them without resetting a new user/password. Here are the steps:</p>

<p>First, we need to know the DOMAIN_HOME directory. My OMS is located in “/u02/Middleware/oms”. You can find yours if you read “/etc/oragchomelist”. If the full path of OMS is “/u02/Middleware/oms”, the middleware home is “/u02/Middleware/”. Under my middleware home, I need to go GCDomains folder:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oracle@db-cloud /$ cd /u02/Middleware
oracle@db-cloud Middleware$ cd gc_inst/user_projects/domains/GCDomain
</code></pre></div></div>

<p>Then we get the encrypted information from boot.properties file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oracle@db-cloud GCDomain$ cat servers/EMGC_ADMINSERVER/security/boot.properties

# Generated by Configuration Wizard on Wed Jun 04 10:22:47 EEST 2014
username={AES}nPuZvKIMjH4Ot2ZiiaSVT/RKbyBA6QITJE6ox56dHvk=
password={AES}krCf4h1du93tJOQcUg0QSoKamuNYYuGcAao1tFvHxzc=
</code></pre></div></div>
<!--more-->

<p>The encrypted information starts with {AES} and ends with an equal (=) sign. To decrypt the username and password, we will create a simple Java application:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">recoverpassword</span> <span class="o">{</span>
 <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span>
 <span class="o">{</span>
  <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span>
  <span class="k">new</span> <span class="n">weblogic</span><span class="o">.</span><span class="na">security</span><span class="o">.</span><span class="na">internal</span><span class="o">.</span><span class="na">encryption</span><span class="o">.</span><span class="na">ClearOrEncryptedService</span><span class="o">(</span>
  <span class="n">weblogic</span><span class="o">.</span><span class="na">security</span><span class="o">.</span><span class="na">internal</span><span class="o">.</span><span class="na">SerializedSystemIni</span><span class="o">.</span><span class="na">getEncryptionService</span><span class="o">(</span><span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span>
   <span class="o">)).</span><span class="na">decrypt</span><span class="o">(</span><span class="n">args</span><span class="o">[</span><span class="mi">1</span><span class="o">]));</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Save it as “recoverpassword.java”. To compile (and run) it, we need to set environment variables (we’re still in the GCDomain folder). We’ll give the encrypted part as the last parameter:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oracle@db-cloud GCDomain$ . bin/setDomainEnv.sh
oracle@db-cloud GCDomain$ javac recoverpassword.java
oracle@db-cloud GCDomain$ java -cp $CLASSPATH:. recoverpassword $DOMAIN_HOME {AES}nPuZvKIMjH4Ot2ZiiaSVT/RKbyBA6QITJE6ox56dHvk=
oracle@db-cloud GCDomain$ java -cp $CLASSPATH:. recoverpassword $DOMAIN_HOME {AES}krCf4h1du93tJOQcUg0QSoKamuNYYuGcAao1tFvHxzc=
</code></pre></div></div>

<p>Correct CLASSPATH and DOMAIN_NAME are set when we issued the “setDomainEnv.sh” command. When we run the last two commands, we should see the WebLogic administrator username and password in plain text. By the way, WebLogic uses the cipher key stored in “security/SerializedSystemIni.dat” file when encrypting and decrypting. So even if you use the same password as me, you can see a different encrypted text.</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[As you know, Weblogic is a part of the Enterprise Manager Cloud Control environment, and it’s automatically installed and configured by the EM installer. The Enterprise Manager asks you to enter a username and password for Weblogic administration. This information is stored in secure files; you usually do not need them unless you use the Weblogic console. So it’s easy to forget this username and password, and that’s what happened to me. Fortunately, there’s a way to recover them without resetting a new user/password. Here are the steps: First, we need to know the DOMAIN_HOME directory. My OMS is located in “/u02/Middleware/oms”. You can find yours if you read “/etc/oragchomelist”. If the full path of OMS is “/u02/Middleware/oms”, the middleware home is “/u02/Middleware/”. Under my middleware home, I need to go GCDomains folder: oracle@db-cloud /$ cd /u02/Middleware oracle@db-cloud Middleware$ cd gc_inst/user_projects/domains/GCDomain Then we get the encrypted information from boot.properties file: oracle@db-cloud GCDomain$ cat servers/EMGC_ADMINSERVER/security/boot.properties # Generated by Configuration Wizard on Wed Jun 04 10:22:47 EEST 2014 username={AES}nPuZvKIMjH4Ot2ZiiaSVT/RKbyBA6QITJE6ox56dHvk= password={AES}krCf4h1du93tJOQcUg0QSoKamuNYYuGcAao1tFvHxzc=]]></summary></entry><entry><title type="html">How To Retrieve Passwords From The Named Credentials in EM12c?</title><link href="https://www.gokhanatil.com/how-to-retrieve-passwords-from-named-credentials-in-em12c/" rel="alternate" type="text/html" title="How To Retrieve Passwords From The Named Credentials in EM12c?" /><published>2015-02-05T00:00:00+00:00</published><updated>2015-02-05T00:00:00+00:00</updated><id>https://www.gokhanatil.com/how-to-retrieve-passwords-from-named-credentials-in-em12c</id><content type="html" xml:base="https://www.gokhanatil.com/how-to-retrieve-passwords-from-named-credentials-in-em12c/"><![CDATA[<p>The username, password, and role name of the named credentials are stored in the em_nc_cred_columns table. When we examine it, we can see that there’s one-to-many relation with em_nc_creds using the target_guid column, and the sensitive information is stored in the cred_attr_value column. That column is encrypted using the em_crypto package. The encryption algorithm uses a secret key which is stored in the “Admin Credentials Wallet” and a salt (random data for additional security). The wallet file is located in:</p>

<p>$MIDDLEWARE_HOME/gc_inst/em/EMGC_OMS1/sysman/config/adminCredsWallet/cwallet.sso</p>

<p>And the salt data can be found in the cred_salt column of the em_nc_cred_columns table. Here’s what it looks like:</p>

<p><img src="/assets/encrypted_credentials.png" alt="encrypted_credentials" /></p>

<!--more-->

<p>To decrypt the information, we need to call the decrypt in the em_crypto package, but if we call it without opening the wallet, we get the following error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ORA-06512: at line 1
28239. 00000 -  "no key provided"
*Cause:    A NULL value was passed in as an encryption or decryption key.
*Action:   Provide a non-NULL value for the key.
</code></pre></div></div>

<p>How can we read the secret key from that wallet? The easiest way is, to make Enterprise Manager open the wallet and store the secret key in the repository database. So we issue the following command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oracle@db-cloud ~$ /u02/Middleware/oms/bin/emctl config emkey -copy_to_repos
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Enter Enterprise Manager Root (SYSMAN) Password :
The EMKey has been copied to the Management Repository. 
This operation will cause the EMKey to become unsecure.
After the required operation has been completed, 
secure the EMKey by running "emctl config emkey -remove_from_repos".
</code></pre></div></div>

<p>It asks for the SYSMAN password. If you enter the correct password, it reads the wallet file and stores the secret key in the repository database. Of course, it makes your system insecure. If you issue the command “emctl config emkey -remove_from_repos”, you can remove the key from the repository.</p>

<p>If you issued the above command and stored the secret key in the repository, you can use the following query to fetch the decrypted information:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="k">c</span><span class="p">.</span><span class="n">cred_owner</span><span class="p">,</span>
<span class="k">c</span><span class="p">.</span><span class="n">cred_name</span><span class="p">,</span>
<span class="k">c</span><span class="p">.</span><span class="n">target_type</span><span class="p">,</span> 
<span class="p">(</span><span class="k">SELECT</span> <span class="n">em_crypto</span><span class="p">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">cred_attr_value</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_salt</span><span class="p">)</span> 
<span class="k">FROM</span> <span class="n">em_nc_cred_columns</span> <span class="n">p</span> <span class="k">WHERE</span> <span class="k">c</span><span class="p">.</span><span class="n">cred_guid</span>  <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_guid</span> 
<span class="k">AND</span> <span class="k">lower</span><span class="p">(</span><span class="n">P</span><span class="p">.</span><span class="n">CRED_ATTR_NAME</span><span class="p">)</span> <span class="k">LIKE</span> <span class="s1">'%user%'</span><span class="p">)</span> <span class="n">username</span><span class="p">,</span>
<span class="p">(</span><span class="k">SELECT</span> <span class="n">em_crypto</span><span class="p">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">cred_attr_value</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_salt</span><span class="p">)</span> 
<span class="k">FROM</span> <span class="n">em_nc_cred_columns</span> <span class="n">p</span> <span class="k">WHERE</span> <span class="k">c</span><span class="p">.</span><span class="n">cred_guid</span>  <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_guid</span> 
<span class="k">AND</span> <span class="k">lower</span><span class="p">(</span><span class="n">P</span><span class="p">.</span><span class="n">CRED_ATTR_NAME</span><span class="p">)</span> <span class="k">LIKE</span> <span class="s1">'%role%'</span><span class="p">)</span> <span class="n">rolename</span><span class="p">,</span>
<span class="p">(</span><span class="k">SELECT</span> <span class="n">em_crypto</span><span class="p">.</span><span class="n">decrypt</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">cred_attr_value</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_salt</span><span class="p">)</span> 
<span class="k">FROM</span> <span class="n">em_nc_cred_columns</span> <span class="n">p</span> <span class="k">WHERE</span> <span class="k">c</span><span class="p">.</span><span class="n">cred_guid</span>  <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">cred_guid</span> 
<span class="k">AND</span> <span class="k">lower</span><span class="p">(</span><span class="n">P</span><span class="p">.</span><span class="n">CRED_ATTR_NAME</span><span class="p">)</span> <span class="k">LIKE</span> <span class="s1">'%password%'</span><span class="p">)</span> <span class="n">password</span>
<span class="k">FROM</span> <span class="n">em_nc_creds</span> <span class="k">c</span>
<span class="k">WHERE</span> <span class="k">c</span><span class="p">.</span><span class="n">cred_owner</span> <span class="o">&lt;&gt;</span> <span class="s1">'&lt;SYSTEM&gt;'</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">cred_owner</span><span class="p">;</span>
</code></pre></div></div>

<p>Sample output:</p>

<p><img src="/assets/decrypted_credentials.png" alt="decrypted_credentials" /></p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[The username, password, and role name of the named credentials are stored in the em_nc_cred_columns table. When we examine it, we can see that there’s one-to-many relation with em_nc_creds using the target_guid column, and the sensitive information is stored in the cred_attr_value column. That column is encrypted using the em_crypto package. The encryption algorithm uses a secret key which is stored in the “Admin Credentials Wallet” and a salt (random data for additional security). The wallet file is located in: $MIDDLEWARE_HOME/gc_inst/em/EMGC_OMS1/sysman/config/adminCredsWallet/cwallet.sso And the salt data can be found in the cred_salt column of the em_nc_cred_columns table. Here’s what it looks like:]]></summary></entry><entry><title type="html">BBED Block Browser Editor For Oracle 11g</title><link href="https://www.gokhanatil.com/bbed-block-browser-editor-oracle-11g/" rel="alternate" type="text/html" title="BBED Block Browser Editor For Oracle 11g" /><published>2014-10-08T00:00:00+00:00</published><updated>2014-10-08T00:00:00+00:00</updated><id>https://www.gokhanatil.com/bbed-block-browser-editor-oracle-11g</id><content type="html" xml:base="https://www.gokhanatil.com/bbed-block-browser-editor-oracle-11g/"><![CDATA[<p>BBED (Block Browser Editor) is a tool for Oracle internal use, and it helps you to read and manipulate data at the Oracle Database block level. No need to say that it’s very powerful and extremely dangerous because you can corrupt data/header blocks. There’s an unofficial but very comprehensive manual for BBED. It’s written by Graham Thornton. You can download it as PDF: http://orafaq.com/papers/dissassembling_the_data_block.pdf</p>

<p>The object code of BBED is shipped for earlier releases of Oracle. All you need is to compile it. On Oracle 11g, the required files are not shipped. So you need to copy the following files from an Oracle 10g home to Oracle 11g home:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ORACLE_HOME/rdbms/lib/sbbdpt.o
$ORACLE_HOME/rdbms/lib/ssbbded.o
$ORACLE_HOME/rdbms/mesg/bbedus.msb
$ORACLE_HOME/rdbms/mesg/bbedus.msg
</code></pre></div></div>

<!--more-->

<p>What will you do if you don’t have access to any Oracle 10g software home? As you know, Oracle doesn’t provide a link to download Oracle 10g anymore. You may open a service request and ask for it, but there’s an easier way: You can get the required files by downloading the 10.2.0.5 patchset from My Oracle Support. Download p8202632_10205_Linux-x86-64.zip, and then issue the following commands (I assume that you have already set the Oracle environment variables):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unzip -j p8202632_10205_Linux-x86-64.zip */oracle.rdbms/10.2.0.5.0/1/DataFiles/filegroup48.1.1.jar -d /tmp

unzip -j p8202632_10205_Linux-x86-64.zip */oracle.rdbms.util/10.2.0.5.0/1/DataFiles/filegroup6.1.1.jar -d /tmp

unzip -j /tmp/filegroup48.1.1.jar sbbdpt.o ssbbded.o -d /tmp

unzip -j /tmp/filegroup6.1.1.jar bbedus.ms* -d /tmp

cp /tmp/s*bd*.o $ORACLE_HOME/rdbms/lib

cp /tmp/bbedus.ms* $ORACLE_HOME/rdbms/mesg
</code></pre></div></div>

<p>When the files are copied, you can compile BBED:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk BBED=$ORACLE_HOME/bin/bbed $ORACLE_HOME/bin/bbed
</code></pre></div></div>

<p>BBED tool will ask you password when you try to run it. It’s not hard to find if you can use GNU debugger. You can even find it if you examine the strings in the file. Here is the password: BLOCKEDIT</p>

<p>Be sure to read Graham Thornton’s great manual, and be careful when playing with BBED!</p>]]></content><author><name>Gokhan Atil</name></author><summary type="html"><![CDATA[BBED (Block Browser Editor) is a tool for Oracle internal use, and it helps you to read and manipulate data at the Oracle Database block level. No need to say that it’s very powerful and extremely dangerous because you can corrupt data/header blocks. There’s an unofficial but very comprehensive manual for BBED. It’s written by Graham Thornton. You can download it as PDF: http://orafaq.com/papers/dissassembling_the_data_block.pdf The object code of BBED is shipped for earlier releases of Oracle. All you need is to compile it. On Oracle 11g, the required files are not shipped. So you need to copy the following files from an Oracle 10g home to Oracle 11g home: $ORACLE_HOME/rdbms/lib/sbbdpt.o $ORACLE_HOME/rdbms/lib/ssbbded.o $ORACLE_HOME/rdbms/mesg/bbedus.msb $ORACLE_HOME/rdbms/mesg/bbedus.msg]]></summary></entry></feed>