I want someone to finish my final questions and get full points.
In this final project you will implement a cache simulator. Your simulator will be configurable and will be able to handle caches with varying capacities, block sizes, levels of associativity, replacement policies, and write policies. The simulator will operate on trace files that indicate memory access properties. All input files to your simulator will follow a specific structure so that you can parse the contents and use the information to set the properties of your simulator.
After execution is finished, your simulator will generate an output file containing information on the number of cache misses, hits, and miss evictions (i.e. the number of block replacements). In addition, the file will also record the total number of (simulated) clock cycles used during the situation. Lastly, the file will indicate how many read and write operations were requested by the CPU.
It is important to note that your simulator is required to make several significant assumptions for the sake of simplicity.
- You do not have to simulate the actual data contents. We simply pretend that we copied data from main memory and keep track of the hypothetical time that would have elapsed.
- Accessing a sub-portion of a cache block takes the exact same time as it would require to access the entire block. Imagine that you are working with a cache that uses a 32 byte block size and has an access time of 15 clock cycles. Reading a 32 byte block from this cache will require 15 clock cycles. However, the same amount of time is required to read 1 byte from the cache.
- In this project assume that main memory RAM is always accessed in units of 8 bytes (i.e. 64 bits at a time).
When accessing main memory, it’s expensive to access the first unit. However, DDR memory typically includes buffering which means that the RAM can provide access to the successive memory (in 8 byte chunks) with minimal overhead. In this project we assume an overhead of 1 additional clock cycle per contiguous unit.
For example, suppose that it costs 255 clock cycles to access the first unit from main memory. Based on our assumption, it would only cost 257 clock cycles to access 24 bytes of memory.
- Assume that all caches utilize a “fetch-on-write” scheme if a miss occurs on a Store operation. This means that you must always fetch a block (i.e. load it) before you can store to that location (if that block is not already in the cache).
L2/L3 Cache Implementation (required for CS/ECE 572 students)
Implement your cache simulator so that it can support up to 3 layers of cache. You can imagine that these caches are connected in a sequence. The CPU will first request information from the L1 cache. If the data is not available, the request will be forwarded to the L2 cache. If the L2 cache cannot fulfill the request, it will be passed to the L3 cache. If the L3 cache cannot fulfill the request, it will be fulfilled by main memory.
It is important that the properties of each cache are read from the provided configuration file. As an example, it is possible to have a direct-mapped L1 cache that operates in cohort with an associative L2 cache. All of these details will be read from the configuration file. As with any programming project, you should be sure to test your code across a wide variety of scenarios to minimize the probability of an undiscovered bug.
Graduate Students (CS/ECE 572)
Part 1: Summarize your work in a well-written report. The report should be formatted in a professional format. Use images, charts, diagrams or other visual techniques to help convey your information to the reader.
Explain how you implemented your cache simulator. You should provide enough information that a knowledgeable programmer would be able to draw a reasonably accurate block diagram of your program.
- What data structures did you use to implement your multi-level cache simulator?
- What were the primary challenges that you encountered while working on the project?
- Is there anything you would implement differently if you were to re-implement this project?
- How do you track the number of clock cycles needed to execute memory access instructions?
Part 2: Using trace files provided by the instructor (see the sample trace files section), how does the miss rate and average memory access time (in cycles) vary when you simulate a machine with various levels of cache? Note that you can compute the average memory access time by considering the total number of read and write operations (requested by the CPU), along with the total number of simulated cycles that it took to fulfill the requests.
Research a real-life CPU (it must contain at least an L2 cache) and simulate the performance with L1, L2, (and L3 caches if present). You can choose the specific model of CPU (be sure to describe your selection in your project documentation). This could be an Intel CPU, an AMD processor, or some other modern product. What is the difference in performance when you remove all caches except the L1 cache? Be sure to run this comparison with each of the three instructor-provided trace files. Provide written analysis to explain any differences in performance. Also be sure to provide graphs or charts to visually compare the difference in performance.
Part 3: If you chose to implement any extra credit tasks, be sure to include a thorough description of this work in the report.
You will submit both your source code and a PDF file containing the typed report.
Any chart or graphs in your written report must have labels for both the vertical and horizontal axis!
For the source code, you must organize your source code/header files into a logical folder structure and create a tar file that contains the directory structure. Your code must be able to compile on flip.engr.oregonstate.edu. If your code does not compile on the engineering servers you should expect to receive a 0 grade for all implementation portions of the grade.
Your submission must include a Makefile that can be used to compile your project from source code. It is acceptable to adapt the example Makfile from the starter code. If you need a refresher, please see this helpful page (Links to an external site.). If the Makefile is written correctly, the grader should be able to download your TAR file, extract it, and run the “make” command to compile your program. The resulting executable file should be named: “cache_sim”.
ECE/CS 572 Extra Credit Opportunities
10 points – Implement and document write-back cache support for a system that contains only an L1 cache.
10 points (additional) – Extend your implementation so that it works with multiple layers of write-back caches. E.g. if a dirty L1 block is evicted, it should be written to the L2 cache and the corresponding L2 block should be marked as dirty. Assuming that the L2 cache has sufficient space, the main memory would not be updated (yet).