I decided to do a blitz and get a simple working version of the DCPU16 cpu in hardware.
This was my journey.
General Architecture
The DCPU16 architecture is not exactly standard, nor was it optimised for hardware implementation. However, since it is a simple CPU, the architecture itself is not too difficult to deal with.
Pipeline
The main pipeline stages for regular instructions would consist of the following stages: fetch, decode, load A, load B, execute, and save A. My design pipelines it using a 8-stage pipeline, with 1-clock cycles for each stage, layered over 2-instructions. Therefore, each instruction would effectively take 4-stages or 4-clocks to complete.
Memory
To feed this pipeline, two independent memory busses are needed. I’d hazard to call it a Harvard architecture because one memory bus is used purely for loading data while the other bus is used for fetching instructions and storing data. This will work fine for internal memory access only. However if it is necessary to access memory mapped I/O, this will need to be modified slightly.
There are two ways to modify it. One bus can be used for data load/store operations (operand A) while the other is used for instruction fetch and data load (operand B) operations. The other way to modify it is to use the two spare cycles to do external memory access instead.
Decoder
The decoder’s only job is mainly to decode the effective address calculation. Decoding this can be a little pain as the processor supports a whole number of addressing modes, various direct, indirect and immediate modes. So, this is the trickiest part of the core. In fact, this is also the file with the most code in it.
ALU
Nothing much to say here except that it uses unsigned numbers, which is fine for the adder but not so fine for the multiplier. My design uses a 17×17 multiplier instead of a 16×16 one. The conditional code testing is also part of the ALU decoding. This can be changed in later revisions.
Hazards
Due to pipe-lining, there will be some data and control hazards. My design does not take into account data hazards at the moment. My assumption is that the compilers will take care of things or code can be manually re-ordered slightly.
All in all, it took me almost a week plus two iterations to get it done.
Now, it’s released on github.