This being the internet, let me preface this with this being honest questions, rather than attacks. I did try to read through the docs before asking but I don't see the answers directly.
Especially on the FPGA side, how does this interact with all the features of Go that seem ill-suited to an FPGA implementation? Can I write functions that generate and consume closures? Where is my garbage going on the FPGA side and how is it collected? Or is the FPGA code being written only in a subset of Go?
I understand the idea of wrapping the primitives offered by the FPGA hardware itself into channels, but I'm unclear on how one can sensibly implement a Go runtime on top of that in the FPGA without making it too difficult to understand the cost model of your Go code.
might be like OpenCL: a subset of C99 with some extensions works for parallel hardware including FPGAs already so why not a subset of Go targeting FPGAs?
"A subset of Go" is certainly one sensible option, but I don't see it documented. The obvious subset that one could get with the least effort is a rather brutally cut down subset of Go, to the point that I'd consider claiming it to be "Go running on an FPGA" to be nearly a lie. On the other extreme, you've got a full FPGA on hand, so nothing stops them from bringing up a simple ARM CPU die and putting a bit of general-purpose RAM there, so it's theoretically possible that you could write general-purpose FPGA code and have that FPGA-CPU gloss over any runtime issues, but that's where I get my question about how a programmer would be expected to model the costs incurred by given constructs sensibly. (Presumably, if performance is not a big deal to the programmer, they're not using FPGAs at all, so I'm going in assuming performance is at least in the top 3 concerns of anyone who might use this, if not #1.) (Also I'm not 100% sure about the intersection of what ARM cores might be available vs. what Go runs on; IIRC Go does ARM but only very high-end ones. So, take the principle of my point rather than the literal text. With enough work they can do "anything" with the FPGA code.)
Shameless plug for a friend I know, but there's also Connectal, another open source project aiming to do the same thing, but for BlueSpec Verilog: https://github.com/cambridgehackers/connectal
The real shame of course is that Connectal is open source, although BSV is not...
Spatial is built on Chisel, and from what I can tell, basically offers you an "SDSoC" experience for Scala.
In Chisel, you write Scala that is compiled to Verilog(?) and you put it into your synthesis toolchain. But Chisel is mostly just that: it's a HDL, and not much more. And if you want to talk to an FPGA, especially from software, you still have to write another pile of glue that does interfacing to your device, over your peripherials, etc.
Spatial gives you more on top of Chisel. Instead, you simply write a single program and say "Accelerate this bit", and it generates both the hardware and software and glues it all together. This means you write a single program once and the compiler generates all the glue for you, so the usage is more seamless.
This is what "SDSoC" from Xilinx does, but for C/C++. You simply write a C program and annotate functions as "Accelerate" and it compiles both the hardware and software for you and generates the interconnections. Spatial is like that, for Scala.
We are not only concerned by FPGA but also CGRA (developping our own: Plasticine). For FPGA, we target Chisel. Chisel is Verilog without the boilerplate. We aim to be more high-level by generating also all the control flow.
I've been playing around implementing this, but am stuck on sequential logic (I think it will have to use closures and function generators of some sort)...
Wrote it in three days; although it's very young, some of the strengths are emission of human-readable verilog, and the ability to build the verilog into C (using verilator) and doing continuous testing without ever leaving julia.
Interesting. This is a neat approach to building a register-transfer language using some high-level bindings. Go (at a very high level) seems to have a pretty good match with the async nature of transistor-level bit-flow operations when you use channels. In theory you could map this to any language with strongly-typed async operations.
You get advantages of Go's type-checker, though you're probably limited to a very small subset of the language. Note that you won't be able to use a lot of third-party languages unless their translation is really good or the third-party package code is very simple.
I think the approach of mapping a high-level language to a CPU is not necessarily novel, but using Go for it is.
I used a different approach to build a bit-flow graph in Java for a project in the past. Rather than map the whole language to the circuit, I created some APIs that would generate the graph and export it. It looks fairly similar to what you see here.
The website in itself does not give me an understanding of whatever the hell they're doing. I'm a computer engineering undergrad and I've done FPGA's before. I don't see what's "code in go and deploy FPGA's to the cloud". I think that putting some code and other actual use cases to the website would be nice.
Looking at some of the examples it seems to me that you'd still need to know hardware programming, memory etc. Now my comment seems very snarky, but I still think that it's a huge achievement to have gotten this far with this and I wish them luck! I just don't get the target user base.
This is great, I've waited 20 years for this (computer engineering degree 1999). For all the naysayers - what has gone wrong with computing, why Moore's law no longer works, etc etc is that we've gone from general purpose computing to proprietary narrow-use computing thanks to Nvidia and others. VHDL and Verilog are basically assembly language and are not good paradigms for multicore programming.
The best languages to take advantage of chips that aren't compute-limited* are things like Erlang, Elixir, Go, MATLAB, R, Julia, Haskell, Scala, Clojure.. I could go on. Most of those are the assembly languages of functional programming and are also not really usable by humans for multicore programming.
I personally vote no confidence on any of this taking off until we have a Javascript-like language for concurrent programming. Go is the closest thing to that now, although Elixir or Clojure are better suited for maximum scalability because they are pure functional languages. I would give MATLAB a close second because it makes dealing with embarrassingly parallel problems embarrassingly easy. Most of the top rated articles on HN lately for AI are embarrassingly parallel or embarrassingly easy when you aren't compute-limited. We just aren't used to thinking in those terms.
* For now lets call compute-limited any chip that can't give you 1000 cores per $100
I would argue that Elixir is far closer to being a "JavaScript-like language for concurrent programming" than Go, due to it's dynamic typing and relative freedom it affords in comparison to the others you mentioned (except for Clojure, which is actually quite similar in a lot of ways).
Although it is technically a purely functional language, you can almost mutate variables (in reality it is creating a new immutable variable with the same name, which takes precedence)
> Although it is technically a purely functional language [..]
This (purity) stirred my interest, but as far as I can see it's incorrect. This[1] Wikipedia page on pure languages does not list Elixir, and the Elixir Wikipedia page itself does not mention purity at all.
> The best languages to take advantage of chips that aren't compute-limited*
FPGAs may or may not deserve that distinction, depending on your point of view.. but even if you concede that, they're still heavily bandwidth limited. An SDRAM interface takes up quite a bit of floor space, especially if you want more than one FIFO to move data with -- and even then, you're still standing behind relatively slow memory interface. There's SRAM on newer chips, but it's still too paltry an amount to really do anything close to general purpose computing on.. especially with the languages you've mentioned.
> why Moore's law no longer works
Moore's law no longer works because we hit 4GHz in silicon, there's nowhere to go but sidways now, and that's true whether you're in dedicated or reconfigurable chips.
It sort of depends on coding style - if you're hand instantiating gates and wiring them in verilog then yes it is a lot like assembly ... on the other hand if you're coding in a high level way and having your design compiled to gates then no, it's not at all like assembly
It also may depend on where you learned your craft - my decade as a logic designer seemed to show people who started life as an EE and went straight into logic design coded at a lower level than people who started as programmers (who tended to be more productive as a result) ....
Actually there is already a Scala version that is well defined. From the website https://chisel.eecs.berkeley.edu/ 'Chisel is an open-source hardware construction language developed at UC Berkeley that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages.'
People who want to use an FPGA should learn VHDL or verilog. There have been a lot of projects to make C compile to VHDL/verilog, and it's generally accepted that it does not work very well.
What is the advantage of using Go for the same purpose?
As a long time HW FPGA guy, I think you might want to take a look at the C projects again. I don't know whether Go has any advantages but the concept of a higher level language for development is being used by the major FPGA companies.
Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or C++. If you know how FPGA work is generally done, you can separate the hype from the reality and you can understand how to use it for a real application.
The vendors have lots of libraries for IP. You don't write RTL from scratch. It would take too long to verify. You tie IP together. It can be DSP or generic maths or a video codec thing. The VHDL is done for you.
You write your algorithm in C++ in a particular format using compatible data types and calling HLS libraries. You run it all in C++ first and make sure it does exactly what you want in SW. This is where the algorithm is developed.
THEN you fire up the HLS tool and a couple of hours of synthesizing later (lol) you get to load a bitstream onto a FPGA to verify it.
Of course there can be problems in that translation. It takes good engineering to dive down into the design and find the issues.
My current work does not touch any HLS. I am doing the VHDL stuff. But I know the algorithms all started from SW first. It always does. For the bulk of the work, verification, it is somewhat irrelevant whether it is manually converted to RTL or done via tools.
Another HW FPGA guy here. Albeit one who has never used HLS. My concern with the whole idea of HLS is that it fails to take advantage of the parallelization capability of FPGAs, which in my opinion is one of the main reasons to use an FPGA in the first place. It sounds great for designs that are linear in nature, that is, putting data through a bunch of sequential processing blocks and then outputting some result. But for most of those cases, why not just use a processor + DSP SoC? Or even something like a Zynq? It will probably be faster.
Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:
A = 5;
B_out = A + 3;
A = 6;
C_out = A;
"A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.
Agreed.
However, MyHDL (www.myhdl.org) allows for programming using Python, and works very well : I haven't written any VHDL (other than at top level) for years now.
It would be interesting to see how far they can take it using Go.
Some great questions (and some other really exciting projects m!)
It's early days for us at reconfigure.io, we're just working with a few core early users at the moment and we'll be bringing more examples, benchmarks and increased access over time.
Especially on the FPGA side, how does this interact with all the features of Go that seem ill-suited to an FPGA implementation? Can I write functions that generate and consume closures? Where is my garbage going on the FPGA side and how is it collected? Or is the FPGA code being written only in a subset of Go?
I understand the idea of wrapping the primitives offered by the FPGA hardware itself into channels, but I'm unclear on how one can sensibly implement a Go runtime on top of that in the FPGA without making it too difficult to understand the cost model of your Go code.
Could you elaborate on why closures are ill-suited to FPGAs? Thanks.
We, a stanford lab, are pursuing similar goals but opensource and from a Scala DSL although our doc (http://spatial-lang.readthedocs.io/en/latest/tutorial/starti...) is not that up-to-date:
https://github.com/stanford-ppl/spatial-lang
The real shame of course is that Connectal is open source, although BSV is not...
In Chisel, you write Scala that is compiled to Verilog(?) and you put it into your synthesis toolchain. But Chisel is mostly just that: it's a HDL, and not much more. And if you want to talk to an FPGA, especially from software, you still have to write another pile of glue that does interfacing to your device, over your peripherials, etc.
Spatial gives you more on top of Chisel. Instead, you simply write a single program and say "Accelerate this bit", and it generates both the hardware and software and glues it all together. This means you write a single program once and the compiler generates all the glue for you, so the usage is more seamless.
This is what "SDSoC" from Xilinx does, but for C/C++. You simply write a C program and annotate functions as "Accelerate" and it compiles both the hardware and software for you and generates the interconnections. Spatial is like that, for Scala.
https://github.com/interplanetary-robot/Verilog.jl
Wrote it in three days; although it's very young, some of the strengths are emission of human-readable verilog, and the ability to build the verilog into C (using verilator) and doing continuous testing without ever leaving julia.
Deleted Comment
You get advantages of Go's type-checker, though you're probably limited to a very small subset of the language. Note that you won't be able to use a lot of third-party languages unless their translation is really good or the third-party package code is very simple.
Docs seem to be available here (thanks to another comment in this thread): http://docs.reconfigure.io/welcome.html
I think the approach of mapping a high-level language to a CPU is not necessarily novel, but using Go for it is.
I used a different approach to build a bit-flow graph in Java for a project in the past. Rather than map the whole language to the circuit, I created some APIs that would generate the graph and export it. It looks fairly similar to what you see here.
Looking at some of the examples it seems to me that you'd still need to know hardware programming, memory etc. Now my comment seems very snarky, but I still think that it's a huge achievement to have gotten this far with this and I wish them luck! I just don't get the target user base.
github: https://github.com/ReconfigureIO
The best languages to take advantage of chips that aren't compute-limited* are things like Erlang, Elixir, Go, MATLAB, R, Julia, Haskell, Scala, Clojure.. I could go on. Most of those are the assembly languages of functional programming and are also not really usable by humans for multicore programming.
I personally vote no confidence on any of this taking off until we have a Javascript-like language for concurrent programming. Go is the closest thing to that now, although Elixir or Clojure are better suited for maximum scalability because they are pure functional languages. I would give MATLAB a close second because it makes dealing with embarrassingly parallel problems embarrassingly easy. Most of the top rated articles on HN lately for AI are embarrassingly parallel or embarrassingly easy when you aren't compute-limited. We just aren't used to thinking in those terms.
* For now lets call compute-limited any chip that can't give you 1000 cores per $100
Although it is technically a purely functional language, you can almost mutate variables (in reality it is creating a new immutable variable with the same name, which takes precedence)
Concurrency feels very natural:This (purity) stirred my interest, but as far as I can see it's incorrect. This[1] Wikipedia page on pure languages does not list Elixir, and the Elixir Wikipedia page itself does not mention purity at all.
Can anyone clarify?
[1] https://en.wikipedia.org/wiki/List_of_programming_languages_...
FPGAs may or may not deserve that distinction, depending on your point of view.. but even if you concede that, they're still heavily bandwidth limited. An SDRAM interface takes up quite a bit of floor space, especially if you want more than one FIFO to move data with -- and even then, you're still standing behind relatively slow memory interface. There's SRAM on newer chips, but it's still too paltry an amount to really do anything close to general purpose computing on.. especially with the languages you've mentioned.
> why Moore's law no longer works
Moore's law no longer works because we hit 4GHz in silicon, there's nowhere to go but sidways now, and that's true whether you're in dedicated or reconfigurable chips.
Err...I respectfully disagree. They're HDLs and more akin to hardware design than any traditional software abstraction, assembly included.
It also may depend on where you learned your craft - my decade as a logic designer seemed to show people who started life as an EE and went straight into logic design coded at a lower level than people who started as programmers (who tended to be more productive as a result) ....
Not sure why they wouldn't use it instead.
People who want to use an FPGA should learn VHDL or verilog. There have been a lot of projects to make C compile to VHDL/verilog, and it's generally accepted that it does not work very well.
What is the advantage of using Go for the same purpose?
Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or C++. If you know how FPGA work is generally done, you can separate the hype from the reality and you can understand how to use it for a real application.
The vendors have lots of libraries for IP. You don't write RTL from scratch. It would take too long to verify. You tie IP together. It can be DSP or generic maths or a video codec thing. The VHDL is done for you.
You write your algorithm in C++ in a particular format using compatible data types and calling HLS libraries. You run it all in C++ first and make sure it does exactly what you want in SW. This is where the algorithm is developed.
THEN you fire up the HLS tool and a couple of hours of synthesizing later (lol) you get to load a bitstream onto a FPGA to verify it.
Of course there can be problems in that translation. It takes good engineering to dive down into the design and find the issues.
My current work does not touch any HLS. I am doing the VHDL stuff. But I know the algorithms all started from SW first. It always does. For the bulk of the work, verification, it is somewhat irrelevant whether it is manually converted to RTL or done via tools.
Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:
"A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.It's early days for us at reconfigure.io, we're just working with a few core early users at the moment and we'll be bringing more examples, benchmarks and increased access over time.