Hello, this is Ewa Dusza with another episode of the “Code & Chatter” series. Today, we will talk about artificial intelligence for the first time. To discuss AI tools for developers, I’ve invited Krzysztof Wawer, Piotr Moszkowicz, Łukasz Duda, and Konrad Nowacki. Welcome!

Piotr, maybe we could start with you? Can you briefly tell us about these tools, when they were created, and how they work?

Historical Context and Evolution of AI Tools

Piotr Moszkowicz: Sure. Regarding these tools, I delved a bit into their history but not very deeply. I believe the major revolution began with LLMs (Large Language Models). These are AI models that allow for the creation of tools like ChatGPT, which we’re all familiar with. Of course, there are many more models. ChatGPT, developed by OpenAI, is one of many. These models clearly showed that they can also be used nicely in developer tools. Let me briefly add that these LLM models are often based on Transformer neural networks, which were created around 2017. I will talk about LLM models a little later.

I believe the major revolution began with LLMs (Large Language Models). These are AI models that allow for the creation of tools like ChatGPT, which we’re all familiar with.

However, now I have a scientific article available titled “Evaluating Large Language Models Trained on Code“, which describes exactly how GitHub Copilot was created and is dated July 2021. So I’d say that was the moment when these tools started to take off.

Regarding the types of tools, we’ll talk more about Copilot later, but these tools don’t just code for developers or complete lines of code; they offer many other functionalities, which we’ll discuss soon. To summarize, these tools often support the code review process by summarizing code sections, checking if tests are added, or even writing tests, as well as assisting in reviewing code quality from a programmer’s perspective. Those are the main applications. There are some CLI-related uses, but I’ll talk more about that when we get to Copilot.

AI in Code Review: Current State and Future

Ewa Duisza: OK, thanks. Should we start with a code review topic? Łukasz, can AI tools fully replace a developer and improve the code, or are we not there yet?

Łukasz Duda: I wish I could say we’re there, but it’s still a bit ahead of us. Let me talk about what AWS prepared for us in 2020. I agree with what Pitotr said. These were the years when AWS came out with its services, primarily AWS CodeGuru.

This service offers machine learning models for improving our apps’ quality. They split this service into three sub-products: CodeGuru Reviewer, which allows for code review; Profiler, which builds a heatmap for code calls and identifies areas for improvement (and it also does it in an interesting way, because it shows us how a specific change would affect the costs we spend on services in AWS; so it’s quite nice for companies or projects that are looking at AWS costs); and CodeGuru Security, still in preview, which identifies hardcoded credentials in codebase, for instance.

Focusing on CodeGuru Reviewer, AWS built a base of detectors for Python, Java, and JavaScript. Based on this, machine learning models prepare reports on our codebase’s issues. Recommendations are based on OWASP guidelines and AWS’s internal advice. OWASP, i.e. this entire project built by security specialists and testers – Open Web Application Security Project. This is where various rules are collected that give us information about security vulnerabilities for web applications. Internal AWS advice, i.e. recommendations regarding the implementation of AWS libraries (here I received some information, e.g. that the use of Lambda was incorrectly initialized and things like that).

This service offers machine learning models for improving our apps’ quality […] Recommendations are based on OWASP guidelines and AWS’s internal advice.

While it looks good on paper, my JavaScript testing yielded no errors, even with AWS’s test code. I tested the same with Python and got good results. But for JavaScript I didn’t get any. On the other hand, I also wanted to check it from another source. There is a well-known security provider, a de facto pioneer of this trend – Snyk, and I decided to scan the repository prepared by AWS with test errors for JavaScript by Snyk and it caught a lot of these artificially prepared problems for the needs of such a test report.

So, I’d recommend this service for Java and Python developers but not for JavaScript ones. We have a long way to go before these automatic reviewers truly make sense, to make it have such intelligence, and not just searching for certain rules that have been previously defined.

CodeWhisperer: A Deep Dive

Ewa Dusza: OK, so as we mentioned at the beginning, when it comes to code review, everything is ahead of us in this regard. And what about such a key tool, which is the tool for code completion? Among others, there’s CodeWhisperer. Krzysztof, could you tell us?

Krzysztof Wawer: Yes, the undeniable advantage of CodeWhisperer from AWS, from Amazon, is that it is a free tool, which we can use without incurring any costs. It supports the three languages mentioned by Łukasz, but it also supports Ruby, which I use daily. So I was able to test it on a Ruby project.

It undoubtedly does very well when we need to create some repetitive code (e.g. interfaces) or map the structure of external files to application code. Then it completes the code perfectly and creates unit tests without any issues. Of course, it’s about adding line by line because it doesn’t create whole methods, it only suggests what should be in a given line. It handles these cases exceptionally well, to my surprise.

However, when starting a new file in the code, it struggles with selecting the class names that should be placed in a given file. In Ruby, we have a naming convention for typical files in a project. Unfortunately, CodeWhisperer doesn’t quite handle it. It often suggests that they are files instead of controllers, they are database models. So CodeWhisperer will need to improve in these areas.

It undoubtedly does very well when we need to create some repetitive code […] However, when starting a new file in the code, it struggles with selecting the class names.

I use RubyMine daily, a product from JetBrains. It’s a paid product. But CodeWhisperer can also be used in Microsoft’s Visual Studio Code. When I integrated CodeWhisperer with RubyMine, there was also a security scan option. In the free version, you can run 3 checks a month. Unfortunately, it didn’t support Ruby, so I couldn’t test it for Ruby projects. That’s about it.

I will also share my feelings about CodeWhisperer. Initially, I was writing the code myself, knowing what I wanted to write. Only when CodeWhisperer started suggesting code after a few lines, I fell into the trap of waiting between my writing and CodeWhisperer’s suggestion, which took milliseconds, maybe less than a second. In this case, AI, in my opinion, can be a slight trap for developers. Over time, as we use AI more and more, we may become lazy and not work on memorizing the code we create. So, in my opinion, this could be a slight threat to us.

Over time, as we use AI more and more, we may become lazy and not work on memorizing the code we create. So, in my opinion, this could be a slight threat to us.

Ewa Dusza: For the developer himself?

Krzysztof Wawer: Yes.

GitHub Copilot: Leading the AI Code Completion Revolution

Ewa Dusza: OK, thanks. Piotrek, I wanted to get back to you, as you mentioned earlier that you’d talk more about Copilot, which is perhaps one of the most popular, if not the most popular tool for code completion.

Piotrek Moszkowicz: Yes, so back to Copilot. I didn’t delve deeply, but I tried to read as much as possible the scientific paper I mentioned earlier. I would like to summarize it, not specifically about Copilot, but about the entire ecosystem. This is because within this paper, for example, a neat framework was created for testing these models in the context of meaningful metrics that indicate how good this model is compared to others, for example, to the classic GPT-3. I would also like to discuss how this model was created, on what data it was trained, and only then move to my experiences with Copilot. Finally, I’d like to mention the newer version, Copilot X.

So, regarding the framework, the authors created a set of problems tested for AI. Their main interest was generating code from docstrings. That is, writing a brief function documentation, the function’s signature, and expecting the AI to complete the entire code of the function. This is what the authors focused on and created a number of synthetic problems to evaluate the quality of the model. They noticed that ChatGPT-3 – at that time, mid-2021 – struggled with code-related problems and decided to create their own model based on GPT-3, trained more on code. But where to get this code? Before answering – a digression. Since the authors were OpenAI employees, GitHub approached OpenAI and said, “Hey, could you create such a service for us? It’d be great.” Such a service was indeed provided by OpenAI at the time because, as I understand it, subsequent versions of Copilot are being developed internally by GitHub. Of course, these companies are connected because GitHub belongs to Microsoft, and Microsoft invested a lot of money in OpenAI, so they all exist within one ecosystem. But back to the data. In May 2020, which indicates that the work lasted for a while… However, claiming that the paper was released in July and data collection began in May, and it usually starts with data collection, so they managed to make GitHub Copilot in two to three months, which is quite good. For the training dataset, they, of course, used GitHub and public Posidonia. This also somewhat resolves the problem of the code we train these networks on. Namely, publicly available code was used, in accordance with licenses. So there’s no question of any legal breach. They further filtered the database and decided to focus mainly on the Python language. They found 179 gigabytes of Python files, each no more than 1 MB. They set some limits, deciding that larger files made no sense. They also filtered out files where the average line of code was longer than 100 characters, believing these to be data or something similar, so they thought it was pointless to train on them. They also removed files that had at least one line with more than 1000 characters. After this filtering, 159 gigs of data remained. So, is that a lot or a little? Well, for code, it’s quite a lot. And the network was then trained based on this.

So, diving into the actual finished product. Of course, GitHub Copilot is available for all GitHub users, but it’s a paid service; it’s not free. GitHub offers its Pro package, which includes free access to Copilot, for instance, for academic institutions. So, it’s worth noting that students, academic staff, PhD candidates, and so on, have free access to these tools. An interesting point highlighted in the article promoting this OpenAI and Codex model on OpenAI’s website is that the memory frame, or theoretically the code input, this model has is as large as 14 KB for Python code. 14 KB of code is quite a lot. So, this model knows much more about its environment compared to, say, ChatGPT.

Now, strictly speaking about this model, it essentially supports all programming languages and frameworks. However, on the GitHub site, it’s specifically mentioned that it works best with Python, JavaScript, TypeScript, Ruby, Go, C#, and C++. As you can see, the support here is considerably better than in competing tools, at least the ones we’ve discussed so far.

I’ve been using GitHub Copilot for, honestly, I’m not sure how long, but certainly for a good few months. And from my personal experience (oh, and I also use it on a TypeScript codebase, which is important, although I’ve occasionally used it with Python too), I feel it’s a great tool in terms of enhancing developer productivity. This is also what GitHub boasts about when summarizing Copilot’s successes in terms of specific metrics, which I’ll touch on shortly. No one is talking about replacing developers in the coding process, but about providing additional tools that optimize the coding process. Meaning, producing higher-quality code at a faster pace. Typically, it’s one or the other, let’s be honest.

No one is talking about replacing developers in the coding process, but about providing additional tools that optimize the coding process. Meaning, producing higher-quality code at a faster pace.

Regarding what was mentioned earlier about code suggestions and waiting for AI responses, that’s absolutely correct. However, I think in IntelliJ, which I use, it worked quite decently, and I wasn’t bored waiting for the AI response. I, myself, would not have written such a large amount of code at the pace AI suggests. But, there was a time when I had a poor internet connection, and IntelliJ started acting up. It couldn’t connect to GitHub’s IP, which generates the hints, and the memory used by the editor skyrocketed to huge values like 20 gigabytes. As a result, the IDE began to run very slowly. So, it’s something to keep in mind if you have a bad internet connection. I’m not sure if it’s just me, but I’d suggest turning off this tool in such situations.

As I mentioned earlier, GitHub Copilot has a newer version called Copilot X, based on the GPT-4 model. This version is available only through a waitlist. Firstly, one needs to have access to GitHub Copilot to sign up for this waitlist. Later on, GitHub grants access to this tool. I managed to get access. However, as I mentioned before, I mainly use IntelliJ. And since it’s still in beta, Copilot X is only supported by Visual Studio and Visual Studio Code. Yet, I took a brief detour to Visual Studio Code to see the unique features offered by this tool. Here, I wasn’t solely focused on auto code-completion to judge whether it’s significantly better or maybe worse than the standard Copilot. Because, to be honest, GitHub Copilot as a code suggestion tool is good enough for me when I need a boost in my productivity. It understands the context of the entire file. For instance, if there are static properties at the top of a class, I don’t have to remember the property’s name. It knows if I want to change that specific property. It’s pretty neat, especially when mapping between one data set and another. It captures and completes these things really well.

However, in GitHub Copilot X, I tried a few tools. For instance, there are tools categorized as Brushes, which allow you to modify existing code. To use them, you choose a particular function and choose one of these Brushes. Available sub-tools include Make More Readable, Add Types, Fix Bug, Debug, Make Robust, and Custom. I primarily used the Make More Readable tool. Mainly, this tool added comments to the code. The critical question is, if the code was of lower quality, would it then rewrite it, altering variable names or other elements? It’s hard for me to answer since I didn’t test the tool on such code. However, it manages to add comments pretty well. Based on the existing code, it can comprehend the developer’s logic, adding comments of a quality akin to what the developer might have added.

I also looked at the Language Translation feature, which rewrites code from one language to another. But the code snippet I used relied on various libraries. Because I only selected the functions and not the library imports, Copilot X had no idea what these references were or their purpose. It completely ignored them when translating to another language. Thus, if we’re using external libraries, we should try to select a broader amount of code and observe its behavior then.

Another thing I decided to test is writing unit tests. This works on a similar principle: you select a specific function, and GitHub Copilot X offers us chat functionality. This means we’re not limited to just the specific options available in the IDE, but we can also chat with this AI, just like with the GPT chat. And it knows, for example, that in another window (tab) we have selected a specific function, and if I write “generate unit tests for my code”, it takes the highlighted code and writes unit tests for it.

When it comes to writing unit tests, it writes complete test cases, including initializing stubs and so on. I specifically tested a piece of code that added nodes to a Neo4j graph database. It easily recognized the Neo driver, but it couldn’t initialize the Neo driver. However, it left a comment stating the Neo driver needs initialization. But, for example, it did a good job cleaning up afterward, meaning in the “After All” functionality, it closed the database driver. And there were two branches of code here, and it generated reasonable test cases for them, in my opinion. The downside is that these were, in fact, integration tests. So, it added these entries to a live database and didn’t try to stub the database. And it didn’t clean up afterward, meaning after running these tests, the nodes added to the database would remain there. It assumes that this is some special database specifically created for the test. Nonetheless, I think it’s quite a good feature because not many people use Neo4j, I believe, especially not in TypeScript. So, it did a nice job with these tests. I was genuinely impressed.

The entire Copilot X toolset also provides us with Copilot X CLI. This is a tool where, in the command line, we enter what we’d like to do, and it suggests the entire command we should simply run in this CLI. Unfortunately, I don’t have access to this tool because it turns out there’s a separate waitlist for it, so I can’t mention anything more about how it behaves.

And the third tool in this Copilot X package is a tool for code review. It’s a tool that summarizes pull requests. It doesn’t try to replace a programmer in evaluating code but rather tries to write nice descriptions of pull requests based on the code that’s been added, so the programmer doesn’t have to write this, and other programmers know exactly what’s been changed. Also, this tool – to be honest, I’m not entirely sure how it works in all its details – but it tries to detect if the added code is tested. So, it somehow detects whether in the entire codebase, we have unit tests for that specific part of the code that we added in the pull request. And if not, it adds this as a “not passed check” to the pull request, but it also tries to suggest some additional test cases or full tests. That’s at least how GitHub advertises it. So, I think here’s a really, really interesting set of tools that are still evolving.

But right now, I genuinely think that GitHub is a very leading company delving into these tools. Metrics also show that the longer people use these tools and the more they are developed, the acceptance rate, which is the number of suggestions that were accepted divided by the number of suggestions that were generated, is around 35%. So, the average developer agrees with every third suggestion from AI. I think that’s quite a good result.

I genuinely think that GitHub is a very leading company delving into these tools. Metrics also show that the longer people use these tools and the more they are developed, the acceptance rate, which is the number of suggestions that were accepted divided by the number of suggestions that were generated, is around 35%.

Two months ago, GitHub boasted in articles about the impact of their tool on the ecosystem, saying that by 2030 they save one and a half trillion dollars in developer salaries, in a sense. And that their tool add productivity equivalent to 15 million developers. Those are genuinely impressive, huge numbers. I think we’ll look at these figures more closely in the summary. But in such words, I would generally describe GitHub Copilot as currently the leading tool that truly sets trends.

CodiumAI: A Free Alternative to Copilot

Ewa Dusza: Thanks. You mentioned at the very beginning that it’s a paid tool. I know that you, Konrad, tested its free counterpart. How did it go? What are the pros, and cons? Where does it excel, and where does it fall short compared to Copilot?

Konrad Nowacki: Yes, the tool is called CodiumAI, and in various documents, it even compares itself to Copilot on many fronts. So here, as Piotr mentioned and as we all believe, Copilot is the market leader that everyone refers to. Codium directly acknowledges this without hiding it. However, what does Codium do differently?

Firstly, it’s free. By registering on their website, we can start using it integrated with our IDE. As for language support, it’s very broad – including even COBOL, some more exotic languages like Lua, and of course, all the popular ones like Java, Python, Kotlin, TypeScript, C++, etc.

What does it do differently? Its codebase is completely different. Codium believes this to be its advantage, especially concerning copyright issues, which we’ll discuss more later.

Codium believes this to be its advantage, especially concerning copyright issues, which we’ll discuss more later.

It’s about the GPL, the General Public License, where the code is publicly available. However, to use it commercially, we should obtain the owner’s permission, the creator of the code. Codium’s main criticism towards Copilot is that its legal position is unclear. Can such GPLs and code based on these GPLs be used? As they demonstrate, in many cases, Copilot can directly replicate someone’s code verbatim. One developer proved it. It can be found in the documentation where he input his function, and Copilot replicated his entire code exactly. Without a second thought, it spit out the full code. Codium operates differently. They have a limited dataset and importantly, our code is not sent to their database. They assert that our code isn’t shared. They operate on this fixed base. To be honest, their hints aren’t always as brilliant or comprehensive as Copilot’s. Copilot often generates an entire block of code or even an entire functionality based on the title. For example, during migrations, I was surprised that Copilot, based on other migration files, instantly produced SQL for adding a column and immediately provided a rollback option if the migration failed. With Codium, real-time analysis is evident. It reaches the same point as Copilot but only when we start defining specific variables and naming them.

Another thing they show about Codium and GPL is that if we want to write a GPL license, Codium won’t suggest it. Simply, it couldn’t since it didn’t have it in its test files. But with Copilot, they show in a video how it can generate a standard GPL line by line.

Of course, it’s a free version of Codium, so it may have limitations since companies want to profit. There’s also a Pro version. The major conceptual change in Pro is that we can host the entire code engine on our server, meaning we can have our instance of the analyzer and let Codium analyze our code internally. This provides security assurance that our code won’t leak. We control everything. Our instance or server hosting the Pro version of Codium won’t communicate with the external world. This is their solution. It also provides a better match to our repository since Codium, in this Pro version, will scan our entire repository. Hence, while coding, it will try to adapt to our style.

What else is there? There’s a chat option, which I honestly never used because I didn’t notice it. Only when preparing for the discussion this weekend did I see and briefly check it. It’s similar to what Piotr described about Copilot. There’s a code generation function, a refactoring function, and a code translation function, which Piotr also mentioned. There’s also an “Explain” function, probably similar to comments, explaining code blocks, etc.

It’s evident that the ideas are the same, or some are copying from others. Hard to say. On the one hand, everyone’s heading in the same direction, so breakthrough solutions might be hard to come by. After all, the functionalities developers need are consistent.

Regarding the comparison, I started with Codium as a free alternative. It was pleasant to use, providing quick suggestions, but usually single lines, rarely blocks of code. Switching to Copilot was a big leap in productivity. Indeed, Copilot can rapidly generate entire code blocks that either require a glance or minor adjustments. And it excels when the function name clearly indicates its purpose.

In conclusion, I genuinely recommend trying Codium. It’s an interesting tool and a fantastic concept. Currently, all features are free, and they say they will remain free. But there’s a chance, as they mention, that future features that required more effort from their team might only be available in the Pro version. It’s a fair approach and an intriguing way to promote their product. One can experiment with it for free. Then, companies might be enticed. The Pro version, with its instance ensuring no code leaks and that copyrights and licenses won’t be violated, seems an appealing option, especially for businesses wary of allegations or actions from competitors.

Ewa Dusza: Thanks. AI and its tools are a field that is constantly evolving, and keeping up with all these innovations must surely be very challenging. I’m curious, and I think our readers are too, where do you get your theoretical knowledge from? How do you know when new versions of these tools are approved (besides the fact that you get them yourselves and then, as you’ve mentioned before, build your knowledge based on them)? Are there dedicated websites or YouTube channels for this? How do you learn about these tools?

Piotr Moszkowicz: To be honest, I mainly use YouTube and don’t spend a lot of time specifically searching for new tools, whether in the context of AI or otherwise. There’s a fairly popular YouTube channel called Fireship, which addresses programming-related topics in a somewhat humorous and sometimes ironic manner. Given that generative AI is a big trend lately, many videos on this topic have been appearing there. It’s a great source informing us about new, interesting tools. However, when it comes to deeper research, once I know the names of these tools, I always prefer to visit their websites and take a look. As a representative of the academic community, I like to see if there are any scientific articles related to these tools, which delve deeper into how these tools work and how they were developed. That’s my approach to this topic.

Ewa Dusza: Does anyone have other sources?

Łukasz Duda: I’d like to add that I also recommend the Fireship channel on YouTube. It’s simply fantastic. The videos presented there are really great. I highly recommend it. It’s quite an unusual program and channel, touching on various programming challenges from different angles. As for AI, I can recommend an aggregator of various AI projects called AI Valley. There, we have categorized projects, for instance in design, audio, coding, fun tools, and life assistance. There’s also a special category for free tools and some that were recently added. So, for anyone wanting to stay up-to-date, it’s a good source to follow.

Ewa Dusza: In your comments, you emphasized with specific examples that AI tools are more of an aid or support, and that you can’t really do without a developer. Do you think that in the future these tools might fully replace the job of a programmer in some areas? Or will issues like copyright effectively prevent this, since we already know that some countries or companies are banning the use of these tools because of the subsequent problems that arise? Do you have any personal reflections on this topic?

Konrad Nowacki: I believe the issue of copyright will certainly be regulated in some way. Looking at Codium, for instance, the tool might explicitly state that it doesn’t use any other sources. And I forgot to mention, Copilot has made an effort to combat this issue as well by implementing a post-filter, designed to discard GPL results. However, the problem is that merely altering the sequence of parameters or arguments – as Codium demonstrates – can retrieve the exact same GPL-based code. It’s a point I remembered from their documentation. I recommend checking it out; you can learn more about Copilot than what GitHub wishes to reveal, which isn’t surprising.

So, I’m not sure if copyrights will be the issue; I think it will come down to regulations. Because the question is: what if two individuals simultaneously arrive at the same piece of code? Theoretically, if there’s a patent, the first one gets the credit. But with coding, it’s intricate; typically, a snippet of code isn’t unattainable for most. It’s an idea, but you can’t really label it as groundbreaking or as some significant event.

As for AI replacing developers, that’s a long-standing debate. I remember discussions suggesting AI would replace developers even before tools like Copilot appeared. However, developers are still essential for analyzing requirements, adapting, and Agile operations where requirements change mid-process. I find it hard to believe, for example, that a Project Manager or a Product Owner could communicate directly with a machine to generate the exact code they desire. I think it might be more about awaiting hints from these tools. But our intervention remains necessary. In manufacturing, humans initially did the packing, then robots took over, but humans are still needed to operate the robots. I believe it’s more about a mindset shift and adapting our skills to the tools we have to be more efficient. But developers will always be needed, perhaps just in different capacities.

Łukasz Duda: I’ve been pondering over this copyright issue. And honestly, I’m not convinced it will be easily resolved. Like with the Cedeium versus Copilot comparison, from what I understand, Copilot doesn’t check licenses at all, right, Piotr? Is there any sort of check?

Piotr Moszkowicz: As mentioned, there is this “post-filter”, but its effectiveness is debatable.

Konrad Nowacki: And it’s disabled by default. When you turn it on, Copilot’s performance degrades significantly, which is also crucial to note.

Łukasz Duda: This pertains to all results generated by AI in various fields, be it graphics, music, or anything. Legal disputes are increasingly arising. More original authors are demanding compensation from big corporations that use their pieces of art or code and make substantial profits. I just can’t see how individual countries or even a single corporation’s guidelines can suddenly make everything right and compliant. I’m skeptical. I think high-level international regulations, like those within the European Union or even global ones, will need to be established. The issue is just too vast. What’s hidden behind these AI models can be anything, and currently, no one can really trace the origins of these AI models.

Ewa Dusza: So, we still have all the regulations ahead of us, right?

Łukasz Duda: That’s my opinion, yes.

Piotr Moszkowicz: The authors of the article I mentioned earlier, which discusses the Codex model on which GitHub Copilot operates, added an entire chapter titled “Broader Impacts and Hazard Analysis.” It outlines several factors they identified as potential risks.

One of these is legal implications. They defend themselves in an interesting manner here. They argue that AI rarely suggests code that’s identical to what was in the training set. They claim this happens in less than 0.1% of cases and that they’ve conducted studies on this topic. It’s hard to judge because if it’s not a 1-to-1 match, legally it might be challenging to argue that it’s the same code. Even if minor changes are made, I think skilled lawyers would argue that the code was independently generated.

They also touch upon other interesting considerations, like environmental impact. Training these machine models requires significant computational power. So, what about that aspect?

There are also biases, as seen in ChatGPT. These tools can generate inappropriate or even racist content, for example. Similar problems might arise with generative tools for coding, albeit probably not in the same way. It might not be “racist code”, but certain biases can appear.

They also highlight an often overlooked issue in a subsection titled “Misalignment”, which describes how such tools might intentionally suggest lower-quality code, even if they could suggest better code, just because the codebase provided by the developer is of inferior quality. This adaptation to our codebase can sometimes negatively impact code quality. It’s an intriguing threat that not many mention.

And of course, there’s the over-reliance on the AI-generated code, which poses various risks, including security and copyright issues. The code might seem to work but doesn’t, or seems to address a problem but doesn’t truly resolve it. These pitfalls might be overlooked, and people might place too much trust in these tools, leading to worsening quality.

In the context of whether AI will replace us, there always needs to be data for these models to learn from. Will we reach a stage where AI writes code and then learns from that very code? Probably, sooner or later. But the question is whether it’s a good thing. What’s the basis for the code generation? Will the quality of code suggested by these tools deteriorate over time? ChatGPT showed us that the more people used it, the less accurate and more nonsensical its responses became. In a sense, people managed to “dumb down” the AI. Similar phenomena might occur with these developer tools.

As for the impact, I mentioned those large numbers earlier, and I’d like to clarify something. It’s a forecast for 2030. So, it’s not what these tools have already achieved. But the authors of the GitHub article, backed by a scientific paper involving individuals from Harvard and Kingston AI, make these predictions. They project how much AI can save.

Łukasz Duda: But you’re talking about the costs saved, right?

Piotr Moszkowicz: Yes, precisely about those saved costs and those additional 15 million “free” developers, in a sense. Adding to the research source, I found that the study examined how over 930,000 GitHub Copilot users utilized it. So, we can safely say that by June 2023, when this article was written, almost a million developers are using these AI tools. The data for the research had to come from somewhere. It’s an interesting figure showing many people are indeed using these tools.

Two months ago, GitHub boasted in articles about the impact of their tool on the ecosystem, saying that by 2030 they save one and a half trillion dollars in developer salaries, in a sense. And that their tool add productivity equivalent to 15 million developers. Those are genuinely impressive, huge numbers.

GitHub also emphasizes in this research that these tools are most beneficial for junior developers or those with less experience. I suspect – and this is my personal speculation, not based on any article or data – that it’s primarily because less experienced developers often tackle repetitive problems. They’re not necessarily deep problems, research, or implementing new technologies but problems that have been solved similarly or even identically before. In such cases, AI can, of course, resolve these issues much more efficiently.

Ewa Dusza: Thank you very much.

Konrad Nowacki: So, the question arises, how will these developers learn if they are handed some pre-made solutions?

Piotr Moszkowicz: Indeed, it’s about over-reliance. The issue is, how does an organization that uses such tools foster a culture around their usage? I personally view these tools as productivity boosters. And of course, every line that these tools write for me is evaluated. The real question is, how will especially the younger developers, who use these tools daily, be educated? How much trust will they have in them? I think a significant responsibility lies with team managers, team leaders, or leading figures in remote developer teams to cultivate a good strategy and a deep understanding of the pros, cons, and potential hazards of these tools.

Krzysztof Wawer: In my opinion, it’s a threat to those entering IT as programmers. If AI is suggesting code to them, they won’t be scrutinizing and writing code as thoroughly. We know from studies that when we write something ourselves, whether typing or even by hand, we remember what we produced much better. Relying entirely or mostly on AI could have long-term repercussions on quality.

Konrad Nowacki: So what? Introduce a seniority level from which one can use Copilot in companies? Is that the proposition we’re discussing today? 😉

Ewa Dusza: I think this is yet another testament that humans can’t be wholly replaced by artificial intelligence. At some point, that mentor, that senior figure, will still be essential for learning, rather than relying solely on a tool.

Łukasz Duda: I’d like to touch on this because I recently had a discussion with a colleague. In these post-pandemic times, as most of IT has shifted to remote work, the situation where a junior goes to the kitchen to grab coffee and discuss a problem with a senior colleague sort of vanishes. For these individuals, I believe it becomes challenging to learn new things and solve problems correctly with the assistance of more experienced colleagues. I think there used to be more opportunities to interact in the “real world,” so to speak. I believe that young developers have it harder now, even if they can subscribe to Copilot. It’s just not the same. That’s my view.

Piotr Moszkowicz: I think the code review process becomes crucial here. Both in terms of educating these younger developers and catching potential AI errors. Sometimes this code review process is sidelined, and the code is quickly skimmed through. Honestly, I feel that because potentially a significant portion of the code might be auto-generated, it should be scrutinized much more. And not just by one person, but as part of the broader code review process.

Ewa Dusza: Perhaps, on this optimistic note that the human factor remains indispensable in our work, we should wrap up. Thank you all for today, for this discussion, and for shedding light on the pros and cons of these tools. I think we can conclude that they do boost productivity and support you in your daily tasks. A big thank you to you all. Until next time, readers. See you!